Please correct me if I’m wrong but it looks like K...
# questions
o
Please correct me if I’m wrong but it looks like Kedro’s implementation has slightly overlooked input dataset as a differentiating factor for an experiment. That is, Kedro doesn’t consider a different input dataset as a different session/experiment run.
j
do you mean, in the context of Kedro Viz Experiment Tracking?
o
Yes for example. How do you store two experiments, same code but different data, side-by-side?
This is a simple execution/flow question. The second question is of methodology, how do you compare them?
j
since the metrics and json outputs can be versioned, every time you do
kedro run
the tracking outputs you've defined will be saved in a different directory, that will be named using the timestamp of the run: https://docs.kedro.org/en/stable/visualisation/experiment_tracking.html#generate-the-run-data so the question is, how to identify which input dataset was used for each, am I right?
o
Yes, and how to easily specify input datasets without modifying files, e.g.
kedro run --name=my_first_exp --input-dataset=my_first_dataset.parquet
kedro run --name=my_second_exp --input-dataset=my_second_dataset.parquet
kedro viz
something along these lines..
j
modular pipelines allow you to reuse the same pipeline structure for different inputs: https://docs.kedro.org/en/stable/nodes_and_pipelines/modular_pipelines.html#how-to-use-a-modular-pipeline-with-different-parameters you could then designate different pipelines that run for different inputs already defined in your catalog. this is similar to a question that got asked a few days ago https://kedro-org.slack.com/archives/C03RKP2LW64/p1682343231008289
also, @Ofir you might want to use
kedro run --from-inputs
o
Thanks a lot!