xvc pipeline step dependency

Purpose

Define a dependency to an existing step in the pipeline.

Synopsis

$ xvc pipeline step dependency --help
Add a dependency to a step

Usage: xvc pipeline step dependency [OPTIONS] --step-name <STEP_NAME>

Options:
  -s, --step-name <STEP_NAME>
          Name of the step to add the dependency to

      --generic <GENERICS>
          Add a generic command output as a dependency. Can be used multiple times. Please delimit the command with ' ' to avoid shell expansion

      --url <URLS>
          Add a URL dependency to the step. Can be used multiple times

      --file <FILES>
          Add a file dependency to the step. Can be used multiple times

      --step <STEPS>
          Add a step dependency to a step. Can be used multiple times. Steps are referred with their names

      --glob_items <GLOB_ITEMS>
          Add a glob items dependency to the step.
          
          You can depend on multiple files and directories with this dependency.
          
          The difference between this and the glob option is that this option keeps track of all matching files, but glob only keeps track of the matched files' digest. When you want to use ${XVC_GLOB_ITEMS}, ${XVC_ADDED_GLOB_ITEMS}, or ${XVC_REMOVED_GLOB_ITEMS} environment variables in the step command, use the glob-items dependency. Otherwise, you can use the glob option to save disk space.

      --glob <GLOBS>
          Add a glob dependency to the step. Can be used multiple times.
          
          You can depend on multiple files and directories with this dependency.
          
          The difference between this and the glob-items option is that the glob-items option keeps track of all matching files individually, but this option only keeps track of the matched files' digest. This dependency uses considerably less disk space.

      --param <PARAMS>
          Add a parameter dependency to the step in the form filename.yaml::model.units . Can be used multiple times

      --regex_items <REGEX_ITEMS>
          Add a regex dependency in the form filename.txt:/^regex/ . Can be used multiple times.
          
          The difference between this and the regex option is that the regex-items option keeps track of all matching lines, but regex only keeps track of the matched lines' digest. When you want to use ${XVC_REGEX_ITEMS}, ${XVC_ADDED_REGEX_ITEMS}, ${XVC_REMOVED_REGEX_ITEMS} environment variables in the step command, use the regex option. Otherwise, you can use the regex-digest option to save disk space.

      --regex <REGEXES>
          Add a regex dependency in the form filename.txt:/^regex/ . Can be used multiple times.
          
          The difference between this and the regex option is that the regex option keeps track of all matching lines that can be used in the step command. This option only keeps track of the matched lines' digest.

      --line_items <LINE_ITEMS>
          Add a line dependency in the form filename.txt::123-234
          
          The difference between this and the lines option is that the line-items option keeps track of all matching lines that can be used in the step command. This option only keeps track of the matched lines' digest. When you want to use ${XVC_ALL_LINE_ITEMS}, ${XVC_ADDED_LINE_ITEMS}, ${XVC_CHANGED_LINE_ITEMS} options in the step command, use the line option. Otherwise, you can use the lines option to save disk space.

      --lines <LINES>
          Add a line digest dependency in the form filename.txt::123-234
          
          The difference between this and the line-items dependency is that the line option keeps track of all matching lines that can be used in the step command. This option only keeps track of the matched lines' digest. If you don't need individual lines to be kept, use this option to save space.

  -h, --help
          Print help (see a summary with '-h')

Generic Command Dependencies

This command works only in Xvc repositories.

$ git init
...
$ xvc init

You can use the output of a shell command as a dependency to a step. When the command is run, the output hash is saved to compare and the step is invalidated when the output of the command changed.

You can use this for any command that outputs a string.

$ xvc pipeline step new --step-name morning-message --command "echo 'Good Morning!'"

$ xvc  pipeline step dependency --step-name morning-message --generic 'date +%F'

The step is invalidated when the date changes and the step is run again.

$ xvc pipeline run
[OUT] [morning-message] Good Morning!
 
[DONE] morning-message (echo 'Good Morning!')

The step won't run until tomorrow, when date +%F changes.

$ xvc pipeline run
[OUT] [morning-message] Good Morning!
 
[DONE] morning-message (echo 'Good Morning!')

You can mimic all kinds of pipeline behavior with this generic dependency.

For example, if you want to run a command when directory contents change, you can depend on the output of ls -lR:

$ xvc pipeline step new --step-name directory-contents --command "echo 'Files changed'"
$ xvc pipeline step dependency --step-name directory-contents --generic 'ls'

$ xvc pipeline run
[OUT] [directory-contents] Files changed
 
[DONE] directory-contents (echo 'Files changed')

When you add a file to the directory, the step is invalidated and run again:

$ xvc pipeline run

$ xvc-test-helper generate-random-file new-file.txt
$ xvc pipeline run
[OUT] [directory-contents] Files changed
 
[DONE] directory-contents (echo 'Files changed')

File Dependencies

This command works only in Xvc repositories.

$ git init
...
$ xvc init

Begin by adding a new step.

$ xvc pipeline step new --step-name file-dependency --command "echo data.txt has changed"

Add a file dependency to the step.

$ xvc pipeline step dependency --step-name file-dependency --file data.txt

When you run the command, it will print data.txt has changed if the file data.txt has changed.

$ xvc pipeline run
[OUT] [file-dependency] data.txt has changed
 
[DONE] file-dependency (echo data.txt has changed)

You can add multiple dependencies to a step with multiple invocations.

$ xvc pipeline step dependency --step-name file-dependency --file data2.txt

A step will run if any of its dependencies have changed.

$ xvc pipeline run
[OUT] [file-dependency] data.txt has changed
 
[DONE] file-dependency (echo data.txt has changed)

By default, they are not run if none of the dependencies have changed.

$ xvc pipeline run

However, if you want to run the step even if none of the dependencies have changed, you can set the --when option to always.

$ xvc pipeline step update --step-name file-dependency --when always

Now the step will run even if none of the dependencies have changed.

$ xvc pipeline run
[OUT] [file-dependency] data.txt has changed
 
[DONE] file-dependency (echo data.txt has changed)

Glob Dependencies

A step can depend on multiple files specified with globs. The difference with this and glob-items dependency is that this one doesn't track the files, and doesn't pass the list of files in environment variables to the command.

This command works only in Xvc repositories.

$ git init
...
$ xvc init

Let's create a set of files:

$ xvc-test-helper create-directory-tree --directories 2 --files 3 --seed 2023

$ tree
.
├── dir-0001
│   ├── file-0001.bin
│   ├── file-0002.bin
│   └── file-0003.bin
└── dir-0002
    ├── file-0001.bin
    ├── file-0002.bin
    └── file-0003.bin

3 directories, 6 files

Add a step to say files has changed when the files have changed.

$ xvc pipeline step new --step-name files-changed --command "echo 'Files have changed.'"

$ xvc pipeline step dependency --step-name files-changed --glob 'dir-*/*'

The step is invalidated when a file described by the glob is added, removed or changed.

$ xvc pipeline run
[OUT] [files-changed] Files have changed.
 
[DONE] files-changed (echo 'Files have changed.')

$ xvc pipeline run

When a file is removed from the files described by the glob, the step is invalidated.

$ rm dir-0001/file-0001.bin

$ xvc pipeline run
[OUT] [files-changed] Files have changed.
 
[DONE] files-changed (echo 'Files have changed.')

Regex Dependencies

You can specify a regular expression matched against the lines from a file as a dependency. The step is invalidated when the matched results changed.

This command works only in Xvc repositories.

$ git init
...
$ xvc init

We'll use a sample CSV file in this example:

$ cat people.csv
"Name",     "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex",       "M",   41,       74,      170
"Bert",       "M",   42,       68,      166
"Carl",       "M",   32,       70,      155
"Dave",       "M",   39,       72,      167
"Elly",       "F",   30,       66,      124
"Fran",       "F",   33,       66,      115
"Gwen",       "F",   26,       64,      121
"Hank",       "M",   30,       71,      158
"Ivan",       "M",   53,       72,      175
"Jake",       "M",   32,       69,      143
"Kate",       "F",   47,       69,      139
"Luke",       "M",   34,       72,      163
"Myra",       "F",   23,       62,       98
"Neil",       "M",   36,       75,      160
"Omar",       "M",   38,       70,      145
"Page",       "F",   31,       67,      135
"Quin",       "M",   29,       71,      176
"Ruth",       "F",   28,       65,      131


Now, let's add a step to the pipeline to count females in the file:

$ xvc pipeline step new --step-name count-females --command "grep -c '\"F\",' people.csv"

These commands are run when the regex dependencies change.

$ xvc pipeline step dependency --step-name count-females --regex 'people.csv:/^.*"F",.*$'

When you run the pipeline initially, the steps are run.

$ xvc pipeline run
[OUT] [count-females] 7
 
[DONE] count-females (grep -c '"F",' people.csv)

When you run the pipeline again, the step is not run because the regex result didn't change.

$ xvc pipeline run

When you add a new female record to the file, the step is run and the command prints the new count.

$ zsh -c "echo '\"Asude\",      \"F\",   12,       55,      110' >> people.csv"

$ cat people.csv
"Name",     "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex",       "M",   41,       74,      170
"Bert",       "M",   42,       68,      166
"Carl",       "M",   32,       70,      155
"Dave",       "M",   39,       72,      167
"Elly",       "F",   30,       66,      124
"Fran",       "F",   33,       66,      115
"Gwen",       "F",   26,       64,      121
"Hank",       "M",   30,       71,      158
"Ivan",       "M",   53,       72,      175
"Jake",       "M",   32,       69,      143
"Kate",       "F",   47,       69,      139
"Luke",       "M",   34,       72,      163
"Myra",       "F",   23,       62,       98
"Neil",       "M",   36,       75,      160
"Omar",       "M",   38,       70,      145
"Page",       "F",   31,       67,      135
"Quin",       "M",   29,       71,      176
"Ruth",       "F",   28,       65,      131

"Asude",      "F",   12,       55,      110

$ xvc pipeline run
[OUT] [count-females] 8
 
[DONE] count-females (grep -c '"F",' people.csv)

Line Dependencies

You can make your steps to depend on lines of text files. The lines are defined by starting and ending indices.

When the text in those lines change, the step is invalidated.

This command works only in Xvc repositories.

$ git init
...
$ xvc init

We'll use a sample CSV file in this example:

$ cat people.csv
"Name",     "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex",       "M",   41,       74,      170
"Bert",       "M",   42,       68,      166
"Carl",       "M",   32,       70,      155
"Dave",       "M",   39,       72,      167
"Elly",       "F",   30,       66,      124
"Fran",       "F",   33,       66,      115
"Gwen",       "F",   26,       64,      121
"Hank",       "M",   30,       71,      158
"Ivan",       "M",   53,       72,      175
"Jake",       "M",   32,       69,      143
"Kate",       "F",   47,       69,      139
"Luke",       "M",   34,       72,      163
"Myra",       "F",   23,       62,       98
"Neil",       "M",   36,       75,      160
"Omar",       "M",   38,       70,      145
"Page",       "F",   31,       67,      135
"Quin",       "M",   29,       71,      176
"Ruth",       "F",   28,       65,      131


Let's a step to show the first 10 lines of the file:

$ xvc pipeline step new --step-name print-top-10 --command "head people.csv"

The command is run only when those lines change.

$ xvc pipeline step dependency --step-name print-top-10 --lines 'people.csv::1-10'

When you run the pipeline initially, the step is run.

$ xvc pipeline run
[OUT] [print-top-10] "Name",     "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex",       "M",   41,       74,      170
"Bert",       "M",   42,       68,      166
"Carl",       "M",   32,       70,      155
"Dave",       "M",   39,       72,      167
"Elly",       "F",   30,       66,      124
"Fran",       "F",   33,       66,      115
"Gwen",       "F",   26,       64,      121
"Hank",       "M",   30,       71,      158
"Ivan",       "M",   53,       72,      175
 
[DONE] print-top-10 (head people.csv)

When you run the pipeline again, the step is not run because the specified lines didn't change.

$ xvc pipeline run

When you change a line from the file, the step is invalidated.

$ perl -i -pe 's/Hank/Ferzan/g' people.csv

Now, when you run the pipeline, it will print the first 10 lines again.

$ xvc pipeline run
[OUT] [print-top-10] "Name",     "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex",       "M",   41,       74,      170
"Bert",       "M",   42,       68,      166
"Carl",       "M",   32,       70,      155
"Dave",       "M",   39,       72,      167
"Elly",       "F",   30,       66,      124
"Fran",       "F",   33,       66,      115
"Gwen",       "F",   26,       64,      121
"Ferzan",       "M",   30,       71,      158
"Ivan",       "M",   53,       72,      175
 
[DONE] print-top-10 (head people.csv)

Glob Items Dependency

A step can depend on multiple files specified with globs. When any of the files change, or a new file is added or removed from the files specified by glob, the step is invalidated.

Unline glob dependency, glob items dependency keeps track of the individual files that belong to a glob. If your command run with the list of files from a glob and you want to track added and removed files, use this. Otherwise if your command for all the files in a glob and don't need to track which files have changed, use the glob dependency.

This one injects ${XVC_GLOB_ADDED_ITEMS}, ${XVC_GLOB_REMOVED_ITEMS}, ${XVC_GLOB_CHANGED_ITEMS} and ${XVC_GLOB_ALL_ITEMS} to the command environment.

This command works only in Xvc repositories.

$ git init
...
$ xvc init

Let's create a set of files:

$ xvc-test-helper create-directory-tree --directories 2 --files 3 --seed 2023

$ tree
.
├── dir-0001
│   ├── file-0001.bin
│   ├── file-0002.bin
│   └── file-0003.bin
└── dir-0002
    ├── file-0001.bin
    ├── file-0002.bin
    └── file-0003.bin

3 directories, 6 files

Add a step to list the added files.

$ xvc pipeline step new --step-name files-changed --command 'echo "### Added Files:\n${XVC_GLOB_ADDED_ITEMS}\n### Removed Files:\n${XVC_GLOB_REMOVED_ITEMS}\n### Changed Files:\n${XVC_GLOB_CHANGED_ITEMS}"'

$ xvc pipeline step dependency --step-name files-changed --glob-items 'dir-*/*'

The step is invalidated when a file described by the glob is added, removed or changed.

$ xvc pipeline run
[OUT] [files-changed] ### Added Files:
dir-0001/file-0001.bin
dir-0001/file-0002.bin
dir-0001/file-0003.bin
dir-0002/file-0001.bin
dir-0002/file-0002.bin
dir-0002/file-0003.bin
### Removed Files:

### Changed Files:

 
[DONE] files-changed (echo "### Added Files:/n${XVC_GLOB_ADDED_ITEMS}/n### Removed Files:/n${XVC_GLOB_REMOVED_ITEMS}/n### Changed Files:/n${XVC_GLOB_CHANGED_ITEMS}")

$ xvc pipeline run

If you add or remove a file from the files specified by the glob, they are printed.

$ rm dir-0001/file-0001.bin

$ xvc pipeline run
[OUT] [files-changed] ### Added Files:

### Removed Files:
dir-0001/file-0001.bin
### Changed Files:

 
[DONE] files-changed (echo "### Added Files:/n${XVC_GLOB_ADDED_ITEMS}/n### Removed Files:/n${XVC_GLOB_REMOVED_ITEMS}/n### Changed Files:/n${XVC_GLOB_CHANGED_ITEMS}")

When you change a file, it's printed in both added and removed files:

$ xvc-test-helper generate-filled-file dir-0001/file-0002.bin

$ xvc pipeline run
[OUT] [files-changed] ### Added Files:

### Removed Files:

### Changed Files:
dir-0001/file-0002.bin
 
[DONE] files-changed (echo "### Added Files:/n${XVC_GLOB_ADDED_ITEMS}/n### Removed Files:/n${XVC_GLOB_REMOVED_ITEMS}/n### Changed Files:/n${XVC_GLOB_CHANGED_ITEMS}")

Regex Item Dependencies

You can specify a regular expression matched against the lines from a file as a dependency. The step is invalidated when the matched results changed.

Unlike regex dependencies, regex item dependencies keep track of the matched items. You can access them with ${XVC_REGEX_ALL_ITEMS}, ${XVC_REGEX_ADDED_ITEMS}, and ${XVC_REGEX_REMOVED_ITEMS} environment variables.

This command works only in Xvc repositories.

$ git init
...
$ xvc init

We'll use a sample CSV file in this example:

$ cat people.csv
"Name",     "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex",       "M",   41,       74,      170
"Bert",       "M",   42,       68,      166
"Carl",       "M",   32,       70,      155
"Dave",       "M",   39,       72,      167
"Elly",       "F",   30,       66,      124
"Fran",       "F",   33,       66,      115
"Gwen",       "F",   26,       64,      121
"Hank",       "M",   30,       71,      158
"Ivan",       "M",   53,       72,      175
"Jake",       "M",   32,       69,      143
"Kate",       "F",   47,       69,      139
"Luke",       "M",   34,       72,      163
"Myra",       "F",   23,       62,       98
"Neil",       "M",   36,       75,      160
"Omar",       "M",   38,       70,      145
"Page",       "F",   31,       67,      135
"Quin",       "M",   29,       71,      176
"Ruth",       "F",   28,       65,      131


Now, let's add steps to the pipeline to count males and females in the file:

$ xvc pipeline step new --step-name new-males --command 'echo "New Males:\n ${XVC_REGEX_ADDED_ITEMS}"'
$ xvc pipeline step new --step-name new-females --command 'echo "New Females:\n ${XVC_REGEX_ADDED_ITEMS}"'
$ xvc pipeline step dependency --step-name new-females --step new-males

We also added a step dependency to let the steps run always in the same order.

These commands are run when the following regexes change.

$ xvc pipeline step dependency --step-name new-males --regex-items 'people.csv:/^.*"M",.*$'

$ xvc pipeline step dependency --step-name new-females --regex-items 'people.csv:/^.*"F",.*$'

When you run the pipeline initially, the steps are run.

$ xvc pipeline run
[OUT] [new-males] New Males:
 "Alex",       "M",   41,       74,      170
"Bert",       "M",   42,       68,      166
"Carl",       "M",   32,       70,      155
"Dave",       "M",   39,       72,      167
"Hank",       "M",   30,       71,      158
"Ivan",       "M",   53,       72,      175
"Jake",       "M",   32,       69,      143
"Luke",       "M",   34,       72,      163
"Neil",       "M",   36,       75,      160
"Omar",       "M",   38,       70,      145
"Quin",       "M",   29,       71,      176
 
[DONE] new-males (echo "New Males:/n ${XVC_REGEX_ADDED_ITEMS}")
[OUT] [new-females] New Females:
 "Elly",       "F",   30,       66,      124
"Fran",       "F",   33,       66,      115
"Gwen",       "F",   26,       64,      121
"Kate",       "F",   47,       69,      139
"Myra",       "F",   23,       62,       98
"Page",       "F",   31,       67,      135
"Ruth",       "F",   28,       65,      131
 
[DONE] new-females (echo "New Females:/n ${XVC_REGEX_ADDED_ITEMS}")

When you run the pipeline again, the steps are not run because the regexes didn't change.

$ xvc pipeline run

When you add a new female record to the file, only the female count step is run.

$ zsh -c "echo '\"Asude\",      \"F\",   12,       55,      110' >> people.csv"

$ cat people.csv
"Name",     "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex",       "M",   41,       74,      170
"Bert",       "M",   42,       68,      166
"Carl",       "M",   32,       70,      155
"Dave",       "M",   39,       72,      167
"Elly",       "F",   30,       66,      124
"Fran",       "F",   33,       66,      115
"Gwen",       "F",   26,       64,      121
"Hank",       "M",   30,       71,      158
"Ivan",       "M",   53,       72,      175
"Jake",       "M",   32,       69,      143
"Kate",       "F",   47,       69,      139
"Luke",       "M",   34,       72,      163
"Myra",       "F",   23,       62,       98
"Neil",       "M",   36,       75,      160
"Omar",       "M",   38,       70,      145
"Page",       "F",   31,       67,      135
"Quin",       "M",   29,       71,      176
"Ruth",       "F",   28,       65,      131

"Asude",      "F",   12,       55,      110

$ xvc pipeline run
[OUT] [new-females] New Females:
 "Asude",      "F",   12,       55,      110
 
[DONE] new-females (echo "New Females:/n ${XVC_REGEX_ADDED_ITEMS}")

Line Item Dependencies

You can make your steps to depend on lines of text files. The lines are defined by starting and ending indices.

When the text in those lines change, the step is invalidated.

Unlike line dependencies, this dependency type keeps track of the lines in the file. You can use ${XVC_LINE_ALL_ITEMS}, ${XVC_LINE_ADDED_ITEMS}, and ${XVC_LINE_REMOVED_ITEMS} environment variables in the command. Please be aware that for large set of lines, this dependency can take up considerable space to keep track of all lines and if you don't need to keep track of changed lines, you can use --lines dependency.

This command works only in Xvc repositories.

$ git init
...
$ xvc init

We'll use a sample CSV file in this example:

$ cat people.csv
"Name",     "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex",       "M",   41,       74,      170
"Bert",       "M",   42,       68,      166
"Carl",       "M",   32,       70,      155
"Dave",       "M",   39,       72,      167
"Elly",       "F",   30,       66,      124
"Fran",       "F",   33,       66,      115
"Gwen",       "F",   26,       64,      121
"Hank",       "M",   30,       71,      158
"Ivan",       "M",   53,       72,      175
"Jake",       "M",   32,       69,      143
"Kate",       "F",   47,       69,      139
"Luke",       "M",   34,       72,      163
"Myra",       "F",   23,       62,       98
"Neil",       "M",   36,       75,      160
"Omar",       "M",   38,       70,      145
"Page",       "F",   31,       67,      135
"Quin",       "M",   29,       71,      176
"Ruth",       "F",   28,       65,      131


Let's a step to show the first 10 lines of the file:

$ xvc pipeline step new --step-name print-top-10 --command 'echo "Added Lines:\n ${XVC_LINE_ADDED_ITEMS}\nRemoved Lines:\n${XVC_LINE_REMOVED_ITEMS}"'

The command is run only when those lines change.

$ xvc pipeline step dependency --step-name print-top-10 --line-items 'people.csv::1-10'

When you run the pipeline initially, the step is run.

$ xvc pipeline run
[OUT] [print-top-10] Added Lines:
 "Alex",       "M",   41,       74,      170
"Bert",       "M",   42,       68,      166
"Carl",       "M",   32,       70,      155
"Dave",       "M",   39,       72,      167
"Elly",       "F",   30,       66,      124
"Fran",       "F",   33,       66,      115
"Gwen",       "F",   26,       64,      121
"Hank",       "M",   30,       71,      158
"Ivan",       "M",   53,       72,      175
Removed Lines:

 
[DONE] print-top-10 (echo "Added Lines:/n ${XVC_LINE_ADDED_ITEMS}/nRemoved Lines:/n${XVC_LINE_REMOVED_ITEMS}")

When you run the pipeline again, the step is not run because the specified lines didn't change.

$ xvc pipeline run

When you change a line from the file, the step is invalidated.

$ perl -i -pe 's/Hank/Ferzan/g' people.csv

Now, when you run the pipeline, it will print the changed line, with its new and old versions.

$ xvc pipeline run
[OUT] [print-top-10] Added Lines:
 "Ferzan",       "M",   30,       71,      158
Removed Lines:
"Hank",       "M",   30,       71,      158
 
[DONE] print-top-10 (echo "Added Lines:/n ${XVC_LINE_ADDED_ITEMS}/nRemoved Lines:/n${XVC_LINE_REMOVED_ITEMS}")

(Hyper-)Parameter Dependencies

You may be keeping pipeline-wide parameters in structured text files. You can specify such parameters found in JSON, TOML and YAML files as dependencies.

This command works only in Xvc repositories.

$ git init
...
$ xvc init

Suppose we have a YAML file that we specify various parameters for the whole connection.

param: value
database:
  server: example.com
  port: 5432
  connection:
    timeout: 5000
numeric_param: 13

Now, we create two steps to read different variables from the file and a dependency between them to force them to run in the same order always.

$ xvc pipeline step new --step-name read-database-config --command 'echo "Updated Database Configuration"'

$ xvc pipeline step new --step-name read-hyperparams --command 'echo "Update Hyperparameters"'

$ xvc pipeline step dependency --step-name read-database-config --step read-hyperparams

Let's create different steps for various pieces of this parameters file:

$ xvc pipeline step dependency --step-name read-database-config --param 'myparams.yaml::database.port' --param 'myparams.yaml::database.server' --param 'myparams.yaml::database.connection'

$ xvc pipeline step dependency --step-name read-hyperparams --param 'myparams.yaml::param' --param 'myparams.yaml::numeric_param'

Run for the first time, as initially all dependencies are invalid:

$ xvc pipeline run
[OUT] [read-hyperparams] Update Hyperparameters
 
[DONE] read-hyperparams (echo "Update Hyperparameters")
[OUT] [read-database-config] Updated Database Configuration
 
[DONE] read-database-config (echo "Updated Database Configuration")

For the second time, it won't read the configuration as nothing is changed:

$ xvc pipeline run

When you update a value in this file, it will only invalidate the steps that depend on the value, not other dependencies that rely on the same file.

Let's update the database port:

$ perl -pi -e 's/5432/9876/g' myparams.yaml

$ xvc pipeline run
[OUT] [read-database-config] Updated Database Configuration
 
[DONE] read-database-config (echo "Updated Database Configuration")

Note that, read-hyperparams is not invalidated, though the values are in the same file.

Step Dependencies

This command works only in Xvc repositories.

$ git init
...
$ xvc init

You can add a step dependency to a step. These steps specify dependency relationships explicitly, without relying on changed files or directories.

$ xvc pipeline step new --step-name world --command "echo world"
$ xvc pipeline step new --step-name hello --command "echo hello"
$ xvc pipeline step dependency --step-name world --step hello

When run, the dependency will be run first and the step will be run after.

$ xvc pipeline run
[OUT] [hello] hello
 
[DONE] hello (echo hello)
[OUT] [world] world
 
[DONE] world (echo world)

If the dependency is not run, the dependent step won't run either.

$ xvc pipeline step update --step-name hello --when never
$ xvc pipeline run

If you want to run the dependent always, you can set it to run always explicitly.

$ xvc pipeline step update --step-name world --when always
$ xvc pipeline run
[OUT] [world] world
 
[DONE] world (echo world)

URL Dependencies

This command works only in Xvc repositories.

$ git init
...
$ xvc init

You can use a web URL as a dependency to a step. When the URL is fetched, the output hash is saved to compare and the step is invalidated when the output of the URL is changed.

You can use this with any URL.

$ xvc pipeline step new --step-name xvc-docs-update --command "echo 'Xvc docs updated!'"

$ xvc pipeline step dependency --step-name xvc-docs-update --url https://docs.xvc.dev/

The step is invalidated when the page is updated.

$ xvc pipeline run
[OUT] [xvc-docs-update] Xvc docs updated!
 
[DONE] xvc-docs-update (echo 'Xvc docs updated!')

The step won't run again until a new version of the page is published.

$ xvc pipeline run

Note that, Xvc doesn't download the page every time. It checks the Last-Modified and Etag headers and only downloads the page if it has changed.

If there are more complex requirements than just the URL changing, you can use a generic dependency to get the output of a command and use that as a dependency.

Generic Command Dependencies

This command works only in Xvc repositories.

$ git init
...
$ xvc init

You can use the output of a shell command as a dependency to a step. When the command is run, the output hash is saved to compare and the step is invalidated when the output of the command changed.

You can use this for any command that outputs a string.

$ xvc pipeline step new --step-name morning-message --command "echo 'Good Morning!'"

$ xvc  pipeline step dependency --step-name morning-message --generic 'date +%F'

The step is invalidated when the date changes and the step is run again.

$ xvc pipeline run
[OUT] [morning-message] Good Morning!
 
[DONE] morning-message (echo 'Good Morning!')

The step won't run until tomorrow, when date +%F changes.

$ xvc pipeline run
[OUT] [morning-message] Good Morning!
 
[DONE] morning-message (echo 'Good Morning!')

You can mimic all kinds of pipeline behavior with this generic dependency.

For example, if you want to run a command when directory contents change, you can depend on the output of ls -lR:

$ xvc pipeline step new --step-name directory-contents --command "echo 'Files changed'"
$ xvc pipeline step dependency --step-name directory-contents --generic 'ls'

$ xvc pipeline run
[OUT] [directory-contents] Files changed
 
[DONE] directory-contents (echo 'Files changed')

When you add a file to the directory, the step is invalidated and run again:

$ xvc pipeline run

$ xvc-test-helper generate-random-file new-file.txt
$ xvc pipeline run
[OUT] [directory-contents] Files changed
 
[DONE] directory-contents (echo 'Files changed')

Caveats

Tips

Most shells support editing longer commands with an editor. For bash, you can use Ctrl+X Ctrl+E.

Pipeline commands can get longer quickly. You can use xvc aliases for shorter versions. Type source $(xvc aliases) to load the aliases into your shell.