xvc pipeline step dependency
Purpose
Define a dependency to an existing step in the pipeline.
Synopsis
$ xvc pipeline step dependency --help
Add a dependency to a step
Usage: xvc pipeline step dependency [OPTIONS] --step-name <STEP_NAME>
Options:
-s, --step-name <STEP_NAME>
Name of the step to add the dependency to
--generic <GENERICS>
Add a generic command output as a dependency. Can be used multiple times. Please delimit the command with ' ' to avoid shell expansion
--url <URLS>
Add a URL dependency to the step. Can be used multiple times
--file <FILES>
Add a file dependency to the step. Can be used multiple times
--step <STEPS>
Add a step dependency to a step. Can be used multiple times. Steps are referred with their names
--glob_items <GLOB_ITEMS>
Add a glob items dependency to the step.
You can depend on multiple files and directories with this dependency.
The difference between this and the glob option is that this option keeps track of all matching files, but glob only keeps track of the matched files' digest. When you want to use ${XVC_GLOB_ITEMS}, ${XVC_ADDED_GLOB_ITEMS}, or ${XVC_REMOVED_GLOB_ITEMS} environment variables in the step command, use the glob-items dependency. Otherwise, you can use the glob option to save disk space.
--glob <GLOBS>
Add a glob dependency to the step. Can be used multiple times.
You can depend on multiple files and directories with this dependency.
The difference between this and the glob-items option is that the glob-items option keeps track of all matching files individually, but this option only keeps track of the matched files' digest. This dependency uses considerably less disk space.
--param <PARAMS>
Add a parameter dependency to the step in the form filename.yaml::model.units
The file can be a JSON, TOML, or YAML file. You can specify hierarchical keys like my.dict.key
--regex_items <REGEX_ITEMS>
Add a regex dependency in the form filename.txt:/^regex/ . Can be used multiple times.
The difference between this and the regex option is that the regex-items option keeps track of all matching lines, but regex only keeps track of the matched lines' digest. When you want to use ${XVC_REGEX_ITEMS}, ${XVC_ADDED_REGEX_ITEMS}, ${XVC_REMOVED_REGEX_ITEMS} environment variables in the step command, use the regex option. Otherwise, you can use the regex-digest option to save disk space.
--regex <REGEXES>
Add a regex dependency in the form filename.txt:/^regex/ . Can be used multiple times.
The difference between this and the regex option is that the regex option keeps track of all matching lines that can be used in the step command. This option only keeps track of the matched lines' digest.
--line_items <LINE_ITEMS>
Add a line dependency in the form filename.txt::123-234
The difference between this and the lines option is that the line-items option keeps track of all matching lines that can be used in the step command. This option only keeps track of the matched lines' digest. When you want to use ${XVC_ALL_LINE_ITEMS}, ${XVC_ADDED_LINE_ITEMS}, ${XVC_CHANGED_LINE_ITEMS} options in the step command, use the line option. Otherwise, you can use the lines option to save disk space.
--lines <LINES>
Add a line digest dependency in the form filename.txt::123-234
The difference between this and the line-items dependency is that the line option keeps track of all matching lines that can be used in the step command. This option only keeps track of the matched lines' digest. If you don't need individual lines to be kept, use this option to save space.
--sqlite-query <SQLITE_FILE> <SQLITE_QUERY>
Add a sqlite query dependency to the step with the file and the query. Can be used once.
The step is invalidated when the query run and the result is different from previous runs, e.g. when an aggregate changed or a new row added to a table.
-h, --help
Print help (see a summary with '-h')
File Dependencies
This command works only in Xvc repositories.
$ git init
...
$ xvc init
Begin by adding a new step.
$ xvc pipeline step new --step-name file-dependency --command "echo data.txt has changed"
Add a file dependency to the step.
$ xvc pipeline step dependency --step-name file-dependency --file data.txt
When you run the command, it will print data.txt has changed
if the file data.txt
has changed.
$ xvc pipeline run
[OUT] [file-dependency] data.txt has changed
[DONE] [file-dependency] (echo data.txt has changed)
You can add multiple dependencies to a step with multiple invocations.
$ xvc pipeline step dependency --step-name file-dependency --file data2.txt
A step will run if any of its dependencies have changed.
$ xvc pipeline run
[OUT] [file-dependency] data.txt has changed
[DONE] [file-dependency] (echo data.txt has changed)
By default, they are not run if none of the dependencies have changed.
$ xvc pipeline run
However, if you want to run the step even if none of the dependencies have changed, you can set the --when
option to always
.
$ xvc pipeline step update --step-name file-dependency --when always
Now the step will run even if none of the dependencies have changed.
$ xvc pipeline run
[OUT] [file-dependency] data.txt has changed
[DONE] [file-dependency] (echo data.txt has changed)
Glob Dependencies
A step can depend on multiple files specified with globs. The difference with this and glob-items dependency is that this one doesn't track the files, and doesn't pass the list of files in environment variables to the command.
This command works only in Xvc repositories.
$ git init
...
$ xvc init
Let's create a set of files:
$ xvc-test-helper create-directory-tree --directories 2 --files 3 --seed 2023
$ tree
.
├── dir-0001
│ ├── file-0001.bin
│ ├── file-0002.bin
│ └── file-0003.bin
└── dir-0002
├── file-0001.bin
├── file-0002.bin
└── file-0003.bin
3 directories, 6 files
Add a step to say files has changed when the files have changed.
$ xvc pipeline step new --step-name files-changed --command "echo 'Files have changed.'"
$ xvc pipeline step dependency --step-name files-changed --glob 'dir-*/*'
The step is invalidated when a file described by the glob is added, removed or changed.
$ xvc pipeline run
[OUT] [files-changed] Files have changed.
[DONE] [files-changed] (echo 'Files have changed.')
$ xvc pipeline run
When a file is removed from the files described by the glob, the step is invalidated.
$ rm dir-0001/file-0001.bin
$ xvc pipeline run
[OUT] [files-changed] Files have changed.
[DONE] [files-changed] (echo 'Files have changed.')
Regex Dependencies
You can specify a regular expression matched against the lines from a file as a dependency. The step is invalidated when the matched results changed.
This command works only in Xvc repositories.
$ git init
...
$ xvc init
We'll use a sample CSV file in this example:
$ cat people.csv
"Name", "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex", "M", 41, 74, 170
"Bert", "M", 42, 68, 166
"Carl", "M", 32, 70, 155
"Dave", "M", 39, 72, 167
"Elly", "F", 30, 66, 124
"Fran", "F", 33, 66, 115
"Gwen", "F", 26, 64, 121
"Hank", "M", 30, 71, 158
"Ivan", "M", 53, 72, 175
"Jake", "M", 32, 69, 143
"Kate", "F", 47, 69, 139
"Luke", "M", 34, 72, 163
"Myra", "F", 23, 62, 98
"Neil", "M", 36, 75, 160
"Omar", "M", 38, 70, 145
"Page", "F", 31, 67, 135
"Quin", "M", 29, 71, 176
"Ruth", "F", 28, 65, 131
Now, let's add a step to the pipeline to count females in the file:
$ xvc pipeline step new --step-name count-females --command "grep -c '\"F\",' people.csv"
These commands are run when the regex dependencies change.
$ xvc pipeline step dependency --step-name count-females --regex 'people.csv:/^.*"F",.*$'
When you run the pipeline initially, the steps are run.
$ xvc pipeline run
[OUT] [count-females] 7
[DONE] [count-females] (grep -c '"F",' people.csv)
When you run the pipeline again, the step is not run because the regex result didn't change.
$ xvc pipeline run
When you add a new female record to the file, the step is run and the command prints the new count.
$ zsh -c "echo '\"Asude\", \"F\", 12, 55, 110' >> people.csv"
$ cat people.csv
"Name", "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex", "M", 41, 74, 170
"Bert", "M", 42, 68, 166
"Carl", "M", 32, 70, 155
"Dave", "M", 39, 72, 167
"Elly", "F", 30, 66, 124
"Fran", "F", 33, 66, 115
"Gwen", "F", 26, 64, 121
"Hank", "M", 30, 71, 158
"Ivan", "M", 53, 72, 175
"Jake", "M", 32, 69, 143
"Kate", "F", 47, 69, 139
"Luke", "M", 34, 72, 163
"Myra", "F", 23, 62, 98
"Neil", "M", 36, 75, 160
"Omar", "M", 38, 70, 145
"Page", "F", 31, 67, 135
"Quin", "M", 29, 71, 176
"Ruth", "F", 28, 65, 131
"Asude", "F", 12, 55, 110
$ xvc pipeline run
[OUT] [count-females] 8
[DONE] [count-females] (grep -c '"F",' people.csv)
Line Dependencies
You can make your steps to depend on lines of text files. The lines are defined by starting and ending indices.
When the text in those lines change, the step is invalidated.
This command works only in Xvc repositories.
$ git init
...
$ xvc init
We'll use a sample CSV file in this example:
$ cat people.csv
"Name", "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex", "M", 41, 74, 170
"Bert", "M", 42, 68, 166
"Carl", "M", 32, 70, 155
"Dave", "M", 39, 72, 167
"Elly", "F", 30, 66, 124
"Fran", "F", 33, 66, 115
"Gwen", "F", 26, 64, 121
"Hank", "M", 30, 71, 158
"Ivan", "M", 53, 72, 175
"Jake", "M", 32, 69, 143
"Kate", "F", 47, 69, 139
"Luke", "M", 34, 72, 163
"Myra", "F", 23, 62, 98
"Neil", "M", 36, 75, 160
"Omar", "M", 38, 70, 145
"Page", "F", 31, 67, 135
"Quin", "M", 29, 71, 176
"Ruth", "F", 28, 65, 131
Let's a step to show the first 10 lines of the file:
$ xvc pipeline step new --step-name print-top-10 --command "head people.csv"
The command is run only when those lines change.
$ xvc pipeline step dependency --step-name print-top-10 --lines 'people.csv::1-10'
When you run the pipeline initially, the step is run.
$ xvc pipeline run
[OUT] [print-top-10] "Name", "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex", "M", 41, 74, 170
"Bert", "M", 42, 68, 166
"Carl", "M", 32, 70, 155
"Dave", "M", 39, 72, 167
"Elly", "F", 30, 66, 124
"Fran", "F", 33, 66, 115
"Gwen", "F", 26, 64, 121
"Hank", "M", 30, 71, 158
"Ivan", "M", 53, 72, 175
[DONE] [print-top-10] (head people.csv)
When you run the pipeline again, the step is not run because the specified lines didn't change.
$ xvc pipeline run
When you change a line from the file, the step is invalidated.
$ perl -i -pe 's/Hank/Ferzan/g' people.csv
Now, when you run the pipeline, it will print the first 10 lines again.
$ xvc pipeline run
[OUT] [print-top-10] "Name", "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex", "M", 41, 74, 170
"Bert", "M", 42, 68, 166
"Carl", "M", 32, 70, 155
"Dave", "M", 39, 72, 167
"Elly", "F", 30, 66, 124
"Fran", "F", 33, 66, 115
"Gwen", "F", 26, 64, 121
"Ferzan", "M", 30, 71, 158
"Ivan", "M", 53, 72, 175
[DONE] [print-top-10] (head people.csv)
Glob Items Dependency
A step can depend on multiple files specified with globs. When any of the files change, or a new file is added or removed from the files specified by glob, the step is invalidated.
Unline glob dependency, glob items dependency keeps track of the individual files that belong to a glob. If your command run with the list of files from a glob and you want to track added and removed files, use this. Otherwise if your command for all the files in a glob and don't need to track which files have changed, use the glob dependency.
This one injects ${XVC_ADDED_GLOB_ITEMS}
, ${XVC_REMOVED_GLOB_ITEMS}
, ${XVC_CHANGED_GLOB_ITEMS}
and ${XVC_ALL_GLOB_ITEMS}
to the command
environment.
This command works only in Xvc repositories.
$ git init
...
$ xvc init
Let's create a set of files:
$ xvc-test-helper create-directory-tree --directories 2 --files 3 --seed 2023
$ tree
.
├── dir-0001
│ ├── file-0001.bin
│ ├── file-0002.bin
│ └── file-0003.bin
└── dir-0002
├── file-0001.bin
├── file-0002.bin
└── file-0003.bin
3 directories, 6 files
Add a step to list the added files.
$ xvc pipeline step new --step-name files-changed --command 'echo "### Added Files:\n${XVC_ADDED_GLOB_ITEMS}\n### Removed Files:\n${XVC_REMOVED_GLOB_ITEMS}\n### Changed Files:\n${XVC_CHANGED_GLOB_ITEMS}"'
$ xvc pipeline step dependency --step-name files-changed --glob-items 'dir-*/*'
The step is invalidated when a file described by the glob is added, removed or changed.
$ xvc pipeline run
[OUT] [files-changed] ### Added Files:
dir-0001/file-0001.bin
dir-0001/file-0002.bin
dir-0001/file-0003.bin
dir-0002/file-0001.bin
dir-0002/file-0002.bin
dir-0002/file-0003.bin
### Removed Files:
### Changed Files:
[DONE] [files-changed] (echo "### Added Files:/n${XVC_ADDED_GLOB_ITEMS}/n### Removed Files:/n${XVC_REMOVED_GLOB_ITEMS}/n### Changed Files:/n${XVC_CHANGED_GLOB_ITEMS}")
$ xvc pipeline run
If you add or remove a file from the files specified by the glob, they are printed.
$ rm dir-0001/file-0001.bin
$ xvc pipeline run
[OUT] [files-changed] ### Added Files:
### Removed Files:
dir-0001/file-0001.bin
### Changed Files:
[DONE] [files-changed] (echo "### Added Files:/n${XVC_ADDED_GLOB_ITEMS}/n### Removed Files:/n${XVC_REMOVED_GLOB_ITEMS}/n### Changed Files:/n${XVC_CHANGED_GLOB_ITEMS}")
When you change a file, it's printed in both added and removed files:
$ xvc-test-helper generate-filled-file dir-0001/file-0002.bin
$ xvc pipeline run
[OUT] [files-changed] ### Added Files:
### Removed Files:
### Changed Files:
dir-0001/file-0002.bin
[DONE] [files-changed] (echo "### Added Files:/n${XVC_ADDED_GLOB_ITEMS}/n### Removed Files:/n${XVC_REMOVED_GLOB_ITEMS}/n### Changed Files:/n${XVC_CHANGED_GLOB_ITEMS}")
Regex Item Dependencies
You can specify a regular expression matched against the lines from a file as a dependency. The step is invalidated when the matched results changed.
Unlike regex dependencies, regex item dependencies keep track of the matched items. You can access them with
${XVC_ALL_REGEX_ITEMS}
, ${XVC_ADDED_REGEX_ITEMS}
, and ${XVC_REMOVED_REGEX_ITEMS}
environment variables.
This command works only in Xvc repositories.
$ git init
...
$ xvc init
We'll use a sample CSV file in this example:
$ cat people.csv
"Name", "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex", "M", 41, 74, 170
"Bert", "M", 42, 68, 166
"Carl", "M", 32, 70, 155
"Dave", "M", 39, 72, 167
"Elly", "F", 30, 66, 124
"Fran", "F", 33, 66, 115
"Gwen", "F", 26, 64, 121
"Hank", "M", 30, 71, 158
"Ivan", "M", 53, 72, 175
"Jake", "M", 32, 69, 143
"Kate", "F", 47, 69, 139
"Luke", "M", 34, 72, 163
"Myra", "F", 23, 62, 98
"Neil", "M", 36, 75, 160
"Omar", "M", 38, 70, 145
"Page", "F", 31, 67, 135
"Quin", "M", 29, 71, 176
"Ruth", "F", 28, 65, 131
Now, let's add steps to the pipeline to count males and females in the file:
$ xvc pipeline step new --step-name new-males --command 'echo "New Males:\n ${XVC_ADDED_REGEX_ITEMS}"'
$ xvc pipeline step new --step-name new-females --command 'echo "New Females:\n ${XVC_ADDED_REGEX_ITEMS}"'
$ xvc pipeline step dependency --step-name new-females --step new-males
We also added a step dependency to let the steps run always in the same order.
These commands are run when the following regexes change.
$ xvc pipeline step dependency --step-name new-males --regex-items 'people.csv:/^.*"M",.*$'
$ xvc pipeline step dependency --step-name new-females --regex-items 'people.csv:/^.*"F",.*$'
When you run the pipeline initially, the steps are run.
$ xvc pipeline run
[OUT] [new-males] New Males:
"Alex", "M", 41, 74, 170
"Bert", "M", 42, 68, 166
"Carl", "M", 32, 70, 155
"Dave", "M", 39, 72, 167
"Hank", "M", 30, 71, 158
"Ivan", "M", 53, 72, 175
"Jake", "M", 32, 69, 143
"Luke", "M", 34, 72, 163
"Neil", "M", 36, 75, 160
"Omar", "M", 38, 70, 145
"Quin", "M", 29, 71, 176
[DONE] [new-males] (echo "New Males:/n ${XVC_ADDED_REGEX_ITEMS}")
[OUT] [new-females] New Females:
"Elly", "F", 30, 66, 124
"Fran", "F", 33, 66, 115
"Gwen", "F", 26, 64, 121
"Kate", "F", 47, 69, 139
"Myra", "F", 23, 62, 98
"Page", "F", 31, 67, 135
"Ruth", "F", 28, 65, 131
[DONE] [new-females] (echo "New Females:/n ${XVC_ADDED_REGEX_ITEMS}")
When you run the pipeline again, the steps are not run because the regexes didn't change.
$ xvc pipeline run
When you add a new female record to the file, only the female count step is run.
$ zsh -c "echo '\"Asude\", \"F\", 12, 55, 110' >> people.csv"
$ cat people.csv
"Name", "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex", "M", 41, 74, 170
"Bert", "M", 42, 68, 166
"Carl", "M", 32, 70, 155
"Dave", "M", 39, 72, 167
"Elly", "F", 30, 66, 124
"Fran", "F", 33, 66, 115
"Gwen", "F", 26, 64, 121
"Hank", "M", 30, 71, 158
"Ivan", "M", 53, 72, 175
"Jake", "M", 32, 69, 143
"Kate", "F", 47, 69, 139
"Luke", "M", 34, 72, 163
"Myra", "F", 23, 62, 98
"Neil", "M", 36, 75, 160
"Omar", "M", 38, 70, 145
"Page", "F", 31, 67, 135
"Quin", "M", 29, 71, 176
"Ruth", "F", 28, 65, 131
"Asude", "F", 12, 55, 110
$ xvc pipeline run
[OUT] [new-females] New Females:
"Asude", "F", 12, 55, 110
[DONE] [new-females] (echo "New Females:/n ${XVC_ADDED_REGEX_ITEMS}")
Line Item Dependencies
You can make your steps to depend on lines of text files. The lines are defined by starting and ending indices.
When the text in those lines change, the step is invalidated.
Unlike line dependencies, this dependency type keeps track of the lines in the
file. You can use ${XVC_ALL_LINE_ITEMS}
, ${XVC_ADDED_LINE_ITEMS}
, and
${XVC_REMOVED_LINE_ITEMS}
environment variables in the command. Please be
aware that for large set of lines, this dependency can take up considerable
space to keep track of all lines and if you don't need to keep track of changed
lines, you can use --lines
dependency.
This command works only in Xvc repositories.
$ git init
...
$ xvc init
We'll use a sample CSV file in this example:
$ cat people.csv
"Name", "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex", "M", 41, 74, 170
"Bert", "M", 42, 68, 166
"Carl", "M", 32, 70, 155
"Dave", "M", 39, 72, 167
"Elly", "F", 30, 66, 124
"Fran", "F", 33, 66, 115
"Gwen", "F", 26, 64, 121
"Hank", "M", 30, 71, 158
"Ivan", "M", 53, 72, 175
"Jake", "M", 32, 69, 143
"Kate", "F", 47, 69, 139
"Luke", "M", 34, 72, 163
"Myra", "F", 23, 62, 98
"Neil", "M", 36, 75, 160
"Omar", "M", 38, 70, 145
"Page", "F", 31, 67, 135
"Quin", "M", 29, 71, 176
"Ruth", "F", 28, 65, 131
Let's a step to show the first 10 lines of the file:
$ xvc pipeline step new --step-name print-top-10 --command 'echo "Added Lines:\n ${XVC_ADDED_LINE_ITEMS}\nRemoved Lines:\n${XVC_REMOVED_LINE_ITEMS}"'
The command is run only when those lines change.
$ xvc pipeline step dependency --step-name print-top-10 --line-items 'people.csv::1-10'
When you run the pipeline initially, the step is run.
$ xvc pipeline run
[OUT] [print-top-10] Added Lines:
"Alex", "M", 41, 74, 170
"Bert", "M", 42, 68, 166
"Carl", "M", 32, 70, 155
"Dave", "M", 39, 72, 167
"Elly", "F", 30, 66, 124
"Fran", "F", 33, 66, 115
"Gwen", "F", 26, 64, 121
"Hank", "M", 30, 71, 158
"Ivan", "M", 53, 72, 175
Removed Lines:
[DONE] [print-top-10] (echo "Added Lines:/n ${XVC_ADDED_LINE_ITEMS}/nRemoved Lines:/n${XVC_REMOVED_LINE_ITEMS}")
When you run the pipeline again, the step is not run because the specified lines didn't change.
$ xvc pipeline run
When you change a line from the file, the step is invalidated.
$ perl -i -pe 's/Hank/Ferzan/g' people.csv
Now, when you run the pipeline, it will print the changed line, with its new and old versions.
$ xvc pipeline run
[OUT] [print-top-10] Added Lines:
"Ferzan", "M", 30, 71, 158
Removed Lines:
"Hank", "M", 30, 71, 158
[DONE] [print-top-10] (echo "Added Lines:/n ${XVC_ADDED_LINE_ITEMS}/nRemoved Lines:/n${XVC_REMOVED_LINE_ITEMS}")
SQLite Query Dependency
You can create a step dependency with an SQLite query. When the query results change, the step is invalidated.
SQLite dependencies doesn't track the results of the query. It just checks whether the query results has changed.
This command works only in Xvc repositories.
$ git init
...
$ xvc init
Suppose we have an SQLite database people.db
with the following schema and data:
CREATE TABLE People (
Name TEXT,
Sex TEXT,
Age INTEGER,
Height_in INTEGER,
Weight_lbs INTEGER
);
INSERT INTO People (Name, Sex, Age, Height_in, Weight_lbs) VALUES
('Alex', 'M', 41, 74, 170),
('Bert', 'M', 42, 68, 166),
('Carl', 'M', 32, 70, 155),
('Dave', 'M', 39, 72, 167),
('Elly', 'F', 30, 66, 124),
('Fran', 'F', 33, 66, 115),
('Gwen', 'F', 26, 64, 121),
('Hank', 'M', 30, 71, 158),
('Ivan', 'M', 53, 72, 175),
('Jake', 'M', 32, 69, 143),
('Kate', 'F', 47, 69, 139),
('Luke', 'M', 34, 72, 163),
('Myra', 'F', 23, 62, 98),
('Neil', 'M', 36, 75, 160),
('Omar', 'M', 38, 70, 145),
('Page', 'F', 31, 67, 135),
('Quin', 'M', 29, 71, 176),
('Ruth', 'F', 28, 65, 131);
EOF
Now, we'll add a step to the pipeline to calculate the average age of these people.
$ xvc pipeline step new --step-name average-age --command "sqlite3 people.db 'SELECT AVG(Age) FROM People;'"
Let's run the step without a dependency first.
$ xvc pipeline run
[OUT] [average-age] 34.6666666666667
[DONE] [average-age] (sqlite3 people.db 'SELECT AVG(Age) FROM People;')
Now, we'll add a dependency to this step and it will only run the step when the results of that query changes.
$ xvc pipeline step dependency --step-name average-age --sqlite-query people.db 'SELECT count(*) FROM People;'
The dependency query is run everytime the pipeline runs. It's expected to be lightweight to avoid performance issues.
So, when the number of people in the table changes, the step will run. Initially it doesn't keep track of the query results, so it will run again.
$ xvc pipeline run
[OUT] [average-age] 34.6666666666667
[DONE] [average-age] (sqlite3 people.db 'SELECT AVG(Age) FROM People;')
But it won't run the step a second time, as the table didn't change.
$ xvc pipeline run
Let's add another row to the table:
$ sqlite3 people.db "INSERT INTO People (Name, Sex, Age, Height_in, Weight_lbs) VALUES ('Asude', 'F', 10, 74, 170);"
This time, the step will run again as the result from dependency query (SELECT count(*) FROM People
) changed.
$ xvc pipeline run
[OUT] [average-age] 33.3684210526316
[DONE] [average-age] (sqlite3 people.db 'SELECT AVG(Age) FROM People;')
Xvc opens the database in read-only mode to avoid locking.
(Hyper-)Parameter Dependencies
You may be keeping pipeline-wide parameters in structured text files. You can specify such parameters found in JSON, TOML and YAML files as dependencies.
This command works only in Xvc repositories.
$ git init
...
$ xvc init
Suppose we have a YAML file that we specify various parameters for the whole connection.
param: value
database:
server: example.com
port: 5432
connection:
timeout: 5000
numeric_param: 13
Now, we create two steps to read different variables from the file and a dependency between them to force them to run in the same order always.
$ xvc pipeline step new --step-name read-database-config --command 'echo "Updated Database Configuration"'
$ xvc pipeline step new --step-name read-hyperparams --command 'echo "Update Hyperparameters"'
$ xvc pipeline step dependency --step-name read-database-config --step read-hyperparams
Let's create different steps for various pieces of this parameters file:
$ xvc pipeline step dependency --step-name read-database-config --param 'myparams.yaml::database.port' --param 'myparams.yaml::database.server' --param 'myparams.yaml::database.connection'
$ xvc pipeline step dependency --step-name read-hyperparams --param 'myparams.yaml::param' --param 'myparams.yaml::numeric_param'
Run for the first time, as initially all dependencies are invalid:
$ xvc pipeline run
[OUT] [read-hyperparams] Update Hyperparameters
[DONE] [read-hyperparams] (echo "Update Hyperparameters")
[OUT] [read-database-config] Updated Database Configuration
[DONE] [read-database-config] (echo "Updated Database Configuration")
For the second time, it won't read the configuration as nothing is changed:
$ xvc pipeline run
When you update a value in this file, it will only invalidate the steps that depend on the value, not other dependencies that rely on the same file.
Let's update the database port:
$ perl -pi -e 's/5432/9876/g' myparams.yaml
$ xvc pipeline run
[OUT] [read-database-config] Updated Database Configuration
[DONE] [read-database-config] (echo "Updated Database Configuration")
Note that, read-hyperparams
is not invalidated, though the values are in the same file.
Step Dependencies
This command works only in Xvc repositories.
$ git init
...
$ xvc init
You can add a step dependency to a step. These steps specify dependency relationships explicitly, without relying on changed files or directories.
$ xvc pipeline step new --step-name world --command "echo world"
$ xvc pipeline step new --step-name hello --command "echo hello"
$ xvc pipeline step dependency --step-name world --step hello
When run, the dependency will be run first and the step will be run after.
$ xvc pipeline run
[OUT] [hello] hello
[DONE] [hello] (echo hello)
[OUT] [world] world
[DONE] [world] (echo world)
If the dependency is not run, the dependent step won't run either.
$ xvc pipeline step update --step-name hello --when never
$ xvc pipeline run
If you want to run the dependent always, you can set it to run always explicitly.
$ xvc pipeline step update --step-name world --when always
$ xvc pipeline run
[OUT] [world] world
[DONE] [world] (echo world)
URL Dependencies
This command works only in Xvc repositories.
$ git init
...
$ xvc init
You can use a web URL as a dependency to a step. When the URL is fetched, the output hash is saved to compare and the step is invalidated when the output of the URL is changed.
You can use this with any URL.
$ xvc pipeline step new --step-name xvc-docs-update --command "echo 'Xvc docs updated!'"
$ xvc pipeline step dependency --step-name xvc-docs-update --url https://docs.xvc.dev/
The step is invalidated when the page is updated.
$ xvc pipeline run
[OUT] [xvc-docs-update] Xvc docs updated!
[DONE] [xvc-docs-update] (echo 'Xvc docs updated!')
The step won't run again until a new version of the page is published.
$ xvc pipeline run
Note that, Xvc doesn't download the page every time. It checks the Last-Modified
and Etag
headers and only downloads the page if it has changed.
If there are more complex requirements than just the URL changing, you can use a generic dependency to get the output of a command and use that as a dependency.
Generic Command Dependencies
This command works only in Xvc repositories.
$ git init
...
$ xvc init
You can use the output of a shell command as a dependency to a step. When the command is run, the output hash is saved to compare and the step is invalidated when the output of the command changed.
You can use this for any command that outputs a string.
$ xvc pipeline step new --step-name morning-message --command "echo 'Good Morning!'"
$ xvc pipeline step dependency --step-name morning-message --generic 'date +%F'
The step is invalidated when the date changes and the step is run again.
$ xvc pipeline run
[OUT] [morning-message] Good Morning!
[DONE] morning-message (echo 'Good Morning!')
The step won't run until tomorrow, when date +%F
changes.
$ xvc pipeline run
[OUT] [morning-message] Good Morning!
[DONE] [morning-message] (echo 'Good Morning!')
You can mimic all kinds of pipeline behavior with this generic dependency.
For example, if you want to run a command when directory contents change, you can depend on the output of ls -lR
:
$ xvc pipeline step new --step-name directory-contents --command "echo 'Files changed'"
$ xvc pipeline step dependency --step-name directory-contents --generic 'ls'
$ xvc pipeline run
[OUT] [directory-contents] Files changed
[DONE] [directory-contents] (echo 'Files changed')
When you add a file to the directory, the step is invalidated and run again:
$ xvc pipeline run
$ xvc-test-helper generate-random-file new-file.txt
$ xvc pipeline run
[OUT] [directory-contents] Files changed
[DONE] [directory-contents] (echo 'Files changed')
Caveats
Tips
Most shells support editing longer commands with an editor. For bash, you can use Ctrl+X Ctrl+E
.
Pipeline commands can get longer quickly. You can use xvc aliases for shorter
versions. Type source $(xvc aliases)
to load the aliases into your shell.