xvc file track

Purpose

xvc file track is used to register any kind of file to Xvc for tracking versions.

Synopsis

$ xvc file track --help Add file and directories to Xvc Usage: xvc file track [OPTIONS] [TARGETS]... Arguments: [TARGETS]... Files/directories to track Options: --recheck-method <RECHECK_METHOD> How to track the file contents in cache: One of copy, symlink, hardlink, reflink. Note: Reflink uses copy if the underlying file system doesn't support it. --no-commit Do not copy/link added files to the file cache --text-or-binary <TEXT_OR_BINARY> Calculate digests as text or binary file without checking contents, or by automatically. (Default: auto) --include-git-files Include git tracked files as well. (Default: false) Xvc doesn't track files that are already tracked by git by default. You can set files.track.include-git to true in the configuration file to change this behavior. --force Add targets even if they are already tracked --no-parallel Don't use parallelism -h, --help Print help (see a summary with '-h')

Examples

File tracking works only in Xvc repositories.

$ git init ... $ xvc init

Let's create a directory tree for these examples.

$ xvc-test-helper create-directory-tree --directories 4 --files 3 --seed 20231021 $ tree . ├── dir-0001 │   ├── file-0001.bin │   ├── file-0002.bin │   └── file-0003.bin ├── dir-0002 │   ├── file-0001.bin │   ├── file-0002.bin │   └── file-0003.bin ├── dir-0003 │   ├── file-0001.bin │   ├── file-0002.bin │   └── file-0003.bin └── dir-0004 ├── file-0001.bin ├── file-0002.bin └── file-0003.bin 5 directories, 12 files

By default, the command runs similar to git add and git commit.

You can track individual files.

$ xvc file track dir-0001/file-0001.bin

You can track directories with the same command.

$ xvc file track dir-0002/

You can specify more than one target in a single command.

$ xvc file track dir-0001/file-0002.bin dir-0001/file-0003.bin

Files tracked by Git

By default, Xvc doesn't track files tracked by Git. You need to specify --include-git-files options to track files tracked by git.

Warning

Xvc detects files tracked by git with the output of git ls-files. In default configuration Git encodes UTF-8 file names in octal format. As Xvc uses UTF-8 internally to keep track of paths, it cannot identify files are tracked by Git if they have non-ASCII characters.

Please set

git config core.quotepath off

in your Xvc repository to let Git list files in UTF-8.

Caching

When you track a file, Xvc moves the file to the cache directory under .xvc/ and connects the workspace file with the cached file. This connection is called rechecking and analogous to Git checkout. For example, the above commands create a directory tree under .xvc as follows:

$ tree .xvc/b3 .xvc/b3 ├── 493 │   └── eeb │   └── 6525ea5e94e1e760371108e4a525c696c773a774a4818e941fd6d1af79 │   └── 0.bin ├── ab3 │   └── 619 │   └── 814cae0456a5a291e4d5c8d339a8389630e476f9f9e8d3a09accc919f0 │   └── 0.bin └── e51 └── 7d6 └── b9a3617fdcd96bd128142a39f1eca26ed77a338d2b93ba4921a0116c70 └── 0.bin 10 directories, 3 files

There are different recheck (checkout) methods that Xvc connects the workspace file to the cache. The default method for this is copying the file to the workspace. This way a separate copy of the cache file is created in the workspace.

If you want to make this connection with symbolic links, you can specify it with --recheck-method option.

$ xvc file track --recheck-method symlink dir-0003/file-0001.bin $ ls -l dir-0003/file-0001.bin lrwxr-xr-x[..] dir-0003/file-0001.bin -> [CWD]/.xvc/b3/e51/7d6/b9a3617fdcd96bd128142a39f1eca26ed77a338d2b93ba4921a0116c70/0.bin

You can also use --hardlink and --reflink options. Please see xvc file recheck reference for details.

$ xvc file track --recheck-method hardlink dir-0003/file-0002.bin $ xvc file track --recheck-method reflink dir-0003/file-0003.bin $ ls -l dir-0003/ total 16 l[..] file-0001.bin -> [CWD]/.xvc/b3/e51/7d6/b9a3617fdcd96bd128142a39f1eca26ed77a338d2b93ba4921a0116c70/0.bin -[..] file-0002.bin -[..] file-0003.bin

Info

Note that, unlike DVC that specifies checkout/recheck option repository wide, Xvc lets you specify per file. You can recheck files data files as symbolic links (which are non-writable) and save space and make model files as copies of the cached original and commit (carry-in) every time they change.

When you track a file in Xvc, it's automatically commit (carry-in) to the cache directory. If you want to postpone this operation and don't need a cached copy for a file, you can use --no-commit option. You can later use xvc file carry-in command to move these files to the repository cache.

$ xvc file track --no-commit --recheck-method symlink dir-0004/ $ ls -l dir-0004/ total 24 -rw-r--r--[..] file-0001.bin -rw-r--r--[..] file-0002.bin -rw-r--r--[..] file-0003.bin $ xvc file list dir-0004/ FS [..] ab361981 ab361981 dir-0004/file-0003.bin FS [..] 493eeb65 493eeb65 dir-0004/file-0002.bin FS [..] e517d6b9 e517d6b9 dir-0004/file-0001.bin Total #: 3 Workspace Size: 6006 Cached Size: 6006

You can carry-in (commit) these files to the cache with xvc file carry-in command. Note that, as the files are deduplicated, we need to use --force in carry-in command. This behavior may change in the future.

$ xvc file carry-in --force dir-0004/ $ ls -l dir-0004/ total 0 lrwxr-xr-x[..] file-0001.bin -> [CWD]/.xvc/b3/e51/7d6/b9a3617fdcd96bd128142a39f1eca26ed77a338d2b93ba4921a0116c70/0.bin lrwxr-xr-x[..] file-0002.bin -> [CWD]/.xvc/b3/493/eeb/6525ea5e94e1e760371108e4a525c696c773a774a4818e941fd6d1af79/0.bin lrwxr-xr-x[..] file-0003.bin -> [CWD]/.xvc/b3/ab3/619/814cae0456a5a291e4d5c8d339a8389630e476f9f9e8d3a09accc919f0/0.bin

Xvc deduplicates files in the cache. If you track a file that is already in the cache, it won't be moved to the cache again. It will be copied, linked from the same copy.

$ tree .xvc/b3 .xvc/b3 ├── 493 │   └── eeb │   └── 6525ea5e94e1e760371108e4a525c696c773a774a4818e941fd6d1af79 │   └── 0.bin ├── ab3 │   └── 619 │   └── 814cae0456a5a291e4d5c8d339a8389630e476f9f9e8d3a09accc919f0 │   └── 0.bin └── e51 └── 7d6 └── b9a3617fdcd96bd128142a39f1eca26ed77a338d2b93ba4921a0116c70 └── 0.bin 10 directories, 3 files

Caveats

  • This command doesn't discriminate symbolic links or hardlinks. Links are followed and any broken links may cause errors.

  • Under the hood, Xvc tracks only the files, not directories. Directories are considered as path collections. It doesn't matter if you track a directory or files in it separately.

Technical Details

  • Detecting changes in files and directories employ different kinds of associated digests. If a file has different metadata digest, its content digest is calculated. If file's content digest has changed, the file is considered changed. A directory that contains different set of files, or files with changed content is considered changed.