Converts a list of htids to relative paths for rsync to download

htid_to_rsync(htids, file)

Arguments

htids

A character vector of HathiTrust ids (htids), a workset generated by workset_builder, or a data frame with a column named 'htid' and containing the htids.

file

A text file to save the resulting list of relative stubbytree paths to use in the command rsync -av --files-from FILE.txt data.analytics.hathitrust.org::features-2020.03/ hathi-ef/

Value

The list of relative paths saved to the file (invisibly).

Details

If you have a lot of files to download, generating the list of relative stubbytree paths and using rsync is much faster than using get_hathi_counts over a list of htids. But rsync only downloads json files, so calling get_hathi_counts on a downloaded json file will be slower the first time as the function will cache the json file to csv or another format. It is best to run cache_htids after using rsync to reduce this performance penalty.

Examples

htid_to_rsync(c("nc01.ark:/13960/t2v41mn4r", "mdp.39015001796443"), tempfile())
#> Use rsync -av --files-from /tmp/RtmpdUz0R0/file32831953f9cc data.analytics.hathitrust.org::features-2020.03/ hathi-ef/ to download EF files to hathi-ef directory