This function downloads the big hathifile catalog with simple metadata for the
over 17 million digitized volumes in the Hathi Trust digital library
collection. It can be used in conjunction with workset_builder and rsync
to select an appropriate sample of Hathi Trust Extracted Features files and
metadata for further analysis. Warning - it's a 1GB file; if the latest
version of the file (there's a new one every month) has been downloaded
already, the function will just return the file name and won't attempt to
download it again.
download_hathifile(
url = "https://www.hathitrust.org/hathifiles",
dir = getOption("hathiTools.hathifile.dir"),
full_catalog = TRUE
)
The URL for the Hathi Trust hathifiles https://www.hathitrust.org/hathifiles
The directory to use to save the downloaded hathifile. Defaults to
getOption("hathiTools.hathifile.dir")
, which on loading the package is
just ./raw-hathifiles
(a directory which will be created if it doesn't
exist already when you call the function).
Whether to download the full catalog (>17 million
records), or just the latest update (there's a new "update file" every day,
and a new version of the full catalog every month). Default is TRUE
-
download the full catalog.
The downloaded filename.