Removes cached files for a set of Hathi Trust ids
A character vector of Hathi Trust ids, a workset created with
workset_builder, or a data frame with a column named "htid" containing
the Hathi Trust ids that require caching. If the JSON Extracted Features
files for these htids have not been downloaded via rsync_from_hathi or
get_hathi_counts to dir
, nothing will be cached (unless attempt_rsync
is TRUE
).
The directory where the download extracted features files are to
be found. Defaults to getOption("hathiTools.ef.dir")
, which is just
"hathi-ef" on load.
Type of information to remove. The default is c("ef",
"meta", "pagemeta"), which refers to the extracted features, the volume
metadata, and the page metadata in dir
. Omitting one of these removes
only them (e.g., cache_type = "ef" removes only the EF files, not their
associated metadata or page metadata).
The format of the cached EF files to remove. Defaults to c("csv.gz", "rds", "feather", "text2vec.csv", "parquet"), i.e., all formats.
Whether to keep any downloaded JSON files. Default is
TRUE
; if FALSE
will delete all JSON extracted features associated with
the set of htids.
(Invisible) a character vector with the deleted paths.
Warning! This function does not double-check that you want to delete your cache. It will go ahead and do it.
# \donttest{
dir <- tempdir()
htids <- c("mdp.39015008706338", "mdp.39015058109706")
dir <- tempdir()
cache_htids(htids, dir = dir, cache_type = "ef", attempt_rsync = TRUE)
#> 2 HTIDs have already been cached to csv.gz format.
#> All existing JSON files already cached to required formats.
#> # A tibble: 2 × 5
#> htid local_loc cache…¹ cache…² exists
#> <chr> <glue> <chr> <chr> <lgl>
#> 1 mdp.39015008706338 /tmp/RtmpdUz0R0/mdp/31003/mdp.39015… csv.gz ef TRUE
#> 2 mdp.39015058109706 /tmp/RtmpdUz0R0/mdp/31500/mdp.39015… csv.gz ef TRUE
#> # … with abbreviated variable names ¹cache_format, ²cache_type
# Clears only "csv" cache
deleted <- clear_cache(htids, dir = dir)
#> Now deleting 2 cached files in /tmp/RtmpdUz0R0 (../..)
deleted
#> /tmp/RtmpdUz0R0/mdp/31003/mdp.39015008706338.csv.gz
#> /tmp/RtmpdUz0R0/mdp/31500/mdp.39015058109706.csv.gz
# Clears also JSON files
deleted <- clear_cache(htids, dir = dir, keep_json = FALSE)
#> Now deleting 2 cached files in /tmp/RtmpdUz0R0 (../..)
deleted
#> /tmp/RtmpdUz0R0/mdp/31003/mdp.39015008706338.json.bz2
#> /tmp/RtmpdUz0R0/mdp/31500/mdp.39015058109706.json.bz2
# }