Removes cached files for a set of Hathi Trust ids

clear_cache(
  htids,
  dir = getOption("hathiTools.ef.dir"),
  cache_type = c("ef", "meta", "pagemeta"),
  cache_format = c("csv.gz", "rds", "feather", "text2vec.csv", "parquet"),
  keep_json = TRUE
)

Arguments

htids

A character vector of Hathi Trust ids, a workset created with workset_builder, or a data frame with a column named "htid" containing the Hathi Trust ids that require caching. If the JSON Extracted Features files for these htids have not been downloaded via rsync_from_hathi or get_hathi_counts to dir, nothing will be cached (unless attempt_rsync is TRUE).

dir

The directory where the download extracted features files are to be found. Defaults to getOption("hathiTools.ef.dir"), which is just "hathi-ef" on load.

cache_type

Type of information to remove. The default is c("ef", "meta", "pagemeta"), which refers to the extracted features, the volume metadata, and the page metadata in dir. Omitting one of these removes only them (e.g., cache_type = "ef" removes only the EF files, not their associated metadata or page metadata).

cache_format

The format of the cached EF files to remove. Defaults to c("csv.gz", "rds", "feather", "text2vec.csv", "parquet"), i.e., all formats.

keep_json

Whether to keep any downloaded JSON files. Default is TRUE; if FALSE will delete all JSON extracted features associated with the set of htids.

Value

(Invisible) a character vector with the deleted paths.

Note

Warning! This function does not double-check that you want to delete your cache. It will go ahead and do it.

Examples

# \donttest{
dir <- tempdir()

htids <- c("mdp.39015008706338", "mdp.39015058109706")
dir <- tempdir()

cache_htids(htids, dir = dir, cache_type = "ef", attempt_rsync = TRUE)
#> 2 HTIDs have already been cached to csv.gz format.
#> All existing JSON files already cached to required formats.
#> # A tibble: 2 × 5
#>   htid               local_loc                            cache…¹ cache…² exists
#>   <chr>              <glue>                               <chr>   <chr>   <lgl> 
#> 1 mdp.39015008706338 /tmp/RtmpdUz0R0/mdp/31003/mdp.39015… csv.gz  ef      TRUE  
#> 2 mdp.39015058109706 /tmp/RtmpdUz0R0/mdp/31500/mdp.39015… csv.gz  ef      TRUE  
#> # … with abbreviated variable names ¹​cache_format, ²​cache_type

# Clears only "csv" cache

deleted <- clear_cache(htids, dir = dir)
#> Now deleting 2 cached files in /tmp/RtmpdUz0R0 (../..) 
deleted
#> /tmp/RtmpdUz0R0/mdp/31003/mdp.39015008706338.csv.gz
#> /tmp/RtmpdUz0R0/mdp/31500/mdp.39015058109706.csv.gz

# Clears also JSON files

deleted <- clear_cache(htids, dir = dir, keep_json = FALSE)
#> Now deleting 2 cached files in /tmp/RtmpdUz0R0 (../..) 
deleted
#> /tmp/RtmpdUz0R0/mdp/31003/mdp.39015008706338.json.bz2
#> /tmp/RtmpdUz0R0/mdp/31500/mdp.39015058109706.json.bz2

# }