Connecting to the Hathi Trust Bookworm

Functions for connecting to the Hathi Trust Bookworm at https://bookworm.htrc.illinois.edu/develop/ and downloading word frequency data.

query_bookworm()

Queries the Hathi Trust Bookworm Server at https://bookworm.htrc.illinois.edu/develop/

Building Worksets using the Workset Builder 2.0

Functions for connecting to the Hathi Trust Workset Builder 2.0 at https://solr2.htrc.illinois.edu/solr-ef/ and downloading Hathi Trust IDs matching specific criteria and their metadata.

get_workset_meta()

Get metadata for a set of Hathi Trust IDs

workset_builder()

Builds a Workset of Hathi Trust vol IDs by querying the Workset Builder 2.0

browse_htids()

Browse a set of Hathi Trust IDs interactively at the Hathi Trust digital library site

Syncing Hathi Trust Extracted Features files

Functions for syncing selected Extracted Features files to your local machine from the Hathi Trust rsync server and caching them in a fast-loading format.

htid_to_rsync()

Converts a list of htids to relative paths for rsync to download

rsync_from_hathi()

rsync Hathi Trust EFs from Hathi Trust

cache_htids()

Caches downloaded JSON Extracted Features files to another format

clear_cache()

Removes cached files for a set of Hathi Trust ids

find_cached_htids()

Finds cached Extracted Features files for a set of HT ids

read_cached_htids()

Read Cached HTIDs

Downloading and working with a big hathifile

Functions for downloading and working with one of the big ‘hathifiles’ (metadata for over 17 million records) at https://www.hathitrust.org/hathifiles.

download_hathifile()

Downloads the Hathi Trust big hathifile

load_raw_hathifile()

Loads the raw hathifile into memory

add_imputed_date()

Add imputed date

Working with a single Hathi Trust Extracted Features file

Functions for downloading and working with the Extracted Features file for a single Hathi Trust volume.

get_hathi_counts()

Reads the downloaded extracted features file for a given Hathi Trust id

get_hathi_meta()

Reads the volume-level metadata of a single downloaded Hathi Trust extracted features file

get_hathi_page_meta()

Reads the page-level metadata of a single Hathi Trust Extracted Features file

Datasets

fiction

Fiction Dataset

drama

Drama Dataset

poetry

Poetry Dataset

iso639

ISO639 language codes