To view a more accessible version of this website press control m
Research & Scholarship

HathiTrust: What’s in it, and what’s in it for us?

September 30th, 2013 by Rachel Fox Von Swearingen

Librarians are constantly striving for the perfect collection with perfect access to provide to their users. Reaching this Utopian goal assumes both the complete availability of information (ideally, everything in existence), as well as ideal access, including indexing for full text, bibliographic information, controlled vocabulary, and even electronic availability.

Our quest is corrupted by the commercial nature of the electronic information environment. Print collections afford libraries peace of mind through real ownership, but we are met with the high costs of preserving the physical volumes, and we risk irrelevancy in the modern information world by not offering the improved indexing and digital content available in licensed resources, provided that the equivalent electronic access to our print collection is even available. Libraries are bound by what is for sale or rent, not what our users actually need or want. What is an information professional to do? At Syracuse University, one answer was to join HathiTrust.

HathiTrust is many things:

  • It is a partnership, built from over 80 (and growing) partner institutions and libraries from around the world
  • It is a digital preservation repository, providing long-term guaranteed preservation and access for public domain volumes as well as any titles owned by SU Libraries that are brittle or otherwise unusable, as covered under U.S. Copyright Section 108
  • It is a public digital library, providing full view access to the world for public domain content
  • It is an access platform, providing traditional library catalog search, full-text search, and other features to all users. Additionally, it facilitates access to print materials for users with print disabilities at SU
  • It is a research center, providing datasets of its content, a research portal, and a variety of data services for researchers.

We at Syracuse University are still getting to know the content of HathiTrust’s digital library. One of the concepts that seems the most difficult to explain is what is in HathiTrust? To unpack this, let’s discuss some common questions about content.

What formats are digitized in HathiTrust?
Collection content is formed from deposits made by partner institutions and the strategic direction from the collections committee. So, the content is equivalent to what you might find in an academic research library, including books, serials, journals, government documents, musical scores, manuscripts and more. Details on contributions by institution can be found in HathiTrust’s monthly updates.

How big is HathiTrust?
Currently, HathiTrust contains more than 10.8 million volumes. Statistics are updated daily on the HathiTrust website. For context, Syracuse University Libraries holds just over 3 million print volumes, and Harvard University holds about 17 million volumes.

This HathiTrust blog post from 2012 has an excellent timeline outlining contributions and growth in the digital library.

HathiTrust is just for public domain items, right?
If you’re only talking full text viewing and downloading (evoking a comparison with a e-book database), then yes. Works in the public domain or otherwise not protected by copyright are what is provided in full view (see HathiTrust’s rights determinations) .

But, if you’re talking full text indexed content and what is being preserved in a non-commercial environment, then it is certainly not a public domain only resource. Only 32% of HathiTrust’s content is in the public domain, and 100% is ripe for discovery. Check out HathiTrust’s visualizations for more statistics.


Isn’t the content the same as Google Books?
HathiTrust began with digital preservation copies that came from the Google Books project. However, it has developed into a separate entity with many additional access features and some overlapping content. Unique content is increasing daily through deposits from the Internet Archive as well as partner digitization projects. For a full comparison, see this great chart from Dartmouth comparing Google Books and HathiTrust.

How useful is HathiTrust for X discipline?
Go to HathiTrust whenever you would go to the library catalog, regardless of your research discipline. HathiTrust content is distributed over all Library of Congress classification areas just like the print collection in an academic research library. Bear in mind the commercial nature of information, HathiTrust does not contain licensed e-journal content from last week, nor proprietary industry or marketing data. But, neither does a library print collection.


Image caption: Screen capture of HathiTrust visualization by call number from

How is HathiTrust different from the library catalog?
HathiTrust does not replace the library catalog. But, it does improve the discovery of print items in SU Libraries’ collections (and leads to the discovery of other items) through providing full text indexing and searching for HathiTrust’s full library of 10.8 million volumes.

Currently, about half of the print volumes owned by SU Libraries are full text indexed in HathiTrust. So, search the library catalog to search the bibliographic data (author, title, publisher, etc) of SU Libraries’ collections, but also search HathiTrust to find more.

Syracuse University Libraries invites you to investigate for yourself what is in HathiTrust. For more information on logging in, search tips, viewing and downloading full text, and other features, please visit our HathiTrust research guide at

Tags: , , , ,

Leave a Reply