How Archival Book Scans Are Fueling New Waves of Literary Remixing
In October 2004, Google announced the Google Print program — subsequently renamed Google Books — with the stated ambition of scanning and making searchable every book ever published. The technical achievement was real: by 2015, when the US Supreme Court declined to hear the Authors Guild's final appeal, Google had scanned approximately 25 million volumes from the collections of the University of Michigan, Harvard, Stanford, the New York Public Library, and the Bodleian at Oxford. The legal status of those scans remained disputed for over a decade. The cultural consequence — the sudden availability of an unprecedented body of textual material that could be searched, excerpted, and recombined — has been reshaping literary culture in ways that copyright law and traditional publishing economics are still struggling to process.
The Archives
Project Gutenberg, founded by Michael Hart in 1971 when he typed the Declaration of Independence into a mainframe computer at the University of Illinois, is the oldest digital library and the foundational institution of the public domain text ecosystem. Its catalog now exceeds 70,000 items, all in the public domain and freely downloadable in multiple formats. The archive's primary value is not recreational — these are texts whose cultural significance is established, and they are reproduced with minimal editorial apparatus. But they are also, from the perspective of writers interested in the public domain as a creative resource, a library of raw material of extraordinary historical range.
The Internet Archive's Open Library operates on a different model: it holds digital scans of physical books, including copyrighted works, and lends them one copy at a time on the same principle as a physical library. The legal theory — Controlled Digital Lending — has been challenged in court by publishers, with the Internet Archive losing a significant ruling in 2023 that found its practice of lending scanned books infringed copyright. The case raised fundamental questions about the relationship between physical ownership, digital access, and the rights of libraries that remain unresolved.
The HathiTrust Digital Library, a partnership of academic and research libraries, holds over 17 million digitized items, of which approximately 40% are in the public domain and available for full-text download. The remainder are searchable but viewable only in snippet form for copyright holders. HathiTrust's scale makes it among the most comprehensive digital resources for literary research.
The Remix Phenomenon
The most commercially visible example of literary remixing enabled by the availability of public domain texts is Seth Grahame-Smith's Pride and Prejudice and Zombies (Quirk Books, 2009), which inserted zombie narrative into the complete text of Jane Austen's novel, reprinting the original in its entirety with interpolated horror material. The book sold over 1 million copies and spawned a genre — "monster mashup" — that produced dozens of imitators and a 2016 film adaptation. Its success demonstrated that public domain texts were not merely archival material but active literary resources with commercial potential.
More sophisticated remixing practices operate through literary and academic channels. The Electronic Literature Organization maintains an archive of digital works that engage with archival material through algorithmic, collage, and generative techniques. Writers including Amaranth Borsuk and Nick Montfort have produced works that use public domain texts as generative source material in ways that raise genuine questions about authorship, attribution, and the ontology of the literary work.
The Legal and Ethical Frame
The 2015 settlement — or rather, the Supreme Court's refusal to hear the final appeal, which left in place a Second Circuit ruling finding Google's scanning program a fair use — established that digitizing books for the purpose of creating a searchable index constituted transformative use under copyright law. What it did not establish was any clarity about the many downstream uses enabled by that index. The US Copyright Office's ongoing work on orphan works — books whose copyright holders cannot be identified or located — represents one of the clearest practical obstacles to making archival material fully available for creative reuse.