Google Partners with Oxford, Harvard & Others to Digitize Libraries By Gary Price, News Editor December 14, 2004 Google is working closely with five new content partners on a massive scanning project that will bring millions of volumes of printed books into the Google Print database. Google is working with libraries at the University of Michigan, Harvard University, Stanford University, Oxford University and the New York Public Library to digitize books in their collections and make them accessible via Google Print. Google Print was expanded in October and allows publishers to make scanned copies of books available through Google. (See: Google Print Opens Widely To Publishers) At the University of Michigan, the plan is to scan seven million titles over a six year period using a non-destructive scanning technology that Google has developed. The university will also be given a copy of each file to use as they see fit. A "digitize the complete library" arrangement is also the current plan at Stanford and Oxford, and the New York Public Library will also be running a pilot project. Harvard's involvement in the program is a "pilot project" according to Peter Kosewski, director of Publications and Communications, Harvard University Libraries. For now, Harvard is allowing Google to digitize 40,000 titles. The university wants to use the project to learn about large scale digitization projects. The first set of materials will come from the Harvard Depository. The total size of the Harvard book collection is over 15 million volumes. Google will begin the scanning process with a focus towards out-of-copyright content. Product Manager Adam Smith, said that many variables come into play regarding what order to scan including the way material might be shelved in these libraries. Google stressed that today's announcement simply introduces the partnerships. In fact, just a small number of scanned items from either library are currently available in the Google Print database. Google has no plans to introduce a Google Print "only" search interface. Google Print results appear in the "OneBox" area at the top of Google search result pages, in much the same way that news headlines or products from Froogle appear in response to relevant queries. However, tools have been created to help isolate Google Print material. Books that are scanned from either library's collection will also have a direct link to find the book in a local library (along with links to purchase the book) using OCLC Open Worldcat data. Other books (materials not scanned from the library collections) will not have the "Find it in A Library" link available. Searching by subject (using a controlled vocabulary) is not available, at least at launch. "In-copyright" books that are in these collections will have basic bibliographic information available but the full text will not be accessible. Smith told us that out-of copyright material will be available in full text, though printing will be disabled when viewing this content. All books will be scanned by Google, in many cases on-site. "Both parties will work conservatively within the laws of copyright," Smith said. Material is scanned into image files, though Google declined to discuss specific file or viewing formats. Google developed this scanning technology for the Google Catalogs project which has remained dormant for most of this year. Although Google has no current plans to include material from other libraries, Smith said that the company would be happy to talk with libraries interested in potentially participating in the program. This is a massive digitization project and it will be very interesting to monitor how the work progresses over the next year. It will also be interesting to see if other web search companies (Yahoo, MSN, Ask Jeeves) partner with libraries and repositories of printed content. Other Sources For Full Text Books Online Placing full text book material is not a new idea on the web. Many services, both free and fee-based, allow you to access books online. The longest running such service is Project Gutenberg, founded by Michael Hart in 1971, with over 13,000 books available. I wrote about The Online Books Page forSearchDay last year. This wonderful collection has been online for more than 10 years, and currently provides searchable access to over 20,000 free full text books. The OBP is edited by John Mark Ockerbloom, a digital library planner at the University of Pennsylvania. The Internet Archive is also digitizing books. The goal of the Million Book Project is to "create a free-to-read, searchable digital library the approximate size of the combined libraries at Carnegie Mellon University, and one much bigger than the holdings of any high school library." One publisher that offers a large portion of their new and old material available online, free, searchable, and full image is The National Academy Press. The currently offer access to more than 3000 publications. Two fee-based services include NetLibrary offers access to about 76,000 books with about 1300 new titles added each month. You can access NetLibray books through your local public or university library, often at no charge. ebrary provides access to more than 50,000 titles (books, maps, sheet music, etc). Like NetLibrary, ebrary licenses their service to libraries and educational organizations and users can login and access via any computer with web access, in most cases for free. ~~ Details, details-such is the Achilles' Hell of the visionary temperament. When Google put forth a massive online literary digitization effort, the scholar, the literati, the purist self-educator, the mousey, bucked-toothed, four-eyed little girl in all of us cheered the soon-to-be nearer reach of all those words. But visions, especially the grandiose, face the speed bumps, the hurdles, of real world logic-or worse, lawyers. Google Print Faces Copyright Hurdles Editor's Note: Google Print for Libraries has opened a Pandora's Box of copyright issues with publishers around the world. Google says everything is covered under the provisions of Fair Use, the original publisher agreements in Print for Publishers, and the ruling Kelly v. Arriba Soft. Publishers dispute all of Google's assertions. Do you think Google has covered all its bases? Or are the publishers entitled to a separate, collective agreement despite what the library, as is, could do for business? Discuss at WebProWorld. -------------------------------------------------------------------------------- Two major associations of publishers have sent letters to Google demanding the cessation of the digitization project that involves scanning the entire text of copyrighted material until all pertinent questions are answered and a collective copyright agreement can be reached. One letter, written by Peter Givler, executive director of the Association of American University Presses (AAUP) on behalf his organization and several others, claims that Google Print for Libraries was sneaked in under provisions for the enthusiastically received Google Print for Publishers. But, goes the letter, the library project was never mentioned in meetings about the Print for Publishers program and news of the project was a huge surprise to everyone. ".News of Google Print for Libraries came as a complete surprise. It had not been mentioned by Google representatives during any of the discussions they were having with our members, and Google's subsequent explanations of Google Print for Libraries have only increased that confusion and transformed it into mounting alarm and concern at a plan that appears to involve systematic infringement of copyright on a massive scale." The letter outlines 16 pointed questions the respective associations would like answered. The Association of Learned and Professional Society Publishers (ALPSP), a non-profit trade association representing 300+ publishers in more than 30 countries, wrote their own nasty letter to Google. In it, Chief Executive Sally Morris contends that the project is not covered by Fair Use/Fair Dealings, and requests a collective agreement with publishers. Less detailed and more pointed than the AAUP letter, the letter contains some chastising remarks. "We cannot believe that a business which prides itself on its cooperation with publishers could seriously wish to build part of its business on a basis of copyright infringement," wrote Morris. Quick Overview of Google Print for Libraries Announced in December of 2004, Google plans to digitize the entire collections of several prominent US libraries and one English library. A ten-year, $200 million project, the online material would be donated for scanning from Harvard, Stanford, Oxford, Michigan University, and the New York Public Library. Google says the goal is to provide a "virtual card catalog of all books in all languages," while respecting authors and publishers' copyrights. The search giant addresses copyright issues with self-described "conservative" measures. All books published in the US before 1923 will be considered books in the public domain. The entire text of these books will be available online without worry of copyright infringement. Because of various international copyright laws, all books published outside the US before 1900 will be considered public domain. As for books published after 1923 in the US and after 1900 outside the US, Google plans to provide "snippets" of text related to a search term. If a searcher wishes to have a complete copy of the book, the service will provide links to libraries and booksellers where the book can be found. So What's All The Fuss About? At first glance it would appear that Google has its bases covered here. They cite a precedent ruling, Kelly v. Arriba Soft, which allows search engines to index copyrighted images already on the web. And as only snippets of copyrighted material are provided along with a link to where the book can be purchased, one may conclude that the library would be good for business in the same way that Google Print for Publishers is good for business. After all, they use the same search-snippet-to-vendor-link method. According to the publishers, the difference is that they've been left out of the process here and they're crying foul when Google says the library project is under the same umbrella of principles as the publisher project. Peter Givler says that while Google's effort is "enormously seductive," it has the fundamental flaw of a process that involves the mass copying of copyrighted material. "Copyright means the right to make copies, period," Givler. "Copyright law can seem pretty byzantine and technical and elaborate and complicated, but at its simplest, that's what it is. It's the right to make copies." Sally Morris concurs. "The law does not permit wholesale copying (which is what digitisation is) by a commercial organisation of works that are still in copyright," she wrote. "It is also illegal to make those works available digitally once they have been copied." Not only is it the legality of physically copying the material that concerns the publishers, it's what happens after they've been copied. "Nobody has convinced us that this can't be hacked," says Kay Murray general counsel for the Authors' Guild. Though the piracy concern doesn't seem to have been addressed, Google denies that it has not attempted to work with publishers on the matter. Adam M. Smith, a senior business-product manager for Google said, "We've actually gone out of our way to speak with everyone and have a very open, receptive conversation with [publishers]. We believe we're creating a product that is beneficial to publishers and to libraries -- that by allowing full-text search of the books that we would spur additional interest in books and in using books and in purchasing books in a way that will benefit all people that are interested in publishing generally." If Google is accused and found guilty of copyright infringement, the penalty is a $150,000 fine per infringement. In a digitization effort involving millions of books, that could add up to one big headache.
top of page