Google Books

Logo

Google Books ( also Google Books, Google Book Search ) is the world's largest digital library. It is operated by the U.S. company Google Inc., which has set itself the goal of making the stored knowledge in books in the world mainly by digitizing the full-text search available. However, the access to the full text is only available in some of the books. The company is not known how many digital books it already has.

Description

Google Books draws from two sources:

  • Google Print in the narrower sense, the - not controversial - project in cooperation with publishers, and
  • Google Library are tons of books scanned at the large academic libraries without prior permission of the copyright holder, which was legally controversial and is.

History

In October 2004, Google Print presented at the Frankfurt Book Fair ( press conference with Google's founders Sergey Brin and Larry Page ). In December 2004 search results from scanned books in the result lists of the English search interface Google.com began to appear. Google has set out to scan 15 million books by 2015. This corresponds to about 4.5 billion pages.

Since April 2005, a separate search for the contents of the program exists. In October 2005, was presented at the Frankfurt Book Fair, German and other languages ​​user interfaces.

On 4 November 2005 the search page, now with an advanced search provided (query by time period now possible ), officially presented. On November 17, 2005 Google announced the renaming of the service in the company's blog. Since then, requests from print.google.com after books.google.com be forwarded.

In September 2008, Google announced that, together with the North American newspaper publishers to digitize newspapers. The digitized version should be searchable, navigable with the web browser and will appear as in the print edition, along with the photographs, headlines and advertisements.

There are now a number of books in a partnership with Internet Archive. There are issues in different formats, the PDF Reference is made to Google (where it is then for non-US users not available for works after 1864, see controversies ).

2009 and 2012, the records for the Ngram Viewer created in different languages ​​from the corpus of Google Books.

Cooperation with publishers

Google receives from the publishers not to send books or PDF files. The books are scanned and OCR as e- texts in the index. Users can view only comparatively few pages of each book each. After a few pages only (free ) registered users can view a number of other pages. A set of pages is blocked from the outset for access. After exhaustion of the daily quota no more pages can be viewed. Are freely accessible in general, the table of contents, and not infrequently the register.

Google tries to protect the contents through a kind of copy protection (so-called "Digital Rights Management " rsp. Digital rights management ). That this is not always fully applied, one can understand easily to different books. Intuited pages can be read out by means of certain methods then even from the browser cache after viewing the web browser and can then be merged with the appropriate tools to a PDF file.

Cooperation with libraries

Google scans since around 2005 the complete collection of the library of the University of Michigan (over 7 million volumes ) as well as large parts of the U.S. university libraries of Harvard University and Stanford University, the New York Public Library as well as in Europe, the Bodleian Library at Oxford University. The libraries of the University of Virginia, the University of Wisconsin - Madison, Princeton University, the University of California and the University of Texas at Austin participate.

The end of 2006 a further two new institutions at the Network of Libraries, digitize the books at Google: The National Library of Catalonia ( Biblioteca de Catalunya) in Barcelona and the Library of the Universidad Complutense of Madrid.

On 6 March 2007 the Bavarian State Library in Munich announced to cooperate as first German library to the project. It will now be digitized about a million copyright- free works from the historical collections and from special collections. Excluded from the digitization project are only the manuscripts and Inkunabelbestände as well as rare and especially valuable historical prints.

In July 2008, the Bibliothèque Municipale de Lyon announced the first French Library to digitize their books.

On 15 June 2010, the Austrian National Library ( ANL ) announced that Google digitized their copyright-free book inventory. The cost of the digitization of 400,000 books be about 30 million euros and are supported by Google. ANL Director-General Johanna Rachinger described this project as one of the largest public-private partnership in the Austrian cultural landscape. 400,000 volumes from the 16th to the 19th century ( with the exception of those books where conservation concerns suggest otherwise ) should in this case be recorded in full text - about 120 million pages are then available online and free.

Fierce criticism from authors and publishers page brought Google to suspend the scanning of copyrighted books to November 2005. Up to this point, the right-holders should specify which books they would like to have made ​​accessible ( opt-out solution). While Google is fair use under U.S. law relied upon and supported by renowned lawyers, call the publisher and author associations that no book is set in the program without consent (opt -in). In October 2005, brought by authors and publishers against Google was filed in the United States.

Application in research

A report published in Science in December 2010 paper reports on the ways Google Books for the quantitative analysis of culture to use ( Culturomics ). The scientists were for their analyzes about 4 % of all books ever printed, are available. They converted the books into a massive database of words contained in the books (N- grams). The approach 'll be used for research in various fields such as lexicography, the evolution of grammar, collective memory, technology adoption, fame, censorship, and historical epidemiology. The research team estimated based for example on the database that the size of the English vocabulary in the last century almost doubled. In another study, the cultural influence of Freud's with the Darwin was compared. Freud lost its influence; Darwin overtook Freud in 2005.

Controversies

Problems of selection of the digitized

The historian Jean -Noël Jeanneney - former director of the French National Library, who runs a free European digitization project with Gallica - advocates that Europe an alternative to Google's digitization project is on its feet. At Google he is particularly critical of the hegemony of English and the cumulative effect ( with him called the " eye-catcher method ", as usual, the term " ranking ", see: PageRank ), which means that on the battle for the attention of the reader a deliberate concentration on the list leader takes place. The stronger providers is still stronger at the expense of the weaker. This Google 'll especially important for advertising. This " capitalist " Google - principle would Jeanneney a model oppose, in which the state has the final say in matters of cultural memory. 19 National and University Libraries in Europe have the appeal of the French National Library signed to prevent any imminent spiritual and cultural dominance of the United States.

The problem is that Google Books, with its market dominance moved by his selection practice alternatives will also be seen in Germany, especially in the research on specialized areas such as local history or the dialect research.

Problems with the search function

A systematic mapping of functional groups and key words to books such as library catalogs will not occur. To select books of a particular subject area, is not possible. Google assumes that it is sufficient for the thematic search to include all words in the books. Entering a keyword but can only supply results in the language used. It is not considered that is often sought and across languages ​​, and that a word can be used in multiple disciplines, and can have different meanings.

Problems with the text recognition

The mirror criticized in 2007, the often miserable OCR quality and the lack of metadata.

There are cases in which even the author's name was incorrectly recognized by the OCR, so that the work under the author's name can not be found. The apparent quality of the pages is repeated criticized. This relates to the points missing passages and visible finger of the staff on the scanner.

2009 Google bought the used at different sites worldwide CAPTCHA service reCAPTCHA, to have them check the automatic text recognition as a by- product of people.

Copyright

Members of the competition project of the Open Content Alliance criticize the actions of Google, which take no account of copyright.

In Germany in Heidelberg Appeal calling for writers, publishers and scientists the protection of copyright against its erosion. In the manifesto, two things are connected: the critique of the Google book digitization with a critique of open access policy in general. This has led to fragmentation of the critics of rapid advances in Google's digitization project. A major problem sees the Heidelberg Appeal, particularly in the so-called Google Book Settlement.

Google Book Settlement

The Google Book Settlement is a settlement proposal, which Google Inc. has worked on a class action American publishers and authors against them. If this comparison come before the New York court concluded, it also relates to non-US publishers and authors since Google is the world accessible via the Internet. In addition, authors could against the settlement agreements under U.S. law no longer litigate later, if they have not previously excluded by individual opposition from the class action. Google could then every work of German-speaking authors who have not filed an appeal in the U.S., put in digital form on its platform for viewing, without that there would still be possible against legal opposition to the authors.

In early May 2009 was postponed the final hearing process for the Google Book Settlement dated 11 July 2009 to 6 October 2009. The objection period for publishers and authors ( " Opt-Out Deadline " ) was extended on 5 May 2009 to 4 September 2009. For the German book market, the VG Wort has drafted its own proposed rule. VG Wort criticized and complained on the one hand against aspects of this deal possible in an American court. On the other hand, the VG Wort is also working with Google along with the planned implementation of the agreement.

On 1 September 2009, the federal government criticized the settlement proposal. They demanded that they should form a separate class for the German rights holders at least and this excludes from the flat-rate agreement. In addition hinders Google's copyright infringement and behavior " do first, then ask " projects such as the European digital library Europeana, the copyright advance true.

In the U.S., the American Society of Journalists and Authors criticized the agreement as an internal trade in favor of those involved. Also in the FAZ the suspicion of so-called " coupon settlements" is suggested, in which self-appointed plaintiffs' attorneys with Google a " unification " negotiate in order to achieve a considerable amount of money and a market-dominant position of Google.

On the occasion of an expert be heard by the European Commission on 7 September 2009 told Google to try to address the concerns of publishers and authors and their representatives are involved in the supervision of the Google Books project. In Europe, copyrighted books available and will not be scanned without the express permission and made available online. At the same time, the European Commission announced its intention to change the copyright, since due to the legal situation, only the United States would benefit from the advantages of digitization and online marketing.

In September 2011 it was announced:

" In a surprising approach missed authors' associations with their former allies, the universities, a court action in which they demand that the universities give up the digital collections of books and cease cooperation with Google. The action opens a new round in the battle for digital libraries and comes in the same week in which the controversial " Google book settlement" is expected to made ​​in court snuffed. "

In November 2013, the application was rejected on a jury trial and held at the same time that Google Books is in principle covered by the " fair use" principle of copyright procedures of the American Authors Guild v. Google.

Other Projects

272799
de