Here we try to abide by subdivision of scanned books into single pages, the workload for a single proofreaders as low as possible and after the brute force method ( here means the greatest possible number of agents reads only one book page by thousands provided to correct ) to reach the largest possible workload.
The procedure is the same principle as the distributed computing ( distributed computing ). The crucial difference is that here is not a very large number of computers on the Internet linked to one another, but that represent an arbitrarily large number of people on the internet its employees to use as digitize in a short time hundreds of books by her proofreading.
The currently about 1,400 active participants organize themselves on a voluntary basis by origin or interests to teams; such as the very active Team Germany has almost 300 members who participate at all levels of DP.
The end of the world book digitization
Basically, three phases can be distinguished in the process.
- In the initialization phase a book is selected by an experienced and has long been participating Proofreader. The selected book must be free from copyrights. In the original project, the American copyright law is applied ( published until 1922 texts), at Distributed Proofreaders Europe in Europe largely uniform set of rules that the author of the book must have died more than 70 years ago.
- The initiator begins by scanning a book each page. The scans cover the whole book, so cover page, table of contents, text and images.
- Then, the pages to be analyzed by an OCR software. The first, but still highly error-prone raw text is then present.
- Thereafter, the amount of data uploaded to the website of the Distributed Proofreader and provided as another project proposal in the forum for discussion. After a positive vote, the project is then unlocked for proofreading. It is then to call on the homepage along with other projects around the world.
Phases of proofreading
Rounds 1 to 3 of proofreading ( " proofing " )
After calling the project a page of the book is displayed in each case. Here, in the upper half of the screen scanned original page is displayed ( as a graph) and in the lower half of the recognized OCR text. The Proofreader now reads the text of the original page and comparing it with the OCR text ( the source text ). This scan errors are corrected and special characters added.
This actual proofreading ( " proofing " ) takes place in two or three rounds, with each side of two different participants is processed. Among the higher rounds only experienced proofreaders are allowed.
Rounds 4 and 5 ( " Formatting" )
In the fourth and fifth round formatting can be added (for example, italics, headings, footnotes). While the barriers to entry to the fourth round are relatively low, only experienced users have access to the fifth round (the second formatting ).
Post ( "Post- Processing" )
The previously unconnected sides of the source text are automatically combined into a text document. In each case, an experienced proofreaders who has attained the status of a "Post- Processors ", completes the layout with the graphics, ie he adapts to this, these improved or supplemented potential gaps in the text. It checks the document to full compliance with the original work. Finally, he may except the obligatory text format produce even more formats, especially HTML.
The project will be completed. The digitized work (not to be confused with the commercial provider Project Gutenberg -DE) to the server from Project Gutenberg published. Any Internet user can now download this work and read. The work thus available to the whole world.
Importance of Distributed Proofreaders
In the course of time Distributed Proofreading ( DP) developed into the largest source of e- texts for Project Gutenberg, so Distributed Proofreaders became an official part of the Project Gutenberg in 2002. So far (January 2011), approximately 19,500 texts republished by Distributed Proofreaders. The texts come from no specific subject areas; there are, for example, Literature, science, music books, magazines and popular fiction books represented, just to name a few.
On 9 March 2007, the completion and publication of the first 10,000 sentences were announced by Distributed Proofreaders. To celebrate this and to demonstrate the variety of subjects treated in DP books, a selection of 15 titles has been published together:
- Slave Narratives, Oklahoma (A Folk History of Slavery in the United States From Interviews with Former Slaves)
- Eighth annual report of the Bureau of ethnology. (1891 N 08 / 1886-1887 )
- R. Caldecott 's First Collection of Pictures and Songs
- Como atravessei Àfrica (Volume II)
- Punch 10/27/1920
- Sylva, or, A Discourse of Forest - Trees
- Encyclopedia of Needlework
- The annals of the Cakchiquels
- The Shanty Book, Part I, Sailor Shanties ( 1921)
- Le marchand de Venise
- Agriculture for beginners, Rev. ed
- Species Plantarum (Part 1)