CuneiForm (software)

CuneiForm (English for cuneiform ) is a text recognition software for printed templates recognition of the Russian company Cognitive Technologies, which is now available as free software.

Features

CuneiForm recognizes printed documents, but no handwriting or the like, with various language models for over 20 languages. Good work on detecting more complicated table structures. Results can be saved in RTF, HTML or ASCII text or Excel spreadsheet can be exported directly to Word or word processing. It receives document structure and fonts and allows batch processing.

History

CuneiForm was once the market leader in Russia ( in competition with ABBYY FineReader company ) and was supplied with some scanners.

In 1993, Cognitive Technologies an OEM agreement with the Canadian Corel Corporation A, which allowed the inclusion of recognition library in the Corel Draw package from version 3.0 it contained.

1996 OCR CuneiForm'96 was published. It was the first text recognition package that worked with an adaptive detection method, ie a method that combines multi - font and omnifont recognition: There is an internal replica of the fonts used in the template recognition (English for fonts) from the characters that are displayed in the recognizable quality. A result, the recognition of poor character is shown possible in the connection, as the software dynamically adjusts in recognition. With this detection method, the recognition accuracy is considerably increased.

In 1997, the use of neural networks was introduced in recognition.

Since 1999, the software can get the look of the template by the arrangement of the elements is reconstructed in the output.

As part of a program which is avowedly make text recognition technology available for everyone, Cognitive Technologies has announced on 2 April 2008, to make the software ultimately completely free software available. As a first step, a freeware version was released after a few years without development progress on 12 December 2007. Furthermore, a free text -recognition service on the World Wide Web was established in June 2008.

As an investor and project coordinator Cognitive Technologies aims to promote the development of a new version of the software. Since the beginning of April 2008, the core of the recognition engine is freely available under the simplified BSD license to allow commercial use. On 30 August 2009, the original user interface has been disclosed.

Cuneiform Linux

Jussi Pakkanen has created a platform independent compilable version of the software that runs on Linux, BSD, Mac OS X and Windows. These independent developments are to be finally integrated into the main branch of Cognitive Technologies. It is a pure command line version that reading a variety of file formats allowed by the integration of ImageMagick, whereas otherwise only uncompressed Windows Bitmap ( BMP) is supported. Since version 0.5, the software can also output in the description language HOCR.

Frontends

  • YAGF is a Qt 4 based graphical user interface that read via xsane images directly from a scanner and can perform a spell check using libaspell.
  • Cuneiform - Qt is another Qt - based frontend.
  • OCRFeeder provides a complete ( scan, analyze image editing, page layout, and receive, proofreading, ... ) desktop OCR solution available, with which you can use in addition to other well CuneiForm as a backend.
  • WatchOCR is a free OCR server for PDFs. WatchOCR CuneiForm used to create from PDFs ( scanned ) images searchable PDFs. Using a web interface can be configured WatchOCR so that it automatically converts scanned PDFs new ( in a specific folder ) into searchable PDFs. WatchOCR is available for Ubuntu LiveCD and pre-configured in the Deb format.

Using a script ( xsane2cunei ) CuneiForm can also be incorporated into the scanning software xsane. From HOCR issue of CuneiForm hocr2pdf images PDF files can be made searchable by machine using the command line program. The command line tools pdfsandwich or pdfocr automate this process. Also, the document management system Archivista making machine searchable by CuneiForm and hocr2pdf PDFs.

209562
de