Nutch

Nutch is a Java framework for Internet search engines. The software is open source and is being developed by the Apache Software Foundation under the Apache license. Nutch is based, inter alia, on Lucene ( stemming, indexing, etc.), Solr ( web functionalities ) and Hadoop ( scaling).

Nutch can search arbitrarily large amounts of data. To company-specific needs, it can be adapted through its plug -in architecture - for example, to other document formats.

611258
de