YaCy

YaCy ( Yet another of cyberspace, homophonic to english ya see ) is a search engine that according to the peer-to -peer principle - short P2P - works. In this case, there is no central server, but all participants are equal.

By installing YaCy YaCy a local proxy is provided. All retrieved via this proxy websites, as well as other data supplied plugins also be indexed locally and can be searched by the user via the web interface YaCy. This index is then ( optionally ) redundantly distributed to other peers of the global YaCy network, so that a global index. A global search queries the global index, which consists of the peers that are currently online. Through this principle of decentralization YaCy is resistant to failures.

The own index (and thus indirectly the global ) can be extended by the loss of a Chicken own web crawler. It can alternatively own YaCy -based networks are configured to form a common index; an example is here the Sciencenet.

The YaCy project was founded by Michael Christen in 2003. The search engine is used for example at the Johannes Gutenberg University Mainz.

  • 4.1 index distribution
  • 4.2 Peer types
  • 4.3 Protocol
  • 4.4 bootstrapping

Advantages and Disadvantages

Benefits

  • The constructed with YaCy global search engine would be virtually failsafe, there is always a part of the network will be available.
  • The internet users are by YaCy as a search engine independent of firms whose ranking (which they can be possibly pay ) and their censorship.
  • The software is open source, released under the GNU General Public License and is free of charge.
  • Since the indexing via the proxy on each client takes place, thus can be (eg i2p ) index, which can not develop a crawler a public search engine such as Google pages from the deep web or non-public networks.
  • YaCy is not necessarily linked to participation in public YaCy cluster and can be used for example as a search engine in private networks (eg company intranet ) or as a private search engine about visited (and indexed ) pages.

Disadvantages

  • Since YaCy for a search query contact other peers and needs to verify Search Results for avoiding spam by reloading the hit, the search takes longer than traditional search engines.
  • If only a few peers exist, fewer results than can be found in major search engines. It can also come by the failure or shutdown of individual (large ) peers to further impairments. With the release of version 1.0 in November 2011, however, the number of peers increased due to the increasing awareness approximately 1000, so that this disadvantage can be currently neglected.
  • The YaCy protocol works on individual HTTP requests, thus it has a higher latency than UDP or TCP with permanent connections.
  • The searches are stored in the searched peer caching purpose only temporarily in RAM. The hash function used to encode the search words is mainly used to control the distributed hash table ( DHT), and keywords can be uncovered with a dictionary partly to show the searches in plain text.
  • Theoretically, spammers operate their own peers that return spam as a result. Wrong search results but this is not possible because a peer hit verified by reloading of result before display.

The program

The heart of the engine is different from other search engines is not a central site, but a computer program that runs on almost all operating systems. The search proceeds via a local web page that is delivered from the installed program. The display of results is here as usual as an HTML page.

Coupled with the P2P system is an optional -use proxy server that automatically indexes the pages visited. This does not take place at sites where further data is passed via GET or POST or use cookies or HTTP authentication (such as pages in a log-in area ). This ensures that even really only publicly accessible data is indexed.

Other Features

Technology

The program is based on a web server, which is also a caching proxy. Can be accessed on the user interface to seek or to manage their own peer to by the Web server. The proxy shares its code with the crawler, which means that all pages that are not personalized are automatically recorded in the index. YaCy used since version 1.04.9097 Apache Solr. Furthermore offers the YaCy YaCy network own domains that are available via the proxy.

Index distribution

Unlike file - sharing networks that result in a P2P search engine must be immediately available. To ensure this, YaCy uses a distributed hash table ( DHT, of Engl. Distributed hash table ). This means that all recognized URLs and words are sent to the peers whose Peerhash matches the corresponding Wordhash or Urlhash. When performing a search, it does the opposite: It searches only on peers who can know their hash for URLs for the word.

This means that only a fraction of the peers in the search must be contacted in order to still get good results.

Peer types

YaCy distinguishes four different types of peers:

Protocol

The protocol of YaCy is text servlets that provides the built-in web server under / yacy / servletname.html. Other peers transmit via GET parameters data and get a simple text response; The exact format is different between the servlets.

Bootstrapping

When bootstrapping YaCy trying to find the net with the other peers. We will start by looking for a Seedliste. In superseed.txt is first the URL of a Seedliste that a YaCy peer periodically uploads selected and then downloaded. In the seeds.txt are the references of other peers, so that contact with the YaCy network can be recorded. On the next start can be gebootstrapt from the known seeds, and the Seedlisten are only necessary when there are many references are no longer valid.

831627
de