Email filtering

A spam filter ( advertizement ) is a computer program or a program module for filtering electronic junk mail (spam).

Rare spam filters are also named with the ambiguous expression spambot.

Approach the control

  • Verification of the sender based on its e- mail address or URL
  • Forward control of the servers that send the content, or provide
  • Sorting out after the header
  • Sorting on the basis of the text (Content filter)

Methods of control

Black List method

This method checks the contents of the e -mail for certain expressions or phrases or the consignor in entries from a negative list ( blacklist). The term included in the e -mail, the e-mail is sorted. These blacklists must be created manually in general and are correspondingly expensive to manage. However, many spam filters have predefined blacklist. In addition, the hit rate is not very high, since spam can be sorted as a good email and good email as spam now and then. Also, you can identify easily bypass filters: for example Viagra is in the blacklist, the filters Vla * gr- a will not be recognized. Does the filter input of regular expressions, but can be used in accordance with sophisticated filter patterns that take into account all possible spellings, such as v. { 0,1} [ III1 \ |! L]. { 0,1} [ AAAAA @ ]. { 0,1} g { 0,1} r. { 0,1} [ AAAAA @ ].

One of the most common programs on Linux and other Unix derivatives is SpamAssassin, which according to various criteria (obviously invalid sender, known spam passages of text, HTML content, in the future dated Absendedaten etc. ) points are awarded to each mail and from a certain score as spam classified. Also working with a blacklist SpamPal and SPAVI that examines other than the respective e- mail itself also linked in the e- mail pages for suspicious items. Razor and Pyzor turn produce to each mail a hash value and check in central databases, whether other people who have also received this email, they have classified as spam or not.

Bayesian method

Alternatively, the Spam can be filtered with a self-learning Bayesian filter based on the Bayesian probability. The user must manually classified as spam or non - spam as the first 1,000 emails. Then, the system detects almost independently with a hit rate of over 95 % mostly the spam e -mail. From the system is faulty dispatch tickets emails the user must manually re-sorting. Thus, the hit rate is steadily increased. This method is usually far superior to the blacklist method.

This mechanism make Bogofilter and Mozilla Thunderbird and the popular especially in the German -speaking Spamihilator advantage in the current versions. The program must be trained by the user before it can reliably detect spam.

A Bayesian filter the congeneric method is the Markov filters. He uses to a Markov chain and is more effective than a Bayes filter, such as William Yerazunis able to show with his spam filter CRM114.

Database -based solutions

Usenet was already discussed in the 90s, due to the spam advertised in the mail URL (and possibly phone numbers ) to detect. Although spammers can modify and personalize the messages arbitrarily, but since it ultimately ( UCE ) always comes to entice the user to a contact, and the possible address space is not infinitely variable, this approach allows a theoretically very good recognition. Especially interesting is that no heuristics are used, always involving the risk of false positives with it. Due to technical requirements, reaction rates, etc. However, this was considered a long time for impractical. The spam filter Stop Here -based ( as a centrally hosted solution ) in the core, however, on exactly this idea and shows that this can definitely work well in practice.

Problems

The sorting e -mails is always associated with a certain error rate. Firstly spam messages are not detected and thus reach a " false negative " in the inbox. If desired mails classified as spam, it is called " false positive " detection. If the filter is trained sufficiently long so can be ( eg via the use of a white list) almost rule "positive" errors and "negative " in 10 % to less than 1% press. However, this is associated with a certain effort. In addition, filters need to be constantly adjusted by improved methods to the new methods of spammers.

Example of a concealment method

The following spam was sent at intervals a few days to the same recipient list. He comes from the same sender has the same content and makes the technique of spammers clearly to deceive by small variances spam filter and to come so directly to the addressee.

457666
de