Fuzzy Retrieval

The fuzzy information retrieval has developed since the 1970s. Here identifies fuzzy information retrieval, an information retrieval based on the fuzzy logic.

  • 5.1 Examples of term relations
  • 5.2 Construction of the Term Relations
  • 5.3 Use of term relations in IR

The fuzzy model IR

The fuzzy IR model is to define a quadruple , where

  • T = { t1, t2, ..., tn } is a set of index terms that describe queries and documents.
  • Q = { q1, q2, ..., qm } be a set of queries that consist of index terms. Here, the index terms by logical operations AND, OR and NOT can be linked.
  • D = { d1, d2, ..., dk } is a set of documents. Each dj ∈ D, j = 1, 2, ..., k is represented by { (t1, wj1 ), ..., (tn WJN, ) } to represent, whereby W ji (i = 1,2, ..., n), the importance of term ti in dj represents and takes a value from the interval [ 0, 1].
  • F is a ranking function

This value represents the similarity between the document D and the query q.

For a query, the following applies:

The fuzzy sets of operations may be used as follows:

Is an example to illustrate the application of fuzzy IR model is called. The query is:

There are two documents:

After surgery, it concludes:

The same results at d1 and d2 state that the similarity between d1 and q1 is between d2 and q1 equal. But most people would decide that d2 would be the q1 similar than d1. Here, the undesired result is due to the fact that the operation of only one weight term is considerate. In addition, the simple fuzzy set operations only limit to two terms. Two developed fuzzy models are presented Following up to be evaluated many terms. It can further be a parameter as "softness factor" to solve the above problem (the need of a weight result ) into the models

Advanced fuzzy IR models

The Waller - force model

F ( dj, t1 AND ... AND tn ) = (1 - γ ) · MIN { wj1, ..., WJN } γ · MAX { wj1, ..., WJN }, 0 ≤ γ ≤ 0.5;

F ( dj, t 1 OR ... OR tn ) = (1 - γ ) · MIN { wj1, ..., WJN } γ · MAX { wj1, ..., WJN }, 0.5 ≤ γ ≤ 1

The model mixes the maximum with minimum operation and has better effectiveness than the simple fuzzy model.

The Paice model

In an AND operation: wji sorted by size in ascending order, ie wj1 ≤ ... ≤ WJN

F ( dj, t1 AND ... AND tn ) = [? I = 1 N ( r i -1 · Wji )] / [? I = 1 n r i -1], 0 ≤ r ≤ 1

In an OR operation: wji sorted by size in descending order, ie ≥ ... ≥ wj1 WJN

F ( dj, t 1 OR ... OR tn ) = [? I = 1 n (ri -1 · W ji )] / [? I = 1 n r i -1], 0 ≤ r ≤ 1

This model takes into account all the term weights in the calculation of similarity. But it requires much higher computational effort than the Waller - force model.

Comparison

In the following table the results of d1 and d2, with a simple fuzzy IR model, Waller power model and Paice model are compared with each other.

The degree of similarity between d1 and q1 is the three models equal, that's understandable. The difference arises in the case of D2, the models are of the two expanded larger than the case of simple fuzzy IR model, which corresponds to the more likely expectation. Therefore it can be said that the two models have greater effectiveness in locating than the simple fuzzy IR model.

While mixing the Waller - power model with maximum minimum, but it only pays attention to these two term weights, which can lead to problems for queries with more than two terms. example:

It is clear that the degree of similarity is greater than that between D3 and D4 between Q2 and Q2. But according to the equation at Waller - force model same results are calculated at d3 and d4, which value for the parameter γ is also determined because it is taken in this model only to the MIN and MAX - term weight into consideration. Thus, the problem arises. In comparison, the Paice model is more complex, but it takes into account all term weights in the calculation and therefore avoids this problem.

The introduction of the term weight in the query

The models just shown excludes weights of terms in query, where all terms have the same importance in queries. It is known that the introduction of the weights of terms in the queries can improve the efficiency of retrieval. With the weight of the query term is represented: qk = { (t1, wk1 ), ..., (tn, wkn ) }, wk ∈ [0, 1]. In the retrieval, the weights of terms are multiplied in queries and documents, that is,

A query without term weight like a query in which the weights of all terms be "1". A term is removed when the weight is zero, this means that the term does not influence the query.

Although the Waller - force model and the Paice model offer no method to evaluate the term weights in queries, the P- norm model formulas for the calculation of term weights in queries.

Fuzzy IR model with query weights

The P- norm model with query weights

F ( dj, ( tq (k ) 1, wq (k ) 1) AND ... AND ( tq (k ) n, wq (k ) n) ) = 1 - [[? I = 1n (1 - wji ) p · wq (k ) ip ] / [? i = 1n wjip ] ] 1 / p, 1 ≤ p < ∞,

F ( dj, ( tq (k ) 1, wq (k ) 1 ) OR ... OR ( tq (k ) n, wq (k ) n) ) = 1 - [[? I = 1n wjip · wq (k ) ip ] / [? i = 1n wjip ] ] 1 / p, 1 ≤ p < ∞.

Here, " p" parameter and represents the degree of precision. "1" means little in common, while " ∞ " very accurately called.

Term relations

Fuzzy -term relations is called fuzzy thesaurus. Here, this relation is a fuzzy relation to a fuzzy set, which has the interpretation of a fuzzy graph. Formally, it is assumed: T = { t1, t2, ..., tm } is a set of terms and D = { d1, d2, ..., dn } is a set of documents. A ( general ) term - relation is defined by a fuzzy relation on T ∪ D: R ( x, y), x, y ∈ T ∪ D. ( Here terms and documents are combined into a whole lot, even though it is called term - relation. ) Three types of relations are involved:

The below problems in term relations are then discussed:

Examples of term relations

The thesauri and their fuzzy versions are typical examples of term relations, the fuzzy relation R is not defined on T ∪ D, but on T. Different types of fuzzy thesauri are considered. For example, see Reisinger fuzzy equivalence relations and fuzzy ordering as natural generalizations of sharp categorical and hierarchical relations. Tahani also mentioned fuzzy partial order. Redecki proposes the use of a fuzzy equivalence relation together with a subset of the basic terms and a Termgeneralisationsrelation.

In the research of fuzzy thesauri balanced and unbalanced fuzzy relations and fuzzy transitivity be noted, however, their assumption leads to a problem because you can not find a fuzzy transitivity directly into reality. This problem can be solved by taking into account fuzzy graphs ( undirected graphs ) and digraphs. Given is a fuzzy relation R, which need not be transitive. This relation can be represented by a fuzzy digraph, and a " transitive closure" is covered, R * = R ∪ R2 ∪ ... ∪ Rk ∪ ... (Rk = Rk - 1οR, where ο the MAX-MIN composition implies ). R * denotes the degree of accessibility to the digraph, namely R * (x, y ) is the max value of α - section, whereby on the x y sharp digraph is reachable.

The above operations and properties of fuzzy relations are summarized here:

Construction of the term relations

Various researches treat under different assumptions, methods of automatic construction of fuzzy relation of terms or documents. A typical method is the use of the document -term matrix A = ( Aij ), where aij the weight of term tj in the document is di. It is assumed here: γj =? I aij / di is the fuzzy set, which corresponds to the term tj. A symmetric relation R ( tj, tk ) and an asymmetrical relation Rn ( tj, tk ) are defined by

Where | γj | = A1J A2J ... anj the Σ - sum. This method is based on the assumption that the meaning of the two terms is also similar when the two patterns of γj and γk are similar. The adoption of Rn ( tj, tk ) is that γj γk has a narrower meaning than when γj is the γk included.

Use of term relations in IR

There are two basic methods of using the term relations in information retrieval. If a term - relation is made possible as a network in which the documents are terminal nodes and a query is an original node, the retrieval is performed by tracing from the network. On the other hand, if a term - relation R on T together with a fuzzy relation F (d, t) and a fuzzy query vector q =? J wj / tj is specified, a simple standard method for retrieval of documents is the calculation of a fuzzy set δ = FοRοq by the application of MAX-MIN composition of fuzzy relations.

357619
de