FIELD: information technology.
SUBSTANCE: edit distance is employed in determining relevance of the document as result ranking by detecting near-matches of a whole query or part of the query. The edit distance evaluates how close the query string is to a given data stream that includes document information such as TAUC (title, anchor text, URL, clicks) information, etc. The architecture includes the index-time splitting of compound terms in the URL to allow the more effective discovery of query terms. Additionally, index-time filtering of anchor text is used to find the top N anchors of one or more of the document results. The TAUC information can be input to a neural network (e.g., 2-layer) to improve relevance metrics for ranking the search results.
EFFECT: improved relevance of search results.
19 cl, 12 dwg
Authors
Dates
2013-12-10—Published
2009-03-10—Filed