Yandex - like search by the hands.
The rare web - programmer did not collide{face} with a problem{task} of a spelling of search for the site. Irrespective of - whether it was done{made} for own CMS or for the first site, the cousin uncle made to firm toporikom on kolenke in 10 class.
Frequently, the problem{task} of search in a site is solved use of simple SQL-search of a kind where 'content' like ' %semenovich % ' at which the required phrase is broken into words and everyone is searched by means SQL among lines in a DB. Despite of simplicity of this decision, quality of results of such search leaves much to be desired. Responsible{crucial} developers use indexation, take into account relevance and even morphology. However still on one site I did not see such beautiful search, as on a Yandex.
That I now understand as beautiful search:
? Sorting results on relevance
? The account of morphology of Russian
? And - function « probably you searched for the most important »
Whether it is possible to make such search on the site, having spent there is some time and not using bulky bases of word forms? It is possible.
At once should warn is not the description of how search on a Yandex works. This description of how to make search, on 80 % similar to search in a Yandex:) In other words - those methods which give maximal effect at the minimal expenditures of labour will be shown.
1. Sorting on relevance.
« Relevance - the subjective concept expressing a degree of conformity of search and found, relevance of result » (Wikipedia).
As the concept subjective, means and to define{determine} it{him} it is necessary to ourselves. We shall make relevance of the document function of two parameters - numbers of words of search which are present at the document, and quantities{amounts} of ocurrences of all these words in the text.
For example, search « computer interiors of Syktyvkar ».
At the text "computer" there are 5 times, "interiors" 2 times, "Syktyvkar" - 0 times. We had 2 words, and only 7 their ocurrences. In language of mathematics x=2, y=7.
Let's define{determine} function relevance (x, y) = 1000x + y.
At such ranging there are those pages in which meets more words from search above, and among themselves they will be sorted on frequency of ocurrence of the given words.
At correctly constructed index of a site, sorting on relevance can be included easily directly in SQL search, having used order by (technical details are lower).
2. The account of morphology of Russian.
As search with morphology understand search which is not sensitive to the form of a word, that is searches not only "neighbours", but also "neighbours", "neighbour", "neighbour's" and so on.
To achieve it it is possible two ways - correct and "almost correct".
The correct way consists available huge (about 50 Mb) the dictionary of the word forms competently translated in a format of the used DB and fastened to "cursor" of a site. To find the free-of-charge dictionary it is possible here: Ispell dictionary list <, the truth is necessary to understand with his{its} structure.
"Almost correct" way - to allocate a root of a word on the basis of the general{common} linguistic rules of language (rejecting standard suffixes and the terminations{endings}). It is done{made} with the help "stemmera". (the Initial code stemmera for language PHP
Minus of this algorithm that he is based on rules. And as is known, each rule has exceptions - the given algorithm is powerless against pairs like "to go" - "went", and also is unpredictable in a case with the fluent vowels enough often meeting in Russian (lions - lions, sheeps - sheeps).
At indexation of a site before to bring the next word in base, we select his{its} root and we bring only it{him}.
By search, we select roots from words of search and we search in a database already only for them.
In a result we compare not words, and roots - search with morphology is ready.
3. Function « probably you searched ».
I could explain for a long time why and for whom this function is necessary. But instead of it I shall say only one - she is in a Yandex (for Russian) and in Google (for English language). And it that the given search sites very cautiously and carefully approach to a choice of the funkcionala.

|