Stemmer.
Code stemmera it is possible to take here - Heuristic extraction of a root from Russian word <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fforum.dklab.ru%2Fphp%2Fadvises%2FHeuristicWithoutTheDictionaryExtrac>
Transliteration:
function rus2lat ($string) {
$rus = array ('±', 'µ', '?', '?', '?', '?', 'I', '¬', '?', '?', '?', '?', '?', 'I', '?', '?', '?', '?');
$lat = array ('e', 'zh', 'c', 'ch', 'sh', 'sh', 'ju', 'ja', 'E', 'ZH', 'C', 'CH', 'SH', 'SH', 'JU', 'JA', " ",", ");
$string = str_replace ($rus, $lat, $string);
$string = strtr ($string,
"ABVGDEZIJJKLMNOPRSTUFKHYEHabvgdezijklmnoprstufkhyeh",
"ABVGDEZIJKLMNOPRSTUFHIEabvgdezijklmnoprstufhie");
return $string;
}
Search on search.
First of all we break search about words in a file $words, having allocated for Russian words a root stemmerom.
We form search search so.
for ($i=0; $i <$num; $i ++)
{
$if_clause. = " iw.word = ' ". $ words [$i.] "'";
if ($i! = $num-1) $if_clause. = "or";
}
$query = " select il.id, il.url, il.title, il.short, count (distinct iw.id) *1000 + sum (ii.times) as rel
from indexing_link il, indexing_index ii, indexing_word iw
where (". $if_clause. ") and ii.word=iw.id and il.id=ii.link
group by il.id order by rel desc ";
Links on a subject
Russian stemmer with uluchshenymi characteristics, from the developer of search on a Rambler:
http://linguist.nm.ru/stemka/stemka.html
Function of language PHP for calculation of distance between lines:
http://www.php.net/manual/en/function.levenshtein.php <
Yandex. XML is the service, allowing to do{make} automatic search searches to a Yandex and to publish his{its} answers at itself on a site in own design.
http://xml.yandex.ru/
JAndex. Server - the application for text-through information search on your web - server or in a local area network in view of morphology of Russian.
http://company.yandex.ru/technology/products/yandex-server.xml
Slow, but the interesting proof-reader of spelling:
http://norvig.com/spell-correct.html <
http://sphinxsearch.com <
The free-of-charge search mechanism. Demands compilation and installation on the server.
Contains API for integration with PHP, Python, Perl, and Ruby.
From features:
- Fast job (10 MB/sek indexation, search 0.1 sek on 2 Gb to base)
- The scalability, the distributed{allocated} search
- Russian and English stemming
- The PKHP-MODULE does not demand compilation for job
http://www.dataparksearch.org
The search mechanism with an open code. Front - end as CGI applications. Supports:
- Dictionaries of word forms ISpell
- Synonyms for Russian and English of languages
- Logic language of searches (boolean)
- Correction of spelling on the basis of dictionaries Aspell
P.S. Given clause{article} was written with hope that in the greater number of sites convenient and useful search will be organized. In fact in due course the quantity{amount} of the information in a network grows, and one of priority problems{tasks} of developers I think maintenance of its{her} effective search.

|