Home Links
Home Page
How to us to correct the person who has entered
Yandex - like search by the hands
Stemmer.
Creation of sites - model of qualitative imposition
Entrance parameter of function is the file from six elements
Promotion of a site in Rambler
The good design should make a profit
Validnost` HTML
Perenapravlenie mistakes in a browser - 100 % as in PHP
Terrible animal the traffic
Cajt with help HTML:: Mason
Choice of the module
Bases of
Creation of a site
Adjustment of a site
TT - the counter of the traffic
GetCurrBytes
OnOverflow
Small improvements of our counter
 

Stemmer.


Code stemmera it is possible to take here - Heuristic extraction of a root from Russian word <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fforum.dklab.ru%2Fphp%2Fadvises%2FHeuristicWithoutTheDictionaryExtrac>



Transliteration:



function rus2lat ($string) {

$rus = array ('±', 'µ', '?', '?', '?', '?', 'I', '¬', '?', '?', '?', '?', '?', 'I', '?', '?', '?', '?');

$lat = array ('e', 'zh', 'c', 'ch', 'sh', 'sh', 'ju', 'ja', 'E', 'ZH', 'C', 'CH', 'SH', 'SH', 'JU', 'JA', " ",", ");

$string = str_replace ($rus, $lat, $string);

$string = strtr ($string,

"ABVGDEZIJJKLMNOPRSTUFKHYEHabvgdezijklmnoprstufkhyeh",

"ABVGDEZIJKLMNOPRSTUFHIEabvgdezijklmnoprstufhie");

return $string;

}



Search on search.


First of all we break search about words in a file $words, having allocated for Russian words a root stemmerom.


We form search search so.



for ($i=0; $i <$num; $i ++)

{

    $if_clause. = " iw.word = ' ". $ words [$i.] "'";

    if ($i! = $num-1) $if_clause. = "or";

}

$query = " select il.id, il.url, il.title, il.short, count (distinct iw.id) *1000 + sum (ii.times) as rel

from indexing_link il, indexing_index ii, indexing_word iw

where (". $if_clause. ") and ii.word=iw.id and il.id=ii.link

group by il.id order by rel desc ";




Links on a subject


Russian stemmer with uluchshenymi characteristics, from the developer of search on a Rambler:

http://linguist.nm.ru/stemka/stemka.html

Function of language PHP for calculation of distance between lines:

http://www.php.net/manual/en/function.levenshtein.php <


Yandex. XML is the service, allowing to do{make} automatic search searches to a Yandex and to publish his{its} answers at itself on a site in own design.

http://xml.yandex.ru/

JAndex. Server - the application for text-through information search on your web - server or in a local area network in view of morphology of Russian.

http://company.yandex.ru/technology/products/yandex-server.xml


Slow, but the interesting proof-reader of spelling:

http://norvig.com/spell-correct.html <


http://sphinxsearch.com <

The free-of-charge search mechanism. Demands compilation and installation on the server.

Contains API for integration with PHP, Python, Perl, and Ruby.

From features:


- Fast job (10 MB/sek indexation, search 0.1 sek on 2 Gb to base)

- The scalability, the distributed{allocated} search

- Russian and English stemming

- The PKHP-MODULE does not demand compilation for job


http://www.dataparksearch.org

The search mechanism with an open code. Front - end as CGI applications. Supports:

- Dictionaries of word forms ISpell

- Synonyms for Russian and English of languages

- Logic language of searches (boolean)

- Correction of spelling on the basis of dictionaries Aspell


P.S. Given clause{article} was written with hope that in the greater number of sites convenient and useful search will be organized. In fact in due course the quantity{amount} of the information in a network grows, and one of priority problems{tasks} of developers I think maintenance of its{her} effective search.