Mini-Google
Internal documentation
Parser
- Scope
- Parse the file crawled and extract more URL to be crawled, output links of the current URL, and the bindings between words and URL with theirs hits.
- Interface
-
- To the store_server it sends messages
GET FILE
GET PATH
RMV _path_
and receives in response to GET the messages :
FILE _size_info_byte_ _info_
PATH _path_
where _info_ is a compressed information containing _urlid_, MIME type, and file crawled.
- To the URL Resolver it sends message :
GETID _urlid_crt_ _basestring_ _url_
|__________|
if 0 it is the crt URL
and receives message :
URLID _url_id_
- To the lexicon it sends message :
UPD _mot_
and receives message :
WID _wid_
- To the Links server it sends message :
PUT _urlid_ _urlid_
- To the Indexer server it sends message :
PUT _urlid_ _wid_ _hit_
- File giving the server sockets of the servers it should contact :