The definitions (and/or name)
associated with directories entries can be considered as full text that will be
indexed and interrogated by a search engine. This type of search can be divided
into the following levels depending on the linguistic level reached:
1.
the
“keyword” level: The search is carried out by searching exclusively
for words that have been entered by the user. This means that the user’s
language must correspond perfectly to the language used in the definition
(and/or name) of the interrogated directories entries;
2. the “semantic” level: the search at this level focuses not only on the user’s words, but also on words with a similar meaning (typically synonyms and words with a more generic or more specific meaning). This involves either using a semantic network to interpret the request (to extend the search) or inserting the words with a similar meaning in the interrogated directories. It also means that the system has to be capable of recognising compound words in the language so that, for example, it can consider “runner bean” as an entire concept (and search vegetables in general) rather than the juxtaposition of two words (which could lead to the search being extended to sport);
3.
the
“simple phrase” level: the system considers the user’s request not
as a list of words, but as a phrase with its own meaning and chooses all the
appropriate meanings for each word in view of the initial phrase (for example,
with the phrase “cycle race”, the system will choose words whose meaning is
close to the “competition” meaning of race and will not expand on the idea
of ethnic groups). This is known as semantic disambiguation. This system
also recognises the semantic “heads” of compound words and can extend
searches to these terms (for example, if the term “table wine” is in the
dictionary, a search can be conducted on “wine” but not on “table”);
4.
the
“complex phrase” level. This level involves processing simple phrases
but also takes complex linguistic structures into account, such as
coordination and exclusion. Searches can also be carried which are deduced from
the initial one in order to extend the search. For example, in a request for
“vegetable seed for sowing”, the search (or indexing process) will be
conducted on “vegetable seed” and “seed for sowing”, but not
“vegetable for sowing”.
Obviously, the fourth level is
necessary to handle the complexity of textual descriptions in directories. This
involves a powerful dictionary model and a linguistic engine that uses this
dictionary and several dedicated modules to handle the linguistic phenomena that
occur massively in directories.
This allows the system to
compute the semantic distance between the user query and the found
entries. With a high-quality dictionary, this distance can also be computed
between entries in different languages, allowing the comparison of a query in
French with a description in English, for example.
This cutting-edge technology is
implemented in the BSR Semantic Tool, in order to
allow powerful searches on directories, even in a cross-language mode (French
querying of English directories).