The different kinds of search in linguistic area, and the BSR semantic tool characteristics.

  The definitions (and/or name) associated with directories entries can be considered as full text that will be indexed and interrogated by a search engine. This type of search can be divided into the following levels depending on the linguistic level reached:

   1. the “keyword” level: The search is carried out by searching exclusively for words that have been entered by the user. This means that the user’s language must correspond perfectly to the language used in the definition (and/or name) of the interrogated directories entries;

   2. the “semantic” level: the search at this level focuses not only on the user’s words, but also on words with a similar meaning (typically synonyms and words with a more generic or more specific meaning). This involves either using a semantic network to interpret the request (to extend the search) or inserting the words with a similar meaning in the interrogated directories. It also means that the system has to be capable of recognising compound words in the language so that, for example, it can consider “runner bean” as an entire concept (and search vegetables in general) rather than the juxtaposition of two words (which could lead to the search being extended to sport);

    3. the “simple phrase” level: the system considers the user’s request not as a list of words, but as a phrase with its own meaning and chooses all the appropriate meanings for each word in view of the initial phrase (for example, with the phrase “cycle race”, the system will choose words whose meaning is close to the “competition” meaning of race and will not expand on the idea of ethnic groups). This is known as semantic disambiguation. This system also recognises the semantic “heads” of compound words and can extend searches to these terms (for example, if the term “table wine” is in the dictionary, a search can be conducted on “wine” but not on “table”);

    4. the “complex phrase” level. This level involves processing simple phrases but also takes complex linguistic structures into account, such as coordination and exclusion. Searches can also be carried which are deduced from the initial one in order to extend the search. For example, in a request for “vegetable seed for sowing”, the search (or indexing process) will be conducted on “vegetable seed” and “seed for sowing”, but not “vegetable for sowing”.

Obviously, the fourth level is necessary to handle the complexity of textual descriptions in directories. This involves a powerful dictionary model and a linguistic engine that uses this dictionary and several dedicated modules to handle the linguistic phenomena that occur massively in directories.

This allows the system to compute the semantic distance between the user query and the found entries. With a high-quality dictionary, this distance can also be computed between entries in different languages, allowing the comparison of a query in French with a description in English, for example.

This cutting-edge technology is implemented in the BSR Semantic Tool, in order to allow powerful searches on directories, even in a cross-language mode (French querying of English directories).