Question by Simon, Oct 4, 2016 7:20 PM

Building your own Word Corrector Lexicon

The Query Correction Feature (Did You Mean) is based on the Word Corrector Lexicon which is automatically generated by the content of the index:

I cannot find anything about how to build your own lexicon so I am guessing this is not available. Is there a plan to allow administrators to build their own lexicon?

Answer by Daniel Lavoie, Oct 12, 2016 11:06 AM

The word corrector lexicon can be influenced by a bias file, containing word/bias pairs. For example, an e-commerce site could feed the products catalog into a bias file, to make sure that corrections suggest product names instead of common english words. This is the closest you can go about making your own word corrector lexicon.

I don't think this is officially documented yet, so I will copy the relevant documentation found in our internal wiki page here:

Influencing the algorithm

It is possible to influence the algorithm using a flag file, using the following format:

Word1 [-]Bias

Word2 [-]Bias

For example, this would make sure that the "verison" typo does not suggest "version", but "Verizon":

verizon 100000

version -20000

The number of occurrences to add or subtract is of course dependent on the index content. You mileage may vary.

The file will be loaded from the path configured in PhysicalIndex/WordCorrectorBiasFilePath. Once you are satisfied with your bias file, simply click on Rebuild Word Corrector Lexicon in the administration tool (Index/Advanced).

Using this technique, it is also possible to make the algorithm output suggestions containing ligated characters, which is impossible otherwise, since the index does not contain any ligated character.

For example, to make the indexl output the "œsophage" suggestion, using the same number of occurrences as "oesophage": œsophage 0

Of course, the number of occurrences of the bias can be non zero, to boost or dampen the suggestion.

