Gravatar for jbouvet@absolunet.com

Question by ebpo, Aug 25, 2016 8:31 AM

Some characters not "indexed" for pages in russian

Hi !

I have a website in multiple languages, including Russian. I've noticed few results when visitors search for russian text, so I looked in the Coveo admin interface.

For a lot of words, some characters are simply missing. Surprisingly, it isn't necessarily Russian characters. For example : "Агрессия - Обобщение | Энциклопедия раннего детского развития" becomes "гре и - Обобщение | Энциклопеди раннего дет кого развити"

I'm pretty sure the same situation happens with other cyrillic languages, but I did not test.

Anyone got a solution to this ?

Thanks,

3 Replies
Gravatar for bduan@coveo.com

Answer by binduan, Aug 29, 2016 8:38 AM

It's a little strange. I have tested with CES 7 that these russian characters are indexed and searched correctly. So could you please check whether you have special setting (like stopword file, thesaurus)? And also some custom preprocessing script that could filter these characters(We have encounter this sort of problem before).

Gravatar for bduan@coveo.com

Answer by binduan, Aug 29, 2016 8:38 AM

It's a little strange. I have tested with CES 7 that these russian characters are indexed and searched correctly. So could you please check whether you have special setting (like stopword file, thesaurus)? And also some custom preprocessing script that could filter these characters(We have encounter this sort of problem before).

Gravatar for jbouvet+coveo@absolunet.com

Answer by jbouvet, Sep 20, 2016 2:35 PM

Thanks for the reply, and sorry for my delay ! There does not seem to have any kind of preprocessing filter in place. I will keep looking!

Thanks again,

Ask a question