Question by mark, Oct 26, 2015 4:43 PM

Stop words with JS Frameworks

Hi, I am using JS Frameworks and currently when my client queries specific terms such as font size, Coveo returns documents that contain html tags 'font-size.

How to I configure the query to ignore html tags or how do I include a stop word list?

Thanks in advance.

Answer by Jean-François L'Heureux, Oct 26, 2015 11:59 PM

HTML tags are not indexed when Coveo Enterprise Search indexes HTML documents unless the index was unable to detect the documents as HTML documents. When correctly detected, HTML documents are handled by the HTML converter and only the text of the HTML documents is indexed. All the tags, scripts and CSS rules are excluded.

Can you check those documents in the CES Administration Tool Index Browser and report the document type detected by CES in the details section of those documents please? If it's not HTML, your documents may contain just HTML tags without the <html>, <head> and <body> elements or just invalid HTML. You would have to fix your source documents to have them recognized as HTML documents by CES.

Comment by mark, Oct 27, 2015 8:44 AM

Hi Jeff, thanks for your speedy reply. Its not an HTML document but the document has custom fields with partial HTML tags. If I were to parse out the html tags in the converter, how will this affect the ranking 'Term has formatting' setting?

Thanks in advance.

Comment by Jean-François L'Heureux, Oct 27, 2015 11:18 AM

Fields are not "converted" by CES. Only the binary data (body) of the documents is converted to keep only the text. This is why all the text in your fields is searchable.

"Term has formatting" ranking applies only to the binary data (body) of the indexed documents too. It doesn't check the fields content. So pruning your field values from the HTML tags won't have any effect on ranking.

