Gravatar for daniel.reedy@tcw.com

Question by Daniel Reedy, Apr 17, 2015 2:16 PM

impact of excluding GUID field

We are finding GUID and some other unwanted content showing up in the search result excerpt of our web site. My first inclination is to configure the GUID field for exclusion with defaultIndexConfiguration. But then I thought that might remove it for all use cases. Sometimes I need to search Sitecore Desktop by GUID, and I don't want to handicap that.

Is there a best practice that you can advise?

1 Reply
Gravatar for jflheureux@coveo.com

Answer by Jean-François L'Heureux, Apr 17, 2015 2:34 PM

Which processor are you using to generate the HTML version of the indexed items and what is its configuration? Is it the BasicHtmlContentInBodyProcessor or the HtmlContentInBodyWithRequestsProcessor?

If you are using the BasicHtmlContentInBodyProcessor processor, you can turn the IncludeTextFieldsOnly option to true to avoid indexing non-text fields in the body of your documents. That way, they won't show up in the results excerpt.

Gravatar for daniel.reedy@tcw.com

Comment by Daniel Reedy, Apr 17, 2015 3:16 PM

BasicHtmlContentInBodyProcessor.

That change worked well to remove the GUID. Thank you. I credited your answer.

I still see items in the excerpt that I need to filter out. They look like field titles:

Relative Value Balanced
menu title ... title TCW Relative Value Balanced Strategy ... browser title ... web bit title ... css class ... file description ... factivaid ... metatags-other keywords ... doc title

Any ideas what is happening? (Please advise if I should open a new ticket)

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Apr 17, 2015 3:29 PM

The BasicHtmlContentInBodyProcessor processor include the field names by default. (see Indexing Documents with Basic HTML Content for all the options).

You can turn it off by setting the IncludeFieldNames option to false.

Gravatar for daniel.reedy@tcw.com

Comment by Daniel Reedy, Apr 17, 2015 3:56 PM

There it is! It worked beautifully. Thank you.

Gravatar for daniel.reedy@tcw.com

Comment by Daniel Reedy, Apr 29, 2015 2:52 PM

The excerpt looks clean most of the time, but some rows occasionally contain field names. I'm not sure how that is possible, since the IncludeFieldNames setting is server-wide. Here is a public example:

https://www.tcw.com/Search%20TCW.aspx#q=Fidelity%20Money%20Market%20Fund

Any ideas what I can do about it?

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Apr 29, 2015 3:04 PM

Maybe those Sitecore items were not reindexed after you changed the BasicHtmlContentInBodyProcessor settings? To confirm, in the CES Administration Tool Index Browser, search for those documents, expand their details and check their indexed date. If it's older than April 17th 15:56 (your last comment), they have a body built with the default BasicHtmlContentInBodyProcessor settings.

You can rebuild your whole index from the Sitecore Indexing Manager to force the items to be re-indexed.

Gravatar for daniel.reedy@tcw.com

Comment by Daniel Reedy, Apr 30, 2015 3:40 AM

These are new servers, in a new environment; but your advice still applied to us. Thank you.

Ask a question