Gravatar for dwebb@habaneroconsulting.com

Question by dwebb, Dec 29, 2014 6:40 PM

What fields are used to derive the Excerpt?

Hello -- I am using Coveo for Sitecore, Nov 2014 version.

I have a web page which has some custom fields on it, and uses a custom MVC rendering to display those fields. These fields are either single-line text or multi-line rich text.

One of the pages contains (in the custom fields) a sentence which contains the words "oil mist". When the page is published, the custom field is added to the index with the full sentence and I can search for that page using the search term "oil mist" (or oil or mist), so the full-text search is working fine.

However, the "Excerpt" for that page in the search results is empty. I would have expected that the Excerpt field would contain some snippet of text with the words "oil mist".

PDF documents containing the words "oil mist" somewhere in the body of the document DO have a valid Excerpt field which shows the highlighted words.

I have enabled the "HtmlContentInBodyWithRequestsProcessor" in the post-item processing pipeline, but the result is the same with or without that processor.

Is there something special I have to do to get a valid Excerpt for pages which have custom fields?

Cheers David

1 Reply
Gravatar for vseguin@coveo.com

Answer by Vincent Séguin, Dec 29, 2014 8:25 PM

Hi David,

The excerpt is build from what is in the body of the document. What you could do is implement a PostItemProcessingPipeline processor to add the desired fields in the body, that is called 'BinaryData' on the CoveoItem you receive as a parameter.

You can learn more about this pipeline here : https://developers.coveo.com/display/public/SC201412/Using+the+Coveo+Pipelines

And see an example right there : https://developers.coveo.com/display/public/SC201412/Indexing+Documents+with+Custom+Pipeline+Processor

Gravatar for dwebb@habaneroconsulting.com

Comment by dwebb, Dec 30, 2014 12:47 PM

Hi Vincent - ok, that makes sense, and I've tested it and it works. However, we will have lots of different kinds of pages in our application … it will be a maintenance headache to have to continually update the list of templates (and fields) which we want to include in the document "body". Is there no way to get the "rendered" form of the document with all the text fields on it automatically? In other CMS systems, they are able to get the full rendered HTML from the page and automatically pick that up as the body, in addition to all the individual fields.

David

Gravatar for vseguin@coveo.com

Comment by Vincent Séguin, Dec 30, 2014 1:45 PM

Hi David,

The HtmlContentInBodyWithRequestsProcessor is supposed to do that, actually put the 'rendered' form in the body… but it implies that the document is accessible from the web aka published. If it doesnt do anything, there's maybe error while fetching the content. You could also use the BasicHtmlContentInBodyProcessor which includes all the text fields for instance. Learn more about this processor here : https://developers.coveo.com/display/SC201412/Indexing+Documents+with+Basic+HTML+Content

Gravatar for dwebb@habaneroconsulting.com

Comment by dwebb, Dec 30, 2014 1:58 PM

OK, I'll give the BasicHtmlContentInBodyProcessor a try.

Is there a way to see the BinaryData through the Coveo admin tool? I can see all the individual indexed fields, but I don't see a field called "Body" or "BinaryData" on the docs.

Gravatar for vseguin@coveo.com

Comment by Vincent Séguin, Dec 30, 2014 2:04 PM

Yes. When the documents have something in their body, there's a 'Quick View' button on the document in the Admin Tool. You can try it by indexing a PDF from the Media Library, for instance.

Gravatar for dwebb@habaneroconsulting.com

Comment by dwebb, Dec 30, 2014 2:39 PM

OK, got it. BTW, I removed my custom post item pipeline processor step and replaced it with the BasicHtmlContentInBodyProcessor, and now I'm getting a excerpt automatically for all the pages. So everything working as I expect.

Thanks, David

Ask a question