Gravatar for jrioux@absolunet.com

Question by jrioux, Sep 4, 2015 8:58 AM

Index Item and associated Media Item

Hello,

I have an Item Template "DocumentItem" in Sitecore which is just a placeholder with a link to a Media Library Item (mostly pdfs) and some custom properties.

I'm looking for a way to index my DocumentItem, but that the excerpt comes from the content of the file from the Media Library.

Anyway I can achieve this?

2 Replies
Gravatar for jflheureux@coveo.com

Answer by Jean-François L'Heureux, Sep 4, 2015 11:19 AM

Hi,

Simon's answer on the coveoPostItemProcessingPipeline pipeline is right. You should create a custom processor for this pipeline. But you don't need to put the linked document text in a custom field nor change the UI to display this custom field.

Instead, in your processor, you will detect that the item about to be indexed is an instance of your "DocumentItem" template. When that will be the case, you will get the linked document binary data from the Sitecore API and set it as the value of the .CoveoItem.BinaryData property of the CoveoPostItemProcessingPipelineArgs object you received in the Process method.

All the other fields on the item will be handled by Coveo for Sitecore. You just need to set the BinaryData property. Then, when CES will index the document, it will automatically detect the type of file contained in the binary data, extract its text and allow full-text search on all the document words. The excerpt in the search results will be automatically generated from the original document text.

Gravatar for jrioux@absolunet.com

Comment by jrioux, Sep 12, 2015 10:38 AM

It seems to work, but I have two things I want to correct.

  1. I want the title field to be the title of my "DocumentItem" item, not the pdf title.
  2. The generated link has no layout, so I can't access it. I need to change the Url.

I tried to do something like this :

pArgs.CoveoItem.BinaryData = memoryStream.ToArray();
pArgs.CoveoItem.Title = item.DisplayName;
pArgs.CoveoItem.ClickableUri = UrlHelper.GetMediaUrl(mediaItem);

But it seems like everything is overwriten from the BinaryData stuff.

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Sep 14, 2015 10:23 AM

As you experimented, CES is playing with the document title when indexing. The title you set in the processor is just the title that will be sent in the RabbitMQ message, overriding the one set by the Coveo for Sitecore SearchProvider module. CES use a "Title Selection Sequence" in the source and optionally on certain document types to choose the title it will use for each document.

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Sep 14, 2015 10:23 AM

By default, the Coveo for Sitecore sources and "Sitecore Search Provider Document Types Set" have the following title selection sequence:

  1. Use the title extracted by the converter (use the title extracted from binary data (from the Title Metadata Name))
  2. Automatically detect the title of documents (use an algorithm to generate a meaningful title)
  3. Use the filename (use the title set by Coveo for Sitecore or processors)
Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Sep 14, 2015 10:25 AM

  • https://onlinehelp.coveo.com/en/ces/7.0/administrator/modifyinggeneralsource_parameters.htm
  • https://onlinehelp.coveo.com/en/ces/7.0/administrator/modifyinghowceshandlesadocumenttype.htm

As explained in the documentation links, you can change the title selection sequence of your source or individual document types in the "Sitecore Search Provider Document Types Set". For your use case, I recommend you change the sequence on your sources by moving "Use the filename" at the first position. Then, you'll need to rebuild your Sitecore indexes and you should have your expected titles.

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Sep 14, 2015 10:35 AM

For the ClickableUri, if you use a recent version of Coveo for Sitecore, you have a ResolveResultClickableUriProcessor in the coveoProcessParsedRestResponse pipeline of the Coveo.SearchProvider.Rest.config file. This processor is modifying the results clickable URI at query time to ensure the URIs are for the current Sitecore site in multi-sites setups.

I recommend you add a processor after this one to detect your "DocumentItem" template based items and change their clickable URL to the associated media item at query time.

Gravatar for slangevin@coveo.com

Answer by Simon, Sep 4, 2015 9:29 AM

I have never tried this particular case but my guess would be to use the Coveo Item Processing pipeline:

https://developers.coveo.com/display/public/SC201508/Using+the+Coveo+Pipelines

From there you can grab the body of the linked document and place it in a custom field. Then you can replace the out of the box excerpt by your custom field:

https://developers.coveo.com/display/public/SupportKB/Pointing+a+Search+Result%27s+Excerpt+to+a+Specific+Field

Try it and if you hit a roadblock with your code, post it as a comment and we can work on this together.

Cheers,
Simon

Gravatar for debabrata.biswas@xcentium.com

Comment by debu_biswas, Mar 2, 2017 11:16 PM

Is this "coveoPostItemProcessingPipeline" still the way to go for CES 7.0 x64 Build 8691.0 and Coveo for sitecore 4.0? Please let me know if there is a better way.

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Mar 3, 2017 8:33 AM

Yes, it is the right way to do it with the latest releases.

Ask a question