Gravatar for dan@getfishtank.ca

Question by Dan Cruickshank, May 27, 2015 10:45 AM

HTML Content Processor Not Processing On Publish (Oct 2014 Release)

Hello!

Using the October 2014 build of CFS.

  • A page in Sitecore contains a multilist field
  • The items in the multilist field as rendered as HTML
  • Items in the multilist are changed, the page is published, CES Console shows the following:

Indexed - sitecore://database/web/ItemId/{E517F78E-9FBE-408F-B470-794521DD485B}/Language/en/Version/1

However, the HTML on the page never appears to be re-crawled. New HTML content on the page cannot be found via search.

I understand the values could be added as a computed field or as a reference field but ideally, I'd like the HTML to be crawled.

Thanks!

I've looked at these answers previously. I'm not certain if they're related. https://answers.coveo.com/questions/2702/html-content-processor-not-working https://answers.coveo.com/questions/2995/using-sitecore-multilist-with-the-basichtmlcontentinbody-processor

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, May 27, 2015 11:06 AM

I don't have an answer but I'll share what I would check for problems like this.

  • Maybe the processor is run before the Sitecore HTML cache is cleared on publish?
  • Enabling debug Log4Net logging for both Sitecore and Coveo for Sitecore would be helpful to understand the events order.
Gravatar for dan@getfishtank.ca

Comment by Dan Cruickshank, May 27, 2015 11:09 AM

Maybe the processor is run before the Sitecore HTML cache is cleared on publish

Interesting idea. :)

2 Replies
Gravatar for dan@getfishtank.ca

Answer by Dan Cruickshank, Jun 25, 2015 10:09 AM

Fwiw:

The cache clearing checked out in the logs. Turns out, there were changes to the admin user so permissions had to reset on the index.

As a cheat to configuring foreign key fields to trigger changes in the file, I created a new indexing strategy that re-indexes the websites from their respective roots on publish. To it's a partial re-index & crawl of "web" and "pub" on every publish. It happens quite quickly. A bit of a "cheat" but a successful outcome ultimately.

Thanks for your insights, gents.

Gravatar for vseguin@coveo.com

Answer by Vincent Séguin, May 27, 2015 10:52 AM

Hi Dan,

Have you enabled the processor needed for the HTML content to be crawled? Such as explained here : https://developers.coveo.com/display/public/SC201505/Indexing+Documents+with+HTML+Content+Processor, in the "Configuring" section.

Thank you

Gravatar for dan@getfishtank.ca

Comment by Dan Cruickshank, May 27, 2015 10:58 AM

Yes sir.

If I rebuild the index, or "re-index the tree" (using the developer tab) I can see the updated HTML gets added. But it doesn't get added on a publish.

I know one approach could be to create a custom indexing strategy that reindexes from the sites /home node on each publish. But that's pretty hamfisted, if you will. :)

Ask a question