Gravatar for arthur.north@avanade.com

Question by ArthurN, Sep 13, 2018 4:25 PM

Crawling HTML for Sitecore wildcard rendering with query string parameter

We are trying to get the HTML indexed of the body of Sitecore items where multiple subsections of a page are rendering data from the children of that item on a wildcard node. We have two different data types being rendered by two different wildcard nodes, one item type the HTML body is being indexed the other is not. The only difference we can see is that one uses a query string to determine what specific child of an item to display and the other does not.

We have implemented a custom link manager so the ClickUri and PrintableUri are properly getting indexed for both, both item types show up in the results, and both properly link to the intended destination, but only one crawls the body so that you can search for terms within the content. Both item types are properly rendering when you use the Preview view from the Sitecore content editor.

We also added the HtmlContentInBodyWithRequestsProcessor to our config.

I'm not sure what else to check, is there anything separate we have to do since one of them requires a query string to select the proper item?

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Sep 13, 2018 5:12 PM

Which one does not work? The one that requires a query string?

Gravatar for arthur.north@avanade.com

Comment by ArthurN, Sep 13, 2018 5:22 PM

correct the one that requires a query string does not work.

1 Reply
Gravatar for jflheureux@coveo.com

Answer by Jean-François L'Heureux, Sep 13, 2018 5:57 PM

As a troubleshooting step, could you look at the Sitecore logs when indexing that problematic wildcard item. The `HtmlContentInBodyWithRequestsProcessor` and its friends are logging ERROR messages like this when errors occurs:

  • "An exception occurred while trying to fetch the HTML content of the document {0}."

  • "Impossible to create a web request with the following url : {0}"

If you do not see those errors, try splitting the Coveo logs in its own log file using the commented log4net section in the `Coveo.SearchProvider.Custom.config` file and set the logging level to DEBUG. There is also a debug message logged when the item has no layout: "The item {0} doesn't have a layout, hence its html content will not be retrieved."

If you do not see any of those 3 log messages, then I am out of ideas. You should then open a support ticket.

Gravatar for arthur.north@avanade.com

Comment by ArthurN, Sep 13, 2018 6:30 PM

I see zero errors only warnings in the log viewer. When I look at the items in question they do have warnings but not anything like what you have above, it just says:

Error messageThe JSON for document sitecore://database/master/ItemId/BD65F3E9-385F-4B45-B4BC-35B550824FD6/Language/en/Version/1 contains case insensitive duplicate keys '[ParentID, Version, ID, haschildren]' Error code

DUPLICATEJSONKEYS

I can try setting the logging level to debug and see if I get that third error.

Gravatar for arthur.north@avanade.com

Comment by ArthurN, Sep 13, 2018 7:11 PM

The logging level already appears to be at the DEBUG level and I am not seeing any errors around the item having no layout.

Ask a question