Gravatar for adam@borsi.ca

Question by Adam, Jan 23, 2015 3:21 PM

Where does Coveo for Sitecore draw the line between Meta Data & Sitecore Data

Greetings,

I'm currently setting up a Coveo for Sitecore server and I've got my system indexed. I'm at the point of tailoring the results to suit our needs and I'm having a hard time, conceptually, drawing the line where the system reads Meta Data from the HTML crawled site (i.e. the special crawler template/layout I created) and when it goes into the Sitecore DB/Item Index and reads the Field Data.

Is there an easy explanation, or am I making the mistake of naming my HTML Meta Data the same fields as my Sitecore Indexed Fields?

Does one have a priority over the other? i.e. if they both share the same "name" the Meta Data wins over the Sitecore Indexed Field.

Any and all help is appreciated.

Thank you kindly, Adam

4 Replies
Gravatar for vseguin@coveo.com

Answer by Vincent Séguin, Jan 26, 2015 11:06 AM

Hi Adam,

The Sitecore fields get priority over the HTML crawler. If you want to use the plain text value for instance, you should create yourself a computed field for that. You can learn more about it here : https://developers.coveo.com/display/SC201501/Creating+Computed+Fields

Basically, you create a new field with the value you desire, and this new field will be added on every document that it doesn't return null. This is the safest way to avoid conflicts.

Gravatar for jflheureux@coveo.com

Answer by Jean-François L'Heureux, Jan 23, 2015 6:01 PM

I'm not sure to understand your question but I'll try to give you information on the indexing process.

  1. The Coveo for Sitecore SearchProvider inside Sitecore gathers all the fields of the items. The fields are prefixed with "f" and suffixed with a hash of the index source name to be unique to each database index.
  2. If you enable the HtmlContentInBodyWithRequestsProcessor in your Coveo for Sitecore configuration, Coveo for Sitecore will fire an HTTP GET request to the item URL with the Coveo Sitecore Search Provider UserAgent string that will render the item with a custom device and layout. The HTML returned will be set as the body of the item sent to the Coveo index.
  3. Then, the fields and item body are sent to the Coveo index by the RabbitMQ queue.
  4. The Queue Crawler inside CES receives the item fields and body, creates a document with the fields as "metadata" and pass it to the HTML converter module because the body content type is HTML.
  5. The HTML converter module extracts the meta tags inside the HTML body and also set them as "metadata".
  6. After all those steps, all the metadatas are set in the fields of the CES documents. Metadata keys that match a field name (fsomethingXXXXX) are set directly on those fields. Metadata keys that don't match a field but match a field "Metadata Name" are set on the corresponding fields.

For example, I have a "TemplateID" field in Sitecore. Its Coveo index field name is "@ftemplateid61896" where "61896" is the source name hash. This Coveo index field has "TemplateID" as its "Metadata Name". If the indexed Sitecore item has this field, the CES document metadatas will have a key/value like "ftemplateid61896"/"ANID". If the HTML body of this item contains meta tag with name="TemplateID" and content="ANOTHERID", the CES codument metadatas will have a key/value like "TemplateID"/"ANOTHERID". The final value of the Cove index @ftemplateid61896 field will be "ANOTHERID".

Gravatar for adam@borsi.ca

Answer by Adam, Jan 26, 2015 10:39 AM

Hi jflheureux,

Thank you for the response.

Very insightful and pretty close to what I was asking.

Beyond what you've explained so far. What's the conflict resolution when Coveo has the same "key" from two different sources? In Sitecore I have a template field called "Author" (GUID) and in my crawled HTML I have a Meta tag called "Author" (plain text). Which gets priority? From what you've described it sounds like Sitecore fields get priority over the HTML crawler.

I only ask because I can't easily change the field name in Sitecore and the default "key" in Coveo's field definitions for sysauthor is -seemingly- not set… and don't know why it picks up "Author" from either Meta or Sitecore.

I don't want to use the GUID, but the plain text value.

This part of the configuration process I'm having a difficult time wrapping my head around.

Appreciate the help so far.

Thank you kindly, Adam

Gravatar for adam@borsi.ca

Answer by Adam, Jan 29, 2015 11:37 AM

Hi Vincent,

Apologies for the delayed response.

Awesome, Thank you.

I was reviewing this during my research phase, but I wanted to see what was possible without coding anything. This seems to be where the secret lies to the fine tuning of SC data into Coveo.

Thank you to the both for your assistance.

Adam

Ask a question