Gravatar for corrac@paragon-inc.com

Question by Cris Corra, Jul 1, 2014 12:34 PM

Can you retrieve the value of a different attribute for a field built from meta header tags?

Is there a way to retrieve a different attribute from the meta tags vs. the "content" attribute? The website we are spidering has an "id" attribute that will hold the unique id of the value of the text in the "content" attribute. We need this unique id to match other data in another source. The best chance of a match would be to use the value in the "id" attribute.

-Cris Corra

*Update (07/07/14): I am using the web connector to index a public website (not a sitecore site). I have mapped the meta-tags in the header of each html page to the fields in my source. For example: maps to the field "Authors" via the "citation_author" name attribute. One of the meta-tags I need to map, has both a "name" and "id" attribute: . The "id" attribute will be our primary key to match this data to an item in our Sitecore data. The "content" attribute holds the item name of the data, but will not always be an exact match. Is there a way I can map the "id" attribute value to a field in my fieldset?

Gravatar for mlaporte@coveo.com

Comment by Martin Laporte, Jul 2, 2014 3:44 AM

I was about to answer explaining how you'd do that when using the web crawler… but suddenly I realize you might be using the Sitecore connector and the metadata is not related to html's tag. Right? If so it's outside my expertise, but the Sitecore people should be back today (it was Canada's day yesterday, most people there were off).

Gravatar for nbordeleau@coveo.com

Comment by Nicolas, Jul 4, 2014 8:18 AM

Hi Cris, can you precise your question? We are not quite sure to understand what you are asking for. Thanks

Gravatar for corrac@paragon-inc.com

Comment by Cris Corra, Jul 7, 2014 11:04 AM

Thanks for your replies! I've updated the original post to better explain the scenario. -Cris

1 Reply
Gravatar for sdesilets@coveo.com

Answer by Sebastien Desilets, Jul 11, 2014 1:50 PM

Cris,

In CES, you can add a post conversion script to your Web Crawler source and parse the HTML headers to retrieve the information you need and assign it to custom metadata. You can create that post-conversion script in VBScript, JScript or C#.

You can check https://developers.coveo.com/display/Converter/Analyzing+Text+to+Find+Metadata for more information on your specific case and https://developers.coveo.com/display/Converter/Conversion+Scripts for general information about the conversion process.

Gravatar for jpdery@coveo.com

Comment by jpdery, Jul 16, 2014 10:50 AM

@sdesilets: you must not use PostConversion for this purpose as it does not provide you with raw/original document. You should use PreConversion script and parse from PreConversion.InputDocument. There is no sample of PreConversion parsing but at least one about using PreConversion.InputDocument for another purpose: https://developers.coveo.com/display/Converter/Filtering+Based+on+Size+and+Date

Ask a question