Gravatar for kc@informatica.com

Question by Kumaran C, Nov 1, 2016 12:41 PM

Get HTML tags as metadata using sitemap source in Coveo Cloud

We want to add few HTML pages to our source. However, we want to remove header and footer which is common across all page and keep only the content required.

Also, we want to take metadata from <meta name="..." /> inside HTML.

I am seeing custom conversion option in Coveo Cloud. I believe we need to use Post Conversion script. Can you share few sample scripts to strip out header & footer from page, read meta tags and add to custom field.

Gravatar for flguillemette@coveo.com

Comment by François Lachance-Guillemette, Nov 1, 2016 4:13 PM

Hi @kumaranc

Are you using Coveo Cloud V1 (cloudplatform.coveo.com) or Coveo Cloud V2 (platform.cloud.coveo.com) ?

FLG

Gravatar for maheshbpatil@gmail.com

Comment by Mahesh Patil, Nov 7, 2016 6:53 AM

We are using Coveo Cloud V1

1 Reply
Gravatar for sdesilets@coveo.com

Answer by Sebastien Desilets, Nov 7, 2016 9:41 AM

Hi @kumaranc

You can check this sample https://developers.coveo.com/display/public/Converter/Analyzing+Text+to+Find+Metadata to get a general notion on how to read the content of document in post-conversion then it's a matter of finding the header, metadata and footer you are interested in.

If you have a delimiter in the header and footer of your site's template, you can remove them using a regular expression and replace the matched header and footer with an empty string and feeding it back to the HTMLOutputToOverride and/or TextToOverride.

Please take a look at https://developers.coveo.com/display/public/Converter/Changing+View+as+HTML+Version+of+Document

Specifically at

  • PostConversion.HTMLOutputToOverride.WriteString (for Quick View)
  • PostConversion.TextToOverride.WriteString (for Document Body)
Ask a question