Get HTML tags as metadata using sitemap source in Coveo Cloud
We want to add few HTML pages to our source. However, we want to remove header and footer which is common across all page and keep only the content required.
Also, we want to take metadata from
<meta name="..." /> inside HTML.
I am seeing custom conversion option in Coveo Cloud. I believe we need to use Post Conversion script. Can you share few sample scripts to strip out header & footer from page, read meta tags and add to custom field.
You can check this sample https://developers.coveo.com/display/public/Converter/Analyzing+Text+to+Find+Metadata to get a general notion on how to read the content of document in post-conversion then it's a matter of finding the header, metadata and footer you are interested in.
If you have a delimiter in the header and footer of your site's template, you can remove them using a regular expression and replace the matched header and footer with an empty string and feeding it back to the HTMLOutputToOverride and/or TextToOverride.
Please take a look at https://developers.coveo.com/display/public/Converter/Changing+View+as+HTML+Version+of+Document
- PostConversion.HTMLOutputToOverride.WriteString (for Quick View)
- PostConversion.TextToOverride.WriteString (for Document Body)