Gravatar for sholmesby@hhogdev.com

Question by sholmesby, Jun 16, 2015 12:05 PM

Set custom field based on HTML of External Source

I followed this tutorial to be able to set a custom field in my external source's index. https://developers.coveo.com/display/public/SC201505/Displaying+External+Content+in+a+Search+Interface

Now I am looking to basically set another custom field depending on specific markup on each of the pages. Essentially I am looking to set a hasvideo field to true or false depending on whether the HTML for the page has a link to YouTube on it.

I was wondering:-

  • Do I still do this in a PostConversion step, as described in the above article?
  • What Coveo classes do I need to be able to do this?
  • Is there a way to debug the Post Conversion code as it is being run (by attaching to some process or something)?
2 Replies
Gravatar for sholmesby@hhogdev.com

Answer by sholmesby, Jun 25, 2015 2:28 PM

I managed to get this working with a Post Conversion step, as Jeff mentioned. The code needs to run a regular expression on the HTMLOutput though….and this is not a string, it's a byte string…. so the syntax is a little different than the examples in VB.

public override void RunPostConverter(PostConversion p_PostConversion, DocumentInfo p_DocumentInfo)
{
            string html = p_PostConversion.HTMLOutput.ReadByteString(p_PostConversion.HTMLOutput.BytesCount,
                Charsets.UTF8_CHARSET);

            string youTubeString = "youtube.com";
            Match matchHtml = Regex.Match(html, youTubeString);

            if (matchHtml.Success)
            {
                p_DocumentInfo.SetFieldValue("articleType", "Video");
            }
            else
            {
                p_DocumentInfo.SetFieldValue("articleType", "Article");
            }
}

Note that because we're reading the stream, when you debug this can will need to reset the stream if you've already passed the first line that reads it. (This was an issue I hit).

For information on how to debug a script, follow the instructions here.

Gravatar for jflheureux@coveo.com

Answer by Jean-François L'Heureux, Jun 16, 2015 2:35 PM

Hi Sean,

The job you have to do (parsing the HTML content of an indexed document) requires a postconversion script as you already figured out. For easier debugging, I suggest you to use .Net scripts for the job (see About .NET Conversion Scripts).

You can learn more about postconversion scripts in general in the Converter API documentation (see Postconversion Scripts). This documentation lists all the objects available in conversion scripts.

Gravatar for sholmesby@hhogdev.com

Comment by sholmesby, Jun 17, 2015 11:29 AM

Hi Jeff, Thanks for your response. Would my approach be best to do something similar to the code here:-

https://developers.coveo.com/display/Converter/Analyzing+Text+to+Find+Metadata

…but in a C# form?

I'm having trouble converting that code over to C# (some of the classes don't seem to resolve with my references.

Also, I don't quite know what regular expression I'd use to look for a link to youtube…. Could you help me out with this?

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Jun 17, 2015 11:41 AM

Hi Sean,

I'm not an expert in conversion scripts. The example you found seems to be a good way to do it. I think it's normal that classes don't resolve in Visual Studio. There's a note about that on the link I gave you yesterday:

Important: In Visual Studio, do not worry that IntelliSense is not able to detect classes from the using declarations. When you will configure your script file (.css or .DLL) as a CES preconversion or postconversion script, it will work.

Youtube links regex is not my expertise either. You may have more luck on StackOverflow for that part.

Gravatar for sholmesby@hhogdev.com

Comment by sholmesby, Jun 17, 2015 12:08 PM

OK thanks Jeff. Yeah I looked at the Interop DLL and (from my first look) it won't resolve anyway…. I'm just in the process of getting this built locally, so I can debug it… so I'll see how it goes.

Do you know if there are more examples of .NET post conversion scripts anywhere? I think I might be able to work it out with more of a feel from these sorts of example….

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Jun 17, 2015 1:45 PM

Unfortunately, I didn't find any other .Net conversion script but this very small one:

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Jun 17, 2015 1:46 PM

.Net Postconversion script example:

using System;
using Coveo.CES.ConversionScriptLoader;
using Coveo.CES.ConversionScriptLoader.Interops.x64;

namespace SampleCustomConverter {
    public class SamplePostConverter : CustomConverter {
        public override void RunPostConverter(IPostConversion p_PostConversion, IDocumentInfo p_DocumentInfo) {
            p_PostConversion.Trace("Hello from my .NET post converter: " + p_DocumentInfo.URI, SeverityEnumeration.SeverityNormal);
        }
    }
}
Gravatar for sholmesby@hhogdev.com

Comment by sholmesby, Jun 17, 2015 2:21 PM

OK, yeah I've only been able to find that one, and the one in the tutorial (link posted in the original question). Thanks anyway.

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Jun 17, 2015 2:31 PM

Just found another example in a page about references: https://developers.coveo.com/display/Converter/Add+reference+in+a+.NET+Converter

Ask a question