V2 Pipeline Extension: How to access the Body text
What is the python code that I need to gain access to the Body text? I can't find this anywhere in the documentation. In our V1 script custom field, we have these lines of code:
var UTF8_CHARSET = 2;
var documentContent = PostConversion.HTMLOutput.ReadByteString(PostConversion.HTMLOutput.BytesCount, UTF8_CHARSET);
Also, it may be that I need access to the Original file. If so, please advise.
The V2 indexing pipeline extensions are _very_ different from the V1 version. It now uses Python, can import packages, and has a streamlined API for an even more powerful and easy to use tool.
This Coveo Cloud V2 Indexing Pipeline Extensions page is a really useful resource to kickstart your Cloud V2 Extensions journey.
The Body text is a readable stream, as instructed in the Document Object Python API Reference (Get Data Streams).
You should be able to get that stream and do whatever you want with it.