Gravatar for arulselvan.arivazhagan@cfainstitute.org

Question by Arulselvan, Nov 17, 2017 7:20 PM

Extract Text from pdf

My requirement is to inject pdf content in to "BinaryData" in my own "coveoPostItemProcessingPipeline" processor. Below is my code. I am gettin empty string. But i could see the number of pages in the pdf reader. kindly suggest on this.

var t = mediaItem1.GetMediaStream();
StringBuilder text = new StringBuilder();
ITextExtractionStrategy Strategy = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
using (PdfReader reader = new PdfReader(t)) {
  for (int i = 1; i <= reader.NumberOfPages; i++) {
    text.Append(PdfTextExtractor.GetTextFromPage(reader, i, Strategy));
  }
}
Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Nov 17, 2017 7:35 PM

You should post this in the iTextSharp help forum. Not the Coveo Q&A website.

0 Reply
Ask a question