Gravatar for cbilodeau@coveo.com

Question by christian, May 25, 2015 12:48 PM

Get phone numbers from content of documents with a RegEx

Hello,

we are trying to get a postconversion script to work. This post conversion would read the documents content then return any phone number it find. We want a regular expression to find the phone number.

The issue we are having is to get the script to read the content of the file.

What would need to be corrected in order to get this to work?

So far we have the following:

// ********************************************************************************************************
//  This post-conversion script sample search phone number in files from a RegEx then add it to a metadata:
//*********************************************************************************************************



// Extract the document content.
PostConversion.Trace("Extract Document");

try {
  var CHARSET_CP1252 = 1;
  PostConversion.InputDocument.SeekReadPointerInChars(0);
  var documentContent = Postconversion.InputDocument.ReadByteString(PostConversion.InputDocument.BytesCount, CHARSET_CP1252);

  PostConversion.Trace("Get RegEx");

  // Get phone numbers with the REGEX
  var myRegEx = /(\d\d\d-\d\d\d\d)/g;
  var PhoneNumber = DocumentContent.match(myRegEx);
  var PhoneNumbers = "";

  // Add the phone number
  for (i = 0; i < PhoneNumber.length; i++) {
    if (PhoneNumbers === "") {
      PhoneNumbers += PhoneNumber[i];
    }
    else {
        PhoneNumbers += ", " + PhoneNumber[i];
    }
  }
  PostConversion.Trace("Set Field Value");
  DocumentInfo.SetFieldValue("PhoneNumber", PhoneNumbers);
}
catch(e) {
  PostConversion.Trace(e);
}

Thanks!

1 Reply
Gravatar for lbergeron@coveo.com

Answer by Luc Bergeron, May 25, 2015 4:41 PM

I think you made a typo, you are trying to use the documentContent variable with a capital D at the beginning.

Old line: var PhoneNumber = DocumentContent.match(myRegEx);

New line: var PhoneNumber = documentContent.match(myRegEx);

I hope this helps

Gravatar for cbilodeau@coveo.com

Comment by christian, May 26, 2015 3:10 PM

Thanks lbergeron, it was indeed an issue. We also changed :

try{ var documentContent = PostConversion.Text.ReadString(PostConversion.Text.CharsCount)

And we started to get some result. We are still getting some errors but at least we have the field showing up with the right content. Now the goal is to be able get this to work for any document type. At the moment, from the test we made, it is only working for html and txt files.

Thanks!

Ask a question