Gravatar for shanidgafur@yahoo.com

Question by sgafur, Oct 2, 2015 6:02 PM

Index pipe delimited files in Coveo

Hi, Is there some documentation out there about using Coveo to index pipe or comma delimited files? I'm trying to figure out the best approach for mapping the values in the input file to fields in the index? Will I need to define a custom document type to do that? Any other steps to take into account?

Any guidance appreciated!

2 Replies
Gravatar for jflheureux@coveo.com

Answer by Jean-François L'Heureux, Oct 5, 2015 9:54 AM

Do your file contains more than one row of data (multiple indexed documents per file) or one file should map to only one indexed document?

If it contains only a single document, I would create and use a post-converison script to extract the values and assign them to CES fields.

If it contains data that should be indexed in multiple indexed documents, I think you would have to code your own connector for your file format.

Gravatar for shanidgafur@yahoo.com

Comment by sgafur, Oct 5, 2015 2:56 PM

Hey Jeff,

The document actually has a few thousand rows of data, each of which I need to be treated as a separate document in Coveo. With an out of the box setup, this actually only indexes a single row into the index as the @systitle field.

Any examples of how to approach the creation of the custom file format converter? (for this type of use case)

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Oct 5, 2015 4:04 PM

The Coveo index conversion phase is to convert one source document in one Coveo index document. You will not be able to split a CSV file into multiple Coveo indexed documents in the conversion phase.

Coveo does not have a CSV connector for the moment but I know that a CSV file can be indexed with the database connector with a custom schema.ini file and the "Microsoft Text Driver". However Coveo doesn't have public documentation about it.

Gravatar for dshelgunov@coveo.com

Answer by Denis Shelgunov, Oct 6, 2015 9:55 AM

Hi,

As Jeff mentioned in his last comment, the Database connector is probably the easiest way to go. All you need is an ODBC or OLE DB driver of the type you need.

Microsoft already installs some of ODBC Drivers by default. I will suggest you to use Microsoft Text Driver. You will need to have a connection string and a configuration file to use in the CES Database source.

For example, to crawl a CSV file with the Database connector using Microsoft Text ODBC Driver to have each row as a different document:

  1. I would use a connection string as shown here https://www.connectionstrings.com/microsoft-text-odbc-driver/
  2. My configuration file will have a query similar to this one: "SELECT * FROM [MyDatabaseFile.csv]"

Useful links:

  • https://www.connectionstrings.com
  • https://msdn.microsoft.com/en-us/library/ms714091(v=vs.85).aspx
  • https://onlinehelp.coveo.com/en/ces/7.0/administrator/database_connector.htm
Gravatar for shanidgafur@yahoo.com

Comment by sgafur, Oct 6, 2015 11:35 AM

Thanks all for the feedback!

Ask a question