Gravatar for loebrandy@gmail.com

Question by rloeb, Jul 11, 2016 12:19 PM

Sharepoint crawler and "attachment" index documents

We're crawling a fairly large sharepoint source, and we notice that we get some documents in the index that show up as "Attachment: Yes" in the management ui, also these seem to have file type ".oleFile"

What exactly are these, and what if we don't want them in our index? Thanks.

1 Reply
Gravatar for dshelgunov@coveo.com

Answer by Denis Shelgunov, Jul 11, 2016 4:08 PM

Hi,

I am not able to find any information about ".oleFile" file type. Can you provide me with an example please? You can make me a screen shot and upload it to Google Drive or Dropbox. Than please provide me the link. I may be some temporary files or a custom extension used internally by SharePoint.

If by "file type" you mean file extension (ex: ".txt" in "filename.txt"), you can use a the document type menu in the Administration Tool. More info here: https://onlinehelp.coveo.com/en/ces/7.0/administrator/administrationtool-documenttypes_menu.htm

Thanks for writing.

Denis S.

Gravatar for dshelgunov@coveo.com

Comment by Denis Shelgunov, Jul 11, 2016 4:13 PM

A quick precision, with the document type set you can specify to CES how to handle a file by its type. In your case, you will want to select the action "Reject document" for ".oleFile" file type.

Denis S.

Gravatar for loebrandy@gmail.com

Comment by rloeb, Jul 13, 2016 4:07 PM

Ok, will follow up later with example, but more interested in how/why coveo flags it as an attachment.

Gravatar for dshelgunov@coveo.com

Comment by Denis Shelgunov, Jul 14, 2016 2:31 PM

I have found the missing information.

The '.oleFile' is an extension used by the converters when they extract ole objects from Microsoft files (docx, ppt…). Each retrieved object will be indexed a separate attachment document.

More information about Microsoft ole objects here: https://support.office.com/en-us/article/Create-change-or-delete-an-OLE-object-F767F0F1-4170-4850-9B96-0B6C07EC6EA4

Denis S.

Gravatar for loebrandy@gmail.com

Comment by rloeb, Mar 6, 2017 11:36 PM

Hi Denis, is there a way to stop the crawler from creating separate index entries for the embedded documents, and instead just have their content considered part of the parent document's full text search text? Otherwise we will have to use folding…

Gravatar for fdeschodt@coveo.com

Comment by fdeschodt, Mar 7, 2017 2:17 PM

Hi,

The crawler does not create the entries. The process of separating attachments is done in the converters and unfortunately, there is no way to change that behavior.

Fabien

Ask a question