Gravatar for raju.gattu@gmail.com

Question by raju goud, Oct 27, 2016 7:56 AM

How can I add the option to hide pdf content from being indexed and appearing on search document

Hi,

currently we have different category documents in my products and we are showing document Title and Description as PDF scrapped content(document file) on the search result.

As per the new requirement one of category type documents should show only Document Title on the search result and this has been done with help of UI changes(category filters).

But the problem is when i search with pdf content(text) which is attached to document are showing on the search result, This should not be a case. We should show only on the search results when search with document Title.

How to hide the document from search results when we search with pdf content.? The Basic purpose of the requirement is to have the pdf content not be indexed on the particular document type category.

Kindy help me on this Thanks in advance.

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Oct 27, 2016 9:40 AM

Hi,

By default, Coveo indexes the body of the documents. This can be changed for certain file types but not on a subset of files of a certain type. @lbergeron answer explains a way to do it.

As a best practice, it's very important for search results relevance sorting to have the document body indexed and free-text searchable.

Can you explain your use case with more details for me to understand why you are trying to reduce the relevance of your search results by making their body not free-text searchable?

Thanks,

Jeff

Gravatar for raju.gattu@gmail.com

Comment by raju goud, Oct 28, 2016 7:35 AM

Thanks Jeff for your reply. The below is the use case: Currently when we search for documents most results cause a large number of 3D models(document types) to be displayed. It appears that this is because the entire "document" of a 3D model is being indexed by Coveo.

This causes a bad user experience because Cover is pulling tons of lines of 3D model code and delivering them as relevant results for the user. I'd like to remove the 3D model document types from indexing the entire document, so that only the title and description are indexed and delivered with relevant search results. I hope this will help you understand problem correctly.

Thanks!.

Gravatar for jflheureux@coveo.com

Comment by Jean-François L'Heureux, Oct 28, 2016 9:47 AM

Hi Raju,

This is a very good use case indeed. Thanks for the clarification.

One quick fix you can do to improve your user experience is to negatively boost the 3D model documents in the search results so they have less importance than other type of content but still returned in the search results in case someone is looking for them by using terms found in their body. This can be done with Query Ranking Expressions (QRE).

If you are using Coveo for Sitecore, Coveo for Salesforce or simply, the Coveo JavaScript Search Framework for your search interface, you can add this in your advanced query to decrease the 3D models ranking:

$qre(expression: "Coveo query that only returns 3D model documents", modifier: "-100")

The expression will depend on the values of your indexed fields and your setup but can look like this:

@filetype==PDF @AnotherField=="A value that is only there for 3D models"

As for the modifier, you can start with -10 and decrease until you get a satisfying user experience. Boosting with a large modifier is not recommended as it overrides the default ranking algorithms of Coveo.

I hope this helps,

Jeff

1 Reply
Gravatar for lbergeron@coveo.com

Answer by Luc Bergeron, Oct 27, 2016 9:14 AM

Hi,

If I understand correctly, you want to disable the free-text search on the document body for a specific set of documents. One way to achieve that goal would be to not index the PDF content for those documents. This can be done using post conversion scripts. Here is a sample showing how to override the document body. In your specific case, you would set the document body to an empty string.

https://developers.coveo.com/display/Converter/Changing+View+as+HTML+Version+of+Document https://onlinehelp.coveo.com/en/ces/7.0/administrator/addingapostconversion_script.htm

The drawback is that you would have no more quickview for those documents.

I hope this helps

Gravatar for raju.gattu@gmail.com

Comment by raju goud, Oct 28, 2016 7:37 AM

thanks for your lbergeron. I will try your solution and send my feedback on this.

Ask a question