Based on a unique meta data value skip adding to Index
On Coveo cloud, is there a way to not add a document to an index in scenario like below -
"If there is a meta data field having a unique key - Then, check to see if any other document on the source has the same value on such key, If yes, do not add this specific item to index. "
If this is not possible how else can we ensure uniqueness on search results, our only way to track uniqueness at this point is either through a special meta data value or via combination of parameters on URL. Essentially it is called duplicate if the end page result is the same, but, the URL params are subtly different.
This below is not an option for us as it leads to incorrect numbers on result listing and it might not be performance efficient.
Any thing else we can do to solve this duplicate issue?
The Coveo index primary key is the indexed item URI field. It is unique per source. If 2 items with the same URI are indexed during the same rebuild, the second one overrides the first one. However, nothing stops 2 sources to contain the same document with the same URI.
It is possible to avoid indexing some items based on their metadata values with indexing pipeline extensions in Coveo Cloud. You can write Python scripts that are executed pre or post conversion. Those scripts can tell Coveo not to index the document. in your case it would be post-conversion as the meta tags of the HTML documents are read at conversion.
However, it is not possible to query the index in a conversion script to check indexed document field values. Even if it would be possible, it is to be avoided. Imagine the load on the index if every indexed item require a query. It would be very very slow.
The Coveo index is able to compare indexed items and know which ones are content duplicates (Same original document content). They can have different field values, even different URIs. This is achieved with the `enableDuplicateFiltering` you already found. It is an effective way to filter out duplicates at query time.