Question by rcapple, Feb 27, 2017 3:10 PM

Need the ability to do de-duplication conversion prior to receiving a query

Need the ability to do more robust deduplication

Protocols and extensions on URLs are not easily deduplicated. Such as…HTTP vs HTTPs, and treating trailing slash “/”, file and extensions (.htm, .html, aspx) as synonyms to deduplicate from

Current deduplication efforts have cause other issues such as incorrect pagination and result counts for the end user.

Please Help.

Comment by François Lachance-Guillemette, Feb 28, 2017 2:26 PM

I suppose you are using the enableDuplicateFiltering parameter, which removes similar documents from the same query, this is why you are getting incorrect pagination and results count.

To have a robust deduplication, it is requires not to index items that are similar.

What is the type of documents you are trying to deduplicate? Is it Sitecore items or items in another source type?

