Gravatar for pkashko@salesforce.com

Question by pavel, Mar 14, 2017 1:23 PM

Sitemap order of URL indexing

Hello, When Coveo crawls over a sitemap does it request urls in the order they are in the sitemap?

Currently in Coveo testing org it looks to me like crawling is a single threaded process that follows exact sitemap index and sitemap url order.

Does it behave exactly the same in Production? If no, then how does it handle Url order in a sitemap? Does it respect order at all?

The reason I’m asking is that I could implement a special server side tracking feature in this case. I could add a special url in the end of every sitemap to know for sure that crawling is done for a particular sitemap.

1 Reply
Gravatar for ldblanchet@coveo.com

Answer by ldblanchet, Mar 14, 2017 2:06 PM

The sitemap is downloaded to extract all URLs to refresh. Depending of the number of refresh threads (which I think is 2 by default), we process each URL in parallel, but in the order that are defined in the sitemap.

Gravatar for pkashko@salesforce.com

Comment by pavel, Mar 14, 2017 2:13 PM

Good to know, thank you! Just to clarify a bit more: are urls distributed amongst threads in a way that all the threads will finish at approx the same time or is it possible for one thread to finish a lot earlier than the other thread?

Gravatar for ldblanchet@coveo.com

Comment by ldblanchet, Mar 14, 2017 2:18 PM

Unless a page takes significantly longer to download than the average download time, all threads should finish roughly at the same time.

Ask a question