Gravatar for ebnoble32@charter.net

Question by sirfergall, Feb 29, 2016 10:56 AM

Sitemap Connector Clarification

I am looking into the sitemap connector and was hoping to get a bit of clarification around how it operates as I am looking into the possibility of indexing a web site via a sitemap.

Say the page www.sample.com/article1/ is listed in the sitemap and has a link on the page to a document at www.sample.com/article1/attachments/document1.pdf as well as a link on the page to www.sampledocumentrepo.com/document2.pdf that are both not listed on the sitemap.

My assumption is that the sitemap connector would simply index www.sample.com/article1/ and not index either www.sample.com/article1/attachments/document1.pdf or www.sampledocumentrepo.com/document2.pdf.

However, I did notice in the overview of the sitemap connector at Configuring and Indexing a Sitemap Source that there was an option to "Index Subfolders".

Does this mean that www.sample.com/article1/attachments/document1.pdf (a descendent of www.sample.com/article1/attachments/document1.pdf) would be indexed with this option selected while www.sampledocumentrepo.com/document2.pdf (an entirely different site) would not?

2 Replies
Gravatar for mtheriault@coveo.com

Answer by Matthieu Thériault, Mar 1, 2016 9:38 AM

The "Index Subfolders" option is not used by the Sitemap Connector (it is a generic parameter on all of our sources configuration).

The sitemap connector will only index the content referenced by your sitemap (XML format for example). If you want to index the attachments, you need to specify them in the sitemap file.

Gravatar for ebnoble32@charter.net

Comment by sirfergall, Mar 1, 2016 10:55 AM

Ah, okay. I wasn't sure if the Sitemap connector also had a crawling/spider element to it like the web connector. Thanks for the info!

Gravatar for jgarneau@coveo.com

Answer by Jonathan Garneau, Jun 16, 2017 7:14 PM

In the same vein, if the sitemap URL redirects to another URL, it's the sitemap URL that will get indexed and will show on the fields for the document in the index (clickuri, sysuri, etc)?

Ask a question