Sitemap Connector Clarification
I am looking into the sitemap connector and was hoping to get a bit of clarification around how it operates as I am looking into the possibility of indexing a web site via a sitemap.
Say the page
www.sample.com/article1/ is listed in the sitemap and has a link on the page to a document at
www.sample.com/article1/attachments/document1.pdf as well as a link on the page to
www.sampledocumentrepo.com/document2.pdf that are both not listed on the sitemap.
My assumption is that the sitemap connector would simply index
www.sample.com/article1/ and not index either
However, I did notice in the overview of the sitemap connector at Configuring and Indexing a Sitemap Source that there was an option to "Index Subfolders".
Does this mean that
www.sample.com/article1/attachments/document1.pdf (a descendent of
www.sample.com/article1/attachments/document1.pdf) would be indexed with this option selected while
www.sampledocumentrepo.com/document2.pdf (an entirely different site) would not?
The "Index Subfolders" option is not used by the Sitemap Connector (it is a generic parameter on all of our sources configuration).
The sitemap connector will only index the content referenced by your sitemap (XML format for example). If you want to index the attachments, you need to specify them in the sitemap file.
In the same vein, if the sitemap URL redirects to another URL, it's the sitemap URL that will get indexed and will show on the fields for the document in the index (clickuri, sysuri, etc)?