Exclude directories from a sitemap search
I have a sitemap (https://blog.ipswitch.com/sitemap.xml( that also contains localized versions of some of the blog posts.
I do not want to index any pages that contain:
*/es/* */fr/* */de/* */jp/* */tw/*
I cannot find was way to exclude directories like the shared web source.
Any suggestions?
Would you not prefer indexing them all and adding a metadata field for language based on the URL? That you could eventually display as a facet?
If you don't want to index them, you can reject documents with an Indexing Pipeline Extension, or you could try with the inclusion filters directly on the source?
So what I did was just edit the Tab filter expression to
@syssource=="Sitemap - Ipswitch Blog" AND language="English"
that seems to have worked fine.
Any downside to this approach?
Comment by Greg Jankowski, Mar 19, 2018 3:21 PM
At this point, I'd like not to index them.
I'll look at both methods you described.
Thanks
Comment by Greg Jankowski, Mar 19, 2018 3:23 PM
The issue with the inclusion filters (and exclusion) is that they are not available on a sitemap source.