Gravatar for gjankowski@ipswitch.com

Question by Greg Jankowski, Mar 17, 2018 1:22 PM

Exclude directories from a sitemap search

I have a sitemap (https://blog.ipswitch.com/sitemap.xml( that also contains localized versions of some of the blog posts.

I do not want to index any pages that contain:

*/es/* */fr/* */de/* */jp/* */tw/*

I cannot find was way to exclude directories like the shared web source.

Any suggestions?

2 Replies
Gravatar for erocheleau@coveo.com

Answer by Etienne, Mar 19, 2018 2:49 PM

Would you not prefer indexing them all and adding a metadata field for language based on the URL? That you could eventually display as a facet?

If you don't want to index them, you can reject documents with an Indexing Pipeline Extension, or you could try with the inclusion filters directly on the source?

Gravatar for gjankowski@ipswitch.com

Comment by Greg Jankowski, Mar 19, 2018 3:21 PM

At this point, I'd like not to index them.

I'll look at both methods you described.

Thanks

Gravatar for gjankowski@ipswitch.com

Comment by Greg Jankowski, Mar 19, 2018 3:23 PM

The issue with the inclusion filters (and exclusion) is that they are not available on a sitemap source.

Gravatar for gjankowski@ipswitch.com

Answer by Greg Jankowski, Mar 19, 2018 4:53 PM

So what I did was just edit the Tab filter expression to

@syssource=="Sitemap - Ipswitch Blog" AND language="English"

that seems to have worked fine.

Any downside to this approach?

Ask a question