Gravatar for atit.jeerungsawad@kcc.com

Question by atit j, Dec 8, 2016 6:49 AM

No indexed document external source (web pages)

I added web pages type external source for a website we hosted in our infrastructure. We use on-premise Coveo and it sit in the same network as web servers. But it doesn't get any document indexed at all so far as I start building the index since yesterday. The coveo server is definitely has access to the web site as I can browse to it. I also check the log but no useful information at all.

Indexing hang Operation log

Gravatar for slangevin@coveo.com

Comment by Simon, Dec 8, 2016 11:34 AM

It could be a filter issue. What is the starting address (or the format if it is sensitive info) and what is the inclusion/exclusion filters (in the filter menu)?

Gravatar for atit.jeerungsawad@kcc.com

Comment by atit j, Dec 9, 2016 4:57 AM

@Simon there is only one inclusion filter http://sitedomainname.com/*. I did check there is no redirection from http to https. Could it be some thing to do "with User" Agent parameter?

Gravatar for slangevin@coveo.com

Comment by Simon, Dec 9, 2016 1:34 PM

Well the first log says something about Robot txt support. Have you tried not respecting the robot.txt ? You can change it in the source parameters, see here: https://onlinehelp.coveo.com/en/ces/7.0/administrator/modifyingadvancedsource_parameters.htm

1 Reply
Gravatar for atit.jeerungsawad@kcc.com

Answer by atit j, Dec 12, 2016 9:54 AM

I finally find the result as @Simon suggested. I start over from the beginning again and managed to find the option to not respecting the robot.txt in the advance section. (It just a single drop down list)

Here is another article mentioned about this robot.txt file. https://onlinehelp.coveo.com/en/cloud/addeditweb_source.htm

Thanks a lot @Simon

Gravatar for slangevin@coveo.com

Comment by Simon, Dec 12, 2016 10:16 AM

Oups I linked the On-Premises settings. Glad you found the one for the cloud!

Ask a question