Gravatar for alok.gupta@perficient.com

Question by Alok Gupta, Nov 1, 2017 5:41 PM

crawling web pages hidden behind a cookie?

We are currently indexing the web pages on the Google appliance (GSA) that are hidden behind a page that requires a user to accept the terms and agreement. The terms and agreement page drops a cookie once the user accepts the terms agreement. On GSA, we set up a cookie so that the crawler pass that cookie to every request and web server, therefore, allows Google crawler to crawl the subsequent pages.

Is there a way in Coveo platform to set up a predefined cookie when configuring a web connector?

Please note - On GSA, it is using the forms authentication rule to accept the terms and condition in order to generate the cookie. Please note there is no username and password involved here.

Gravatar for fjalbert@coveo.com

Comment by fjalbert, Nov 3, 2017 1:47 PM

This question is currently being addressed in a case.

We will come back with an answer once the case is solved.

1 Reply
Gravatar for fjalbert@coveo.com

Answer by fjalbert, Jan 12, 2018 3:05 PM

Hi @DEEPTHI KATTA,

At the moment the best way to address the situation is by using the sitemap connector instead of the web connector.

This connector allow the user of the parameters UseCookies and ManualCookies.

Once you created the sitemap source, edit its JSON configuration and find the parameters section.

It should look like something like this:

    "parameters": {
      "PauseOnError": {
        "sensitive": false,
        "value": "true"
      },
      "EnableJavaScript": {
        "sensitive": false,
        "value": "true"
      },
      "Timeout": {
        "sensitive": false,
        "value": "100"
      },
      "UserAgent": {
        "sensitive": false,
        "value": "Mozilla/5.0 (compatible; Coveobot/2.0;+http://www.coveo.com/bot.html)"
      },
      "JavaScriptLoadingDelayInMilliseconds": {
        "sensitive": false,
        "value": "0"
      },
      "OrganizationId": {
        "sensitive": false,
        "value": "MyOrgIDabc123"
      },
      "SourceId": {
        "sensitive": false,
        "value": "q4qdhjdeidzm6wjbtr32lmkheq-MyOrgIDabc123"
      }
    },

Here are the format of the parameters you want to add inside the parameter section:

      "UseCookies": {
        "sensitive": false,
        "value": "true"
      },
      "ManualCookies": {
        "sensitive": false,
        "value": "MyFirstCookieName=MyFirstCookieValue;Path=/samplepath;Domain=www.example.com;;MySecondCookieName=MySecondCookieValue;Path=/samplepath;Domain=www.example.com"
      },

Gravatar for fjalbert@coveo.com

Comment by fjalbert, Jan 12, 2018 3:14 PM

Note that you need to know the name and value of the cookies for this solution. In this example the source will send the cookies MyFirstCookieName and MySecondCookieName when crawling www.example.com/samplepath.

Hope this help.

Franck

Gravatar for dipsindol@gmail.com

Comment by DEEPTHI KATTA, Jan 12, 2018 3:51 PM

Perfect! Thank you so much. This is very helpful.

Will try this out

Ask a question