Gravatar for

Question by Cris Corra, Apr 7, 2015 3:49 PM

Is there an example of how to successfully provide web form login values to be able to crawl secured content in a web pages source?

I'm indexing a web pages source (a site we own), in which some pages are secured by authentication. I'm passing a valid user account/password and other values to the form fields that were downloaded in the Forms section. However, when I test a secure page from the Form details screen in the source, it is returning the login page (not authenticated). The page is supposed to redirect to the login page, then redirect back to the original page after authentication. I also used Fiddler to make sure I am capturing and passing the correct form values. Am I missing something? Or will this not work?

Gravatar for

Comment by Jean-François L'Heureux, Apr 7, 2015 3:58 PM

Did you follow this guide to configure your forms?

Gravatar for

Comment by Cris Corra, Apr 7, 2015 4:26 PM

Yes I did.

Gravatar for

Comment by ldblanchet, Apr 7, 2015 5:27 PM

Well, so far, everything you have done is correct. What the crawler does it a simple http request. If Fiddler shows that a manual login operation and what the crawler does is identical (all details of the request), then it should work.

I'll try getting a meaningful example, but each form makes for pretty specific cases.

Gravatar for

Comment by ldblanchet, Apr 7, 2015 5:57 PM

2 things to note. You can try disabling cookies in the Advanced section of the source, this can influence how the authentication happens. Also, if the redirect is done in Javascript, I am not sure our crawler can follow that, as we do not execute any javascript. If that occurs, you could manually inject the authenticated cookies.

Gravatar for

Comment by Matthieu Thériault, Apr 7, 2015 6:46 PM

Just as another clarification, the cookies should be enabled and in this case, the option "Disable cookies" (in Advanced) should not be checked.

Gravatar for

Comment by Sebastien Desilets, Apr 8, 2015 9:30 AM

Also, depending on how the login page is implemented you may have to remove some of the values automatically retrieved during "Retrieving the form parameters from a website". Anything that is time-sensitive in the login form state may prevent your login attempt, ASP.Net ViewState for example may prevent a login if the value is no longer valid.

Do you see any differences when comparing a Fiddler trace of you login on using the same credentials as the Web Pages source versus a Fiddler trace captured when clicking "Test the Form Using This Address " in the Coveo Admin Tool?

Gravatar for

Comment by Cris Corra, Apr 8, 2015 10:03 AM

Thanks for the info guys! I'm still not able to get the pages to authenticate with the suggestions you posted above. I'm trying to get some info around the login to see if js is used. This is a Sitecore 6.5 site that uses non-Sitecore/.NET authentication. I tried setting it up with a Sitecore connector and use the FormsAuthUserControl and FormsAuthPasswordControl parameters, but I couldn't find an example how to pass the actual values to those parameters. Any examples with this method would help too.

Gravatar for

Comment by Matthieu Thériault, Apr 8, 2015 2:42 PM

Does this page help you to configure these hidden parameters for Sitecore?

Gravatar for

Comment by Cris Corra, Apr 8, 2015 3:07 PM

Yes, but how do you pass the actual login values in the Sitecore source? Those params are for setting pointers to the input controls. I have a dedicated username/password to use for all authentication on all pages.

Gravatar for

Comment by Matthieu Thériault, Apr 8, 2015 4:01 PM

It will pass the username and the password (user identity) defined in the Sitecore source (Security > Authentication).

Gravatar for

Comment by Cris Corra, Apr 8, 2015 4:38 PM

Doesn't that have to be a sitecore (usually admin) account in the user identity? The page authentication is a custom external username/password combination.

Gravatar for

Comment by Jean-François L'Heureux, Apr 9, 2015 4:23 PM

Your login form seems to be really tricky. Is it public so we can have a look at it to understand its behavior?

Troubleshooting a login form is out of scope of this site. I suggest you to open a ticket with Coveo Support. Please provide 2 Fiddler traces with your ticket:

  • One of a successful manual login in your browser. (The trace should begin when you load the URL of the page that asks for a login. The same URL that you enter in the Administration Tool to test your form login)
  • One of the unsuccessful login test from the Forms section of your Web Pages source.

Thank you.

Gravatar for

Comment by Cris Corra, Apr 17, 2015 4:39 PM

Here is the public site.

The page which I've been testing with (and redirects to login and back after login) is

0 Reply
Ask a question