Gravatar for tsheaffer@trellist.com

Question by Troy Sheaffer, Jul 11, 2016 6:58 PM

Index failing for source with multiple addresses on rebuild: InvalidStartingAddressException

The observed behaviour is that single source indexing terminates when ANY initial address fails or returns a 404 not found:

NORMAL 03:55:01 PM The source Static from collection ESRegion was rebuilt. (1 seconds) ESRegion Static ERROR 03:55:00 PM class Merlin::InvalidStartingAddressException: The source could not be refreshed: at least one invalid address was found. ESRegion Static ERROR 03:55:00 PM Invalid starting address: https://www.mysite.com/TEST.html ((404 : Not found) class CGLNetwork::NetworkFileNotFoundException: https://www.mysite.com/TEST.html (404))

This source has many seed addresses, one asset was removed, and the whole source index count then went to 0.

Where is this information documented? I feel like I have read this as expected behaviour previously but cannot locate this information in the online documentation.

Thanks

1 Reply
Gravatar for dshelgunov@coveo.com

Answer by Denis Shelgunov, Jul 12, 2016 11:52 AM

Hi and thanks for writing.

Your issue looks similar to this one.

If the proposed solution does not work for you, I suggest you split your source in multiple smaller sources. This way, when you will get the 404 error it will less impact the rest of the crawled addresses.

If you know in advance which addresses are more likely to fail, isolate those in a separate source each and apply the solution proposed in my link. Otherwise, split the addresses randomly to have multiple sources of a similar size.

Hope it helps,

Denis S.

Gravatar for tsheaffer@trellist.com

Comment by Troy Sheaffer, Jul 12, 2016 12:52 PM

Thank you very much for the documented behavior location and for the suggested remedies.

Ask a question