Index failing for source with multiple addresses on rebuild: InvalidStartingAddressException
The observed behaviour is that single source indexing terminates when ANY initial address fails or returns a 404 not found:
NORMAL 03:55:01 PM The source Static from collection ESRegion was rebuilt. (1 seconds) ESRegion Static ERROR 03:55:00 PM class Merlin::InvalidStartingAddressException: The source could not be refreshed: at least one invalid address was found. ESRegion Static ERROR 03:55:00 PM Invalid starting address: https://www.mysite.com/TEST.html ((404 : Not found) class CGLNetwork::NetworkFileNotFoundException: https://www.mysite.com/TEST.html (404))
This source has many seed addresses, one asset was removed, and the whole source index count then went to 0.
Where is this information documented? I feel like I have read this as expected behaviour previously but cannot locate this information in the online documentation.
Hi and thanks for writing.
Your issue looks similar to this one.
If the proposed solution does not work for you, I suggest you split your source in multiple smaller sources. This way, when you will get the 404 error it will less impact the rest of the crawled addresses.
If you know in advance which addresses are more likely to fail, isolate those in a separate source each and apply the solution proposed in my link. Otherwise, split the addresses randomly to have multiple sources of a similar size.
Hope it helps,