Slow initial index rebuild with a large amount of Sitecore fields
We're using Coveo on an existing Sitecore installation. The Sitecore instance has been around for many years, and has been worked on by many teams, and as a result, has a lot of templates, with a lot of fields. Also, there are 5 'web' targets, in addition to the core and master database.
We're using Sitecore 7.1, so we have 7 Coveo indexes. (This would be much more if we were using Sitecore 8+). Each index is using the default setup for each database, where everything is indexed.
As we've been rebuilding these indexes one by one, we've noticed a severe slowness to each rebuild.
Looking at the logs, the indexing won't actually start until all of the fields for the database have been added to the respective field set in Coveo. The logs initially would say:-
Adding 500 to the field set.
but are now saying
Adding 1 to the field set.
every 5 minutes. I assume this slows down because for each field, it's checking all of the existing fields to make sure it's not already added. It may even be checking fields in existing field sets…as my final index has now taken over a day to create the field set, and hasn't even started indexing yet.
Is the large field set the case of our slowness (completed web indexes show around 9600 fields in each field set)? Is there any way to speed this up? Will rebuilds be quicker after the initial one, because the fieldset won't have to be recreated?
9600 fields by field set is a huge quantity of fields! I think you are experiencing a problem where the CES configuration is bigger than the default cache size for it. When this happens, configuration changes are slower the bigger the configuration is growing.
Can you scan your CES system logs to find the last message of that type: "The ConfigObjectCache is swapping to disk. Performance might be adversely affected. Consider increasing the ConfigCacheSize registry value to [number of bits required]".
This will indicate the suggested number of bits CES thinks is enough to store your CES configuration in cache. I suggest you to increase this number to plan for future grow of your configuration.
Then, follow this KB article to set the new configuration cache size: https://developers.coveo.com/display/SupportKB/The+ConfigObjectCache+is+swapping+to+disk.+Performance+might+be+adversely+affected+in+CES+Admin+Tools
I think you will need to restart the CES service after the modification. Stopping might be long because of that cache problem.
On a side note, each time an indexing operation will occur in Sitecore (rebuild, new item, item modified, item deleted, publish…), the fields sets will be synchronized. You will see a spike in network usage as all the fields configuration will be downloaded by Coveo for Sitecore through the Coveo Admin Service to be compared with the Coveo for Sitecore fields configuration. If no changes are required, the fields won't be sent back to CES. If changes are required, only the fields to change will be sent to CES.
Coveo for Sitecore synchronizes the fields at the beginning of each indexing operation to ensure an up to date configuration at any moment. We will introduce a configurable synchronization delay in the next releases to avoid synchronizing too often.
I hope this helps!