Gravatar for

Question by jupitersspot, Dec 7, 2016 3:24 PM

Salesforce - how much data is too much

In a Org I have a custom object that is transactional -- we write ~6m rows to it per month. The object has ~20 fields that I'd like to index. I imagine each row in the object would constitute one Coveo Document. In terms of data volumes, is six million Coveo Documents per month a little, nothing to worry about, or a problem? Let's say it's a manageable amount; for how many months could I continue to add ~6M Coveo Documents to my index before the data volumes present a performance or capacity issue?

Question 2: let's say I add 150k rows per day and I set Coveo to index daily. How many API queries will Salesforce count? In other words, is my data capacity question above moot simply because Salesforce won't allow the API to extract that much data anyway?

Gravatar for

Comment by maveilleux, Dec 8, 2016 5:20 AM

Hi Jupiter's spot,

Question 1: Before answering the question, I'd like to know more about that object. How is it secured? What's the permission model? Do you need it to be secured in the search (aka is it public)? Is it replicateable?

Question 2: Normally, we fetch the objects incrementally each 5min. That's about 300 calls per day. Then, SOQL pages contains 2k records. Let's say we are not lucky and we must fetch 2 pages per 5min, we then total 600 calls per day. Now, depending on the security model, it may be 0 call to 1 call per 200 records. In the worst case scenario, we need 1500 calls per day to fetch the 150k new documents and its securities. That's the crawling part. For the security part, it really depends on the number of users and the permission model. How many users are in your organization?


Gravatar for

Comment by jupitersspot, Dec 8, 2016 12:53 PM

  1. The Org Wide Defaults are set to private, with 2-3 specific profiles set to be able to update and read. In practice rows are generated from an external process using the API to insert new rows. Not very many humans update these rows. This search would be used solely by a small number of internal users; it's not public. The use case is around failure analysis: there is a status code on the object, and if the status is "Bad" there are a number of other fields that will provide details on why the status is "Bad" including the identity of the user who (programmatically) created the row. My use case is to be able to see which users are most responsible for which type of user error. Effectively I'd be using Coveo search as an analytic front end. A couple times each month I'd be looking at search results, but it would be over a ton of data.
  2. There are literally thousands of users in the org, but most have no access to the object in question. We're talking about fewer than 10 users ever using this search, and could find mechanisms outside of Coveo security to control access to the search to the short list of authorized users.
1 Reply
Gravatar for

Answer by Gauthier Robe, Dec 9, 2016 9:11 AM


Regarding your question about "how much data is too much": today we have a limit around 30 million items in our cloud index, we are planning to raise that limit to 60 million in late Q1-Q2 2017 timeframe. If you are looking to add 6M items every month, you should be able to keep that pace for 10 months or so.

Note that we are also evaluating other solutions that would raise the item limit much higher but I don't have a committed timeline to share. We will keep you updated with our progresses.

Hope this helps.

Ask a question