Restricting Web Pages refresh
We have a web site that deals with about 45000 Laserfiche documents. You do not have a connector for Laserfiche so the only way I can include them in a Web Pages Source is to create a separate web page(ListLaserficheDocs.aspx) that lists every document via a link. That link goes to a ASP.NET page (ViewDoc.ashx) where I retrieve the information and set the Last-Modified http header attribute. The problem is that a refresh takes 18+ hours to complete and we want it to run daily. It takes this long because we have to retrieve each document from Laserfiche.
I tried to just set the Last-Modified header and not retrieve the document from Laserfiche but Coveo still set it to Updated and therefore it returned 0 bytes.
Is there a way to set the http header properly in a file so that Coveo does not reindex the document if it has not been changed? Does Coveo also check the file size as well? I can code it so it only writes the header information if the file has not been changed in the last x days.
The web crawler first does a Head request first and checks the last-modified date to see if it is different from the one in the index for that document. If it is different it will download the document, if the time is the same it won't download the document. You will most likely need to one full refresh that will re-download all the documents before this starts to work.
Please let us know if this was helpful.