Crawl a Subversion site exposed over webdav using http
I would like to understand if there is a way to index subversion repository which currently houses various types of documents like word,pdf,excel,txt,zip etc.
Since SVN (Subversion) itself can be exposed over webDAV using http protocol can coveo crawl this site including following the links as deep as possible and create an Index ?
I would then go ahead and build a web front end which searches the index and link back to SVN repository
Thanks for your help
If the SVN repository is accessible over HTTP (ie. in the browser), you can use the Web Crawler to index the content. However, you must setup the source so that index pages are filtered out. For instance, you can crawl this repository. (which I found on google and seems public).
The Web Crawler will follow the hyperlinks like nay website and download the binary files depending on the Document type settings on the source.
Then, you can add a source filter to remove pages without extensions, so that only binary files are indexed and "index" pages and removed. Remember to check the "Expand before filtering" checkbox in the UI!