question about wildcard searches
when I perform a coveo search for rm*,
the search finds items with rm all by itself first, before any item which has rm plus some more characters. pages and pages of results with rm, before any with rma (for example) are shown.
why are the items which only contain the first part ranked somehow higher than the other items? does not make any sense to me.
The index tends to give more weight to items that are matching exactly what has been entered, vs those matching a derivative. For example, if searching for "universe" a document containing this exact keyword will rank higher than one containing "universities", even though both documents ARE matching the query because of stemming. This is probably something similar happening here, although it concerns wildcards instead of stemming. Still, in my opinion it makes much sense to favor keywords that are closer to the search string.
Assuming that the index is configured to return all the candidates for a wildcard expression (it's a setting, as some expressions could return the whole lexicon), the results will be ranked using all the words matching the expression. For example, if "rm*" matches [rm, rma, rm12, rms, rmve], documents will be ranked as if the query was actually: rm OR rma OR rm12 OR rms OR rmve.
So ranking does not use the length of the keywords or their closeness to the original wildcard expression, but uses the terms gathered from the expression and rank the documents using the standard algorithm (match in [title, summary, concepts], Okapi BM25, etc). So if "rm" matches many document titles as opposed to "rma", documents matching "rm" will be ranked higher.
Does that make more sense ?