Gravatar for joyceyurh@strateqgrp.com

Question by whyy, Oct 31, 2014 10:47 PM

will there be a @sysconcept in every document indexed?

Will there be a @sysconcept in every document imdexed? Just noticed one of the document i indexed is without @sysconcept, what does that means?

I am using the @sysconcept to retrieve related results, if without @sysconcept, any suggestions what else i can use?

I am using web connector.

Thanks.

1 Reply
Gravatar for mlaporte@coveo.com

Answer by Martin Laporte, Nov 2, 2014 9:19 AM

All documents in the index will have a @sysconcepts field, provided that the document contains textual data from which such concepts can be extracted. This happens no matter what connector is being used. If some of you document have no @sysconcepts I suspect they simply contain no relevant text from which the concept extractor can do it's job.

Gravatar for joyceyurh@strateqgrp.com

Comment by whyy, Nov 2, 2014 7:12 PM

we are currently using the keywords extracted from @sysconcepts to retrieve related results, if a document can return 0 @sysconcepts, what other ways I can use to retrieve related results?

Will text analytics be a better way of getting keywords for related results?

Gravatar for joyceyurh@strateqgrp.com

Comment by whyy, Nov 2, 2014 11:04 PM

I understand that the web connector doesn't work very well if the whole page is written in js. The page that we indexed with no @sysconcepts is using storify (javascript), do you think that will be the cause?

http://www.thestar.com.my/News/Nation/2014/10/28/Storify-Anwar-Verdict/

we are currently using the keywords extracted from @sysconcepts to retrieve related results, if a document can return 0 @sysconcepts, what other ways I can use to retrieve related results?

Will text analytics be a better way of getting keywords for related results?

Gravatar for mlaporte@coveo.com

Comment by Martin Laporte, Nov 3, 2014 6:23 AM

Yes, from what I see the HTML doesn't contain any part of the article body. The web connector doesn't evaluate JavaScript. One possibility would be to look for a user agent that causes the site to return the article in a bot-friendly fashion. The text analytics work using the same body as the one retrieved by the web connector, so it wouldn't fare better here.

Gravatar for joyceyurh@strateqgrp.com

Comment by whyy, Nov 3, 2014 8:04 AM

Is there a way to crawl a js page?

Gravatar for slangevin@coveo.com

Comment by Simon, Nov 3, 2014 5:21 PM

No, not for now but the R&D team is working on an enhancement of the web connector. There is no ETA however.

Gravatar for joyceyurh@strateqgrp.com

Comment by whyy, Nov 4, 2014 7:38 PM

what if it's a js page but the content is added to the meta fields, will it help?

Gravatar for mlaporte@coveo.com

Comment by Martin Laporte, Nov 5, 2014 5:18 AM

Hmm that should work for making the content searchable: You can arrange for a field to contain the content of a (using the field "meta" property in the admin tool). Then that field will be populated using the meta value from the html. If that field has the "free text" option selected, the keywords in it will be allowed to match against free text queries.

But this won't impact the extracted @sysconcepts (as far as I know). Those are computed from the document body.

Would it be possible to just include the content in some kind of hidden

element in the page?

Ask a question