How to make this keyword search work?

152 views Asked by At

I am trying to create keyword search for my meteor webapp. And For the most part it works the problem is it is very slow. In the current form when making a article the user gives it keywords. keyS queries one article with a keyword from the search array(skeywords) at a time from mongodb then gives it a score and the 100 highest scored articles are sent to the user. How could it query all the relevant articles at once?

ps Am I going about this all wrong.

The data coming from the client looks like this.

var keyw = ['java','code','jdk','food','good','cook'];
Meteor.call('keyS',keyw);

the data coming out of 'keyS' looks is a array of article ids.

example

Sarticles = [someid,someid]

server

Meteor.methods({
    keyS: function(skeywords) {
        article: 'tempid',
            var score = {
        totalScore: 0
        };
        var potentials = [];
        var badArticles = [];
        var i = 0;
        while (i < skeywords.length) {
            var key = [];
            key.push(skeywords[i]);
            console.log(key);
            if (typeof badarticles == "undefined") {
                var theArticle = Articles.findOne({
                    articlekeywords: {
                        $in: key
                    }
                });
            } else {
                var theArticle = Articles.findOne({
                    $and: [{
                        articlekeywords: {
                            $in: key
                        }
                    }, {
                        _id: {
                            $nin: badArticles
                        }
                    }]
                });
            };
            if (typeof theArticle == "undefined") {
                console.log("no more articles with that keyword")
                i++;
                continue
            }
            score.post = theArticle._id;
            console.log(score.article);
            score.totalScore = 0;
            var points = 0;
            var theKeywords = thearticle.keywords;
            console.log("score worked");
            var points = 0;
            for (var a = 0; a < skeywords.length; a++) {
                var keynumber = theKeywords.indexOf(skeywords[a]);
                if (keynumber > -1) {
                    points++
                } else {
                    continue
                }

            };


            score.totalScore = points;
            console.log(score.totalScore);
            if (score.totalScore > 2) {
            //limiter on number of posts looked at and number added to potentials
                potentials.push({
                    iD: score.post,
                    totalScore: score.totalScore
                });
                var ID = score.article;
                badposts.push(score.article);
                console.log("added to potential" + ID + "to bad");
            } else {
                var badId = score.post;
                console.log("marked as bad" + badId);
                badposts.push(score.post);
            }
        };
        potentials.sort(function(a, b) {
            return b.totalScore - a.totalScore
        })
        for (var b = 0; b < 100; b++) {
            if (typeof potentials[b] == "undefined") {
                break
            };
            var ID = potentials[b].iD;
            Meteor.users.update({
                "_id": this.userId
            }, {
                "$addToSet": {
                    "Sarticles": ID
                }
            });
        }
    }

});
1

There are 1 answers

0
Crenshinibon On

I guess the problem is the server round-trip. For better user experience you should publish/subscribe the keywords list, e.g. make it available in the client, and then search client side.

You should keep in mind, that the keyword list might grow very long. In my search package (Spomet, not 1.0 ready, though) I'm publishing only the 1000 most often used words (besides the most common stop words, like 'and').

My code isn't very tidy, but it might help nevertheless:

Here is the client side handling. Searching at the client and than later update the client side results with real results from the server: https://github.com/Crenshinibon/spomet-pkg/blob/master/client.coffee

Here is the server side code. The publishing happens close to the end of this file: https://github.com/Crenshinibon/spomet-pkg/blob/master/server.coffee

One other aspect. You might consider reversing the data representation for your keywords. Use the keywords as the lookup (in a separate collection) and store the article ids, were the keyword in question is used, in an array. Search wikipedia for 'Inverted Index' for some background.