Optimize SPARQL Query

125 views Asked by At

I have query that finds similar tastes in movies. Such that the absolute value of difference between average rankings of users over same genre is less that 1:

SELECT ?p ?p1 ?genre
WHERE{
?p movies:hasRated ?rate.
?p1 foaf:knows ?p.
?rate movies:ratedMovie ?mov.
?rate movies:hasRating ?rating.
?mov movies:hasGenre ?genre.
?p1 movies:hasRated ?ratep1.
?ratep1 movies:ratedMovie ?movp1.
?ratep1 movies:hasRating ?ratingp1.
?movp1 movies:hasGenre ?genre.
FILTER (?p=movies:user1)
}
GROUP BY ?p ?p1 ?genre
HAVING (abs (AVG(?rating)-AVG(?ratingp1))<1.0)

I would like to ask, whether is it possible to optimize it ? As it looks so bad(

Here is part of the dataset, where it will be used:

movies:Man_of_steel movies:hasGenre "action", "thriller" .

movies:Elysium movies:hasGenre "drama", "sci-fi" .

movies:Gravity movies:hasGenre "sci-fi", "drama" .

movies:Django_Unchained movies:hasGenre "thriller", "action" .

movies:user1 movies:hasGender "male" ;
           movies:hasAge "30"^^xsd:float ;
           movies:hasRated movies:Rating1, movies:Rating2 .

movies:Rating1 movies:ratedMovie movies:Gravity ;
               movies:hasRating "4.0"^^xsd:float .

movies:Rating2 movies:ratedMovie movies:Django_Unchained ;
               movies:hasRating "9.0"^^xsd:float .

movies:user2 movies:hasGender "female" ;
             movies:hasAge "27"^^xsd:float ;
             movies:hasRated movies:Rating3, movies:Rating4 ;
             foaf:knows movies:user1 .

movies:Rating3 movies:ratedMovie movies:Elysium ;
               movies:hasRating "3.0"^^xsd:float .

movies:Rating4 movies:ratedMovie movies:Gravity ;
               movies:hasRating "5.0"^^xsd:float .
2

There are 2 answers

0
Jeen Broekstra On BEST ANSWER

A slight alternative to Joshua's query which should work on your Sesame database (which is an older version that contains a bug in property path evaluation):

SELECT ?p ?p1 ?genre WHERE {

  ?p  movies:hasRated [ movies:ratedMovie [ movies:hasGenre ?genre ];
                        movies:hasRating ?rating ].

  ?p1 foaf:knows ?p ;
      movies:hasRated [ movies:ratedMovie [ movies:hasGenre ?genre ];
                        movies:hasRating ?ratingp1 ].
  FILTER (?p = movies:user1 )
}
GROUP BY ?p ?p1 ?genre
HAVING (abs (AVG(?rating)-AVG(?ratingp1))<1.0)

As you see, similar to Joshua's query, except that here we do not use property path but use a further blank node, and also do not use a values clause (which also has a bug in 2.7.8).

I would really recommend that you update your Sesame database though - 2.7.8 was released in 2013, we've fixed a ton of bugs since then (not to mention significantly improved the query editor in the workbench - it now has nice colors and autocomplete features).

16
Joshua Taylor On

iI don't see that your query is particularly poorly optimized, but since you mention that it looks bad, I expect that you're asking about formatting. It's OK, as it is now, but you could remove a few of the variables and use blank nodes and property paths instead. E.g.:

SELECT ?p ?p1 ?genre WHERE {
  values ?p { movies:user1 }

  ?p  movies:hasRated [ movies:ratedMovie/movies:hasGenre ?genre ;
                        movies:hasRating ?rating ].

  ?p1 foaf:knows ?p ;
      movies:hasRated [ movies:ratedMovie/movies:hasGenre ?genre ;
                        movies:hasRating ?ratingp1 ].
}
GROUP BY ?p ?p1 ?genre
HAVING (abs (AVG(?rating)-AVG(?ratingp1))<1.0)