TechQA.

pandas HDFStore: sql "GROUP BY" equivalent

94 views Asked by ARF At 17 November 2014 at 11:52

I am trying to translate the following SQL query to run on a large pandas HDFStore:

SELECT * FROM mytable
JOIN (
  SELECT col1, col2, col3, max(colN) as maxColN
  FROM mytable
  GROUP BY col1, col2, col3
) m
ON m.col1=mytable.col1 AND m.col2=mytable.col2 AND m.col3=mytable.col3
WHERE colN=maxColN

What would be the best way to implement this? I have indexes on col1, col2, col3.

There are 0 answers