I have a big Excel file, which i read with Excel Provider in F#.
The rows should be grouped by some column. Processing crashes with OutOfMemoryException
. Not sure whether the Seq.groupBy
call is guilty or excel type provider.
To simplify it I use 3D Point here as a row.
type Point = { x : float; y: float; z: float; }
let points = seq {
for x in 1 .. 1000 do
for y in 1 .. 1000 do
for z in 1 .. 1000 ->
{x = float x; y = float y; z = float z}
}
let groups = points |> Seq.groupBy (fun point -> point.x)
The rows are already ordered by grouped column, e.g. 10 points with x = 10, then 20 points with x = 20 and so one. Instead of grouping them I need just to split the rows in chunks until changed. Is there some way to enumerate the sequence just once and get sequence of rows splitted, not grouped, by some column value or some f(row) value?
Lets start with the input
If we try to evalute points then we get a OutOfMemoryException:
It might be same reason that groupBy fails, but I'm not sure. But it tells us that we have to use seq and yield to return the groups with. So we get this implementation:
It is not the most easily read code. It takes each point from the points sequence and prepends it to an accumulator list while the groupBy function is satisfied. If the groupBy function is not satisfied then a new accumulator list is generated and the old one is yielded. Note that the order of the accumulator list is reversed.
Testing the function:
Terminates nicely (after some time).
Other implementation with bug fix and better formatting.