Streaming big data while sorting

806 views Asked by At

I have huge data and as a result I cannot hold all of it in memory and I always get out of memory errors; obviously one of the solutions would be using streaming in Node.JS; but streaming is not possible(as far as I know) with sorting which is one the functionalities which I apply on my data; is there any algorithm maybe Divide and conquer algorithm that I can use for the combination of streaming and sorting (which is one of the functionalities which I apply on my data?)

1

There are 1 answers

2
Alexander Patrikalakis On

You can stream the data using Kinesis and use the Kinesis Client Library, or subscribe a Lambda function to your Kinesis stream and incrementally maintain sorted materialized views. Where you store your sorted materialized views and how you divide your data will depend on your application. If you cannot store the entire sorted materialized views, you could have rolling views. If your data is time-series, or has some other natural order, you could divide the range of your ordered attribute into chunks. Then, you could have for example, 1-day or 1-hour sorted chunks of your data. In other words, choose the sorted subdivision that allows you to keep the information in memory as needed.