I'm playing with Encog java for building a regression model on a very large training dataset.
The dataset I'll be having on my production is close to 70 million records per day. I understand that Encog utilises the multicores well (from the documentation and also a few tests I ran). Wanted to understand what if I want to train my model every day with new data. Assume, I get Day 1 -> 70M, Day 2 --> 70M and so on. Can I update the existing model to just load the current day's data? I'm referring to an update and not a replace.
Also, I understand that a model can be built only on a single machine (not distributed processing like SPARK ML etc). Is this correct?
Curious to know how it's used by people in the industry to see their thoughts on how they deal with a similar case.