Storage tweets using spark in a multicore cluster

28 views Asked by At

I want to store realtime tweet following some filtering criteria in a MySQL database. I want to understand which approach is better given the fact that i have a 16 CPU machine. Since for my case is better to use the streaming api it's possible to easily build a java application using tweet4j library; In this case filtering and storing can be done using multithreading programming. On the other hand i just discovered Spark that with few line permit to do the same but remain the bottleneck of having only one memory.

I want to understand if spark could be a real improvement given that it's pretty difficult to reach twitter rate limit and I can't take advantage of a distributed cluster.

Thanks for helping.

0

There are 0 answers