Can someone explain how the flow is from point of user tracking to the live feed from the data model point of view? My dev team is having issues with this:
When a user performs an activity the footprint is logged to a user_activity table which is the master table for all user footprints. This means every action by every user that needs to be tracked will be written here.
Problems:
1) An activity is 1:M. Like I can tag 10 people in 1 photo. So obviously i will not write 10 foorprints in the activity table for this. Hence, do i need another table for storing the activity details?
2) Since this table is logging all the activity on all objects from where it is fed into a activity feed table to output to the activity feed, the feed needs to know all the objects involved in the activity, so it can say "X tagged Mark, John, Sarah in tim's photo." where Mark, John, Sarah are basically user objects linking to their profiles. Photo is a photo object linking to photo table...
The above is an example, but there are many many objects like Movies, Music, Brands, Cities, etc. So somwhow the system needs to know from the log table to the activity feed which object is what and where it is so it can pull in relevant data into the feed. To do this I have 2 colunms: object_id and object_type_id where object_type is like "User, Photo, Brands etc) and object_id is the ID of the object. But how to connect this with the actual tables?
3) Lastly, is this design the best way to go how i have it to go from tracked data to a feed, that is log to a log table. Log table may have a detail table and log table joins with session table. Every 2 minutes a corn job is scheduled to pull this data into an activity feed table which is denormalized and pulls data from these + the objects tables for direct read into the live feed.
- The 2 min corn job also scares me because if there a lot of records then the system may take longer than 2 min to finish the job and there will be a backlog then. So any other methods i can use?
Log each of the 10 actions, but add an activitybatchid which is common to all the actions so that you can track which actions occurred together.
I would also write the activitybatchid into a queue table for processing, which is what the cron job would read to add items to the feed table. After processing the activitybatchid it would be deleted.
I recommend using a recursive cron job in such a situation, which reads one row or a batch of rows at a time, processes then while maintaining a lock so that no other process can read this table. You can teak the number of rows processed at a time, for performance. Also in case this process dies, the lock will be released after a certain idle time.
The processing of the activitybatchid will read the relevant data from the activity table to build the necessary feed details, and since this is done once, the application does not need to remember.
So basically you end up with: activity table with raw data, queue table with activities to be converted to feeds, a feed table which contains the generated feeds for display or rendering