I have a very high traffic site with lots of photos on it and am trying to track which photo each user has viewed.
My first instinct would be a SQL table with two columns: user_id & photo_id. But, that wouldn't scale to my traffic level, and the table would get unmanageable very quickly.
Any recommendations for anoher solution, either SQL or NoSQL (mongodb,couch,redis,...)
My code is mostly PHP if that matters.
Thanks!
Edit There are 10s of millions of views day.
Edit I don't need to know the total times a user viewed a particular photo, just whether its been viewed at all by that user
Your best bet is to create a collection with { _id:Generated automagically, pictureID, viewerID }
with find( pictureID, viewerID ).limit(1) and an index on pictureID AND viewerID will make checking super ultra fast level 99. very important to set the index. I use find().limit(1) because its faster than findOne, at least from the current benchmarks.
why not have one entry per user with an array of viewed images? because searching through the array is slower than searching for a whole document in a collection. 10 million images? no problem. this is where mongodb shines. its designed to scale for big-ass databases like yours. As long as your documents are less than 16 mb, and with 3 properties, it is :) you have little to worry about.
When you delete an image, just db.viewed.remove( { pictureID : pictureID } ) and it will remove all relating the image.
db.viewed.remove( { viewerID : viewerID } ) for when you remove a user! Don't do this when the user deletes the image or the account. Do this at maintenance time or say, once an hour. Create a collection with pendingRemovingImages and pendingRemovingUsers where you store the things you want removed. check $in to perform bulk removal by images and/or by users.
I find your question to be most exciting and I feel strongly that you should go in my direction.