My understanding is that IPFS and Bittorrent Mainline DHT are built on top of a Distributed hash Table (Kademlia). They use the file hash as Kademlia key to find a list of peer that might have this file.
1- What I don't understand is if this is all decentralized who remove from the DHT peer that no longer host a file content?
2- What prevent someone from storing large amount of data for free inside the DHT?
3- What prevent someone from disrupting the network by adding large number of invalid peer for a popular file.
4- What prevent a bad actor from joining the DHT ring and not following the routing protocol thus preventing discovery message from reaching correct nodes.
Not sure why this was downvoted. These are excellent questions.
I think that DHT entries are regularly re-broadcast. So if a peer goes away, its DHT entries will no longer be broadcast and the network will forget about the data it provides unless some other node has it.
Unless you re-publish or somebody else is interested in the data, it will vanish. The amount of data that you can store directly in a DHT entry is limited. So you can make other nodes store some of your data by putting data directly into DHT entries, but the effort outweighs the benefits.
I think there are some mechanisms envisioned in IPFS to protect the DHT against attacks. However, I don't think the current implementation is all that sophisticated. I don't think that current IPFS would deal well with a large scale distributed DDOS attack.
I think a single node would be insufficient to do much damage, because a node will ask multiple peers. You would have to have multiple nodes to do significant damage.
But IPFS as it is now would not survive a sophisticated attack by state actors.