Optimum directory structure for large number of files to display on a page

171 views Asked by At

I currently have a single directory call "files" which contains 200,000 photos from about 100,000 members. When the number of members increases to millions, I would expect the number of files in the "files" directory to get very large. The name of the files are all random because the users named them. The only way I can do is to sort them by the user name who created those files. In essence, each user will have their own sub-directory.

The server I am running is on Linux with ext3 file system. I am wondering if I shall split up the files into sub-directories inside the "files" directory? Is there any benefit to split up the files into many sub-directories? I saw some argument that it doesn't matter.

If I do need to split, I am thinking of creating directories base on the first two characters of user ID, then a third level sub-directory with the user ID like this:

files/0/0/00024userid/  (so all user ids started with 00 will go in files/0/0/...)
files/0/1/01auser/
files/0/2/0242myuserid/
.
files/0/a/0auser/
files/0/b/0bsomeuser/
files/0/c/0comeuser/
.
files/0/z/0zero/
files/1/0/10293832/
files/1/1/11029user/
.
files/9/z/9zl34/
files/a/0/a023user2/
..
files/z/z/zztopuser/

I will be showing 50 photos at a time. What is the most efficient(fast) way for the server to pick up the files for static display? All from the same directory or from 50 different sub-directories? Any comments or thoughts is appreciated. Thanks.

1

There are 1 answers

1
Christoph Sommer On

Depending on the file system, there might be an upper limit to how many files a directory can hold. This, and the performance impact of storing many files in one directory is also discussed at some length in another question.

Also keep in mind that your file names will likely not be truly random - quite a lot might start with "DSC", "IMG" and the like. In a similar vein, the different users (or, indeed, the same user) might try storing two images with the same name, necessitating a level of abstraction from the file name anyway.