According to GitHub, SeaweedFS
is intended to be a simple and highly scalable distributed file system which enables you to store and fetch billions of files, fast. However, I don't understand the point of SeaweedFS Filer
since it requires an external data store on top of SeaweedFS
:
On top of the object store, optional Filer can support directories and POSIX attributes. Filer is a separate linearly-scalable stateless server with customizable metadata stores, e.g., MySql, Postgres, Redis, Cassandra, HBase, Mongodb, Elastic Search, LevelDB, RocksDB, Sqlite, MemSql, TiDB, Etcd, CockroachDB, etc.
For the Filer
to work it first needs to "lookup metadata from Filer Store, which can be Cassandra/Mysql/Postgres/Redis/LevelDB/etcd/Sqlite"and then read the data from volume servers.
Since SeaweedFS Filer
needs to retrieve the file metadata from another data store (such as Casandra, Scylla DB or HBase) before it could retrieve the actual file, why not use the same data store to store the actual file? What is gained by storing the file metadata in one data store and storing the actual file in SeaweedFS
?
GlusterFS
, for example, stores metadata as xattrs
in the underlying file system so there is no need for external data stores.
Doesn't requiring an external data store defeat the whole purpose of using SeaweedFS
as it requires two hops (round trips) instead of one? As we now need to 1) get the file metadata from external storage 2) get the actual file. If we would have stored the actual file on the external data store we could get it in one step,instead of two.
The metadata includes per-file metadata and also the directory structure.
The former is similar to
xattrs
as you mentioned.The later is more like a graph database, which can be implemented by a key-value store or SQL store.
For a key-value store or SQL store, saving a large chunk of file content data is not efficient since there could be many times of read/write operations on each key, due to maintaining the data ordering for efficient lookup. This kind of write amplification is not good, especially if the file size is in GB/TB/PB.