Retrieval after serialization to disk using gob

311 views Asked by At

I have been learning about databases and wanted to implement one as well for learning purposes and not for production. I have a defined schema:

type Row struct {
    ID       int32
    Username string
    Email    string
}

Now, currently, i am able to encode structs of this type to a file in an append only manner.

//Just to show i use a file for the encoding, it has missing details.

func NewEncoder(db *DB) *gob.Encoder{
    return gob.NewEncoder(db.File)
}

func SerializeRow(r Row, encoder *gob.Encoder, db *DB) {
    err := encoder.Encode(r)
    if err != nil {
        log.Println("encode error:", err)
    }
}

Now, it's relatively easy to mimick a "select" statement by simply decoding the entire file using gob.decode

func DeserializeRow(decoder *gob.Decoder, db *DB){
    var rows Row
    db.File.Seek(0, 0)
    err := decoder.Decode(&rows)
    for err == nil {
        if err != nil {
            log.Println("decode error:", err)
        }
        fmt.Printf("%d %s %s\n", rows.ID, rows.Username, rows.Email)
        err = decoder.Decode(&rows)
    }
}

My current issue is, I want to be able to retrieve specific rows based on ID. I know sqlite uses 4kb paging, in the sense that serialized rows occupy a "page" ie. 4KB till a page can't hold them anymore, then another is created. How do I mimick such a behaviour using gob in the most simplistic and idiomatic way?

Seen: I have seen this and this

1

There are 1 answers

2
icza On BEST ANSWER

A Gob stream may contain type definitions and decoding instructions, so you can't seek a Gob stream. You can only read it from the beginning up to the point you find what you need.

A Gob stream is completely unsuitable for a database storage format in which you need to skip elements.

You could create a new encoder and serialize each records separately, in which case you could skip elements (by maintaining a file index storing which record starts at which position), but it would be terribly inefficient and redundant (as described in the linked answer, the speed and storage cost amortizes as you write more values of the same type, and always creating new encoders loses this gain).

A much better approach would be to not use encoding/gob for this, but rather define your own format. To efficiently support searches (select), you have to build some kind of index on the searchable columns / fields, else you still need to perform a full table scan.