When creating a file format. Is it better to create multiple formats or have several optional sections?

63 views Asked by At

Let's say I am the technical lead for a software company specializing in writing applications for chefs that help organize recipes. Starting out we are developing one app for bakers who make cakes and another for sushi chefs. One of the requirements is to create a standard file format for importing and exporting recipes. (This file format will go on to be an industry standard with other companies using it to interface with our products) we are faced with two options: Make a standard recipes format (lets say .recipe) which uses common properties where applicable and optional properties where they differ, or making independent formats for each application (let us say .sushi and .cake).

Imagine the file format would look something like this for sushi:

{
  "name":"Big California",
  "type":"sushi",
  "spiciness": 0,
  "ingredients": [
    {
      "name":"rice"
      "amount": 20.0,
      "units": "ounces"
    },
    {
      ...
    }
  ],
}

and imagine the file format would look something like this for cakes:

{
  "name":"Wedding Cake",
  "type":"cake",
  "layers": 3,
  "ingredients": [
    {
      "name":"flour"
      "amount": 40.0,
      "units": "ounces"
    },
    {
      ...
    }
  ],
}

Notice the file formats are very similar with only the spiciness and layers properties differing between them. Undoubtedly as the applications grow in complexity and sophistication, and will cause many more specialized properties to be added. There will also be more applications added to the suite for other types of chefs. With this context,

Is it wiser to have each application read/write .recipe files that adhere to a somewhat standardized interface, or is it wiser to remove all interdependence and have each application read/write their respective .sushi and .cake file types?

1

There are 1 answers

0
bazza On

This kind of thing get be a very, very deep thing to get right. I think a lot of it depends on what you want to be able to do with recipes beyond simply displaying them for use by a chef.

For example, does one want to normalise data across the whole system? That is, when a recipe specifies "flour", what do you want to say about that flour, and how standardised do you want that to be? Imagine a chef is preparing an entire menu, and wants to know how much "plain white high gluten zero additives flour" is used by all the recipes in that menu. They might want to know this so they know how much to buy. There's actually quite a lot you can say about just flour that means simply having "flour" as a data item in a recipe may not go far enough.

The "modern" way of going about these things is to simply have plain text fields, and rely on some kind of flexible search to make equivalency associations between fields like "flour, white, plain, strong" and "high gluten white flour". That's what Google does...

The "proper" way to do it is to come up with a rigid schema that allows "flour" to be fully specified. It's going to be hard to come up with a file / database format schema that can exhaustively and unambiguously describe every single possible aspect of "flour". And if you take it too far, then you have the problem of making associations between two different records of "flour" that, for all reasonable purposes are identical, but differ in some minor aspect. Suppose you had a field for particle size; the search would have to be clever enough to realise that there's no real difference in flours that differ by, for example, 0.5 micrometer in average particle size.

We've discussed the possible extent to the definition of a "flour". One also has to consider the method by which the ingredient is prepared. That adds a whole new dimension of difficulty. And then one would have to desribed all the concievable kitchen utensils too. One can see the attractions of free text...

With that in mind, I would aim to have a single file format (.recipe), but not to break down the data too much. I would forget about trying to categorise each and every ingredient down to the last possible level of detail. Instead, for each ingredient I'd have a free text description, then perhaps a well structured quantity field (e.g. a number and a unit, 1 and cup), and finally a piece of free text describing the ingredient preparation (e.g. sieved). Then I'd have something that describes a preparation step, referencing the ingredients; that would have some free text fields and structured fields ("sieve the", , "into a bowl"). The file will contain a list of these. You might also have a list of required utensils, and a general description field too. You'll be wanting to add structured fields for recipe metadata (e.g. 'cake', or 'sushi', or serves 2).

Or something like that. Having some structure allows some additional functionality to be implemented (e.g. tidy layout of the recipe on a screen). Just having a single free-text field for the whole thing means that it'd be difficult to add, say, an ingredient ordering feature - who is to say what lines in the text represent ingredients?

Having separate file formats would involve coming up with a large number of schema. It would be even more unmanagable.