Costly process re-seeding production bad data with Laravel in large MySQL dataset

245 views Asked by At

I'm looking for a saner way to correct errors in a large seeded database. Solutions are welcome, the saner they are (feel free to question the tools for the job i.e. relational database). I can't afford to truncate the data and reseed all values. Especially when data integrity comes into play. This needs to be robust enough for developer error without having to do entire re-seeds to account for error. Migrations/MySQL/relational DBs don't seem to be flexible enough for my seeded dataset. That's where my knowledge is missing.

Scenario:

  • Laravel project, using seeds (JSON) in migration classes, see: [Laravel : Migrations & Seeding for production data][1]

    [1]: Laravel : Migrations & Seeding for production data

    • MySQL database 10k JSON objects representing football players containing 10 values e.g. name, weight, height, team, DOB. Objects iterated through, then inserted into DB with an AI id.

Problem: MySQL 'height' column was originally, by developer error, of type int and not float, so all heights were rounded to 2m.

Via a migration, I have changed the type to float, and now all height values need to be altered via a reseed, since they're still all 2m (and not, say, 1.76). We need to think about time and cost, having to re-seed thousands of JSON objects into MySQL to alter values on a column whilst maintaining data integrity, since the objects from JSON are assigned their unique Ids on MySQL insert.

To clarify: there are ID-less JSON objects which I seed from. There are several JSON files for the respective football leagues, so I seed these in a modular fashion i.e. when I insert new players. If I need to go in and make a change, there's no integrity between the objects in the JSON and those in the database. I have thought about working with the values I have to create a compound key,(?) but it still feels like the data is so fragile and prone to developer error, and I need a way of sanely managing this, which I don't feel Laravel migrations/seeds offer, especially since you are restricted by only running migrations based on their timestamp values (you cannot specify to rollback a certain migration file that you ran earlier). Perhaps a "source control" for databases is a good idea, if such exists. Note the post is tagged database-theory and data-integrity, I know that I can just reseed but this can take a few hours of my working day.

Cheers.

0

There are 0 answers