How would call the process of roughening Data to make it more realistic?

35 views Asked by At

In my current project I work with Synthetic Grid Data and in order to make it more realistic I add noise and omit some measurements as I do not expect to have data measurements everywhere in a real grid. Is there an established wording for roughening Synthetic Data in order to catch realistic Data better?

Up to now I went with 'Data Impairment' but it does not feel right as the resulting Data is not impaired relative to the real world scenario. On the other hand 'Data Augmentation' does also not fit this as I am not gaining more Data by this process.

As an example: Let's say I have a three node grid A-B-C and at some time point the corresponding (synthetic) voltage vector might look like this [2, 5, 3], however in a real world scenario I might have no measurement at node B and some noise when measuring node A and C. So the vector at hand looks more like this: [2.1, 0, 2.9]. In order to test my network to real world applicability, I want to train it on data of the second kind and thus transform data of the first kind.

1

There are 1 answers

1
n-0 On

Without having a concrete example of the shape of your data and how you plan to process it (DNN, Regression etc.), I refer to the following question from the data science stack exchange. In general adding noise to data is definitely a kind of data augmentation to increase robustness and decrease overfitting., e.g. adding noise to the brightness values for images s.t. an image recognition algorithm performs better regardless of the time of day.

Omitting data is usually done to extract features and to produce a better fit (the opposite of the former method). This has various terms, to name a few Truncation or Feature selection. In that sense both of your methods could be termed regularization, but your truncation serves a different purpose. Maybe run a few tests to check if removing data really produces more robust results.