Suppose that I have a function
double f(vector <double> &x) {
// do something with x
return answer;
}
Mathematically, f
is a continuous function with respect to each component of x
. Now I want to evaluate the numerical gradient of x
. There are two methods as follows
Method 1.
double DELTA = 1e-5;
double y1 = f(x);
vector <double> gradX(x.size());
for (int i = 0; i < x.size(); ++i) {
x[i] += DELTA;
double y2 = f(x);
gradX[i] = (y2 - y1) / DELTA;
x[i] -= DELTA;
}
Method 2.
double DELTA = 1e-5;
vector <double> gradX(x.size());
for (int i = 0; i < x.size(); ++i) {
x[i] += DELTA;
double y2 = f(x);
x[i] -= 2.0 * DELTA;
double y1 = f(x);
gradX[i] = (y2 - y1) / (2.0 * DELTA);
x[i] += DELTA;
}
I am observing that Method 1 gives very unreasonable numbers (ones with 6 digits), while Method 2 gives more reasonable ones.
Is there any reason that Method 2 is a better one? Should it always be preferred?
Thanks.
Edit: For a little more context. These implementations are done in C, with some uses of CUDA kernels.
It is an expected result, as the method-1 is first order accurate (forward difference), while the method-2 is second order accurate method (central difference). This can be easily proved using the Taylor's series. For further information you may read any book on finite difference methods.
To get similar accuracy for first order method, you have to use smaller DELTA for the first order method compared to the second order method.
As it is clear from your implementation that method-2 is more costly, (evaluating f1 and f2 for every x), it would be beneficial to use method-1 with smaller DELTA. However, if accuracy is of more concern, then you may use method-2 with smaller DELTA.