How do I go about predicting Closing Price of a Financial Symbol (EURUSD) using Machine Learning?

772 views Asked by At

I did a simple experiment using EURUSD OHLC 1-Day data.
My features were Open Price, Low Price, High Price, and I was trying to predict the future Closing price.

The code worked, as expected, but the results were very misleading.

I got a 99% Accuracy score, which as we all know is impossible.

1) So what I am I doing wrong?
2) How can I correct my mistakes?

The official system I am building would have BoP, PPI, Interest Rate, GDP, and a lot of Momentum indicators, etc. as Features, over some 60 features.

import pandas as pd
import numpy as np
#import matplotlib.pyplot as plt
#import pickle

# 1. Read the EURUSD csv data.
# 2. Process the DataFrame, using only the Open, High, Low, Close columns.
df = pd.read_csv( 'EURUSD1440.csv', index_col= 'Date' )
df = df[['Open','High','Low','Close']]
array = df.values

# Features consist of Open, High, Low column, and stored in x.
# Label is the Close column stored in y.
x = array[:,0:3]
y = array[:,3]


# Split Data into Test and Train.
# 60% Train and 40% Test.
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split( x, y, test_size = 0.4 )


# 1. Train the Model using .fit method.
# 2. Predict the future Closing prices using the .predict method.
# 3. Know how Accurate the Model is using the .score method.
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score

model = LinearRegression()
model.fit( x_train, y_train )
forecast = model.predict( x_test )
accuracy = model.score( x_test, y_test )

print( forecast, accuracy )
2

There are 2 answers

0
user3666197 On

Prologue:
Having been several decades in quantitative modelling and operating a set of 4th Gen distributed system with M/L predictors, I can guarantee even your 60-features' to be overly optimistic. One might assume about an order of magnitude higher dimensionality space, containing both technical and fundamental factors, to reasonably train a model with, if the ambition is to go beyond just an academic paper. Why? The Market Rules.

Your experiment exhibits two types of principal errors:

The first - a conceptual miss:
the Machine Learning task, striving to predict a continuous value is Regression, ( no "classification" Labels, but Regression target values ) for which a metric for "a prediction success" is not a score, but some sort of absolute, PriceDOMAIN distance measures. Yes, distance, not a percent, as it is translated into a monetary reward by a trade execution.

Any attempt to use a percentage does not provide means to compare any two Regression models one against another and is incoherent with highly non-linear professional risk-management.

This post's footprint does not provide space enough to discuss additional dependencies for defining + assessment of a successful Trading TruStrategy, operating in at least 5-dimensions of policies -{ Select, Detect, Act, Allocate, Terminate }-Policy. Without a full TruStrategy SDAAT-model parameters definition, there is no chance to compute any performance expectations of a Market ride of any trading model under review.


Next:

Your model exhibits peeking into the future. You have allowed the model to learn from values, the reality will never give you at hand at the time of prediction, so except some clairvoyance, the model is principally skewed from the training DataSET and will never provide a fair service in real circumstances.


Epilogue:

One need not be shy to make this mistake, as Google has published their own Machine Learning "success" doing the very same error. ( If interested in details, search for Michal Illich + Google Machine Learning blogs on their experience ).


Ex post:

Do not give up. If your project is well-funded, has a reasonable technical infrastructure in place & has a reasonable grounding in the business domain, one can hire a mix of professional knowledge to have a FOREX market predictions engine working within a reasonable time and budget.

Reinventing a wheel could not be more expensive in the FOREX costs of failure realms.

0
A.raoof Hujairi On

user3666197 discussion of the flaws of the concept is right on spot.

following extensive research, I would attest that the only option for utilizing the basic model of machine learning, that is load > transform> fit > predict using sklearn or keras or even tbot to automate model parameter optimization would be to incorporate some future-predicted/calculated "data of some relation"

to point you in the right direction, experiment with the following :

  • Astrology data, provided by NASA horizon system
  • Solar wind and Geomagnatism data provided by NASA.

furthermore, its more practical to focus your work on Features engineering and selection rather than model selection.

best of luck.