I am working on a forex classification problem, need help with creating the below-detailed features, I have shared my code below and also attached pic for a visual reference of the issue at hand.
Feature: opensimilarclose (1 if open = close plus or minus 2 pips, 0 otherwise)
Feature: opencloselow (1 if both open and close > 90% of candle size, 0 otherwise)
Feature: openclosehigh (1 if both open and close < 10% of candle size, 0 otherwise)
MY CODE:
data['opensimilarclose'] = np.where(data.Open-data.Close<=0.02, 1,0)
data['openclosehigh'] = np.where((abs(data.Close-data.Low)>=abs(data.High-data.Low)*0.9 and ()), 1, 0)
data['opencloselow'] = np.where(abs(data.Close-data.Low)<=abs(data.High-data.Low)*0.1, 1, 0)
please find sample of the data below:
Date Timestamp Open High Low Close Volume
2004-01-01 00:00:00 414.92199999999997 414.92199999999997 414.23199999999997 414.55800000000005 0.738269000896253
2004-01-02 00:00:00 414.32199999999995 416.098 413.86699999999996 415.395 3.82642700810902
2004-01-04 00:00:00 414.278 414.69800000000004 414.096 414.444 0.0564850000591832
2004-01-05 00:00:00 415.376 423.981 414.23400000000004 421.89300000000003 10.4188560213806
2004-01-06 00:00:00 422.332 430.17800000000005 420.07800000000003 421.777 11.182643023759699
2004-01-07 00:00:00 420.773 424.121 418.974 419.626 11.956311026187901
2004-01-08 00:00:00 419.574 424.798 416.27 423.298 12.439296027514501
2004-01-09 00:00:00 423.298 426.897 419.42699999999996 425.404 9.2499640192309
2004-01-11 00:00:00 426.49800000000005 426.49800000000005 425.876 426.23 0.0673800002332428
2004-01-12 00:00:00 425.853 428.459 422.219 424.598 10.6995250192995
2004-01-13 00:00:00 424.598 426.395 421.651 423.69800000000004 11.1990780260712
2004-01-14 00:00:00 423.389 424.397 416.78 419.298 10.835633025399101
2004-01-15 00:00:00 418.98 421.098 406.906 408.44699999999995 12.266192030985598
2004-01-16 00:00:00 408.546 410.398 404.43300000000005 406.298 9.26100601695725
2004-01-18 00:00:00 405.842 406.098 405.543 405.75300000000004 0.0658050001220545
2004-01-19 00:00:00 407.18800000000005 408.68300000000005 405.402 406.751 5.688531011830491
2004-01-20 00:00:00 406.449 412.69699999999995 404.417 411.921 10.6885030245794
2004-01-21 00:00:00 411.99800000000005 412.91 406.721 409.832 10.672994028404
2004-01-22 00:00:00 410.043 412.69800000000004 407.216 409.033 9.949593026152801
2004-01-23 00:00:00 409.398 412.29699999999997 405.461 407.398 8.921345019130971
You have few small errors in your code:
I personally prefer to work with pandas directly instead of
where
since it's unneeded - you have a simple condition.Check the following example:
The 3rd line calculating opensimiliarclose by directly asking if the absolute different between open and close is smaller then 0.02. It's a condition, so the result is True/False. To change to 1/0 I added
.astype(int)
. This formatting of directly applying condition over all column is more convenient in my opinion then usingwhere
.Then for your second and third columns, I though it's more convenient to first calculate the percentage and then check the condition. The column "relative_open" and "relative_close" holds the percentages of open/close, and only in the next two lines I condition on both to fill "opencloselow" and "openclosehigh". You can remove the extra columns by
drop
orloc
over all other columns. You can also just put the result as a temporary series instead of extra column (tmp_series = (df["Close"]...
).