Background
Using pandas
, the following program has been successfully used to resample millisecond-resolution tick data into 1 second interval OHLC (Open High Low Close) data. The tick data is read from a .csv file, and the program writes the 1 second OHLC data to a new .csv file with new column headers. The problem stems from the way the output data is formatted (tab-delimited instead of comma-delimited, and with undesirable column headers). This has made it challenging to plot OHLC candlestick charts using plotly
. I have been opening the .csv file using Sublime Text 3.
Goal:
To 1) ensure the output OHLC (Open High Low Close) data is comma-delimited, and 2) to output the time only, without today's date. The desired format can be seen further below, beneath "Desired output format of .csv".
Current attempt:
import pandas as pd
import sys
# Read data from file
data_frame = pd.read_csv('input_data.csv',
names=['Date_time', 'Bid', 'Bid_qty', 'Ask', 'Ask_qty'],
index_col=0, parse_dates=True, delimiter = ',')
data_frame.head()
# Sample data at set intervals
data_bid=data_frame['Bid'].resample('1S').ohlc()
data_ask=data_frame['Ask'].resample('1S').ohlc()
data_bid.head()
data_ask.head()
# Concatenate bid and ask
data_ask_bid = pd.concat(
[data_bid, data_ask], axis=1,
keys=['Bid', 'Ask'])
# Display options
pd.set_option('display.max_rows', None) # Display all rows
pd.set_option('display.max_columns', None) # Display all columns
pd.set_option('display.width', 150) # Fit columns on same line
pd.set_option('display.max_colwidth', None) # Set max width
# Print to file
original_stdout = sys.stdout # Save reference to original standard output
with open('output_data.csv', 'w') as f:
sys.stdout = f # Change the standard output
print(data_ask_bid)
sys.stdout = original_stdout # Reset standard output to original value
For the pandas.read_csv
function, an attempt was made to use the argument sep=','
which controls the delimiter character for the read_csv, but had no success. An attempt was also made to use the pandas.DataFrame.to_csv
function also with the argument sep=','
when printing to the output file, which also was unsuccessful.
Input .csv data format (time, bid, bid quantity, ask, ask quantity):
00:00:00.033,10.6674,1000000,10.6724,1000000
00:00:00.060,10.6674,1000000,10.6724,2000000
00:00:01.645,10.6674,1000000,10.6724,1000000
00:00:01.670,10.6674,1000000,10.6724,2000000
00:00:02.198,10.6674,1000000,10.6724,1000000
00:00:03.198,10.6674,1000000,10.672,1000000
00:00:03.202,10.6671,1000000,10.672,1000000
00:00:03.202,10.6672,1000000,10.672,1000000
Current output format of .csv:
Bid Ask
open high low close open high low close
Date_time
2020-10-21 00:00:00 10.6674 10.6674 10.6674 10.6674 10.6724 10.6724 10.6724 10.6724
2020-10-21 00:00:01 10.6674 10.6674 10.6674 10.6674 10.6724 10.6724 10.6724 10.6724
2020-10-21 00:00:02 10.6674 10.6674 10.6674 10.6674 10.6724 10.6724 10.6724 10.6724
2020-10-21 00:00:03 10.6674 10.6674 10.6671 10.6672 10.6720 10.6720 10.6720 10.6720
Desired output format of .csv:
time,bid_open,bid_high,bid_low,bid_close,ask_open,ask_high,ask_low,ask_close
00:00:00,10.6674,10.6674,10.6674,10.6674,10.6724,10.6724,10.6724,10.6724
00:00:01,10.6674,10.6674,10.6674,10.6674,10.6724,10.6724,10.6724,10.6724
00:00:02,10.6674,10.6674,10.6674,10.6674,10.6724,10.6724,10.6724,10.6724
00:00:03,10.6674,10.6674,10.6671,10.6672,10.6720,10.6720,10.6720,10.6720
Summary:
Q1. How can each value (including the column headers) that is printed to the output .csv file be comma-delimited?
Q2. How can the time be shown without today's date in the output .csv file?
You seem to print the string representation of the dataframe to file, which is incorrect. If you do this it should work: