Pandas: How to change the format of output .csv data to use comma-delimiters?

1.1k views Asked by At

Background

Using pandas, the following program has been successfully used to resample millisecond-resolution tick data into 1 second interval OHLC (Open High Low Close) data. The tick data is read from a .csv file, and the program writes the 1 second OHLC data to a new .csv file with new column headers. The problem stems from the way the output data is formatted (tab-delimited instead of comma-delimited, and with undesirable column headers). This has made it challenging to plot OHLC candlestick charts using plotly. I have been opening the .csv file using Sublime Text 3.

Goal:

To 1) ensure the output OHLC (Open High Low Close) data is comma-delimited, and 2) to output the time only, without today's date. The desired format can be seen further below, beneath "Desired output format of .csv".

Current attempt:

import pandas as pd
import sys

# Read data from file
data_frame = pd.read_csv('input_data.csv',
                         names=['Date_time', 'Bid', 'Bid_qty', 'Ask', 'Ask_qty'],
                         index_col=0, parse_dates=True, delimiter = ',')
data_frame.head()

# Sample data at set intervals
data_bid=data_frame['Bid'].resample('1S').ohlc()
data_ask=data_frame['Ask'].resample('1S').ohlc()

data_bid.head()
data_ask.head()

# Concatenate bid and ask
data_ask_bid = pd.concat( 
    [data_bid, data_ask], axis=1,  
    keys=['Bid', 'Ask'])

# Display options
pd.set_option('display.max_rows', None) # Display all rows
pd.set_option('display.max_columns', None) # Display all columns
pd.set_option('display.width', 150) # Fit columns on same line
pd.set_option('display.max_colwidth', None) # Set max width

# Print to file
original_stdout = sys.stdout # Save reference to original standard output
with open('output_data.csv', 'w') as f:
    sys.stdout = f # Change the standard output
    print(data_ask_bid)
    sys.stdout = original_stdout # Reset standard output to original value

For the pandas.read_csv function, an attempt was made to use the argument sep=',' which controls the delimiter character for the read_csv, but had no success. An attempt was also made to use the pandas.DataFrame.to_csv function also with the argument sep=',' when printing to the output file, which also was unsuccessful.

Input .csv data format (time, bid, bid quantity, ask, ask quantity):

00:00:00.033,10.6674,1000000,10.6724,1000000
00:00:00.060,10.6674,1000000,10.6724,2000000
00:00:01.645,10.6674,1000000,10.6724,1000000
00:00:01.670,10.6674,1000000,10.6724,2000000
00:00:02.198,10.6674,1000000,10.6724,1000000
00:00:03.198,10.6674,1000000,10.672,1000000
00:00:03.202,10.6671,1000000,10.672,1000000
00:00:03.202,10.6672,1000000,10.672,1000000

Current output format of .csv:

                         Bid                                 Ask                           
                        open     high      low    close     open     high      low    close
Date_time                                                                                  
2020-10-21 00:00:00  10.6674  10.6674  10.6674  10.6674  10.6724  10.6724  10.6724  10.6724
2020-10-21 00:00:01  10.6674  10.6674  10.6674  10.6674  10.6724  10.6724  10.6724  10.6724
2020-10-21 00:00:02  10.6674  10.6674  10.6674  10.6674  10.6724  10.6724  10.6724  10.6724
2020-10-21 00:00:03  10.6674  10.6674  10.6671  10.6672  10.6720  10.6720  10.6720  10.6720

Desired output format of .csv:

time,bid_open,bid_high,bid_low,bid_close,ask_open,ask_high,ask_low,ask_close
00:00:00,10.6674,10.6674,10.6674,10.6674,10.6724,10.6724,10.6724,10.6724
00:00:01,10.6674,10.6674,10.6674,10.6674,10.6724,10.6724,10.6724,10.6724
00:00:02,10.6674,10.6674,10.6674,10.6674,10.6724,10.6724,10.6724,10.6724
00:00:03,10.6674,10.6674,10.6671,10.6672,10.6720,10.6720,10.6720,10.6720

Summary:

Q1. How can each value (including the column headers) that is printed to the output .csv file be comma-delimited?

Q2. How can the time be shown without today's date in the output .csv file?

1

There are 1 answers

2
arno_v On BEST ANSWER

You seem to print the string representation of the dataframe to file, which is incorrect. If you do this it should work:

import pandas as pd

# Read data from file
data_frame = pd.read_csv('input_data.csv',
                         names=['Date_time', 'Bid', 'Bid_qty', 'Ask', 'Ask_qty'],
                         index_col=0, parse_dates=True, delimiter = ',')

# Sample data at set intervals
data_bid=data_frame['Bid'].resample('1S').ohlc()
data_ask=data_frame['Ask'].resample('1S').ohlc()

# Concatenate bid and ask
data_ask_bid = pd.concat( 
    [data_bid, data_ask], axis=1,  
    keys=['Bid', 'Ask'])

data_ask_bid.to_csv('file.csv', index=False)