Write Pandas DataFrame to file using FORTRAN format string

1.8k views Asked by At

I would like to write a pandas dataframe to a file using a FORTRAN format string. I haven't been able to find anything online except a discussion of how this functionality would be nice. Does anyone know if this is possible?

I suppose I don't need to use a fortran format string...I just need to get the output file in a specific format that fortran can easily read.

UPDATE: For example, I have a large data file that has a specified fortran format. I load the file into my python function, manipulate the data, and then would like to export the manipulated data into a file with the same format it had originally. An example of the file format would be something like:

FORMAT (1X,F12.6,2F9.6,F11.7,T61,2F9.6,F10.7,T142,I6,1X,A2,T236,A1)

The reason I need to export the data in a specific format is because the output file will be read directly into a well-established fortran code (meaning the fortran code cannot be altered).

4

There are 4 answers

7
JohnE On

Update:

I would now do this sort of thing in two steps:

  • Step 1 -- Convert from pandas dataframe to numpy array or rec-array. This is trivial via the values or to_numpy methods. It's a little trickier if you have strings but see here for one technique. If you have simple numeric data (and no strings), just stick to a regular numpy array and don't bother with a rec-array or structured array.

  • Step 2 -- use numpy's tofile to write out a Fortran-readable binary

Original Answer:

I guess the bigger question is how to output from pandas to fortran and I'm not sure of the best way, but I'll try to show some fairly simple solutions mainly with to_csv().

Doing this will always give you faster IO, and I actually find binary easier than text in this case, although you do lose the ability to view the data as text.

df = pd.DataFrame({ 'x':[1.03,2.9,3.7],'y':[1,22,5] })

      x   y
0  1.03   1
1  2.90  22  
2  3.70   5

Standard pandas output is actually exactly what you are asking for here, but I'm not sure how to get that into a file except with copy and paste. Maybe there is a way with ipython (though not that I can find).

And here's some default csv output, which is obviously not columnar:

df.to_csv('foo.csv',index=False)

%more foo.csv
x,y
1.03,1
2.9,22
3.7,5

But you may be able to get this into fortran with list directed input.

If you can live with the same format for all numbers, you could do something like this:

df.astype(float).to_csv('foo.raw',index=False,float_format='%10.5f')

%more foo.raw
x,y
   1.03000,   1.00000
   2.90000,  22.00000
   3.70000,   5.00000

A couple notes here: that's not bad but limited in forcing you to use the same format for all numbers, which is pretty wasteful for single digit integers, for example. Also, I tried this with some NaNs and that didn't work very well. And also the commas are not needed there but when I tried to change the separator to ' ', then it quoted everything, so I just left it out.

Finally, the most flexible way might be to convert to strings and format them. This gives you some flexibility to format each column individually. Here's a simple example using a right justified format (and width of 8 for 'x' and 4 for 'y'):

df.x = df.x.map('{:>8}'.format)
df.y = df.y.map('{:>4}'.format)
df.to_csv('foo.str',index=False)

%more foo.str
x,y
    1.03,   1
     2.9,  22
     3.7,   5

I still can't figure out how to get rid of those commas, but this way does handle NaNs successfully.

0
Duncan WP On

I know it's not ideal, but the way I've resorted to doing this is by printing out each record line by line:

df = pd.DataFrame({'alt':[1435.2, 1234.7], 'lat':[0.145, 0.324], 'lon':[12.45, 12.23]})

with open('flight.trk', 'w') as f:
    f.write("! Alt Lat Lon\n")
    for ix, alt, lat, lon in df.itertuples():
        f.write("{:10.2f} {:9.4f} {:9.4f}\n".format(alt, lat, lon))

Note that I've used itertuples to iterate over the rows as suggested here (What is the most efficient way to loop through dataframes with pandas?) but that relies on knowing the order of your columns (which are alphabetical in this case).

I've used this for tables longer than 10,000 lines and it's actually reasonably quick in my experience though I've not done any rigorous timing experiments.

0
ztl On

Coming on this much later, here is a solution which worked in my situation (myDF having a first column of strings, then 2 columns of floats). Supposing you need to read something in Fortran with

FORMAT (A19,F11.6,F11.6)

Then (with import numpy as np):

np.savetxt(myfile, myDF.to_numpy(), fmt="%19s %10.6f %10.6f")

Or (to avoid the additional space, but less clear):

np.savetxt(myfile, myDF.to_numpy(), fmt="%19s%11.6f%11.6f") 

(Just in case it helps someone else ;-))

1
chris On

Here's a nice tidy solution that uses the fortranformat package (pip install fotranformat, https://pypi.org/project/fortranformat/) and df.apply() that let's you use a standard fortran format string:

import fortranformat as ff
import pandas as pd 

df = pd.DataFrame({
        'sampleId': ['A','B','C','D'],        
        'var1' : [0.002,0.004,0.006,0.002],
        'var2' : [1.2,1.4,1.6,1.2],
        'Nobs': [32,12,9,30]
    })

format_string = '(a5, f8.3, f8.1, i5)'
header_line = ff.FortranRecordWriter(format_string)
Formatted_df = df.apply(lambda x : header_line.write(x.values),axis=1)

The Formatted_df object will be a Series with a string element for each row of the dataframe:

>>> print(Formatted_df)
0        A   0.002     1.2   32
1        B   0.004     1.4   12
2        C   0.006     1.6    9
3        D   0.002     1.2   30
dtype: object
>>> print(Formatted_df.loc[0])
    A   0.002     1.2   32
>>> print(type(Formatted_df.loc[0]))
<class 'str'>

To write it to file you can then just use to_csv:

Formatted_df.to_csv('formatted_df.csv',index=False,header=False)

Note that this won't include any column names, so you may wish to initialize the output file then append to it:

output_fi='formatted_df.csv'
col_names=df.columns.tolist()
with open(output_fi,'w') as outfi: 
    outfi.write('# '+' '.join(col_names)+"\n")
    outfi.write('# '+format_string+"\n")
    
Formatted_df.to_csv(output_fi,mode='a',index=False,header=False)

Also note that this assumes you know the ordering of your dataframe columns already.

ALSO note that you may run into memory issues if you're dealing with very large dataframes as Formatted_df will be a complete copy of df. If that's the case, you'll have to chunk it up!