Keep Leading Zeros in Converted CSV Using Tabular-Py and Pandas

34 views Asked by At

Is there a way to maintain leading zeros in cells while still using the tabula-py convert_into function? Perhaps by passing something into the 'options' parameter to read them as strings? The documentation didn't seem very clear on what could be used there, but maybe I missed something (Source docs)

# Convert PDF file to csv doc
convert_into(source_path, csv_path, output_format="csv", pages='1-2', stream=True)
1

There are 1 answers

0
Nick08 On

If I open the csv in something other than Excel, and actually see the leading zeros still there. The solution that seems to work is to update the pandas.read_csv() function by passing dtype=str like so:

# Convert csv to Excel doc
csv = pd.read_csv(csv_path, dtype=str)
csv.to_excel(excel_path, index=False, header=True)