Keep unformated values while opening an Excel-file with Python

43 views Asked by At

Background:

We provide our suppliers an excel-file, that they have to fill out. They have to fill out the cells in column I, starting at row 11 (See Picture: Head of the excel-file). In these cells they have to input values for exapmple like this: 8,76545 or 0,98565. (The decimal seperator is a comma: ,). The supplier decides, in which cells they add a value in column I. So some cells can be empty. The rows 1 to 9 in the excel-file can be ignored for this problem. Unfortunately the cells in column I are formated as percentage.

Head of the excel-file: Head

Example with filled values: Filled Cells

We get the filled file back as excel-file or csv-file.

We remove the rows 1 to 9 aswell as all rows, that have no value in column I (starting at row 11), we convert the excel file into csv and upload them into our ERP-System. This is done manually.

The goal is to automate this steps/process with Python:

  • Ignore rows 1 to 9

  • remove all rows with empty cells in column I (starting at row 11)

  • remove the percentage format

  • save the result as csv

The result should look like this: Result

Problem:

We are getting the filled file back as excel- or csv-file, with or without the percentage format. The excel-file with the percentage format is problematic.

If I open the excel-file, with the percentage format with pandas, pandas ist converting the values like this:.

  • Original value in the excel file: 8,61565740319208%

  • Converted value by pandas: 0.0861565740319208

The value should be 8,61565740319208% befor I remove the percentage sign with Python. (The dot-decimal seperator is not a problem)

Since the suppliers delivers the excel-files with or withou the percentage format, I just cant multiply the values in pandas by 100. Pandas is converting the values automatically, so I have no chance to check, if there is a % or not in the cell. I am using openpyxl as engine.

While my code is multiplying the values with 100, this is wrong.

Code:



import os
import pandas as pd
from datetime import datetime

def clear_date(date: str) -> str:
    if len(date.strip()) == 10 and "." in date: #10.02.2023
        return date
    elif len(date.strip()) > 10 and "-" in date:
        return datetime.fromisoformat(date).strftime("%d.%m.%Y")
    

def clear_faktor(faktor: str) -> str:
    try:
        faktor = faktor.replace("%","")

        if "," in faktor:
            return f"{faktor:0<8}"[:8]
        elif float(faktor) < 0.1:
            return f"{str(float(faktor[:10])*100).replace(".", ","):0<8}"[:8]
        else:
            return f"{faktor:0<8}"[:8]
    except Exception as error:
        print(error)
    

def clear_LF(lf: str) -> str:
    if lf == "LF10":
        return "LFA"
    else:
        return lf


if __name__ == "__main__":
    #Set Path
    path_main = os.path.dirname(os.path.realpath(__file__))
    path_offer = os.path.join(path_main, "Offer")
    path_cleared = os.path.join(path_main, "Cleared")

    #create necessary folders"
    if not os.path.exists(path_offer):
        os.makedirs(path_offer)
    if not os.path.exists(path_cleared):
        os.makedirs(path_cleared)

    original_files = iter(os.listdir(path_offer))


    #Clear files
    for file in original_files:
        o_file = pd.DataFrame()
        
        if os.path.splitext(file)[1].upper() == ".XLSX":
            try:
                o_file = pd.read_excel(os.path.join(path_offer, file), dtype="str")
            except PermissionError:
                continue

        elif os.path.splitext(file)[1].upper() == ".CSV":
            try:
                o_file = pd.read_csv(os.path.join(path_offer, file), dtype="str", sep=";", encoding_errors="ignore")
            except PermissionError:
                continue
        else:
            continue

        index_header = o_file[o_file.isin(["Typ", "MODEL_GUID", "VALID_FROM", "VALID_TO", "LAUFZEIT", "FAKTOR"]).any(axis=1)].index.values
        
        if index_header and index_header.size == 1:
            header = o_file.loc[index_header[0]].values
            o_file = o_file.loc[index_header[0]+1:, :]
            o_file.columns = header
            o_file.dropna(subset=["FAKTOR"], inplace=True)
            o_file["Typ"] = "LF"
            o_file["LF"] = o_file["LF"].apply(clear_LF)
            o_file["VALID_FROM"] = o_file["VALID_FROM"].apply(clear_date)
            o_file["VALID_TO"] = o_file["VALID_TO"].apply(clear_date)
            o_file["FAKTOR"] = o_file["FAKTOR"].apply(clear_faktor)
            o_file = o_file.loc[:, "Typ":"FAKTOR"]

            try:
                o_file.to_csv(os.path.join(path_cleared, f"{os.path.splitext(file)[0]}.csv"), index=False, sep=";", encoding="utf-8")
            except PermissionError:
                continue
        else:
            continue

What I have tried:

I tried the converters parameter in the read_excel method with this function:

def to_excel(strng): return str(strng)

The parameter dtype="str" in the read_excel method has also not the effect, I want.

I also tried to open the excel-file with openpyxl, but the result is the same, like opening the file with pandas (pandas is using openpyxl as engine).

Nothing of this is working.

0

There are 0 answers