What I'm trying to do is I have a PDF and I extracted the data using tabula
import pandas as pd import tabula import numpy
dfs = tabula.read_pdf("sample.pdf",stream=True,pages="all")
df=dfs[0] df.columns = ["NumVenta", "FechaDeVenta", "NombreDePaciente", "NombreDeVendedor", "Articulo", "Importe", "Pago", "FormaDePago"]
But since it comes from a PDF the data is wierd since the first note JA2944 is divided into 3 different rows and I need it to join the data into 1 simple row
#Note,Date,Pacient,Seller,Articles,Import,Bill,Way of Payment JA2944,30/09/20,Gonzalez Tabera,Erick Eduardo, Armazon RAY BAN RB 6421,"$8,577.00","$2,700.0",Efectivo "",23,Alfonso,Lopez Osorio,2997,,0, "",,,, Lente Progresivo Transition,,, JA2943,30/09/20,Cuevas Bates,Erick Eduardo, Lente Monofocal,$990.00,$990.00,Terminal Bancomer "",23,Isabela,Lopez Osorio, TINTE,,, JA2942,30/09/20,Villanueva Batun,Erick Eduardo, Armazon OAKLEY OX 8046,"$3,765.00","$2,975.0",Efectivo "",23,Hector Alexis,Lopez Osorio,1657,,0, "",,,, Lente C/ AntiReflejante,,, "",,,,Monofocal,,,
I was trying to use merge but I'm very confuse because I need 2 dataframes, and I was able to get another df with the list of all rows with Null values in the Note column.But I don't know how to merge it with the row above.
Empty_Row_Notes = df[df["NumVenta"].isna()]