I have a large dataframe, and I needed to create a new columns using a formula that works with 3 different columns and assign the value of 2nd column to the 3rd if the 1st column have a given value. The problem is that I'm receiving many errors for everything I do, like:

  • SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

  • FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas.

So please I need your help to be able to solve it because I couldn't found a positive way to do it.

The dataframe shape is around 49MM rows x 100 cols, and in this case I only used 2 columns as example, but I need to do many of this new columns that combines in formulas several columns of the dataframe at the same time.

These are all my attemps, but all of them gave me Pandas Errors.

df2 = df[['Unit','Qty']]

# 1st try: Copy Qty column to QtyKG and replace with zeros where Unit
# location isn't 'KG'.
df2['QtyKG'] = df['Qty'].copy()
df2.loc[df2['Unit'] != 'KG','QtyKG'] = 0

# 2nd try: Create a new column named QtyKG filled with zeroes and 
# assign Qty to indexes of Unit='KG'
df2['QtyKG'] = 0
df2['QtyKG'] = df2.loc[df2['Unit'] == 'KG','Qty']

# 3rd try: For all index values assing zero to QtyKG and then, using 
# a FOR cycle, assign Qty value to QtyKG column for indexes where Unit
# is 'KG' managed as a list of index values.
df2.loc[:,'QtyKG'] = 0
for i in df2.loc[df2['Unit'] == 'KG'].index.to_list():
    df2.loc[i, 'QtyKG'] = df2.loc[i, 'Qty']

# 4th try: For each index from 0 to len(df2) copy Qty to QtyKG if 
# Unit is 'KG', else QtyKG is 0.
for i in range(len(df2)):
    if df2.loc[i,'Unit'] == 'KG':
        df2.loc[i,'QtyKG'] = df2.loc[i,'Qty']
    else:
        df2.loc[i,'QtyKG'] = 0

df2

It worked somehow, but I received this errors:

C:\Users\arlain\AppData\Local\Temp\ipykernel_24992\3058135089.py:4:
SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/
pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus
-a-copy
#1st  df2['QtyKG'] = df['Qty'].copy()
      df2.loc[df2['Unit'] != 'KG','QtyKG'] = 0
#2nd  df2['QtyKG'] = 0
      df2['QtyKG'] = df2.loc[df2['Unit'] == 'KG','Qty']
#3rd  df2.loc[:,'QtyKG'] = 0
      df2.loc[i, 'QtyKG'] = df2.loc[i, 'Qty']
#4th  df2.loc[i,'QtyKG'] = df2.loc[i,'Qty']
      df2.loc[i,'QtyKG'] = 0

Finally they do the job, but I want to know a way to completely avoid the errors/warnings and the correct way to do it.

BTW, the questions #64096923 #45170312 #60849563 asked something similar, but from a different approach as my case.

Thanks a lot in advance.

[ I'm using Python 3.10.6 and Pandas 2.1.2 ]

0

There are 0 answers