pandas combine two columns with null values

Question

pandas combine two columns with null values

61.6k views Asked by vagabond At 03 January 2017 at 17:56

I have a df with two columns and I want to combine both columns ignoring the NaN values. The catch is that sometimes both columns have NaN values in which case I want the new column to also have NaN. Here's the example:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})

df
Out[10]:
foodstuff   type
0   apple-martini   None
1   apple-pie   None
2   None    strawberry-tart
3   None    dessert
4   None    None

I tried to use fillna and solve this :

df['foodstuff'].fillna('') + df['type'].fillna('')

and I got :

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                   
dtype: object

The row 4 has become a blank value. What I want in this situation is a NaN value since both the combining columns are NaNs.

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4            None       
dtype: object

Original Q&A

There are 9 answers

Vikash Singh On 03 January 2017 at 18:02

You can always fill the empty string in the new column with None

import numpy as np

df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)

Complete code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})

df['new_col'] = df['foodstuff'].fillna('') + df['type'].fillna('')

df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)

df

output:

    foodstuff   type    new_col
0   apple-martini   None    apple-martini
1   apple-pie   None    apple-pie
2   None    strawberry-tart strawberry-tart
3   None    dessert dessert
4   None    None    NaN

piRSquared On 03 January 2017 at 18:07

fillna both columns together
sum(1) to add them
replace('', np.nan)

df.fillna('').sum(1).replace('', np.nan)

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                NaN
dtype: object

sirfz On 03 January 2017 at 18:15

you can use the combine method with a lambda:

df['foodstuff'].combine(df['type'], lambda a, b: ((a or "") + (b or "")) or None, None)

(a or "") returns "" if a is None then the same logic is applied on the concatenation (where the result would be None if the concatenation is an empty string).

Mastan Basha Shaik On 20 April 2020 at 05:23

You can replace the non zero values with column names like:

df1= df.replace(1, pd.Series(df.columns, df.columns))

Afterwards, replace 0's with empty string and then merge the columns like below:

f = f.replace(0, '')
f['new'] = f.First+f.Second+f.Three+f.Four

Refer the full code below:

import pandas as pd
df = pd.DataFrame({'Second':[0,1,0,0],'First':[1,0,0,0],'Three':[0,0,1,0],'Four':[0,0,0,1], 'cl': ['3D', 'Wireless','Accounting','cisco']})
df2=pd.DataFrame({'pi':['Accounting','cisco','3D','Wireless']})
df1= df.replace(1, pd.Series(df.columns, df.columns))
f = pd.merge(df1,df2,how='right',left_on=['cl'],right_on=['pi'])
f = f.replace(0, '')
f['new'] = f.First+f.Second+f.Three+f.Four

df1:

In [3]: df1                                                                                                                                                                              
Out[3]: 
   Second  First  Three  Four          cl
0       0  First      0     0          3D
1  Second      0      0     0    Wireless
2       0      0  Three     0  Accounting
3       0      0      0  Four       cisco

df2:

In [4]: df2                                                                                                                                                                              
Out[4]: 
           pi
0  Accounting
1       cisco
2          3D
3    Wireless

Final DataFrame f will be:

In [2]: f                                                                                                                                                                                
Out[2]: 
   Second  First  Three  Four          cl          pi     new
0          First                       3D          3D   First
1  Second                        Wireless    Wireless  Second
2                 Three        Accounting  Accounting   Three
3                        Four       cisco       cisco    Four

Sway Wu On 15 January 2021 at 02:21

We can make this problem even more complete and have a universal solution for this type of problem.

The key things in there are that we wish to join a group of columns together but just ignore NaNs.

Here is my answer:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 
               'type':[None, None, 'strawberry-tart', 'dessert', None],
              'type1':[98324, None, None, 'banan', None],
              'type2':[3, None, 'strawberry-tart', np.nan, None]})

df=df.fillna("NAN")
df=df.astype('str')
df["output"] = df[['foodstuff', 'type', 'type1', 'type2']].agg(', '.join, axis=1)
df['output'] = df['output'].str.replace('NAN, ', '')
df['output'] = df['output'].str.replace(', NAN', '')

keepAlive On 15 May 2021 at 13:21

If you deal with columns that contain something where the others don't and vice-versa, a one-liner that does well the job is

>>> df.rename(columns={'type': 'foodstuff'}).stack().unstack()
         foodstuff
0    apple-martini
1        apple-pie
2  strawberry-tart
3          dessert

... which solution also generalises well if you have multiple columns to "intricate", as long as you can define your ~.rename mapping. The intention behind such renaming is to create duplicates that ~.stack().unstack() will then process for you.

As explained, this solution only suits configuration with orthogonal columns, i.e. columns that never are simultaneously valued.

rachwa On 18 June 2022 at 21:22

With combine_first you can fill null values in one column with non-null values from another column:

In [3]: df['foodstuff'].combine_first(df['type'])
Out[3]: 
0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4               None

KBurchfiel On 13 March 2024 at 19:37

If you initialize your DataFrame with NaNs for missing values rather than None, you can use Series.add() to fill NaN values on the fly when adding the columns together.

Example:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', np.NaN, np.NaN, np.NaN], 
'type':[np.NaN, np.NaN, 'strawberry-tart', 'dessert', np.NaN]})

df['foodstuff'].add(df['type'], fill_value = '')

Result:

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                NaN

This also works nicely for adding numerical columns that have some NaN values, as it allows you to add a number to a NaN value and get the number. Example:

df_test_nums = pd.DataFrame({'left_numbers':[1, 1, np.NaN, 3.7, 2.4], 
'right_numbers':[4, np.NaN, np.NaN, 2.7, 9.4]})
print(df_test_nums)

Result:

   left_numbers  right_numbers
0           1.0            4.0
1           1.0            NaN
2           NaN            NaN
3           3.7            2.7
4           2.4            9.4

Adding these columns together so that the sum of a number and a NaN value will be the number:

df_test_nums['left_numbers'].add(
    df_test_nums['right_numbers'], fill_value = 0)

Result:

Compare this to the use of the + operator, which converts the sum of NaN and a number into NaN:

df_test_nums['left_numbers'] + df_test_nums['right_numbers']

Result:

For operations that involve multiple columns, a more elegant approach is available via df.sum().

print(df_test_nums[
          ['left_numbers', 'right_numbers']].sum(
              axis=1, min_count = 1))

output:

Note that, if min_count is set to 0 (the default), the 3rd row will equal 0, since that's the default output when values consisting only of NaNs are added together. (See the df.sum() documentation for more information.)

**root** · Accepted Answer · 2017-01-03T18:05:36+00:00

root On 03 January 2017 at 18:05 BEST ANSWER

Use fillna on one column with the fill values being the other column:

df['foodstuff'].fillna(df['type'])

The resulting output:

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4               None

TechQA.

pandas combine two columns with null values

There are 9 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in NONETYPE

Popular Questions

Trending Questions