Splitting a Pandas series and Assign Them into Separate Columns

408 views Asked by At

I have this following data frame (df):

mut   gene   pvalue    chrom
1:23456_A>G  0.005     chr1  
2:28484_A>G  0.0001    chr2
4:47629_A>G  0.05      chr4
3:88382_A>G  0.00001   chr3
10:88273_A>G 0.005    chr10

[30 rows x 4 columns]

I am trying to create four columns along with their column name labels from the "mut" column of df and assigned it into newly created df_new that looks like this

chr    st    ref   alt 
1     23456   A     G  
2     28484   A     G  
4     47629   A     G

The resulted data frame (df_new) is basically an extraction of column mut from df and then a separation of each part of the string, i.e: split(":") then split("_") and finally split(">") where we end up with 4 parts of the original field 1 23456 A G and then placed into their columns.

Here is my attempt:

df_new["chr"], df_new["st"], df_new["ref"],    
df_new["alt"] = df.mut.str.split("[:_>]")

but I end up with an error message as the following:

ValueError: too many values to unpack (expected 4)

a simple print statement reveals the result of this line of code:



0   [1, 23456, A, G]  
1   [2, 28484, A, G]

Is there a way to solve this in pandas where you create a new data frame from the separation of the string fields into 4 columns with their columns labels included?


There are 1 answers


Lets try .str.split(expand=True)


 chr     st ref alt
0   1  23456   A   G
1   2  28484   A   G
2   4  47629   A   G
3   3  88382   A   G
4  10  88273   A   G