I have square matrix like this.
ACSM3 ACSX12 ADXM28 ... UGT2B15 VCAN XK
ACSM3 1.000000 0.929347 0.999914 ... 0.986433 0.999947 -0.999680
ACSX12 0.929347 1.000000 0.924428 ... 0.977350 0.925496 -0.919704
ADXM28 0.999914 0.924428 1.000000 ... 0.984196 0.999996 -0.999925
ADAM28 0.999976 0.926774 0.999981 ... 0.985275 0.999994 -0.999831
ADH1B -0.999509 -0.917317 -0.999834 ... -0.980802 -0.999778 0.999982
ADTRP -0.999039 -0.912273 -0.999528 ... -0.978290 -0.999438 0.999828
AEBP1 0.983312 0.846668 0.985611 ... 0.940104 0.985133 -0.987601
AKR1B10 -0.999658 -0.919371 -0.999915 ... -0.981800 -0.999874 1.000000
UBL3 0.997347 0.900002 0.998215 ... 0.971864 0.998043 -0.998870
UGT2B15 0.986433 0.977350 0.984196 ... 1.000000 0.984690 -0.981961
VCAN 0.999947 0.925496 0.999996 ... 0.984690 1.000000 -0.999887
XK -0.999680 -0.919704 -0.999925 ... -0.981961 -0.999887 1.000000
After using stack function I'm bring the data to the shape what I want, but as you can see there is multiple values for all data because of comparing each other.
dfHealty = df_healtyWithGenes.stack().reset_index()
dfHealty.columns = ['gene1', 'gene2', 'score']
dfHealty = dfHealty[dfHealty.gene1 != dfHealty.gene2]
I can filter by score but its not good idea, data may be broke.
How can I filter by gene column?
gene1 gene2 score
EPB41L4B PGC 0.496713249
PGC EPB41L4B 0.496713249
CHGA MT1G 0.496751983
MT1G CHGA 0.496751983
AEBP1 FCER1G 0.497061368
FCER1G AEBP1 0.497061368
ADTRP CAPN9 0.497122603
CAPN9 ADTRP 0.497122603
FAM189A2 GLUL 0.49721763
GLUL FAM189A2 0.49721763
CA9 DUOX1 0.497233294
DUOX1 CA9 0.497233294
EDNRA MSLN 0.497267565
MSLN EDNRA 0.497267565
HRASLS2 LIPF 0.497581499
LIPF HRASLS2 0.497581499
EPB41L4B NEDD4L 0.497613643
NEDD4L EPB41L4B 0.497613643
I need to convert data like this.
gene1 gene2 score
EPB41L4B PGC 0.496713249
CHGA MT1G 0.496751983
AEBP1 FCER1G 0.497061368
ADTRP CAPN9 0.497122603
FAM189A2 GLUL 0.49721763
CA9 DUOX1 0.497233294
EDNRA MSLN 0.497267565
Using the data given you can remove the duplicate pairs in the data like this
Which produces output like this