I am working with a pandas dataframe where a column has non standard values in it. Is there a way that i can extract or replace char and digits in the column. I am very new to applying regex patterns to clean data.
one col is Precise_Age and second col is Browser.
In browser col i want only name and version.( if version is 10.1.2 then i want only 10)- Android 10 , Android 4 , iOS 11 etc.
Browser desired_output
75.0.3770.143 | Chrome Dev | Android | 9 Android 9
78.0.3904.108 | Chrome Dev | Android | 9 Android 9
79.0.3945.93 | Chrome Dev | Android | 9 Android 9
79.0.3945.93 | Chrome Dev | Android | 8.0.0 Android 8
| | Android | 8.1.0 Android 8
79.0.3945.116 | Chrome Dev | Android | 10 Android 10
79.0.3945.93 | Chrome Dev | Android | 5.1 Android 5
| | Android | 10 Android 10
| Facebook | Android | 8.1.0 Android 8
79.0.3945.116 | Chrome Dev | Android | 4.4.4 Android 4
| | Android | 8.1.0 Android 8
79.0.3945.79 | Chrome Dev | Windows | 8 Windows 8
77.0.3865.116 | Chrome Dev | Android | 9 Android 9
88.1.284108841| Google Search | iOS | 13.3 iOS 13
In Age col , i want only standard values , replaces blanks , commas etc. if age has more than 100 values then make it all values to missing.
Age desired_output
67 67
66 66
67.5 67
60대후반 60
1949ë…„ null
63세 63
83ë…„ìƒ 83
11세 11
7217861839 null
59 years 59
60세 60
73.87083774 73
54ë…„ìƒ 54
55세 55
327 null
37ë…„ìƒ 37
642 null
523 null
0.61 0
53세 53
42ë…„ìƒ 42
757575 null
91.98192554 91
1.11991 1
83세(만82세) 83
4324234 null
8827 null
11 Years 11
After split the
Browser
column using|
as separator, you can extract or replace char and digits in the column using the map to transform the data the way you need. Join the last two columns to obtain the desired output of this frame.The same principle used earlier can be applied again to replace data on the column
Age
, now using there.sub
as themap
function to get only "the standard values".Output from df and age