I have the following dataset with NaNs:
County 0
City 0
State 0
Postal Code 0
Model Year 0
Make 0
Model 286
Electric Vehicle Type 0
Clean Alternative Fuel Vehicle (CAFV) Eligibility 0
Electric Range 0
Base MSRP 0
Legislative District 312
DOL Vehicle ID 0
Vehicle Location 0
Electric Utility 0
2020 Census Tract 0
dtype: int64
As shown, there are 286 missing Models and 312 missing Legislative Districts.
For the Models, I've already identified the Makes (Count = 21) that have missing values. Found on the array below:
array(['KIA', 'VOLVO', 'NISSAN', 'BMW', 'TESLA', 'HONDA', 'CHEVROLET',
'TOYOTA', 'FORD', 'VOLKSWAGEN', 'JEEP', 'HYUNDAI', 'LAND ROVER',
'AUDI', 'CHRYSLER', 'POLESTAR', 'SUBARU', 'CADILLAC', 'MITSUBISHI',
'FISKER', 'RIVIAN'], dtype=object)
What I'm trying to do, is populate the missing Model's NaN with the Mode BASED on their Makes.
Ex.Model (Mode) = EV6 WHEN Make = KIA
I've also retrieved a list of the Makes and Models with the Model column showing the Mode:
df1 = (df.groupby('Make')['Model']
.apply(lambda x: x.mode().iat[0])
.reset_index())
Make Model
0 AUDI E-TRON
1 AZURE DYNAMICS TRANSIT CONNECT ELECTRIC
2 BENTLEY BENTAYGA
3 BMW X5
4 CADILLAC ELR
5 CHEVROLET BOLT EV
6 CHRYSLER PACIFICA
7 FIAT 500
8 FISKER KARMA
9 FORD MUSTANG MACH-E
10 GENESIS GV60
11 HONDA CLARITY
12 HYUNDAI IONIQ 5
13 JAGUAR I-PACE
14 JEEP WRANGLER
15 KIA NIRO
16 LAND ROVER RANGE ROVER SPORT
17 LEXUS NX
18 LINCOLN AVIATOR
19 LUCID AIR
20 MAZDA CX-90
21 MERCEDES-BENZ GLC-CLASS
22 MINI HARDTOP
23 MITSUBISHI OUTLANDER
24 NISSAN LEAF
25 POLESTAR PS2
26 PORSCHE TAYCAN
27 RIVIAN R1T
28 SMART FORTWO ELECTRIC DRIVE
29 SUBARU SOLTERRA
30 TESLA MODEL 3
31 TH!NK CITY
32 TOYOTA PRIUS PRIME
33 VOLKSWAGEN ID.4
34 VOLVO XC90
35 WHEEGO ELECTRIC CARS WHEEGO
How can I:
A) Replace NaN with the Model Modes (List) BASED on their Make? B) Simplify the mode calculation, list, and replacement of NaNs in one entry instead of splitting my work (If any)?
Thanks a lot, any input will be valuable.
Generated list of Makes and Models (Mode) using:
df1 = (df.groupby('Make')['Model']
.apply(lambda x: x.mode().iat[0])
.reset_index())
Expecting to replace NaNs with the Model Modes according to their Make.
I would say that the easiest way to do this is by masking out the nan values and then replacing from a dictionary.
Hope this helps!