I have cleared my data into the next 6 columns that you see ahead that it is my input data. I split the dataset to have the label in another variable Y.
My main problem: I don't know how to preprocess the data to have a good input to any model.
My dataset X looks like this:
| desc | tipo | address | region | latitude | longitude |
|---|---|---|---|---|---|
| Galpón | Industria | Subdivisión de la Finca Denominada Violeta S/N | Región de Arica y Parinacota-Arica | -19.423411 | -11.371551 |
- desc - string
- tipo - string
- address - string
- region - string
- latitude - string
- longitude - string
My dataset Y looks like
| CIO |
|---|
| 169379 |
What I tried
I have followed this tutorial that allows me to comprehend a little bit about tabular data, but the data is completely different and I don't know if it fits me as well. So, my code transformed all the data to a LabelEncoder, but it is obvious that doesn't apply to latitude and longitude.
for col in df.columns:
if df.dtypes[col] == "object":
df[col] = df[col].fillna("NA")
else:
df[col] = df[col].fillna(0)
df[col] = LabelEncoder().fit_transform(df[col])
for col in df.columns:
df[col] = df[col].astype('category')
Also, the author used some Categorical Embedding that I don't know if works properly with my kind of data too.