Categorizing a large dataset into different classes

Question

Categorizing a large dataset into different classes

45 views Asked by orangecar At 17 January 2024 at 16:43

I have a data set in which i have diameters of lunar craters. I need to group them into different categories.(using python)

Column names in my data are ID, latitude, londitude, diameter and depth. Values seperated by space

For eg: there is a crater of 980m as diameter then it does into the class of craters whose diameter is less than 1 km (let us name that category as SET1) similarly there is a crater of diameter 40 km then it goes into the category of craters whose diameter is less than 50km but greater than 30km (let us name it as SETX). I need to create such categories and classify all these craters into them.

I also need to count the number of craters in each such categories. Also note there are almost 0.8 million craters in my data.

I need ideas or solution to how can I can solve the above problem.

Original Q&A

There are 1 answers

**AKX** · Answer 1 · 2024-01-17T16:49:06+00:00

0.8 million craters isn't that much.

Since I don't know your data format exactly, this isn't guaranteed to work out of the box, but the basic idea is to simply to read the data, bin it using pd.cut, and to print value counts.

import pandas as pd

# Read text file into dataframe
df = pd.read_csv(
    "craters.txt",
    sep=" ",
    header=None,
    names=["ID", "latitude", "longitude", "diameter", "depth"],
)

# Create new category column based on bins and names (change as desired)
df["category"] = pd.cut(
    df["diameter"],
    bins=[0, 1, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
    labels=["SET1", "SET2", "SET3", "SET4", "SET5", "SET6", "SET7", "SET8", "SET9", "SET10", "SET11"],
)

# Print the counts.
print(df["category"].value_counts())

TechQA.

Categorizing a large dataset into different classes

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in CSV

Related Questions in MACHINE-LEARNING

Related Questions in MULTICLASS-CLASSIFICATION

Popular Questions

Trending Questions