R get all categories in column

76.9k views Asked by At

I have a large Dataset (dataframe) where I want to find the number and the names of my cartegories in a column.

For example my df was like that:

 A   B   
 1   car
 2   car
 3   bus
 4   car
 5   plane 
 6   plane 
 7   plane 
 8   plane 
 9   plane 
 10   train

I would want to find :

  car
  bus
  plane
  train
  4

How would I do that?

6

There are 6 answers

0
CCD On BEST ANSWER
categories <- unique(yourDataFrame$yourColumn) 
numberOfCategories <- length(categories)

Pretty painless.

1
sconfluentus On

You can simply use unique:

x <- unique(df$B)

And it will extract the unique values in the column. You can use it with apply to get them from each column too!

0
AudioBubble On

This gives unique, length of unique, and frequency:

table(df$B)
bus   car plane train 
1     3     5     1

length(table(x$B))
[1] 4
1
Rich Scriven On

I would recommend you use factors here, if you are not already. It's straightforward and simple.

levels() gives the unique categories and nlevels() gives the number of them. If we run droplevels() on the data first, we take care of any levels that may no longer be in the data.

with(droplevels(df), list(levels = levels(B), nlevels = nlevels(B)))
# $levels
# [1] "bus"   "car"   "plane" "train"
#
# $nlevels
# [1] 4
0
Rachael_Adl On

Firstly you must ensure that your column is in the correct data type. Most probably R had read it in as a 'chr' which you can check with 'str(df)'. For the data you have provided as an example, you will want to change this to a 'factor'. df$column <- as.factor(df$column) Once the data is in the correct format, you can then use 'levels(df$column)' to get a summary of levels you have in the dataset

0
V C On

Additionally, to see sorted values you can use the following:

sort(table(df$B), decreasing = TRUE)

And you will see the values in the decreasing order.