(R) 'factor' function discards specified levels in vector

30 views Asked by At

First post on StackExchange, please forgive me if I format incorrectly!

If I specify the levels of a vector v in R, then call factor(v), not all of the levels will show up. I'm trying to figure out why is this the case because I need to see all levels (including "empty" levels) when I call factor for a project that I am working on.

A very simple replication of this:

x <- c('a', 'a', 'b', 'b', 'c', 'c')
levels(x) <- c('a', 'b', 'c', 'd')

Now if we call levels(x), it will output exactly what you'd expect:

> levels(x)
[1] "a" "b" "c" "d"

However, the levels change when calling factor(x):

> factor(x)
[1] a a b b c c
Levels: a b c

What happened to the 'd' level that I introduced? I know there is no datapoint associated with this level, but I don't see why the level should get removed when I call 'factor'. Unfortunately, I need to be able to reference all levels when I call 'factor', so is there anyway to work around this?

2

There are 2 answers

1
Gregor Thomas On

When you first create x, its class is character. When you assign it levels, it gains a levels attribute, but it is still character class, not a factor:

x <- c('a', 'a', 'b', 'b', 'c', 'c')
levels(x) <- c('a', 'b', 'c', 'd')
class(x)
# [1] "character"
str(x)
# chr [1:6] "a" "a" "b" "b" "c" "c"
#  - attr(*, "levels")= chr [1:4] "a" "b" "c" "d"

When you call factor on an object, it is converted to factor class, and as the ?factor documentation states, the default levels are

levels

an optional vector of the unique values (as character strings) that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x. Note that this set can be specified as smaller than sort(unique(x)).

Any existing levels are not considered

y = factor(x)
str(y)
# Factor w/ 3 levels "a","b","c": 1 1 2 2 3 3

Even if we start with a factor class object, calling factor on it "re-factors" it with the default levels--which are only the values that are present:

z = factor(c('a', 'a', 'b', 'b', 'c', 'c'))
levels(z) <- c('a', 'b', 'c', 'd')

str(z)
# Factor w/ 4 levels "a","b","c","d": 1 1 2 2 3 3

z = factor(z)
str(z)
# Factor w/ 3 levels "a","b","c": 1 1 2 2 3 3

As for workarounds:

  • Don't call factor on things that are already factors unless you want to change the levels. It's not clear why you need to do this. Use is.factor() to test if your object is a factor or not, and only call factor() on it if it isn't already.

  • If you really have to call factor on a factor object (or even a character object with a levels attribute) and want to preserve its levels, specify its old levels in the levels argument, e.g., x = factor(x, levels = levels(x)). Note that this won't work on an object without a levels attribute, as above you probably want to use is.factor() to test your input and act accordingly.

0
Elin On

Levels is an option in the factor() function.

z <- factor(c('a', 'a', 'b', 'b', 'c', 'c'), levels=c('a', 'b', 'c', 'd'))

If you don't explicitly set the levels, it will create them from the actual values.