Sum NA values in r

3.2k views Asked by At

I am using a dataframe that has multiple NA values so I was thinking about sorting the attributes based on their NA values. I was trying to use a for loop and this is what I have so far:

> data <- read.csv("C:/Users/Nikita/Desktop/first1k.csv")
> for (i in 1:length(data) ) {
+ temp <- c(sum(is.na(data[i])))}
> temp
[1] 0

It is the first time I am using a for loop in r so I am sure it is just a silly syntax problem but I can't understand which one exactly.

Ultimately, I need a list that shows the name of the attribute and its NA count. This way I could sort the list and get the desired information. Here is some mock data to make it easier.

data <- data.frame(A = c(500, 600, 700, 1000),
                   B = c(500, 600, 700, NA),
                   C = c(NA, NA, 500, 700),
                   D = c(800, NA, 933, NA),
                   E = c(NA, NA, NA, NA))

Edit: Thank you all for the help. All three solution worked for me. I do wonder though if there is a one line code that will sort those attributes before I export them into a file. like I mentioned before, I am quite new in r so I am not sure if it is possible.

Edit 2: When I run the sort is gives me the next error:

temp <- sort(temp)
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 
  'x' must be atomic

Any idea why?

3

There are 3 answers

7
Eli Korvigo On BEST ANSWER

The right way to do iterative code in R is to avoid explicit for loops. Use apply (and the company) instead. @jeremycg gave you the right R-ish answer. Regarding your code, you should make some editing to make it work.

temp <- c()
for (i in 1:length(data)){
    temp[names(data)[i]] <- sum(is.na(data[i]))
}

You had temp rewritten at each iteration. Moreover you didn't write the labels of your variables into temp. Hence the output you see is the number of NAs in the last column of your dataset.

Regarding OP's edit

temp <- sort(temp) # pass decreasing=T into arguments in case
                   # you want reversed order
6
jeremycg On

Here is a quick answer using is.na and colSums:

colSums(is.na(data))

returning:

 A B C D E 
 0 1 2 2 4 

for your above data.

Thanks to @akrun for showing my surplus apply

2
blakeoft On

This answer shows how to make the for loop work.

temp <- vector(length = ncol(data))

for (i in 1:length(data)) {
   temp[i] <- c(sum(is.na(data[, i])))
}

names(temp) <- colnames(data)

temp
# A B C D E 
# 0 1 2 2 4