I am trying to create a correlation matrix of the variables from IMDB movie prediction dataset from kaggle. When I try to plot the correlation matrix I get the following question marks in the matrix.

Correlation matrix

All the variables are numeric. How do i understand the question marks?

numeric_col <- sapply(df, is.numeric)
movie_numeric <- df[, numeric_col]
Correlation <- cor(movie_numeric)
corrplot(Correlation)

1 Answers

0
Karolis Koncevičius On

Like @neilfws said in his comment - NA values are represented by question marks.

You can try to avoid having NA values by using only pairwise-complete observations when computing the correlation matrix:

Correlation <- cor(movie_numeric, use="pairwise.complete.obs")