I recently built a simple R script to summarize three different data frames. Since updating to the newest version of R and R Studio, I am running into an output I haven't seen before when using the summarize function in dplyr for only one of the data frames (the other two are fine). I also receive a series of warnings that are unfamiliar to me. Please note that prior to updating, I ran the script exactly as written with no issues for any of the data frames.
The data frame with the problem is called VO2 and its is set up as follows:
Name Sex VO2
AthleteA M 50
AthleteA M 52
AthleteA M NA
AthleteB M 49
AthleteB M 56
AthleteB M 47
AthleteC M 42
AthleteC M NA
AthleteC M 41
AthleteD M NA
AthleteD M NA
AthleteD M NA
The code I run is:
Test.Summary.VO2 = VO2 %>% group_by(Name, Sex) %>%
summarise(Best.Score = max(VO2, na.rm=TRUE))
This code generates the following summary:
Name Sex Best.Score
AthleteA M 52
AthleteB M 56
AthleteC M 42
AthleteD M -Inf
The -Inf value is completely new in the output. I cannot figure out why it is appearing now for cases where there were only NAs.
As mentioned above, I have the exact same layout for a second data frame and run the same type of summary. Here everything works fine. When I summarize with na.rm=TRUE, it removes the NA cases without replacing NA cases with an -Inf value.
Where this gets a bit more unusual is that when I view the data frame using:
View(Test.Summary.VO2)
I receive the following series of warning messages:
There were 38 warnings (use warnings() to see them)
warnings()
Warning messages:
1: Unknown or uninitialised column: 'Quad'.
2: Unknown or uninitialised column: 'Quad'.
3: Unknown or uninitialised column: 'Quad'.
4: Unknown or uninitialised column: 'Quad'.
Later on in the script I generate a new variable called "Quad". But the warning above appears even after I clear the environment, and restart R Studio. I have even tried renaming the .csv file and importing using a different dataframe name. It's almost as if the column 'Quad' that is generated later in the script is hanging around somewhere in the environment.
I am really at a loss as to what might be happening here.
I hope one of the R experts on Stack can provide me with an idea on how to remedy this issue.
Thanks for you consideration.
See
?max
:You don't have any non-NA values for group D, so
max
returns the value for an empty set.