Thanks to a lot of people I have my charts working being new to R.
I have three charts
plot frequency ordered
plot Pareto overlay
if You look closely you can see the scaled ordered frequency chart is there at the bottom.
```{r}
df <- filter(df_clean_distances, end_station_name != "NA" )
d <-df %>% select( end_station_name) %>%
group_by(end_station_name) %>%
summarize( freq = n())
head(d$freq )
dput(head(d))
d2 <- d[ order(-d$freq),]
d2
plot random
```{r}
ggplot(d2, aes( x=end_station_name, y= freq)) +
geom_bar( stat = "identity") +
theme( axis.text.x = element_blank()) +
ylim( c(0,40000))
```
plot freq ordered
```{r}
ggplot(d2, aes( x=reorder(end_station_name,-freq), y= freq)) +
geom_bar( stat = "identity") +
theme(axis.text.x = element_blank()) +
ylim( c(0,40000))+
labs( title = "end station by freq", x = "Station Name")
Plot with Pareto overlay
```{r}
ggplot(d2, aes( x=reorder(end_station_name,-freq), y= freq)) +
geom_bar( stat = "identity") + theme(axis.text.x = element_blank()) +
ggQC::stat_pareto( point.color = "red", point.size = 0.5) +
labs( title = "end station by freq", x = "Station Name")
```
dput(head) output
```{r}
> dput(head(d, n=20))
structure(list(end_station_name = c("2112 W Peterson Ave", "63rd St
Beach",
"900 W Harrison St", "Aberdeen St & Jackson Blvd", "Aberdeen St &
Monroe St",
"Aberdeen St & Randolph St", "Ada St & 113th St", "Ada St &
Washington Blvd",
"Adler Planetarium", "Albany Ave & 26th St", "Albany Ave &
Bloomingdale Ave",
"Albany Ave & Montrose Ave", "Archer (Damen) Ave & 37th St",
"Artesian Ave & Hubbard St", "Ashland Ave & 13th St", "Ashland Ave &
50th St",
"Ashland Ave & 63rd St", "Ashland Ave & 66th St", "Ashland Ave &
69th St",
"Ashland Ave & 73rd St"), freq = c(1032L, 2524L, 3836L, 8383L,
6587L, 6136L, 18L, 6281L, 12050L, 397L, 2833L, 1875L, 710L, 1879L,
2659L, 151L, 112L, 102L, 78L, 8L)), row.names = c(NA, -20L), class =
c("tbl_df", "tbl", "data.frame"))
```
As you can see the pareto plot is working for the right hand scale but the left hand is out of wack by a lot. While there are 3 million rows the scaling on the y axis has reduced the freq to a very small curve along the bottom, it is there on the left just hard to see.
How do I fix the left y axis to limit to about 40,000 so that the frequency curve shows up correctly?
Here is a solution but not with package
ggQC
, withsec_axis
.The trick is to pre-compute
max(freq)
and then use it as a scale factor in order to align the two axis. This data preparation code is inspired in this rstudio-pubs blog post.