Count the number of columns that have data by group

Question

Count the number of columns that have data by group

37 views Asked by WormWarbler At 08 February 2024 at 02:07

I have a dataframe with number of times a bird species was detected at a site (n=110) during two separate visits:

   Site     Yr Total.ACFL Total.AMCR Total.AMGO Total.AMKE Total.AMRE
   <chr> <int>      <int>      <int>      <int>      <int>      <int>
 1 BRY1   2021          0          0          0          0          0
 2 BRY2   2021          0          0          3          0          0
 3 BRY2   2021          1          0          2          0          0
 4 BRY3   2021          0          0          2          0          0
 5 BRY3   2021          0          0          7          0          0
 6 BRY4   2021          0          0          1          0          0
 7 BRY4   2021          0          0          1          0          0
 8 BRY5   2021          0          0          4          0          0
 9 BRY5   2021          0          0          4          0          0
10 BRY6   2021          0          0          8          0          0

There are a total of 56 species (columns that start with Total.). What I want to do is determine the total species richness (number of unique species detected) at each site. I want to produce a script that does this by determining the number of columns (e.g. Total.ACFL) that contain at least one detection per site. For example, the site BRY2 would have a species richness of 2 since AMGO and ACFL were detected during the two visits.

Any help would be greatly appreciated.

I have not tried anything as of yet (at a bit of a loss, frankly), but I am hoping to be steered in the proper direction. Assuming I might use the vegan package to my advantage.

Original Q&A

There are 2 answers

Mark On 08 February 2024 at 03:41

Here's a one-liner:

summarise(df, richness = rowSums(across(starts_with("Total"), ~ sum(.) > 0)), .by = Site)

Output:

# A tibble: 6 × 2
  Site  richness
  <chr>    <dbl>
1 BRY1         0
2 BRY2         2
3 BRY3         1
4 BRY4         1
5 BRY5         1
6 BRY6         1

**benson23** · Accepted Answer · 2024-02-08T02:20:13+00:00

pivot_longer and summarize

You can first reshape your dataframe into a "long" format for easier manipulation, then count the number of species having the value > 0.

library(dplyr)

df |> 
  pivot_longer(starts_with("Total")) |> 
  summarize(value = sum(value), .by = c(Site, Yr, name)) |> 
  summarize(count = sum(value > 0), .by = c(Site, Yr))

summarize and reframe

Another method using reframe(). This does not require reshaping the dataframe.

df |>
  summarize(across(starts_with("Total"), sum), .by = c(Site, Yr)) |>
  reframe(count = rowSums(pick(starts_with("Total")) > 0), .by = c(Site, Yr))

They have the same output:

# A tibble: 6 × 3
  Site     Yr count
  <chr> <int> <dbl>
1 BRY1   2021     0
2 BRY2   2021     2
3 BRY3   2021     1
4 BRY4   2021     1
5 BRY5   2021     1
6 BRY6   2021     1

TechQA.

Count the number of columns that have data by group

There are 2 answers

pivot_longer and summarize

summarize and reframe

Related Questions in R

Related Questions in DATAFRAME

Related Questions in VEGAN

Popular Questions

Trending Questions