After separating into columns, how do I sort my data so that it's in the right column?

Question

After separating into columns, how do I sort my data so that it's in the right column?

218 views Asked by sisherb At 05 April 2022 at 15:01

I have a taxonomy list created from bracken that I want to import to phyloseq. I have my taxa in one column like so:

I've managed to separate these into columns based on their taxa (see code below). The problem arises when there's candidate species such as

k__Bacteria|f__Candidatus_Chazhemtobacteraceae|g__Candidatus_Chazhemtobacterium

As you can see, it needs to go into the Family column, and the Phylum and Class (etc) columns should be left NA. I'm stuck on how to do this; after separating into columns, how can I then sort the taxa into the correct classification based on the "x__" string (and then remove this string?)

Thank you!

I've managed to separate into the columns "Kingdom", "Phylum", "Class" "Order", "Family" and "Genus" using dplyr:

new_tax <- s_abund %>% 
  dplyr::rename("taxonomy" = "#Classification") %>% 
  dplyr::select(taxonomy) %>%
  separate(taxonomy, sep = "\\|", remove = FALSE, into = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus"))

Original Q&A

There are 1 answers

**langtang** · Answer 1 · 2022-04-05T15:43:24+00:00

You can do use strsplit, and then pivot_wider() like this:

d %>% 
  mutate(taxs = strsplit(taxonomy,split="|",fixed=TRUE),
         rowid =row_number()) %>% 
  unnest(taxs) %>% 
  separate(taxs,into = c("level","value"),sep = "__") %>% 
  pivot_wider(id_cols=rowid,names_from = level,values_from = value)

Output:

  rowid k        p                    c                   o                     f                               g                            
  <int> <chr>    <chr>                <chr>               <chr>                 <chr>                           <chr>                        
1     1 Bacteria Coprothermobacterota Coprothermobacteria Coprothermobacterales Coprothermobacteraceae          Coprothermobacter            
2     2 Bacteria NA                   NA                  NA                    Candidatus_Chazhemtobacteraceae Candidatus_Chazhemtobacterium

Input:

structure(list(taxonomy = c("k__Bacteria|p__Coprothermobacterota|c__Coprothermobacteria|o__Coprothermobacterales|f__Coprothermobacteraceae|g__Coprothermobacter", 
"k__Bacteria|f__Candidatus_Chazhemtobacteraceae|g__Candidatus_Chazhemtobacterium"
)), class = "data.frame", row.names = c(NA, -2L)

TechQA.

After separating into columns, how do I sort my data so that it's in the right column?

There are 1 answers

Related Questions in R

Related Questions in TAXONOMY

Related Questions in PHYLOSEQ

Popular Questions

Popular Tags

Trending Questions