Calculate the proportion of bacteria and protein in R (R-beginner question)

20 views Asked by At
library(readxl)
Data <- read_excel("20240214.xlsx")

library(tidyverse)
Data_frame <- separate(Data, col = "FUN", into = c("Bacteria", "Protein"), remove = TRUE, sep = ";" )
View(Data_frame)
Bacteria       Protein    Log2quantity Sample 1   Log2quantity Sample 2  Log2quantity Sample 3 ...

clostridium    ABC transporter   15.2                     5.2                   2.1
clostridium    Kinase1            8.2                     1.2                   8.2
bacillus       ABC transporter    5.5                     8.8                   24.2
bacillus       Oxidoreductase     3.2                     10.2                  12.2
bacillus       Kinase1            2.1                     1.2                   42.2
firmicutes     Kinase1            9.9                     9.2                   22.2
...            ...                ...                     ...                   ...

I have this protein data that shows (1) The protein name (2) the Bacteria name (deduced from the protein) (3) The amount of protein.

I would like to summarize the proportions of protein and bacteria per each sample. After, since there are too many different names for bacteria and proteins, I want to see the short data table containing only the top 20 most abundant in both bacteria and proteins per sample.

For instance, it would look like - 

clostridium= 15%, bacillus = 5%,  firmicutes= 2% ... in sample 1
clostridium= 2%, bacillus = 15%,  firmicutes= 42% ...in sample 2
clostridium= 12%, bacillus = 11%,  firmicutes= 6% ...in sample 3
 
ABC transporter= 30%, Kinase1= 15%, Oxidoreductase= 3% ...in sample 1
ABC transporter= 10%, Kinase1= 11%, Oxidoreductase= 12% ...in sample 2
ABC transporter= 20%, Kinase1= 55%, Oxidoreductase= 21% ...in sample 3

How should I do it in R and export the data in an Excel file? Because I expect the summary data will be huge with the long list. Therefore, I would like to view it more readable in Excel.

I'm a beginner in R and just know a few visualization methods with ggplot2. It seems like it could work with dplyr, but I'm not sure how should I start with it. Especially, I would like to know if it is even possible to export relatively huge data in an Excel file with R.

0

There are 0 answers