Removing one row of headers

141 views Asked by At

My professor wants us to download an excel file directly from the website, and part of the analysis, we need to generate the sum of some of the columns (the professor suggested that we use starts_with). The point is that the lines with the same name (the second header, if I can call it this way) are the second line, and the RStudio is reading as an observation instead of a proper header. I tried to delete the first row, but the r deleted the header I didn't want. I am going to put the codes here. Initially, I tried this one:

install.packages("tidyverse", dependencies = T)
install.packages("data.table", dependencies = T)
install.packages("readxl", dependencies = T)
install.packages("ggplot2", dependencies = T)
install.packages("openxlsx", dependencies = T)
library(tidyverse)
library(data.table)
library(readxl)
library(ggplot2)
library(openxlsx)
datatable <- data.table(openxlsx::read.xlsx('https://doi.org/10.1371/journal.pone.0242866.s001')) %>%
  tail(-1)

Later on, I tried separately (I uploaded the same without the tail(-1)) and in a second line I wrote:

dt <- dt[-1,]

I Also tried something that I saw on the internet with the:

name(dt) = NULL

but it gave me this problem: Error in View : Internal error: length of names (0) is not length of dt (61)

Can someone tell me the proper way? (In the second and third line I added one object dt = datatable, that is why it is different from the first one)

1

There are 1 answers

0
Matt On

You just need to use the startRow parameter:

datatable <- data.table::data.table(openxlsx::read.xlsx('https://doi.org/10.1371/journal.pone.0242866.s001', startRow = 2)) 

Which makes your column names:

> names(datatable)
 [1] "Time.mark"                "EXP1"                     "CAL1"                     "EXP2"                     "CAL2"                    
 [6] "EXP3"                     "CAL3"                     "EXP4"                     "CAL4"                     "EXP5"                    
[11] "CAL5"                     "EXP6"                     "CAL6"                     "EXP7"                     "CAL7"                    
[16] "EXP8"                     "CAL8"                     "EXP9"                     "CAL9"                     "EXP10"                   
[21] "CAL10"                    "EXP11"                    "CAL11"                    "EXP12"                    "CAL12"                   
[26] "EXP13"                    "CAL13"                    "EXP14"                    "CAL14"                    "EXP15"                   
[31] "CAL15"                    "EXP16"                    "CAL16"                    "EXP17"                    "CALIDAD/PRECIO.RTES"     
[36] "EXP18"                    "CAL18"                    "EXP19"                    "CAL19"                    "SAT1"                    
[41] "SAT2"                     "SAT3"                     "SAT4"                     "SAT5"                     "SAT6"                    
[46] "SAT7"                     "LEALTAD1"                 "LEALTAD2"                 "LEALTAD3"                 "LEALTAD4"                
[51] "LEALTAD5"                 "MEDINA.AZAHARA"           "MEZQUITA-CATEDRAL"        "ALCAZAR.REYES.CRISTIANOS" "SINAGOGA"                
[56] "FIESTA.PATIOS"            "IGLESIAS.FERNANDINAS"     "BARRIO.JUDERIA"           "Sex"                      "age"                     
[61] "level.of.study"  

Then you can use starts_with() to select columns:

datatable %>% 
  select(starts_with('EXP'))

     EXP1 EXP2 EXP3 EXP4 EXP5 EXP6 EXP7 EXP8 EXP9 EXP10 EXP11 EXP12 EXP13 EXP14 EXP15 EXP16 EXP17 EXP18 EXP19
  1:    5    6    5    4    7    5    6    6    5     6     5     5     6     6     6     6     6     6     6
  2:    6    7    7    6    6    7    6    7    7     7     5     7     7     6     7     6     6     6     6
  3:    5    7    5    6    6    6    5    4    6     7     4     5     6     5     5     5     5     6     4
  4:    5    4    6    5    7    6    2    5    6     6     7     7     7     5     5     6     6     5     4
  5:    5    4    6    5    7    6    7    5    6     6     7     7     7     5     5     6     6     6     3
 ---                                                                                                         
258:    6    6    6    6    5    6    7    5    7     6     7     6     5     5     7     5     6     6     5
259:    6    6    7    6    7    6    7    7    6     6     7     6     7     7     6     7     6     7     6
260:    6    6    7    6    6    6    6    6    5     7     6     7     6     7     7     6     7     7     7
261:    6    7    6    7    7    7    5    5    6     6     6     7     6     7     6     7     6     6     6
262:    5    7    7    6    6    7    6    5    6     7     5     7     7     5     7     6     7     6     6