R: How create a edgelist out of a dataframe with many empty/blank fields?

37 views Asked by At

I need an edgelist for an network analisys. My Dataset is an big .csv file with 56400 rows an 1330 columns, but most fields are empty/blank. To create the edgelist I wannt to combine the fields from my data set to a data fram with 2 columns and X rows. My code:

#read my dataset
mydf <- read.csv("....\\clean5.csv",
                 header = TRUE, sep = ";", encoding = "UTF-8")

#creating a edgelist without blank fields
edgelist <- testdf %>% 
  apply(., 1, function(.x){   
    .x[!is.na(.x)] %>%        
      combn(2) %>%             
      t() %>%                
      as.data.frame()         
  }) %>% 
  do.call(rbind, .)

My Problem: My code cant detect witch fields are empty/blank, so most fields in my edgelist are combinations with blank fields ... and thats makes the edgelist way too big. All fields have the datatyp "character", I think thats my problem ... but I cant solv it.

I tryed to transforming all blank fields to NA, so that my code can detect. The datatyp dosent change from character to logic, so the fields become <NA> and not "real" NA. I searched for solutions, but nothing worked for me.

Dose someone has an idea? I'm very thankful for every suport :)

Update:

Here is a dput() from a small part of my data:

test <- 
structure(list(
X1 = c("A ", "A ", "A ", "A ", "A ", "A ", "A ", "A ", "A ", "A ", "A ", "A ", "A ", "A ", "A ", "A ", "A ", "A ", "A ", "A "),
X2 = c("B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B"),
X3 = c("", "C ", "", "C ", "", "C ", "", "C ", "", "C ", "", "C ", "", "C ", "", "C ", "", "C ", "", "C "),
X4 = c("", "", "D", "", "", "", "D", "", "", "", "D", "", "", "", "D", "", "", "", "D", ""),
X5 = c("", "", "E ", "", "", "", "E ", "", "", "", "E ", "", "", "", "E ", "", "", "", "E ", ""),
X6 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
X7 = c("", "", "G ", "", "", "", "G ", "", "", "", "G ", "", "", "", "G ", "", "", "", "G ", ""),
X8 = c("", "", "", "", "", "", "", "", "", "", "", "", "", "A ", "A ", "A ", "A ", "A ", "", ""),
X9 = c("", "", "", "", "", "", "", "", "", "", "", "", "", "B", "B", "B", "B", "B", "", ""),
X10 = c("", "", "", "", "", "", "", "", "", "", "", "", "", "C ", "", "C ", "", "C ", "", ""),
X11 = c("", "", "", "", "", "", "", "", "", "", "", "", "", "", "D", "", "", "", "", ""),
X12 = c("", "", "", "", "", "", "", "", "", "", "", "", "", "", "E ", "", "", "", "", ""),
X13 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
X14 = c("", "", "", "", "", "", "", "", "", "", "", "", "", "", "G ", "", "", "", "", ""),
X15 = c("", "", "", "", "", "", "", "", "", "", "", "", "", "A ", "A ", "A ", "A ", "A ", "", ""),
X16 = c("", "", "", "", "", "", "", "", "", "", "", "", "", "B", "B", "B", "B", "B", "", ""),
X17 = c("", "", "", "", "", "", "", "", "", "", "", "", "", "C ", "", "C ", "", "C ", "", ""),
X18 = c("", "", "", "", "", "", "", "", "", "", "", "", "", "", "D", "", "", "", "", ""),
X19 = c("", "", "", "", "", "", "", "", "", "", "", "", "", "", "E ", "", "", "", "", ""),
X20 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, FALSE, NA, NA, NA, NA, NA)
), row.names = c(NA, 20L), class = "data.frame")
0

There are 0 answers