Reading top level blocks from a text file in R

37 views Asked by At

I am working in Rwith files that contain blocks e.g.

block name { block contents can be anything: strings, numbers or even curly braces {} or whatever}

blockn4m3 containing numbers {
                                 Can be something junk like: 
                   ans{a{a[sf'asödfä'asdösdö'äasdö'äasdö}}}
}}

And then I would like to extract them into a vector so that:

"block name { block contents can be anything strings, numbers or even brackets {} or whatever}","blockn4m3 containing numbers {
                                 Can be something junk like: 
                   ans{a{a[sf'asödfä'asdösdö'äasdö'äasdö}}}
}}"

I assume regular expressions do not work, since there can be curly braces (and nested blocks) within blocks?

So I thought that maybe I just read every file character by character, and then I wrote a following function:

separateBlocksFromFile <- \(file) {
  input <- file %>% readLines %>% {paste(., collapse = "\n")}
  blocks <- c()
  blockNumber = 1 #We start from the first block
  netBracketValue = 0 #0, when reading a block name
  for(i in 1:nchar(input)) {
    currentCharacter = substr(input,i,i)
    
    #Did we enter a block?
    netBracketValue = netBracketValue + (currentCharacter == "{")
    
    #Write the character into its correct place.
    
    #Previous characters in the current block...
    previousCharacters <- ifelse(is.na(blocks[blockNumber]),"",blocks[blockNumber])
    #...are put before current character
    blocks[blockNumber] <- paste0(previousCharacters,currentCharacter)
    
    
    #Did we exit a block? If so, the netBracketValue becomes 0 here.
    netBracketValue = netBracketValue - (currentCharacter == "}")
    
    #Block number is updated, if needed.
    #Updated when we pass "}" character and the character ends a block i.e.
    #netBracketValue == 0
    blockNumber <- blockNumber + (netBracketValue == 0)*(currentCharacter == "}")
  }
  
  return(blocks)
}

While this works, the solution tends to be a bit slow when dealing with larger files. I was wondering whether there is a faster methods to accomplish this?

EDIT: The block contents cannot have closing } before opening {. If this was the case, then there would no way knowing if we exited a block for sure.

0

There are 0 answers