Saved this link https://www.otcmarkets.com/otcapi/company/financial-report/317949/content as a pdf to my computer as a pdf and used pdf_text to read it into R.
Trying to create a data frame out of the 5th entry. I split it at the line breaks \n. The problem is that the cells on the source table have varying numbers of lines.

report <- "C:/Users/projects/report.pdf" #the .pdf of the link above
txt <- pdf_text(report)
x<-str_split(txt[5],"\n")
s<-x[[1]]
excess_end <- str_which(s,"OTC Markets Group Inc")
out <- c(excess_end:length(s))
s<-s[-out]
s

The problem is that because the table has some cells with multiple entries, the line breaks don't match up to what should be each line of the proposed data frame. I would typically use a string_split_fixed at the spaces (s<-str_split_fixed(s,"\s\s+",n=10) but the columns become mis-aligned due to the rows that spill-out from the multi-row cells, e.g. [1] below (this is before trying the str_split_fixed). The first row below for example is overflow from the original source cell on the table.

head(s) [1] " Common Silverback Capital Debt"
[2] " 03/01/19 New Issue 523,500 0.0285 NO Unrestricted R.144" [3] " Shares *Alison Biddle Conversion"
[4] " Jefferson Street"
[5] " Common Debt"
[6] " 03/19/19 New Issue 584,795 0.0342 NO Capital LLC Unrestricted R.144"

Is there another way to import this data that will give me a character string I can work with?

I tried str_split_fixed at the spaces and I tried a variety of other splits, but I cannot get the columns aligned properly.

0

There are 0 answers