I have a FASTQ quality score which is presented as a series of ASCII characters. In this case (likely) ASCII character 64 to 126 represent the a score of 0 to 62 (presuming it is Illumina). This gives rise to underlying sequence :
feffefdfbefdfffcfdeTddaYddffbfcI``S_KKX_]]MR[D_TY[VTVXQ]`Q_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
How do I extract which is the number of the ASCII characters?
Thank you San
EDIT: This sequence denotes the quality of a biological sequence that is made up of bases (from base pairs in nucleic acids, meaning a character (ATGC)). A base quality is the phred-scaled base error probability which equals -10 log10 Pr{base is wrong}.
Well, as Marek said : you might find a function to convert Illumina quality scores in Bioconductor. You can ask at biostar.stackexchange.com.
Using base functions, you can use
charToRaw()
:Mind you, you'll have to escape the backslash, or you'll get into trouble. That depends on how you read in your data and so forth.