How to correctly split a binary file for parallel decoding

203 views Asked by At

I have a binary file which am successfully decoding sequentially using asn1tools, problem is it's consuming some time and am trying to speed-up the process.

my approach was to split binary data using the known starting bytes of the records am looking for.

for example splitting by bf 4f 80 80 , issue is ... if the date was decoded sequentially we get x number for records, say 1000 records, eventhough the file might contain less than 1000 entry that matchs the pattern b4 4f 80 80, so if I split the data I always get less than 1000 records as I do in sequential decoding, noting that the file contains single type of data and am sure of the starting bytes

enter image description here

1

There are 1 answers

0
Kevin On

You do not specify what the encoding is, but the answer is that you cannot, in general, divide up ASN.1 data into pieces without parsing the ASN.1 data. That is certainly true for PER. For BER, IF only definite lengths have been encoded, you could parse the length and skip ahead to the next TLV without parsing the current one. However, unless you are using DER, definite lengths are not required to be used, and any indefinite length would require parsing through the data to find the end markers. Note that you cannot just merely scan ahead for the end markers, as that same sequence of bytes could appear somewhere as actual data (e.g. in an OCTET STRING).

From your description, where you believe BF 4F 80 80 signifies the start of a record but you believe there are more records than you see starting markers, one possible explanation is that you are using unaligned PER for the encoding and the start of a record is sometimes not byte-aligned.