Extract the hole .xsl content from a .str file to an xsl/txt file

91 views Asked by At

I am doing some forensics learning, and got a .str file that has an entire .xsl file:

Content of the xsl file

I need to extract all that .xsl file from the .str file. I have used something like:

cat pc1.str | grep "<From>" > talk.txt

The problem is that I get almost all text, but not in a readable format. I think I am only getting all that has From inside.

Can you help me to get the text from <?xml version="1.0"?> to </log>?

Edit for clarity: I want to get all text, beginning from the xml until the /log.

The .str file is created by strings.

Here is the actual file I am using: https://www.dropbox.com/s/j02elywhkhpbqvg/pc1.str?dl=0

From line 20893696 to 20919817.

2

There are 2 answers

1
Sobrique On

I'd probably use perl:

#!/usr/bin/perl

use strict;
use warnings;

while ( <> ) {
     print if m,<?xml version, .. m,</log>,
}

This makes use of the 'range' operator that returns true if a file is between two markers. By default, it uses the record separators $/ which is newline. If your data has newlines it's easy, but you can iterate based on bytes instead. (Just bear in mind that you may have to worry about overlapping a boundary).

E.g.

$/ = \80; 

Will read 80 bytes at a time.

6
Etan Reisner On

If you want all the lines of your .str file from the line that contains <?xml version="1.0"?> to the first line that contains </log> then this should work.

awk '/<?xml version="1.0"?>/{p=1} p; /<\/log>/{exit}' pc1.str

Match the opening line and set p=1. If p is truth-y then print the current line. Match the line with the closing tag and exit.

If you want output without the radix field from the file then something like this should work.

cut -f 2 pc1.str | awk '/<?xml version="1.0"?>/{p=1} p; /<\/log>/{exit}'

This adds cut to trim off the first radix field (awk isn't as good at field ranges).

If you also want to ignore anything before the opening xml marker and after the closing </log> tag something like this should work (untested).

cut -f 2 pc1.str | awk '/<?xml version="1.0"?>/{p=1; $0=substr($0, 1, index($0, "<?xml version=\"1.0\"?>"))} {sub(/^.*<\/log>/, $0, "&")} p; /<\/log>/{exit}'

This uses substr and sub to remove parts of lines that aren't desired.