Remove text+ pattern from txt file

102 views Asked by At

I am looking for solution how to remove text specified in pattern.txt from another file acc.txt (or acc.html).

<table cellpadding="5" cellspacing="0" border="0" width="100%">
<tr>
<td style="border-bottom-color: #d0d0d0; border-bottom-style: solid; border-bottom-width: 1px; background-color: #eaeaea;"><!-- F1E896 -->
<font style="font-size: 13px;"><b>{.*(everything until meeting <blockquote>}
<blockquote>
{ .{1,5}? (any letters/space characters/tabs -size maximum 5)}
</blockquote>
</td>
</tr>
</table><br>
<br>

It should work to ignore those characters. I would prefer using prompt. I know that working on .html files is not that easy how it looks like, if we just save it as txt does it make a difference?

Edit: probably it would work for

<table {skip everything to first met}<blockquote>{max 5 letters}<blockquote>{skip everyhing to <br>
1

There are 1 answers

0
Martin Brandl On

Save the pattern in a file, e. g. "c:\pattern.txt":

(?<=<b>).*(?=<blockquote>)|(?<=<blockquote>).*(?=<\/blockquote>)

Load the pattern and the text file using the Get-Content cmdlet and replace it with an empty string:

$content = (Get-Content 'c:\acc.txt' -raw) 
$pattern = (Get-Content 'c:\pattern.txt' -raw)

[regex]::Replace($content, $pattern, '',`
     [System.Text.RegularExpressions.RegexOptions]::Multiline `
     -bor [System.Text.RegularExpressions.RegexOptions]::Singleline)

Now you can pipe the output to Out-File or Set-Content cmdlet.