Replace strings in in only part of a document

84 views Asked by At

I have about 1400 files of markdown that I am trying to clean up. As part of this, I need to capture strings and replace them in the file, but only after a certain section.

Here is the example file:

---
title: 'This is the post’s title'
author: foobar
date: 2007-12-04 12:41:01 -0800
layout: post
permalink: /2007/12/04/foo/
categories:
  - General
---


Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta ’ sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur.

’

I want to replace all of the ’ strings with ', but only after the header.

I can capture the header with this:

(---((.|\n)*?)---)

But I am having difficulty capturing the rest of the text after the header.

Any suggestions? I am using TextMate, but could also do this in the terminal (on Mac).

2

There are 2 answers

0
Casimir et Hippolyte On BEST ANSWER

In textmate:

search: ((?:---(?>[^-]++|-(?!--))*---|\G(?<!\A))(?>[^&]++|&(?!#8217;))*)&#8217;
replace: $1'

pattern details:

(                    # capture group 1: all possible content before &#8217;
    (?:              # non capturing group: possible "anchors"
        ---          # begining of the header: entry point
        (?>          # atomic group: possible content of the header
            [^-]++   # all that is not a -
          |          # OR
            -(?!--)  # a - not followed by --
        )*           # repeat the atomic group zero or more times
        ---          # end of the header
      |              # OR
        \G(?<!\A)    # contiguous to a precedent match (not at the start)
    )                # close the non capturing group
    (?>              # atomic group: all that is not &#8217;
        [^&]++       # all character except &
      |              # OR
        &(?!#8217;)  # & not followed by #8217;
    )*               # repeat the atomic group zero or more times
)                    # close the capturing group
&#8217;

The idea is to use the \G feature to allow only contiguous matches.

First match: The entry point is the header. Once the header is found (first possibility, in the non capturing group), the pattern match all that is not &#8217; (second atomic group) until &#8217;.

Other matches: \G forces the other matches to be contiguous to the precedent. The second match start at the first match end, the third at the second end, etc.

0
Barmar On

awk can do this by counting the header delimiter lines

awk -v quote="'" '/^---$/ { header++} { if (header >= 2) { gsub("&#8217;", quote); }}1' infile > outfile