Transpose within Transpose Notepad++?

46 views Asked by At

I have a text file that looks like this (but 132k lines)

********
name : one
Place : city
Initial: none
********
name : two
Place : city2
Initial: none
********
name : three
Place : city3
Initial: none
Limits : some

I'm trying to move it into a more friendly format (excel/database records). Each 'record' is separated by the ********, the fields for 90% of the records are all the same, but some have additional fields, like the limits in the 3rd record.

I would like a csv, or similar output like:

name,place,initial,limit
one,city,none,n/a
two,city2,none,n/a
three,city3,none,some

Is python better suited for parsing and manipulating this?

1

There are 1 answers

1
AdrianHHH On BEST ANSWER

A Notepad++ regex replace of ([^*\r\n])\R([^*\r\n]) with \1,\2 will change the input example text to be:

********
name : one,Place : city,Initial: none
********
name : two,Place : city2,Initial: none
********
name : three,Place : city3,Initial: none,Limits : some

This can be followed by marking (use menu => Search => Mark...) with a regex of ^\*\*\*\*\*\*\*\*$ and finally removing the marked lines (use menu => Search => Bookmark => Remove Bookmarked Lines).

You may need to tidy up the very start and end of the text, including adding the line of column titles.

Variations:

Whitespace at the start or end of lines may lead to unwanted changes, so it might be best to remove it before replacing line-breaks with commas. Use menu => Edit => Blank Operations => Trim Leading and Trailing Space.

The Number of asterisks may be different on some lines. So perhaps change the marking regex to be ^\*\*\*\*\*\*\**$. Adjust the number of \* to match the minimum in the source text.

The Regular Expressions

(                 Start of capture group 1
  [^*\r\n]        A negated character class: Not the characters *, CR and LF
                  This captures the last character on the line
)                 End of capture group
\R                A line break
(                 Start of capture group 2
  [^*\r\n]        As above, captures the first character on the line
)

The replacement is \1,\2, meaning insert the two captured characters separated by a comma.

The marking regex of ^\*\*\*\*\*\*\*$ means start of line ^ then several \* meaning actual asterisks finally the $ means end-of-line. The variation of ^\*\*\*\*\*\*\**$ adds a * near the end, meaning zero-or-more occurrences of the last actual asterisk.