Expression matches on Windows but not on Mac

1.2k views Asked by At

The following regular expression

\s*([\w_]*)\s*(,\s*|=\s*(\d*)\s*,)\n

matches the following line (with appended newline)

  _FIRST_ELEMENT_      = 10000,

on Windows but not on Mac. The environment I'm using it in is the Python implementation of Cinema 4D (3D Software) which uses the CPython 2.6 interpreter.

Someone was able to do a quick test for me, because I do not own a Mac. But he does not have the time to do more tests for me.

On both Platforms (Win/Mac) the same code has been tested in the Scripting Window of Cinema 4D.

import re
enum_match = re.compile('\s*(\w*)\s*(,\s*|=\s*(\d*)\s*,)\n')
line = '  _FIRST_ELEMENT_      = 10000,\n'
match = enum_match.match(line)

if not match:
    print "Regex did not match."
else:
    print match.groups()

Output on Windows:

('_FIRST_ELEMENT_', '= 10000,', '10000')

Output on Mac:

Regex did not match.

The only thing I can think of is that the underscore (_) is not included in \w on Mac.

Do you know why the regular expression matches on Windows but not on Mac?

2

There are 2 answers

2
chepner On BEST ANSWER

Use this instead:

 enum_match = re.compile('\s*(\w*)\s*(,\s*|=\s*(\d*)\s*,)$')

Mac OS X and Windows use different characters to mark the end of a line in text files; it appears that your file uses the Windows variety. '\n', I believe, matches the character(s) uses by the operating system the code is running under, which may not be the characters use in the file. Using '$' instead of '\n' in your regular expression should work under either operating system (even if this explanation isn't quite correct).

5
stema On

I assume the newline character \n is the problem, since it is not the same on all systems.

You can do something more general like

\s*([\w_]*)\s*(,\s*|=\s*(\d*)\s*,)(?:\r\n?|\n)

this would match \r with an optional \n following, or only \n, I think this would cover all of the combinations that are used as newline sequences nowadays.