I have a file like this:
A\r
B\n
C\r\n.
(By \r I'm referring to CR, and \n is LF)
And this script:
import fileinput
for line in fileinput.input(mode='rU'):
print(line)
When I call python script.py myfile.txt
I get the correct output:
A
B
C
But when I call it like this: type myfile.txt|python script.py
, I get this:
B
C
You see? No more "A".
What is happening? I thought the mode='rU'
would take care of every newline problem...
EDIT: In Python 3 there is no such problem! Only in Python 2. But that does not solve the problem.
Thanks
EDIT:
Just for the sake of completeness. - It happens also in Linux.
- Python 3 handles every newline type (\n, \r or \r\n) transparently to the user. Doesn't matter which one your file got, you don't have to worry.
Python 2 needs the parameter mode='rU' passed to fileinput.input to allow it to handle every newline transparently. The thing is, in Python 2 this does not work correctly when piping content to it. Having tried to pipe a file like this:
CR: \r LF: \n CRLF: \r\n
Python 2 just treats these two lines as just one line and if you try to print every line with this code:
for i,line in enumerate(fileinput.input(mode='rU')):
print("Line {}: {}".format(i,line), end='')
It outputs this:
Line 0: CR:
LF:
Line 1: CRLF:
This doesn't happen in Python 3. There, these are 2 different lines. When passing this text as a file, it works ok though.
Piping data like this:
LF: \n
CR: \r
CRLF: \r\n
Gives me a similar result:
Line 0: LF:
Line 1: CR:
CRLF:
My conclusion is the following:
For some reason, when piping data, Python 2 looks for the first newline symbol it encounters and then on, it just considers that specific character as a newline. In this example Python 2 encounters \r as the first newline character and all the others (\n or \r\n) are just common characters.