the logic behind the diffrerence between fileInputStream and Scanner classes

98 views Asked by At

I'm trying to understand the difference between Scanner.nextByte() and FileInputStream.read(). I read similar topics, but I didn't find the answer of my question. A similar question is asked in the topic : Scanner vs FileInputStream

Let me say what I understand :

Say that a .txt file includes

1

Then, FileInputStream.read() will return 49

Scanner.nextByte() will return 1

If .txt file includes

a

FileInputStream.read() will return 97.

Scanner.nextByte() will throw a java.util.InputMismatchException.

In the answers which I gave the link, it said that:

FileInputStream.read() will evaluate the 1 as a byte, and return its value: 49. Scanner.nextByte() will read the 1 and try to evaluate it as an integer regular expression of radix 10, and give you: 1.

FileInputStream.read() will evaluate the a as a byte, and return its value: 97. Scanner.nextByte() will read the a and try to evaluate it as an integer regular expression of radix 10, and throw a java.util.InputMismatchException.

But I didn't understand what they mean actually. Can you explain these words in simple words with more clear examples? I looked at ASCII table, character 1 corresponds to 49. The reason of FileInputStream.read() return 49 is because of that?

I'm totaly confused. Please explain me in simple words.

1

There are 1 answers

5
JB Nizet On BEST ANSWER

Files contain bytes. FileInputStream reads these bytes. So if a file contains one byte whose value is 49, stream.read() will return 49. If the file contains two identical bytes 49, calling read() twice will return 49, then 49.

Characters like 'a', '1' or 'Z' can be stored in files. To be stored in files, they first have to be transformed into bytes, because that's what files contain. There are various ways (called "character encodings") to transform characters to bytes. Some of them (like ASCII, ISO-8859-1 or UTF-8) transform the character '1' into the byte 49.

Scanner reads characters from a file. So it transforms the bytes in the file to characters (using the character encoding, but in the other direction: from bytes to characters). Some sequences of characters form decimal numbers, like for example '123', '-5265', or '1'. Some don't, like 'abc'.

When you call nextByte() on a Scanner, you ask the scanner to read the next sequence of characters (until the next white space or until the end of the file if there is no whitespace), then to check if this sequence of characters represents a valid decimal number, and to check that this decimal number fits into a byte (i.e. be a number between -128 and 127). If it is, the sequence of characters is parsed as a decimal number, stored int a byte, and returned.

So if the file contains the byte 49 twice, the sequence of characters read and parsed by nextByte() would be '11', which would be transformed into the byte 11.