unix Read & Write function

9.9k views Asked by At
/*
Low Level I/O - Read and Write
Chapter 8 - The C Programming Language - K&R
Header file in the original code is "syscalls.h"
Also BUFSIZ is supposed to be defined in the same header file   
*/

#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>

#define BUFSIZ 1

int main()  /* copy input to output */
{
    char buf[BUFSIZ];
    int n;

    while ((n = read(0, buf, BUFSIZ)) > 0)
        write(1, buf, n);

    return 0;
}

When I feed "∂∑∑®†¥¥¥˚π∆˜˜∫∫√ç tu 886661~EOF" as input the same is copied. How so many non ASCII characters are stored at the same time?

BUFSIZ is number of bytes to be transferred. How is BUFSIZ limiting byte transfer if for any value, anything can be copied from input to output?

How char buf[BUFSIZ] is storing non-ASCII characters ?

3

There are 3 answers

5
KAction On

You read by little chunks until EOF:

while ((n = read(0, buf, BUFSIZ)) > 0)

That's why. You literally, byte by byte, copy input to output. How convert it back to unicode, is problem of console, not your. I guess, It do not output anything until it can recognize data as symbol.

0
Maksim Skurydzin On

Since you are calling read in a loop until 'end of file' is reached on an error in encountered, you are getting precisely 1 character in buf after each call of read. After that that character is printed via write system call. It is guaranteed that read system call will read no more than it's specified in the last argument. If you pass 10, for example, in your case, read will go ahead and try to copy the data read beyond the array bounds.

As for the characters you have fed - these seem to be extended ASCII characters (codes 128-255), so no problem here.

0
Dmitry Poroh On

When you call read from standard input you are reading from the pipe, that bound to terminal or to another program. Of course there is a buffer(s) between writer (terminal or other program) and your program. When this buffer is underflow reader (your program) is blocking on read. When the buffer is overflow than writer (terminal etc) in blocking on write and vice versa.

When you write to the standard output you writing to the pipe, that bound to terminal or to another program.

So if your program is run by the shell from the terminal, than your program input and output is bound to the (pseudo)terminal. (Pseudo)terminal is program that can convert user's key presses to the characters and convert some encoded strings (ISO8859-1, UTF-8 etc) to the symbols on the screen.

  1. Characters are stored in the terminal program before you press the EOF of EOL. This is canonical mode of the terminal. After your press enter the bytes are wrote to the pipe bound to your program.
  2. BUFSIZ is number of bytes that you trying to read from the input per one operation. n return value is number of bytes that really have read when operation complete. So BUFSIZ is maximum bytes that can be read by your program from the pipe.
  3. char buf[BUFSIZ] is array of bytes (not the characters of some charset), so it can handle any values (including non-printable and even zero).