13

While researching on file I/O in C, I came across two functions: fgetc() and read().

//code1.c
#include <stdio.h>

int main(void)
{
  char ch;
  
  ch = fgetc(stdin);

  return 0;
}
//code2.c
#include <unistd.h>

int main(void)
{
  char ch;

  read(STDIN_FILENO, &ch, 1);

  return 0;
}

In both of the above programs, if I enter hello:

  • The first one, will store the input from keyboard (stdin) in ch and the program will just terminate. Meaning that ch will contain h and the remaining characters will just disappear.

  • In the second program, the input from keyboard will also be stored in ch. But the remaining characters (ello) will not disappear. Instead, they will be passed on to the terminal as a command after the program terminates.


I am not able to understand why is this happening? Is it something related to how inputs are buffered in C (and by computers in general)?

5
  • 5
    Note that fgetc returns an int, not a char Commented Feb 9 at 17:54
  • 3
    Read this answer: stackoverflow.com/a/20482594/4593267 Commented Feb 9 at 18:15
  • 1
    @John "if I enter hello:" --> Did you enter 5 keystrokes hello or 6 hello\n? <Enter> is a key too. Commented Feb 9 at 19:40
  • 2
    @chux Yes, I meant to include '\n' too Commented Feb 10 at 4:04
  • 1
    You could look at Canonical vs. non-canonical terminal input. Commented Feb 10 at 21:44

4 Answers 4

13

Yes, it is related to how inputs are buffered.

The standard I/O package functions for reading data (fgetc() et al) will wait until data is made available by the terminal driver and will then read the available data from the terminal (usually a line full — with your example, the characters 'h', 'e', 'l', 'l', 'o', '\n') and the fgetc() function will return the first, the 'h'. Consequently, the other characters are not available to other programs.

The system call read() will also wait for the terminal driver to make data available, but then reads only the first character, leaving the other characters available to other programs.

On POSIX-based systems, the fgetc() function typically uses the read() system call (indirectly) to get the data from the terminal, but it usually requests up to a buffer-full of data, which could be anywhere from 512 up to 8192 characters requested (or it could be bigger; it will usually be a power of two), but the read() call will return with what's available. That's usually much less than a buffer-full when the input is a terminal. The rules are somewhat different when the input is a disk file, pipe or socket.

Note that the read() system call does not add a null byte to the end of the data, so what it reads are not strings.

I've glossed over numerous details and caveats, seeking to keep my answer easy to understand while avoiding gross distortions of reality. There are ways to control the behaviour of terminals; I've described more or less what happens in the default case.

Sign up to request clarification or add additional context in comments.

2 Comments

You might expand a bit about the buffering: the default buffering is line based for both the C standard stream stdin when tied to a terminal and for read system calls from a the terminal itself in cooked mode... Buffering is controlled by setvbuf for FILE * and by tcgetattr/tcsetattr` for the terminal itself.
@chqrlie: Yes, there are lots of details that could be added. I've chosen to keep my answer simple — the extra details are not all that relevant to people wanting an answer to the question as posed. I think I've avoided grossly distorting the reality.
8

I'm going to start by saying what's the difference between read and fread.

About fread:

  • fread is a stdio library function.
  • fread works with "streams" (what it calls FILE *).
  • Streams are usually associated with a file descriptor, but not always.
  • fread buffers.
  • fread may read more than requested, storing the excess in a buffer.
  • fread may perform multiple system calls.
  • fread may block if less data than requested is available. It will return the amount of data requested unless EOF is encountered or an error occurs. (I don't know if this is guaranteed behaviour.)

About read:

  • read is a unix system call.
  • read works with "file descriptors" (OS file handles).
  • read doesn't buffer.
  • read will not read more than requested.
  • read only performs one system call.
  • read returns immediately if data is available to be returned, even if the amount of data is less than the amount requested. (I don't know if this is guaranteed behaviour.)

Where fgetc falls into this

fgetc is a function from the stdio library just like fread. As such, the following are equivalent:

The following are equivalent:

int rv = fgetc( stdin );
if ( rv == EOF ) {
   // Handle error or EOF.
} else {
   char ch = rv;
   // Do something with byte read.
}
char ch;
int rv = fread( stdin, &ch, 1 );
if ( rv == 0 ) {
   // Handle error or EOF.
} else {
   // Do something with byte read.
}

Because of the buffering performed by stdio functions, it's not wise to use both stdio and non-stdio functions with the same file descriptor.


About the difference in behaviour of your programs

Because fgetc and fread are buffering functions, they may read more than requested. This is why your program is absorbing ello\n. The excess is stored in the stream's buffer for future calls to fgetc and/or fread to return. None occur before the program exits, so the ello\n is lost.

Because read doesn't buffer, it doesn't read more than requested, and it doesn't consume the ello\n.

1 Comment

while read does not buffer, the underlying device does, which is why characters can be read later by another program (such as the shell) when the program terminates.
6

fgetc(stdin) uses buffered input, reading a full line into a buffer and returning one character at a time, while read(STDIN_FILENO, &ch, 1) is unbuffered, reading only one character and leaving the rest for future input processing.

Comments

5

fgetc is a C stdio library function that uses its input buffering for FILE *stdin.

You can use strace to see what system calls your process makes. read is an actual system call; the libc wrapper for it just passes its args on to the kernel. (On x86-64, by doing mov eax, 1 (__NR_read) ; syscall ; ret, and maybe). So your read program will just do that one system call (after libc startup), but fgetc has to make its own read call to get bytes from stdin.

On my x86-64 Arch GNU/Linux system:

$ strace -o fgetc.tr  ./a.out
hello<enter>
$ tail fgetc.tr
...
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
munmap(0x753744d52000, 278107)          = 0
fstat(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0x9), ...}) = 0
getrandom("\x62\xf6\x57\x14\x0d\xb2\x41\x75", 8, GRND_NONBLOCK) = 8
brk(NULL)                               = 0x59eadd786000
brk(0x59eadd7a7000)                     = 0x59eadd7a7000
read(0, "hello\n", 1024)                = 6
lseek(0, -5, SEEK_CUR)                  = -1 ESPIPE (Illegal seek)
exit_group(0)                           = ?

The last brk() calls (finding the current break then moving it) might have been before main started, or might have been allocating space on demand for the stdin buffer on first use. Even the fstat(0, ...) might have been part of the fgetc, since that's also querying what kind of file it is (in this case a character device file.)

Because stdin is buffered (by default), glibc uses a 1024-byte read system call.

fd 0 was connected to a terminal so the system-call blocked until I hit enter, because the terminal is in "cooked" mode1 and there weren't already any queued keystrokes / input. If it had been a regular file, the read system call wouldn't stop at newlines, only EOF or the requested size. (Or if the requested size and the file were huge, at some kernel-chosen size limit for a single read.)

TTYs have a buffer so you can type even when there isn't a process blocked on a read system call. (And in "cooked" mode there's even line editing, like backspace, before the end-of-line character (normally newline) or end-of-file character (normally ctrl-D) submits the line.)

If the TTY buffer isn't empty when your process exits, those characters will still be there for the shell to read from it, since your process and the shell both have their stdin connected to the same TTY.


Footnote 1: Cooked as oppose to raw mode like your shell uses, or like an editor like vim would use. You could use stty -a < /dev/pts/9 from another terminal while your process is running vs. while you're at the shell prompt to see the different settings. Where /dev/pts/9 is the tty for the xterm or SSH session or whatever you're using. One easy way to find out the right path is ls -l /proc/self/fd and look at the symlink names for where ls's stdin/out/err refer to.
And BTW, stty operates on its stdin, printing output if any on its stdout, that's why we redirect from the terminal we want to query or set.

The shell itself puts the terminal in "cooked" mode before starting a command, because that's the default environment for stuff like cat >> foo.txt which lets you type something with line-editing into a file, or programs that print a prompt and wait to read a multi-character response. So strace on your own program won't show ioctl system calls for that.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.