C62. Text Files versus Binary Files

supports two kinds of files: text and binary. The bytes in a text file represent characters, making it possible for a human to examine the file or edit it. The source code for a C program is stored in a text tile, for example. In a binary file, on the other hand, bytes don't necessarily represent characters; groups of bytes might represent other types of data, such as integers and floating-point num­bers. An executable C program is stored in a binary file, as you'll quickly realize if you try to look at the contents of one.
Text files have two characteristics that binary files don't possess;
Text files are divided into lines. Each line in a text file normally ends with one or two special characters; the choice of characters depends on the operat­ing system. In Windows, the end-of-line marker is a carriage-return character ('\xOd') followed immediately by a line-feed character ('\xOa'). In UNIX and newer versions of the Macintosh operating system (Mac OS), the end-of-line marker is a single line-feed character. Older versions of Mac OS use a single carriage-return character.

Text files may contain a special "end-of-file" marker. Some operating sys­tems allow a special byte to be used as a marker at the end of a text file. In Windows, the marker is ' \xla (Ctrl-Z). There's no requirement that Ctrl-Z be present, but if it is, it marks the end of the file; any bytes after Ctrl-Z are to be ignored. The Ctrl-Z convention is a holdover from DOS. which in turn inherited it from CP/M, an early operating system for personal computers. Most other operating systems, including UNIX, have no special end-of-file character.

Binary files aren't divided into lines. In a binary file, there are no end-of-line or end-of-file markers; all bytes are treated equally.
When we write data to a file, we'll need to consider whether to store it in text form or in binary form. To see the difference, consider how we might store the number 32767 in a file. One option would be to write the number in text form as the characters 3, 2, 7, 6, and 7. If the character set is ASCII, we'd have the follow­ing five bytes:
   3                  2               7                  6              7

The other option is to store number in binary.which would take as few as two bytes.

(The bytes will be reversed on systems that store data in little-endian order.) As this example shows, storing numbers in binary can often save quite a bit of space.
When we're writing a program that reads from a file or writes to a file, we need to take into account whether it's a text file or a binary file. A program that dis­plays the contents of a file on the screen will probably assume it's a text tile. A file copying program, on the other hand, can't assume that the file to be copied is a text file. If it does, binary files containing an end-of-file character won't be copied completely. When we can't say for sure whether a file is text or binary, it's safer to assume that it's binary.

