C62. C Text Files versus Binary Files


Text Files versus Binary Files
supports two kinds of files: text and binary. The bytes in a text file represent characters, making it possible for a human to examine the file or edit it. The source code for a C program is stored in a text tile, for example. In a binary file, on the other hand, bytes don't necessarily represent characters; groups of bytes might represent other types of data, such as integers and floating-point num­bers. An executable C program is stored in a binary file, as you'll quickly realize if you try to look at the contents of one.
Text files have two characteristics that binary files don't possess;
Text files are divided into lines. Each line in a text file normally ends with one or two special characters; the choice of characters depends on the operat­ing system. In Windows, the end-of-line marker is a carriage-return character ('\xOd') followed immediately by a line-feed character ('\xOa'). In UNIX and newer versions of the Macintosh operating system (Mac OS), the end-of-line marker is a single line-feed character. Older versions of Mac OS use a single carriage-return character.

Text files may contain a special "end-of-file" marker. Some operating sys­tems allow a special byte to be used as a marker at the end of a text file. In Windows, the marker is ' \xla (Ctrl-Z). There's no requirement that Ctrl-Z be present, but if it is, it marks the end of the file; any bytes after Ctrl-Z are to be ignored. The Ctrl-Z convention is a holdover from DOS. which in turn inherited it from CP/M, an early operating system for personal computers. Most other operating systems, including UNIX, have no special end-of-file character.

Binary files aren't divided into lines. In a binary file, there are no end-of-line or end-of-file markers; all bytes are treated equally.
When we write data to a file, we'll need to consider whether to store it in text form or in binary form. To see the difference, consider how we might store the number 32767 in a file. One option would be to write the number in text form as the characters 3, 2, 7, 6, and 7. If the character set is ASCII, we'd have the follow­ing five bytes:
   3                  2               7                  6              7
00110011
00110010
00110111
00110110
00110111

The other option is to store number in binary.which would take as few as two bytes.

(The bytes will be reversed on systems that store data in little-endian order.) As this example shows, storing numbers in binary can often save quite a bit of space.
When we're writing a program that reads from a file or writes to a file, we need to take into account whether it's a text file or a binary file. A program that dis­plays the contents of a file on the screen will probably assume it's a text tile. A file copying program, on the other hand, can't assume that the file to be copied is a text file. If it does, binary files containing an end-of-file character won't be copied completely. When we can't say for sure whether a file is text or binary, it's safer to assume that it's binary.

No comments: