Page 651 - Introduction to Programming with Java: A Problem Solving Approach
P. 651
15.6 Text File Data Format Versus Binary File Data Format 617 Notethatanewlinecharacterdoesnotputthedataonaseparate“line”inafile.Bob 2222andPaul5555
print on separate lines, but within a file the data is stored sequentially—one byte after another.
Binary Format
When writing primitive values to a binary file or an object file, Java uses each data type’s native storage
format. We don’t like the name “binary file” because it implies that binary files use binary numbers and text
files do not. Actually, all computer file information is “binary” in the sense that everything on a computer
is represented with 1’s and 0’s. We’d prefer that “binary files” be called “native storage files,” but alas, “bi-
nary” is the term everyone uses. So when we talk about a binary format, we mean the native storage format
recognized by the processor. For example, in a binary file, a char uses 16-bit Unicode, an int uses 32-bit
2’s complement, a double uses the standard 64-bit IEEE floating-point, and so on. See Chapter 11 for a 6 bits-1 bits
discussion of Unicode. 2’s complement means: if (binaryValue > 2 ) then value binaryValue 2 . IEEE stands for Institute of Electrical and Electronic Engineers.
Binary files are not line oriented. Binary file read methods do not recognize end-of-line characters as having any special function. These characters may be present—just like any other characters—but they do not affect the extent of what’s read, and methods that write to binary files never append end-of-line charac- ters automatically. Therefore, programs that access primitive data in binary files do not read or write whole lines. That is, they do not use nextLine and println methods to read a line or print a line. Instead, they use methods like readChar, writeChar, readInt, writeInt, readDouble, writeDouble, and so on to read and write individual primitive variable values.
For example, suppose you have this data:
Bob2147483647
Apago PDF Enhancer
Figure 15.9 shows how it’s stored in a binary or object file.
Figure 15.9 Raw form of binary format
Unicode characters and int number are shown in blue above 16-bit character and 32-bit number sequences.
Characters in a Java program use the 16-bit Unicode storage scheme. Therefore, a ‘B’ is usually stored in a binary file using the 16-bit Unicode storage scheme. In this scheme, the first byte has eight 0’s, and the second byte’s bit sequence matches the ASCII value for ‘B’ shown in Figure 15.8. The left eight bits for ‘B’, ‘o’, and ‘b’ are all zeros and that doesn’t provide any useful information. So why are these extra eight bits there? As described in Chapter 11, they’re there to handle Unicode characters that are not in the ASCII char- acter set. Those other characters need the extra eight bits on the left to hold their full code values.
How is the 2147483647 stored? If it were a text file, the digits would be stored as 10 separate ASCII characters, which would take 10 bytes. But with a binary file, 2147483647 is stored as a single int number. Since an int takes 32 bits, binary files use 32 bits to store int’s, and that takes only 4 bytes. The most significant bit indicates the sign of the number. A 0 in the most significant position says the number is positive. A 1 in the most significant position says the number is negative. The number 2147483647
6 Here’s an int example (where bits 32): If binaryValue (10000000000000000000000000000000) 231 2147483648, then value 2147483648 4294967296 2147483648. For a more extensive explanation of 2’s complement, see http://en.wikipedia .org/wiki/Two’s_complement.
B
o
b
2147483647
00000000
01000010
00000000
01101111
00000000
01100010
01111111
11111111
11111111
11111111