MindView Inc.
[ Viewing Hints ] [ Revision History ] [ Report an Error ]
[ 1st Edition ] [ Free Newsletter ]
[ Seminars ] [ Seminars on CD ROM ] [ Consulting ]

Thinking in Java, 2nd edition, Revision 9

©2000 by Bruce Eckel

[ Previous Chapter ] [ Short TOC ] [ Table of Contents ] [ Index ] [ Next Chapter ]

11: The Java
IO System

Creating a good input/output (IO) system is one of the more difficult tasks for the language designer.

This is evidenced by the number of different approaches. The challenge seems to be in covering all eventualities. Not only are there different kinds of IO that you want to communicate with (files, the console, network connections), but you need to talk to them in a wide variety of ways (sequential, random-access, binary, character, by lines, by words, etc.).

The Java library designers attacked the problem by creating lots of classes. In fact, there are so many classes for Java’s IO system that it can be intimidating at first (ironically, the Java IO design actually prevents an explosion of classes). There was also a significant change in the IO library after Java 1.0. Instead of simply replacing the old library with a new one, the designers at Sun extended the old library and added the new one alongside it. As a result you can sometimes end up mixing the old and new libraries and creating even more intimidating code.

This chapter will help you understand the variety of IO classes in the standard Java library and how to use them. The first portion of the chapter will introduce the “old” Java 1.0 IO stream library, since there is a significant amount of existing code that uses that library. The remainder of the chapter will introduce the new features in the Java 1.1 IO library. Note that when you compile some of the code in the first part of the chapter with a Java 2 compiler you can get a “deprecated feature” warning message at compile-time. The code still works; the compiler is just suggesting that you use certain new features that are described in the latter part of this chapter. It is valuable, however, to see the difference between the old and new way of doing things and that’s why it was left in—to increase your understanding (and to allow you to read code written for Java 1.0).

Input and output

The Java library classes for IO are divided by input and output, as you can see by looking at the online Java class hierarchy with your Web browser. By inheritance, all classes derived from InputStream have basic methods called read( ) for reading a single byte or array of bytes. Likewise, all classes derived from OutputStream have basic methods called write( ) for writing a single byte or array of bytes. However, you won’t generally use these methods; they exist so more sophisticated classes can use them as they provide a more useful interface. Thus, you’ll rarely create your stream object by using a single class, but instead will layer multiple objects together to provide your desired functionality. The fact that you create more than one object to create a single resulting stream is the primary reason that Java’s stream library is confusing.

It’s helpful to categorize the classes by their functionality. The library designers started by deciding that all classes that had anything to do with input would be inherited from InputStream and all classes that were associated with output would be inherited from OutputStream.

Types of InputStream

InputStream’s job is to represent classes that produce input from different sources. These sources can be (and each has an associated subclass of InputStream):

  1. An array of bytes
  2. A String object
  3. A file
  4. A “pipe,” which works like a physical pipe: you put things in one end and they come out the other
  5. A sequence of other streams, so you can collect them together into a single stream
  6. Other sources, such as an Internet connection. (This will be discussed in a later chapter.)

In addition, the FilterInputStream is also a type of InputStream, to provide a base class for "decorator" classes that attach attributes or useful interfaces to input streams. This is discussed later.

Table 10-1. Types of InputStream

Class

Function

Constructor Arguments

How to use it

ByteArray-InputStream

Allows a buffer in memory to be used as an InputStream

The buffer from which to extract the bytes.

As a source of data. Connect it to a FilterInputStream object to provide a useful interface.

StringBuffer-InputStream

Converts a String into an InputStream

A String. The underlying implementation actually uses a StringBuffer.

As a source of data. Connect it to a FilterInputStream object to provide a useful interface.

File-InputStream

For reading information from a file

A String representing the file name, or a File or FileDescriptor object.

As a source of data. Connect it to a FilterInputStream object to provide a useful interface.

Piped-InputStream

Produces the data that’s being written to the associated PipedOutput-Stream. Implements the “piping” concept.

PipedOutputStream

As a source of data in multithreading. Connect it to a FilterInputStream object to provide a useful interface.

Sequence-InputStream

Converts two or more InputStream objects into a single InputStream.

Two InputStream objects or an Enumeration for a container of InputStream objects.

As a source of data. Connect it to a FilterInputStream object to provide a useful interface.

Filter-InputStream

Abstract class which is an interface for decorators that provide useful functionality to the other InputStream classes. See Table 10-3.

See Table 10-3.

See Table 10-3.

Types of OutputStream

This category includes the classes that decide where your output will go: an array of bytes (no String, however; presumably you can create one using the array of bytes), a file, or a “pipe.”

In addition, the FilterOutputStream provides a base class for "decorator" classes that attach attributes or useful interfaces to output streams. This is discussed later.

Table 10-2. Types of OutputStream

Class

Function

Constructor Arguments

How to use it

ByteArray-OutputStream

Creates a buffer in memory. All the data that you send to the stream is placed in this buffer.

Optional initial size of the buffer.

To designate the destination of your data. Connect it to a FilterOutputStream object to provide a useful interface.

File-OutputStream

For sending information to a file.

A String representing the file name, or a File or FileDescriptor object.

To designate the destination of your data. Connect it to a FilterOutputStream object to provide a useful interface.

Piped-OutputStream

Any information you write to this automatically ends up as input for the associated PipedInput-Stream. Implements the “piping” concept.

PipedInputStream

To designate the destination of your data for multithreading. Connect it to a FilterOutputStream object to provide a useful interface.

Filter-OutputStream

Abstract class which is an interface for decorators that provide useful functionality to the other OutputStream classes. See Table 10-4.

See Table 10-4.

See Table 10-4.

Adding attributes
and useful interfaces

The use of layered objects to dynamically and transparently add responsibilities to individual objects is referred to as the decorator pattern. (Patterns[56] are the subject of Thinking in Patterns with Java, downloadable at www.BruceEckel.com.) The decorator pattern specifies that all objects that wrap around your initial object have the same interface, to make the use of the decorators transparent—you send the same message to an object whether it’s been decorated or not. This is the reason for the existence of the “filter” classes in the Java IO library: the abstract “filter” class is the base class for all the decorators. (A decorator must have the same interface as the object it decorates, but the decorator can also extend the interface, which occurs in several of the “filter” classes).

Decorators are often used when subclassing requires a large number of subclasses to support every possible combination needed—so many that subclassing becomes impractical. The Java IO library requires many different combinations of features which is why the decorator pattern is a good approach. There is a drawback to the decorator pattern, however. Decorators give you much more flexibility while you’re writing a program (since you can easily mix and match attributes), but they add complexity to your code. The reason that the Java IO library is awkward to use is that you must create many classes—the “core” IO type plus all the decorators—in order to get the single IO object that you want.

The classes that provide the decorator interface to control a particular InputStream or OutputStream are the FilterInputStream and FilterOutputStream—which don’t have very intuitive names. They are derived, respectively, from InputStream and OutputStream, and they are abstract classes, in theory to provide a common interface for all the different ways you want to talk to a stream. In fact, FilterInputStream and FilterOutputStream simply mimic their base classes, which is the key requirement of the decorator.

Reading from an InputStream
with FilterInputStream

The FilterInputStream classes accomplish two significantly different things. DataInputStream allows you to read different types of primitive data as well as String objects. (All the methods start with “read,” such as readByte( ), readFloat( ), etc.) This, along with its companion DataOutputStream, allows you to move primitive data from one place to another via a stream. These “places” are determined by the classes in Table 10-1. If you’re reading data in blocks and parsing it yourself, you won’t need DataInputStream, but in most other cases you will want to use it to automatically format the data you read.

The remaining classes modify the way an InputStream behaves internally: whether it’s buffered or unbuffered, if it keeps track of the lines it’s reading (allowing you to ask for line numbers or set the line number), and whether you can push back a single character. The last two classes look a lot like support for building a compiler (that is, they were added to support the construction of the Java compiler), so you probably won’t use them in general programming.

You’ll probably need to buffer your input almost every time, regardless of the IO device you’re connecting to, so it would have made more sense for the IO library to make a special case for unbuffered input rather than buffered input.

Table 10-3. Types of FilterInputStream

Class

Function

Constructor Arguments

How to use it

Data-InputStream

Used in concert with DataOutputStream, so you can read primitives (int, char, long, etc.) from a stream in a portable fashion.

InputStream

Contains a full interface to allow you to read primitive types.

Buffered-InputStream

Use this to prevent a physical read every time you want more data. You’re saying “Use a buffer.”

InputStream, with optional buffer size.

This doesn’t provide an interface per se, just a requirement that a buffer be used. Attach an interface object.

LineNumber-InputStream

Keeps track of line numbers in the input stream; you can call getLineNumber( ) and setLineNumber(int).

InputStream

This just adds line numbering, so you’ll probably attach an interface object.

Pushback-InputStream

Has a one byte push-back buffer so that you can push back the last character read.

InputStream

Generally used in the scanner for a compiler and probably included because the Java compiler needed it. You probably won’t use this.

Writing to an OutputStream
with FilterOutputStream

The complement to DataInputStream is DataOutputStream, which formats each of the primitive types and String objects onto a stream in such a way that any DataInputStream, on any machine, can read them. All the methods start with “write,” such as writeByte( ), writeFloat( ), etc.

If you want to do true formatted output, for example, to the console, use a PrintStream. This is the endpoint that allows you to print all of the primitive data types and String objects in a viewable format as opposed to DataOutputStream, whose goal is to put them on a stream in a way that DataInputStream can portably reconstruct them. The System.out static object is a PrintStream.

The two important methods in PrintStream are print( ) and println( ), which are overloaded to print all the various types. The difference between print( ) and println( ) is that the latter adds a newline when it’s done.

BufferedOutputStream is a modifier and tells the stream to use buffering so you don’t get a physical write every time you write to the stream. You’ll probably always want to use this with files, and possibly console IO.

Table 10-4. Types of FilterOutputStream

Class

Function

Constructor Arguments

How to use it

Data-OutputStream

Used in concert with DataInputStream so you can write primitives (int, char, long, etc.) to a stream in a portable fashion.

OutputStream

Contains full interface to allow you to write primitive types.

PrintStream

For producing formatted output. While DataOutputStream handles the storage of data, PrintStream handles display.

OutputStream, with optional boolean indicating that the buffer is flushed with every newline.

Should be the “final” wrapping for your OutputStream object. You’ll probably use this a lot.

Buffered-OutputStream

Use this to prevent a physical write every time you send a piece of data. You’re saying “Use a buffer.” You can call flush( ) to flush the buffer.

OutputStream, with optional buffer size.

This doesn’t provide an interface per se, just a requirement that a buffer is used. Attach an interface object.

Off by itself:
RandomAccessFile

RandomAccessFile is used for files containing records of known size so that you can move from one record to another using seek( ), then read or change the records. The records don’t have to be the same size; you just have to be able to determine how big they are and where they are placed in the file.

At first it’s a little bit hard to believe that RandomAccessFile is not part of the InputStream or OutputStream hierarchy. It has no association with those hierarchies other than that it happens to implement the DataInput and DataOutput interfaces (which are also implemented by DataInputStream and DataOutputStream). It doesn’t even use any of the functionality of the existing InputStream or OutputStream classes—it’s a completely separate class, written from scratch, with all of its own (mostly native) methods. The reason for this may be that RandomAccessFile has essentially different behavior than the other IO types, since you can move forward and backward within a file. In any event, it stands alone, as a direct descendant of Object.

Essentially, a RandomAccessFile works like a DataInputStream pasted together with a DataOutputStream and the methods getFilePointer( ) to find out where you are in the file, seek( ) to move to a new point in the file, and length( ) to determine the maximum size of the file. In addition, the constructors require a second argument (identical to fopen( ) in C) indicating whether you are just randomly reading (“r”) or reading and writing (“rw”). There’s no support for write-only files, which could suggest that RandomAccessFile might have worked well if it were inherited from DataInputStream.

What’s even more frustrating is that you could easily imagine wanting to seek within other types of streams, such as a ByteArrayInputStream, but the seeking methods are available only in RandomAccessFile, which works for files only. BufferedInputStream does allow you to mark( ) a position (whose value is held in a single internal variable) and reset( ) to that position, but this is limited and not too useful.

The File class

The File class has a deceiving name—you might think it refers to a file, but it doesn’t. It can represent either the name of a particular file or the names of a set of files in a directory. If it’s a set of files, you can ask for the set with the list( ) method, and this returns an array of String. It makes sense to return an array rather than one of the flexible container classes because the number of elements is fixed, and if you want a different directory listing you just create a different File object. In fact, “FilePath” would have been a better name. This section shows a complete example of the use of this class, including the associated FilenameFilter interface.

A directory lister

Suppose you’d like to see a directory listing. The File object can be listed in two ways. If you call list( ) with no arguments, you’ll get the full list that the File object contains. However, if you want a restricted list, for example, all of the files with an extension of .java, then you use a “directory filter,” which is a class that tells how to select the File objects for display.

Here’s the code for the example. Note that the result has been effortlessly sorted (alphabetically) using the java.utils.Array.sort( ) method and the AlphabeticComparator defined in Chapter 9:

//: c11:DirList.java
// Displays directory listing.
import java.io.*;
import java.util.*;
import com.bruceeckel.util.*;

public class DirList {
  public static void main(String[] args) {
    try {
      File path = new File(".");
      String[] list;
      if(args.length == 0)
        list = path.list();
      else 
        list = path.list(new DirFilter(args[0]));
      Arrays.sort(list,
        new AlphabeticComparator());
      for(int i = 0; i < list.length; i++)
        System.out.println(list[i]);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
}

class DirFilter implements FilenameFilter {
  String afn;
  DirFilter(String afn) { this.afn = afn; }
  public boolean accept(File dir, String name) {
    // Strip path information:
    String f = new File(name).getName();
    return f.indexOf(afn) != -1;
  }
} ///:~

The DirFilter class “implements” the interface FilenameFilter. (Interfaces were covered in Chapter 8.) It’s useful to see how simple the FilenameFilter interface is:

public interface FilenameFilter {
  boolean accept(File dir, String name);
}

It says that all this type of object does is provide a method called accept( ). The whole reason behind the creation of this class is to provide the accept( ) method to the list( ) method so that list( ) can call back accept( ) to determine which file names should be included in the list. Thus, this technique is often referred to as a callback or sometimes a functor (that is, DirFilter is a functor because its only job is to hold a method). Because list( ) takes a FilenameFilter object as its argument, it means that you can pass an object of any class that implements FilenameFilter to choose (even at run-time) how the list( ) method will behave. The purpose of a callback is to provide flexibility in the behavior of code.

DirFilter shows that just because an interface contains only a set of methods, you’re not restricted to writing only those methods. (You must at least provide definitions for all the methods in an interface, however.) In this case, the DirFilter constructor is also created.

The accept( ) method must accept a File object representing the directory that a particular file is found in, and a String containing the name of that file. You might choose to use or ignore either of these arguments, but you will probably at least use the file name. Remember that the list( ) method is calling accept( ) for each of the file names in the directory object to see which one should be included—this is indicated by the boolean result returned by accept( ).

To make sure that what you’re working with is only the name and contains no path information, all you have to do is take the String object and create a File object out of it, then call getName( ) which strips away all the path information (in a platform-independent way). Then accept( ) uses the String class indexOf( ) method to see if the search string afn appears anywhere in the name of the file. If afn is found within the string, the return value is the starting index of afn, but if it’s not found the return value is -1. Keep in mind that this is a simple string search and does not have regular expression “wildcard” matching such as “fo?.b?r*” which is much more difficult to implement.

The list( ) method returns an array. You can query this array for its length and then move through it selecting the array elements. This ability to easily pass an array in and out of a method is a tremendous improvement over the behavior of C and C++.

Anonymous inner classes

This example is ideal for rewriting using an anonymous inner class (described in Chapter 8). As a first cut, a method filter( ) is created that returns a reference to a FilenameFilter:

//: c11:DirList2.java
// Uses anonymous inner classes.
import java.io.*;
import java.util.*;
import com.bruceeckel.util.*;

public class DirList2 {
  public static FilenameFilter 
  filter(final String afn) {
    // Creation of anonymous inner class:
    return new FilenameFilter() {
      String fn = afn;
      public boolean accept(File dir, String n) {
        // Strip path information:
        String f = new File(n).getName();
        return f.indexOf(fn) != -1;
      }
    }; // End of anonymous inner class
  }
  public static void main(String[] args) {
    try {
      File path = new File(".");
      String[] list;
      if(args.length == 0)
        list = path.list();
      else 
        list = path.list(filter(args[0]));
      Arrays.sort(list,
        new AlphabeticComparator());
      for(int i = 0; i < list.length; i++)
        System.out.println(list[i]);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~

Note that the argument to filter( ) must be final. This is required by the anonymous inner class so that it can use an object from outside its scope.

This design is an improvement because the FilenameFilter class is now tightly bound to DirList2. However, you can take this approach one step further and define the anonymous inner class as an argument to list( ), in which case it’s even smaller:

//: c11:DirList3.java
// Building the anonymous inner class "in-place."
import java.io.*;
import java.util.*;
import com.bruceeckel.util.*;

public class DirList3 {
  public static void main(final String[] args) {
    try {
      File path = new File(".");
      String[] list;
      if(args.length == 0)
        list = path.list();
      else 
        list = path.list(
          new FilenameFilter() {
            public boolean 
            accept(File dir, String n) {
              String f = new File(n).getName();
              return f.indexOf(args[0]) != -1;
            }
          });
      Arrays.sort(list,
        new AlphabeticComparator());
      for(int i = 0; i < list.length; i++)
        System.out.println(list[i]);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~

The argument to main( ) is now final, since the anonymous inner class uses args[0] directly.

This shows you how anonymous inner classes allow the creation of quick-and-dirty classes to solve problems. Since everything in Java revolves around classes, this can be a useful coding technique. One benefit is that it keeps the code that solves a particular problem isolated together in one spot. On the other hand, it is not always as easy to read, so you must use it judiciously.

Checking for and creating directories

The File class is more than just a representation for an existing directory path, file, or group of files. You can also use a File object to create a new directory or an entire directory path if it doesn’t exist. You can also look at the characteristics of files (size, last modification date, read/write), see whether a File object represents a file or a directory, and delete a file. This program shows the remaining methods available with the File class:

//: c11:MakeDirectories.java
// Demonstrates the use of the File class to
// create directories and manipulate files.
import java.io.*;

public class MakeDirectories {
  private final static String usage =
    "Usage:MakeDirectories path1 ...\n" +
    "Creates each path\n" +
    "Usage:MakeDirectories -d path1 ...\n" +
    "Deletes each path\n" +
    "Usage:MakeDirectories -r path1 path2\n" +
    "Renames from path1 to path2\n";
  private static void usage() {
    System.err.println(usage);
    System.exit(1);
  }
  private static void fileData(File f) {
    System.out.println(
      "Absolute path: " + f.getAbsolutePath() +
      "\n Can read: " + f.canRead() +
      "\n Can write: " + f.canWrite() +
      "\n getName: " + f.getName() +
      "\n getParent: " + f.getParent() +
      "\n getPath: " + f.getPath() +
      "\n length: " + f.length() +
      "\n lastModified: " + f.lastModified());
    if(f.isFile())
      System.out.println("it's a file");
    else if(f.isDirectory())
      System.out.println("it's a directory");
  }
  public static void main(String[] args) {
    if(args.length < 1) usage();
    if(args[0].equals("-r")) {
      if(args.length != 3) usage();
      File 
        old = new File(args[1]),
        rname = new File(args[2]);
      old.renameTo(rname);
      fileData(old);
      fileData(rname);
      return; // Exit main
    }
    int count = 0;
    boolean del = false;
    if(args[0].equals("-d")) {
      count++;
      del = true;
    }
    for( ; count < args.length; count++) {
      File f = new File(args[count]);
      if(f.exists()) {
        System.out.println(f + " exists");
        if(del) {
          System.out.println("deleting..." + f);
          f.delete();
        }
      } 
      else { // Doesn't exist
        if(!del) {
          f.mkdirs();
          System.out.println("created " + f);
        }
      }
      fileData(f);
    }  
  }
} ///:~

In fileData( ) you can see the various file investigation methods put to use to display information about the file or directory path.

The first method that’s exercised by main( ) is renameTo( ), which allows you to rename (or move) a file to an entirely new path represented by the argument, which is another File object. This also works with directories of any length.

If you experiment with the above program, you’ll find that you can make a directory path of any complexity because mkdirs( ) will do all the work for you. In Java 1.0, the -d flag reports that the directory is deleted but it’s still there; in Java 1.1 the directory is actually deleted.

Typical uses of IO streams

Although there are a lot of IO stream classes in the library that can be combined in many different ways, there are just a few ways that you’ll probably end up using them. However, they require attention to get the correct combinations. The following rather long example shows the creation and use of typical IO configurations so you can use it as a reference when writing your own code. Note that each configuration begins with a commented number and title that corresponds to the heading for the appropriate explanation that follows in the text.

//: c11:IOStreamDemo.java
// Typical IO stream configurations.
import java.io.*;
import com.bruceeckel.tools.*;

public class IOStreamDemo {
  public static void main(String[] args) {
    try {
      // 1. Buffered input file
      DataInputStream in =
        new DataInputStream(
          new BufferedInputStream(
            new FileInputStream(args[0])));
      String s, s2 = new String();
      while((s = in.readLine())!= null)
        s2 += s + "\n";
      in.close();

      // 2. Input from memory
      StringBufferInputStream in2 =
          new StringBufferInputStream(s2);
      int c;
      while((c = in2.read()) != -1)
        System.out.print((char)c);

      // 3. Formatted memory input
      try {
        DataInputStream in3 =
          new DataInputStream(
            new StringBufferInputStream(s2));
        while(true)
          System.out.print((char)in3.readByte());
      } catch(EOFException e) {
        System.out.println(
          "End of stream encountered");
      }

      // 4. Line numbering & file output
      try {
        LineNumberInputStream li =
          new LineNumberInputStream(
            new StringBufferInputStream(s2));
        DataInputStream in4 =
          new DataInputStream(li);
        PrintStream out1 =
          new PrintStream(
            new BufferedOutputStream(
              new FileOutputStream(
                "IODemo.out")));
        while((s = in4.readLine()) != null )
          out1.println(
            "Line " + li.getLineNumber() + s);
        out1.close(); // finalize() not reliable!
      } catch(EOFException e) {
        System.out.println(
          "End of stream encountered");
      }

      // 5. Storing & recovering data
      try {
        DataOutputStream out2 =
          new DataOutputStream(
            new BufferedOutputStream(
              new FileOutputStream("Data.txt")));
        out2.writeBytes(
          "Here's the value of pi: \n");
        out2.writeDouble(3.14159);
        out2.close();
        DataInputStream in5 =
          new DataInputStream(
            new BufferedInputStream(
              new FileInputStream("Data.txt")));
        System.out.println(in5.readLine());
        System.out.println(in5.readDouble());
      } catch(EOFException e) {
        System.out.println(
          "End of stream encountered");
      }

      // 6. Reading/writing random access files
      RandomAccessFile rf =
        new RandomAccessFile("rtest.dat", "rw");
      for(int i = 0; i < 10; i++)
        rf.writeDouble(i*1.414);
      rf.close();

      rf =
        new RandomAccessFile("rtest.dat", "rw");
      rf.seek(5*8);
      rf.writeDouble(47.0001);
      rf.close();

      rf =
        new RandomAccessFile("rtest.dat", "r");
      for(int i = 0; i < 10; i++)
        System.out.println(
          "Value " + i + ": " +
          rf.readDouble());
      rf.close();

      // 7. File input shorthand
      InFile in6 = new InFile(args[0]);
      System.out.println(
        "First line in file: " +
        in6.readLine());
        in6.close();

      // 8. Formatted file output shorthand
      PrintFile out3 = new PrintFile("Data2.txt");
      out3.print("Test of PrintFile");
      out3.close();

      // 9. Data file output shorthand
      OutFile out4 = new OutFile("Data3.txt");
      out4.writeBytes("Test of outDataFile\n\r");
      out4.writeChars("Test of outDataFile\n\r");
      out4.close();

    } catch(FileNotFoundException e) {
      System.out.println(
        "File Not Found:" + args[0]);
    } catch(IOException e) {
      System.out.println("IO Exception");
    }
  }
} ///:~


Input streams

Of course, one common thing you’ll want to do is print formatted output to the console, but that’s already been simplified in the package com.bruceeckel.tools created in Chapter 5.

Parts 1 through 4 demonstrate the creation and use of input streams (although part 4 also shows the simple use of an output stream as a testing tool).

1. Buffered input file

To open a file for input, you use a FileInputStream with a String or a File object as the file name. For speed, you’ll want that file to be buffered so you give the resulting reference to the constructor for a BufferedInputStream. To read input in a formatted fashion, you give that resulting reference to the constructor for a DataInputStream, which is your final object and the interface you read from.

In this example, only the readLine( ) method is used, but of course any of the DataInputStream methods are available. When you reach the end of the file, readLine( ) returns null so that is used to break out of the while loop.

The String s2 is used to accumulate the entire contents of the file (including newlines that must be added since readLine( ) strips them off). s2 is then used in the later portions of this program. Finally, close( ) is called to close the file. Technically, close( ) will be called when finalize( ) is run, and this is supposed to happen (whether or not garbage collection occurs) as the program exits. However, Java 1.0 has a rather important bug, so this doesn’t happen. In Java 1.1 you must explicitly call System.runFinalizersOnExit(true) to guarantee that finalize( ) will be called for every object in the system. The safest approach is to explicitly call close( ) for files.

2. Input from memory

This piece takes the String s2 that now contains the entire contents of the file and uses it to create a StringBufferInputStream. (A String, not a StringBuffer, is required as the constructor argument.) Then read( ) is used to read each character one at a time and send it out to the console. Note that read( ) returns the next byte as an int and thus it must be cast to a char to print properly.

3. Formatted memory input

The interface for StringBufferInputStream is limited, so you usually enhance it by wrapping it inside a DataInputStream. However, if you choose to read the characters out a byte at a time using readByte( ), any value is valid so the return value cannot be used to detect the end of input. Instead, you can use the available( ) method to find out how many more characters are available. Here’s an example that shows how to read a file one byte at a time:

//: c11:TestEOF.java
// Testing for the end of file 
// while reading a byte at a time.
import java.io.*;

public class TestEOF {
  public static void main(String[] args) {
    try {
      DataInputStream in = 
        new DataInputStream(
         new BufferedInputStream(
          new FileInputStream("TestEof.java")));
      while(in.available() != 0)
        System.out.print((char)in.readByte());
    } catch (IOException e) {
      System.err.println("IOException");
    }
  }
} ///:~

Note that available( ) works differently depending on what sort of medium you’re reading from—it’s literally “the number of bytes that can be read without blocking.” With a file this means the whole file, but with a different kind of stream this might not be true, so use it thoughtfully.

You could also detect the end of input in cases like these by catching an exception. However, the use of exceptions for control flow is considered a misuse of that feature.

4. Line numbering and file output

This example shows the use of the LineNumberInputStream to keep track of the input line numbers. Here, you cannot simply gang all the constructors together, since you have to keep a reference to the LineNumberInputStream. (Note that this is not an inheritance situation, so you cannot simply cast in4 to a LineNumberInputStream.) Thus, li holds the reference to the LineNumberInputStream, which is then used to create a DataInputStream for easy reading.

This example also shows how to write formatted data to a file. First, a FileOutputStream is created to connect to the file. For efficiency, this is made a BufferedOutputStream, which is what you’ll virtually always want to do, but you’re forced to do it explicitly. Then for the formatting it’s turned into a PrintStream. The data file created this way is readable as an ordinary text file.

One of the methods that indicates when a DataInputStream is exhausted is readLine( ), which returns null when there are no more strings to read. Each line is printed to the file along with its line number, which is acquired through li.

You’ll see an explicit close( ) for out1, which would make sense if the program were to turn around and read the same file again. However, this program ends without ever looking at the file IODemo.out. As mentioned before, if you don’t call close( ) for all your output files, you might discover that the buffers don’t get flushed so they’re incomplete.

Output streams

The two primary kinds of output streams are separated by the way they write data: one writes it for human consumption, and the other writes it to be re-acquired by a DataInputStream. The RandomAccessFile stands alone, although its data format is compatible with the DataInputStream and DataOutputStream.

5. Storing and recovering data

A PrintStream formats data so it’s readable by a human. To output data so that it can be recovered by another stream, you use a DataOutputStream to write the data and a DataInputStream to recover the data. Of course, these streams could be anything, but here a file is used, buffered for both reading and writing.

Note that the character string is written using writeBytes( ) and not writeChars( ). If you use the latter, you’ll be writing the 16-bit Unicode characters. Since there is no complementary “readChars” method in DataInputStream, you’re stuck pulling these characters off one at a time with readChar( ). So for ASCII, it’s easier to write the characters as bytes followed by a newline; then use readLine( ) to read back the bytes as a regular ASCII line.

The writeDouble( ) stores the double number to the stream and the complementary readDouble( ) recovers it. But for any of the reading methods to work correctly, you must know the exact placement of the data item in the stream, since it would be equally possible to read the stored double as a simple sequence of bytes, or as a char, etc. So you must either have a fixed format for the data in the file or extra information must be stored in the file that you parse to determine where the data is located.

6. Reading and writing random access files

As previously noted, the RandomAccessFile is almost totally isolated from the rest of the IO hierarchy, save for the fact that it implements the DataInput and DataOutput interfaces. So you cannot combine it with any of the aspects of the InputStream and OutputStream subclasses. Even though it might make sense to treat a ByteArrayInputStream as a random access element, you can use RandomAccessFile to only open a file. You must assume a RandomAccessFile is properly buffered since you cannot add that.

The one option you have is in the second constructor argument: you can open a RandomAccessFile to read (“r”) or read and write (“rw”).

Using a RandomAccessFile is like using a combined DataInputStream and DataOutputStream (because it implements the equivalent interfaces). In addition, you can see that seek( ) is used to move about in the file and change one of the values.

Shorthand for file manipulation

Since there are certain canonical forms that you’ll be using regularly with files, you may wonder why you have to do all of that typing—this is one of the drawbacks of the decorator pattern. This portion shows the creation and use of shorthand versions of typical file reading and writing configurations. These shorthands are placed in the package com.bruceeckel.tools that was begun in Chapter 5 (See page 249). To add each class to the library, simply place it in the appropriate directory and add the package statement.

7. File input shorthand

The creation of an object that reads a file from a buffered DataInputStream can be encapsulated into a class called InFile:

//: com:bruceeckel:tools:InFile.java
// Shorthand class for opening an input file.
package com.bruceeckel.tools;
import java.io.*;

public class InFile extends DataInputStream {
  public InFile(String filename)
    throws FileNotFoundException {
    super(
      new BufferedInputStream(
        new FileInputStream(filename)));
  }
  public InFile(File file)
    throws FileNotFoundException {
    this(file.getPath());
  }
} ///:~

Both the String versions of the constructor and the File versions are included, to parallel the creation of a FileInputStream.

Now you can reduce your chances of repetitive stress syndrome while creating files, as seen in the example.

8. Formatted file output shorthand

The same kind of approach can be taken to create a PrintStream that writes to a buffered file. Here’s the extension to com.bruceeckel.tools:

//: com:bruceeckel:tools:PrintFile.java
// Shorthand class for opening an output file
// for human-readable output.
package com.bruceeckel.tools;
import java.io.*;

public class PrintFile extends PrintStream {
  public PrintFile(String filename)
    throws IOException {
    super(
      new BufferedOutputStream(
        new FileOutputStream(filename)));
  }
  public PrintFile(File file)
    throws IOException {
    this(file.getPath());
  }
} ///:~

Note that it is not possible for a constructor to catch an exception that’s thrown by a base-class constructor.

9. Data file output shorthand

Finally, the same kind of shorthand can create a buffered output file for data storage (as opposed to human-readable storage):

//: com:bruceeckel:tools:OutFile.java
// Shorthand class for opening an output file
// for data storage.
package com.bruceeckel.tools;
import java.io.*;

public class OutFile extends DataOutputStream {
  public OutFile(String filename)
    throws IOException {
    super(
      new BufferedOutputStream(
        new FileOutputStream(filename)));
  }
  public OutFile(File file)
    throws IOException {
    this(file.getPath());
  }
} ///:~

It is curious (and unfortunate) that the Java library designers didn’t think to provide these conveniences as part of their standard.

Reading from standard input

Following the approach pioneered in Unix of “standard input,” “standard output,” and “standard error output,” Java has System.in, System.out, and System.err. Throughout this book you’ve seen how to write to standard output using System.out, which is already prewrapped as a PrintStream object. System.err is likewise a PrintStream, but System.in is a raw InputStream, with no wrapping. This means that while you can use System.out and System.err right away, System.in must be wrapped before you can read from it.

Typically, you’ll want to read input a line at a time using readLine( ), so you’ll want to wrap System.in in a DataInputStream. This is the “old” Java 1.0 way to do line input. A bit later in the chapter you’ll see the Java 1.1 solution. Here’s an example that simply echoes each line that you type in:

//: c11:Echo.java
// How to read from standard input.
import java.io.*;

public class Echo {
  public static void main(String[] args) {
    DataInputStream in =
      new DataInputStream(
        new BufferedInputStream(System.in));
    String s;
    try {
      while((s = in.readLine()).length() != 0)
        System.out.println(s);
      // An empty line terminates the program
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
} ///:~

The reason for the try block is that readLine( ) can throw an IOException. Note that System.in should also be buffered, as with most streams.

It’s a bit inconvenient that you’re forced to wrap System.in in a DataInputStream in each program, but perhaps it was designed this way to allow maximum flexibility.

Piped streams

The PipedInputStream and PipedOutputStream have been mentioned only briefly in this chapter. This is not to suggest that they aren’t useful, but their value is not apparent until you begin to understand multithreading, since the piped streams are used to communicate between threads. This is covered along with an example in Chapter 14.

StreamTokenizer

Although StreamTokenizer is not derived from InputStream or OutputStream, it works only with InputStream objects, so it rightfully belongs in the IO portion of the library.

The StreamTokenizer class is used to break any InputStream into a sequence of “tokens,” which are bits of text delimited by whatever you choose. For example, your tokens could be words, and then they would be delimited by white space and punctuation.

Consider a program to count the occurrence of words in a text file:

//: c11:SortedWordCount.java
// Counts words from a file, outputs
// results in sorted form.
import java.io.*;
import java.util.*;

class Counter {
  private int i = 1;
  int read() { return i; }
  void increment() { i++; }
}

public class SortedWordCount {
  private FileReader file;
  private StreamTokenizer st;
  // A TreeMap keeps keys in sorted order:
  private TreeMap counts = new TreeMap();
  SortedWordCount(String filename)
    throws FileNotFoundException {
    try {
      file = new FileReader(filename);
      st = new StreamTokenizer(
        new BufferedReader(file));
      st.ordinaryChar('.');
      st.ordinaryChar('-');
    } catch(FileNotFoundException e) {
      System.out.println(
        "Could not open " + filename);
      throw e;
    }
  }
  void cleanup() {
    try {
      file.close();
    } catch(IOException e) {
      System.out.println(
        "file.close() unsuccessful");
    }
  }
  void countWords() {
    try {
      while(st.nextToken() !=
        StreamTokenizer.TT_EOF) {
        String s;
        switch(st.ttype) {
          case StreamTokenizer.TT_EOL:
            s = new String("EOL");
            break;
          case StreamTokenizer.TT_NUMBER:
            s = Double.toString(st.nval);
            break;
          case StreamTokenizer.TT_WORD:
            s = st.sval; // Already a String
            break;
          default: // single character in ttype
            s = String.valueOf((char)st.ttype);
        }
        if(counts.containsKey(s))
          ((Counter)counts.get(s)).increment();
        else
          counts.put(s, new Counter());
      }
    } catch(IOException e) {
      System.out.println(
        "st.nextToken() unsuccessful");
    }
  }
  Collection values() {
    return counts.values();
  }
  Set keySet() { return counts.keySet(); }
  Counter getCounter(String s) {
    return (Counter)counts.get(s);
  }
  public static void main(String[] args) {
    try {
      SortedWordCount wc =
        new SortedWordCount(args[0]);
      wc.countWords();
      Iterator keys = wc.keySet().iterator();
      while(keys.hasNext()) {
        String key = (String)keys.next();
        System.out.println(key + ": "
                 + wc.getCounter(key).read());
      }
      wc.cleanup();
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~

It makes sense to present these in a sorted form, which is easy enough to do with by storing the data in a TreeMap, which automatically organizes its keys in sorted order (see Chapter 9). When you get a set of keys using keySet( ), they will also be in sorted order.

To open the file, a FileInputStream is used, and to turn the file into words a StreamTokenizer is created from the FileInputStream. In StreamTokenizer, there is a default list of separators, and you can add more with a set of methods. Here, ordinaryChar( ) is used to say “This character has no significance that I’m interested in,” so the parser doesn’t include it as part of any of the words that it creates. For example, saying st.ordinaryChar('.') means that periods will not be included as parts of the words that are parsed. You can find more information in the online documentation that comes with Java.

In countWords( ), the tokens are pulled one at a time from the stream, and the ttype information is used to determine what to do with each token, since a token can be an end-of-line, a number, a string, or a single character.

Once a token is found, the TreeMap counts is queried to see if it already contains the token as a key. If it does, the corresponding Counter object is incremented to indicate that another instance of this word has been found. If not, a new Counter is created—since the Counter constructor initializes its value to one, this also acts to count the word.

SortedWordCount is not a type of TreeMap, so it wasn’t inherited. It performs a specific type of functionality, so even though the keys( ) and values( ) methods must be re-exposed, that still doesn’t mean that inheritance should be used since a number of TreeMap methods are inappropriate here. In addition, other methods like getCounter( ), which get the Counter for a particular String, and sortedKeys( ), which produces an Iterator, finish the change in the shape of SortedWordCount’s interface.

In main( ) you can see the use of a SortedWordCount to open and count the words in a file—it just takes two lines of code. Then an Iterator to a sorted list of keys (words) is extracted, and this is used to pull out each key and associated Count. Note that the call to cleanup( ) is necessary to ensure that the file is closed.

StringTokenizer

Although it isn’t part of the IO library, the StringTokenizer has sufficiently similar functionality to StreamTokenizer that it will be described here.

The StringTokenizer returns the tokens within a string one at a time. These tokens are consecutive characters delimited by tabs, spaces, and newlines. Thus, the tokens of the string “Where is my cat?” are “Where”, “is”, “my”, and “cat?” Like the StreamTokenizer, you can tell the StringTokenizer to break up the input in any way that you want, but with StringTokenizer you do this by passing a second argument to the constructor, which is a String of the delimiters you wish to use. In general, if you need more sophistication, use a StreamTokenizer.

You ask a StringTokenizer object for the next token in the string using the nextToken( ) method, which either returns the token or an empty string to indicate that no tokens remain.

As an example, the following program performs a limited analysis of a sentence, looking for key phrase sequences to indicate whether happiness or sadness is implied.

//: c11:AnalyzeSentence.java
// Look for particular sequences
// within sentences.
import java.util.*;

public class AnalyzeSentence {
  public static void main(String[] args) {
    analyze("I am happy about this");
    analyze("I am not happy about this");
    analyze("I am not! I am happy");
    analyze("I am sad about this");
    analyze("I am not sad about this");
    analyze("I am not! I am sad");
    analyze("Are you happy about this?");
    analyze("Are you sad about this?");
    analyze("It's you! I am happy");
    analyze("It's you! I am sad");
  }
  static StringTokenizer st;
  static void analyze(String s) {
    prt("\nnew sentence >> " + s);
    boolean sad = false;
    st = new StringTokenizer(s);
    while (st.hasMoreTokens()) {
      String token = next();
      // Look until you find one of the
      // two starting tokens:
      if(!token.equals("I") &&
         !token.equals("Are"))
        continue; // Top of while loop
      if(token.equals("I")) {
        String tk2 = next();
        if(!tk2.equals("am")) // Must be after I
          break; // Out of while loop
        else {
          String tk3 = next();
          if(tk3.equals("sad")) {
            sad = true;
            break; // Out of while loop
          }
          if (tk3.equals("not")) {
            String tk4 = next();
            if(tk4.equals("sad"))
              break; // Leave sad false
            if(tk4.equals("happy")) {
              sad = true;
              break;
            }
          }
        }
      }
      if(token.equals("Are")) {
        String tk2 = next();
        if(!tk2.equals("you"))
          break; // Must be after Are
        String tk3 = next();
        if(tk3.equals("sad"))
          sad = true;
        break; // Out of while loop
      }
    }
    if(sad) prt("Sad detected");
  }
  static String next() {
    if(st.hasMoreTokens()) {
      String s = st.nextToken();
      prt(s);
      return s;
    } 
    else
      return "";
  }
  static void prt(String s) {
    System.out.println(s);
  }
} ///:~

For each string being analyzed, a while loop is entered and tokens are pulled off the string. Notice the first if statement, which says to continue (go back to the beginning of the loop and start again) if the token is neither an “I” nor an “Are.” This means that it will get tokens until an “I” or an “Are” is found. You might think to use the == instead of the equals( ) method, but that won’t work correctly, since == compares reference values while equals( ) compares contents.

The logic of the rest of the analyze( ) method is that the pattern that’s being searched for is “I am sad,” “I am not happy,” or “Are you sad?” Without the break statement, the code for this would be even messier than it is. You should be aware that a typical parser (this is a primitive example of one) normally has a table of these tokens and a piece of code that moves through the states in the table as new tokens are read.

You should think of the StringTokenizer only as shorthand for a simple and specific kind of StreamTokenizer. However, if you have a String that you want to tokenize and StringTokenizer is too limited, all you have to do is turn it into a stream with StringBufferInputStream and then use that to create a much more powerful StreamTokenizer.

Java 1.1 IO streams

At this point you might be scratching your head, wondering if there is another design for IO streams that could require more typing. Could someone have come up with an odder design?” Prepare yourself: Java 1.1 makes some significant modifications to the IO stream library. When you see the Reader and Writer classes your first thought (like mine) might be that these were meant to replace the InputStream and OutputStream classes. But that’s not the case. Although some aspects of the original streams library are deprecated (if you use them you will receive a warning from the compiler), the old streams have been left in for backward compatibility and:

  1. New classes have been put into the old hierarchy, so it’s obvious that Sun is not abandoning the old streams.
  2. There are times when you’re supposed to use classes in the old hierarchy in combination with classes in the new hierarchy and to accomplish this there are “bridge” classes: InputStreamReader converts an InputStream to a Reader and OutputStreamWriter converts an OutputStream to a Writer.

As a result there are situations in which you have more layers of wrapping with the new IO stream library than with the old. Again, this is a drawback of the decorator pattern—the price you pay for added flexibility.

The most important reason for adding the Reader and Writer hierarchies in Java 1.1 is for internationalization. The old IO stream hierarchy supports only 8-bit byte streams and doesn’t handle the 16-bit Unicode characters well. Since Unicode is used for internationalization (and Java’s native char is 16-bit Unicode), the Reader and Writer hierarchies were added to support Unicode in all IO operations. In addition, the new libraries are designed for faster operations than the old.

As is the practice in this book, I will attempt to provide an overview of the classes but assume that you will use online documentation to determine all the details, such as the exhaustive list of methods.

Sources and sinks of data

Almost all of the Java 1.0 IO stream classes have corresponding Java 1.1 classes to provide native Unicode manipulation. It would be easiest to say “Always use the new classes, never use the old ones,” but things are not that simple. Sometimes you are forced into using the Java 1.0 IO stream classes because of the library design; in particular, the java.util.zip libraries are new additions to the old stream library and they rely on old stream components. So the most sensible approach to take is to try to use the Reader and Writer classes whenever you can, and you’ll discover the situations when you have to drop back into the old libraries because your code won’t compile.

Here is a table that shows the correspondence between the sources and sinks of information (that is, where the data physically comes from or goes to) in the old and new libraries.

Sources & Sinks:
Java 1.0 class

Corresponding Java 1.1 class

InputStream

Reader
converter: InputStreamReader

OutputStream

Writer
converter: OutputStreamWriter

FileInputStream

FileReader

FileOutputStream

FileWriter

StringBufferInputStream

StringReader

(no corresponding class)

StringWriter

ByteArrayInputStream

CharArrayReader

ByteArrayOutputStream

CharArrayWriter

PipedInputStream

PipedReader

PipedOutputStream

PipedWriter

In general, you’ll find that the interfaces in the old library components and the new ones are similar if not identical.

Modifying stream behavior

In Java 1.0, streams were adapted for particular needs using “decorator” subclasses of FilterInputStream and FilterOutputStream. Java 1.1 IO streams continues the use of this idea, but the model of deriving all of the decorators from the same “filter” base class is not followed. This can make it a bit confusing if you’re trying to understand it by looking at the class hierarchy.

In the following table, the correspondence is a rougher approximation than in the previous table. The difference is because of the class organization: while BufferedOutputStream is a subclass of FilterOutputStream, BufferedWriter is not a subclass of FilterWriter (which, even though it is abstract, has no subclasses and so appears to have been put in either as a placeholder or simply so you wouldn’t wonder where it was). However, the interfaces to the classes are quite a close match and it’s apparent that you’re supposed to use the new versions instead of the old whenever possible (that is, except in cases where you’re forced to produce a Stream instead of a Reader or Writer).

Filters:
Java 1.0 class

Corresponding Java 1.1 class

FilterInputStream

FilterReader

FilterOutputStream

FilterWriter (abstract class with no subclasses)

BufferedInputStream

BufferedReader
(also has readLine( ))

BufferedOutputStream

BufferedWriter

DataInputStream

use DataInputStream
(Except when you need to use readLine( ), when you should use a BufferedReader)

PrintStream

PrintWriter

LineNumberInputStream

LineNumberReader

StreamTokenizer

StreamTokenizer
(use constructor that takes a Reader instead)

PushBackInputStream

PushBackReader

There’s one direction that’s quite clear: Whenever you want to use readLine( ), you shouldn’t do it with a DataInputStream any more (this is met with a deprecation message at compile-time), but instead use a BufferedReader. Other than this, DataInputStream is still a “preferred” member of the Java 1.1 IO library.

To make the transition to using a PrintWriter easier, it has constructors that take any OutputStream object. However, PrintWriter has no more support for formatting than PrintStream does; the interfaces are virtually the same.

Unchanged Classes

Apparently, the Java library designers felt that they got some of the classes right the first time so there were no changes to these and you can go on using them as they are:

Java 1.0 classes without corresponding Java 1.1 classes

DataOutputStream

File

RandomAccessFile

SequenceInputStream

The DataOutputStream, in particular, is used without change, so for storing and retrieving data in a transportable format you’re forced to stay in the InputStream and OutputStream hierarchies.

An example

To see the effect of the new classes, let’s look at the appropriate portion of the IOStreamDemo.java example modified to use the Reader and Writer classes:

//: c11:NewIODemo.java
// Java 1.1 IO typical usage.
import java.io.*;

public class NewIODemo {
  public static void main(String[] args) {
    try {
      // 1. Reading input by lines:
      BufferedReader in =
        new BufferedReader(
          new FileReader(args[0]));
      String s, s2 = new String();
      while((s = in.readLine())!= null)
        s2 += s + "\n";
      in.close();

      // 1b. Reading standard input:
      BufferedReader stdin =
        new BufferedReader(
          new InputStreamReader(System.in));      
      System.out.print("Enter a line:");
      System.out.println(stdin.readLine());

      // 2. Input from memory
      StringReader in2 = new StringReader(s2);
      int c;
      while((c = in2.read()) != -1)
        System.out.print((char)c);

      // 3. Formatted memory input
      try {
        DataInputStream in3 =
          new DataInputStream(
            // Oops: must use deprecated class:
            new StringBufferInputStream(s2));
        while(true)
          System.out.print((char)in3.readByte());
      } catch(EOFException e) {
        System.out.println("End of stream");
      }

      // 4. Line numbering & file output
      try {
        LineNumberReader li =
          new LineNumberReader(
            new StringReader(s2));
        BufferedReader in4 =
          new BufferedReader(li);
        PrintWriter out1 =
          new PrintWriter(
            new BufferedWriter(
              new FileWriter("IODemo.out")));
        while((s = in4.readLine()) != null )
          out1.println(
            "Line " + li.getLineNumber() + s);
        out1.close();
      } catch(EOFException e) {
        System.out.println("End of stream");
      }

      // 5. Storing & recovering data
      try {
        DataOutputStream out2 =
          new DataOutputStream(
            new BufferedOutputStream(
              new FileOutputStream("Data.txt")));
        out2.writeDouble(3.14159);
        out2.writeBytes("That was pi");
        out2.close();
        DataInputStream in5 =
          new DataInputStream(
            new BufferedInputStream(
              new FileInputStream("Data.txt")));
        BufferedReader in5br =
          new BufferedReader(
            new InputStreamReader(in5));
        // Must use DataInputStream for data:
        System.out.println(in5.readDouble());
        // Can now use the "proper" readLine():
        System.out.println(in5br.readLine());
      } catch(EOFException e) {
        System.out.println("End of stream");
      }

      // 6. Reading and writing random access
      // files is the same as before.
      // (not repeated here)

    } catch(FileNotFoundException e) {
      System.out.println(
        "File Not Found:" + args[1]);
    } catch(IOException e) {
      System.out.println("IO Exception");
    }
  }
} ///:~

In general, you’ll see that the conversion is fairly straightforward and the code looks quite similar. There are some important differences, though. First of all, since random access files have not changed, section 6 is not repeated.

Section 1 shrinks a bit because if all you’re doing is reading line input you need only to wrap a BufferedReader around a FileReader. Section 1b shows the new way to wrap System.in for reading console input, and this expands because System.in is a DataInputStream and BufferedReader needs a Reader argument, so InputStreamReader is brought in to perform the translation.

In section 2 you can see that if you have a String and want to read from it you just use a StringReader instead of a StringBufferInputStream and the rest of the code is identical.

Section 3 shows a bug in the design of the new IO stream library. If you have a String and you want to read from it, you’re not supposed to use a StringBufferInputStream any more. When you compile code involving a StringBufferInputStream constructor, you get a deprecation message telling you to not use it. Instead, you’re supposed to use a StringReader. However, if you want to do formatted memory input as in section 3, you’re forced to use a DataInputStream—there is no “DataReader” to replace it—and a DataInputStream constructor requires an InputStream argument. So you have no choice but to use the deprecated StringBufferInputStream class. The compiler will give you a deprecation message but there’s nothing you can do about it.[57]

Section 4 is a reasonably straightforward translation from the old streams to the new, with no surprises. In section 5, you’re forced to use all the old streams classes because DataOutputStream and DataInputStream require them and there are no alternatives. However, you don’t get any deprecation messages at compile-time. If a stream is deprecated, typically its constructor produces a deprecation message to prevent you from using the entire class, but in the case of DataInputStream only the readLine( ) method is deprecated since you’re supposed to use a BufferedReader for readLine( ) (but a DataInputStream for all other formatted input).

If you compare section 5 with that section in IOStreamDemo.java, you’ll notice that in this version, the data is written before the text. That’s because a bug was introduced in Java 1.1, which is shown in the following code:

//: c11:IOBug.java
// Java 1.1 (and higher?) IO Bug.
import java.io.*;

public class IOBug {
  public static void main(String[] args) 
  throws Exception {
    DataOutputStream out =
      new DataOutputStream(
        new BufferedOutputStream(
          new FileOutputStream("Data.txt")));
    out.writeDouble(3.14159);
    out.writeBytes("That was the value of pi\n");
    out.writeBytes("This is pi/2:\n");
    out.writeDouble(3.14159/2);
    out.close();

    DataInputStream in =
      new DataInputStream(
        new BufferedInputStream(
          new FileInputStream("Data.txt")));
    BufferedReader inbr =
      new BufferedReader(
        new InputStreamReader(in));
    // The doubles written BEFORE the line of text
    // read back correctly:
    System.out.println(in.readDouble());
    // Read the lines of text:
    System.out.println(inbr.readLine());
    System.out.println(inbr.readLine());
    // Trying to read the doubles after the line
    // produces an end-of-file exception:
    System.out.println(in.readDouble());
  }
} ///:~

It appears that anything you write after a call to writeBytes( ) is not recoverable. This is a rather limiting bug, and we can hope that it will be fixed by the time you read this. You should run the above program to test it; if you don’t get an exception and the values print correctly then you’re out of the woods.

Redirecting standard IO

Java 1.1 has added methods in class System that allow you to redirect the standard input, output, and error IO streams using simple static method calls:

setIn(InputStream)
setOut(PrintStream)
setErr(PrintStream)

Redirecting output is especially useful if you suddenly start creating a large amount of output on your screen and it’s scrolling past faster than you can read it. Redirecting input is valuable for a command-line program in which you want to test a particular user-input sequence repeatedly. Here’s a simple example that shows the use of these methods:

//: c11:Redirecting.java
// Demonstrates the use of redirection 
// for standard IO in Java 1.1
import java.io.*;

class Redirecting {
  public static void main(String[] args) {
    try {
      BufferedInputStream in = 
        new BufferedInputStream(
          new FileInputStream(
            "Redirecting.java"));
      // Produces deprecation message:
      PrintStream out =
        new PrintStream(
          new BufferedOutputStream(
            new FileOutputStream("test.out")));
      System.setIn(in);
      System.setOut(out);
      System.setErr(out);

      BufferedReader br = 
        new BufferedReader(
          new InputStreamReader(System.in));
      String s;
      while((s = br.readLine()) != null)
        System.out.println(s);
      out.close(); // Remember this!
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
} ///:~

This program attaches standard input to a file, and redirects standard output and standard error to another file.

This is another example in which a deprecation message is inevitable. The message you can get when compiling with the -deprecation flag is:

Note: The constructor java.io.PrintStream(java.io.OutputStream)
has been deprecated.

However, both System.setOut( ) and System.setErr( ) require a PrintStream object as an argument, so you are forced to call the PrintStream constructor. You might wonder, if Java 1.1 deprecates the entire PrintStream class by deprecating the constructor, why the library designers, at the same time as they added this deprecation, also add new methods to System that required a PrintStream rather than a PrintWriter, which is the new and preferred replacement. It’s a mystery.

Compression

Java 1.1 has also added some classes to support reading and writing streams in a compressed format. These are wrapped around existing IO classes to provide compression functionality.

One aspect of these Java 1.1 classes stands out: They are not derived from the new Reader and Writer classes, but instead are part of the InputStream and OutputStream hierarchies. So you might be forced to mix the two types of streams. (Remember that you can use InputStreamReader and OutputStreamWriter to provide easy conversion between one type and another.)

Java 1.1 Compression class

Function

CheckedInputStream

GetCheckSum( ) produces checksum for any InputStream (not just decompression)

CheckedOutputStream

GetCheckSum( ) produces checksum for any OutputStream (not just compression)

DeflaterOutputStream

Base class for compression classes

ZipOutputStream

A DeflaterOutputStream that compresses data into the Zip file format

GZIPOutputStream

A DeflaterOutputStream that compresses data into the GZIP file format

InflaterInputStream

Base class for decompression classes

ZipInputStream

A DeflaterInputStream that Decompresses data that has been stored in the Zip file format

GZIPInputStream

A DeflaterInputStream that decompresses data that has been stored in the GZIP file format

Although there are many compression algorithms, Zip and GZIP are possibly the most commonly used. Thus you can easily manipulate your compressed data with the many tools available for reading and writing these formats.

Simple compression with GZIP

The GZIP interface is simple and thus is probably more appropriate when you have a single stream of data that you want to compress (rather than a container of dissimilar pieces of data). Here’s an example that compresses a single file:

//: c11:ZipCompress.java
// Uses Java 1.1 Zip compression to compress
// any number of files whose names are passed
// on the command line.
import java.io.*;
import java.util.*;
import java.util.zip.*;

public class ZipCompress {
  public static void main(String[] args) {
    try {
      FileOutputStream f =
        new FileOutputStream("test.zip");
      CheckedOutputStream csum =
        new CheckedOutputStream(
          f, new Adler32());
      ZipOutputStream out =
        new ZipOutputStream(
          new BufferedOutputStream(csum));
      out.setComment("A test of Java Zipping");
      // Can't read the above comment, though
      for(int i = 0; i < args.length; i++) {
        System.out.println(
          "Writing file " + args[i]);
        BufferedReader in =
          new BufferedReader(
            new FileReader(args[i]));
        out.putNextEntry(new ZipEntry(args[i]));
        int c;
        while((c = in.read()) != -1)
          out.write(c);
        in.close();
      }
      out.close();
      // Checksum valid only after the file
      // has been closed!
      System.out.println("Checksum: " +
        csum.getChecksum().getValue());
      // Now extract the files:
      System.out.println("Reading file");
      FileInputStream fi =
         new FileInputStream("test.zip");
      CheckedInputStream csumi =
        new CheckedInputStream(
          fi, new Adler32());
      ZipInputStream in2 =
        new ZipInputStream(
          new BufferedInputStream(csumi));
      ZipEntry ze;
      System.out.println("Checksum: " +
        csumi.getChecksum().getValue());
      while((ze = in2.getNextEntry()) != null) {
        System.out.println("Reading file " + ze);
        int x;
        while((x = in2.read()) != -1)
          System.out.write(x);
      }
      in2.close();
      // Alternative way to open and read
      // zip files:
      ZipFile zf = new ZipFile("test.zip");
      Enumeration e = zf.entries();
      while(e.hasMoreElements()) {
        ZipEntry ze2 = (ZipEntry)e.nextElement();
        System.out.println("File: " + ze2);
        // ... and extract the data as before
      }
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~

The use of the compression classes is straightforward—you simply wrap your output stream in a GZIPOutputStream or ZipOutputStream and your input stream in a GZIPInputStream or ZipInputStream. All else is ordinary IO reading and writing. This is, however, a good example of when you’re forced to mix the old IO streams with the new: in uses the Reader classes, whereas GZIPOutputStream’s constructor can accept only an OutputStream object, not a Writer object.

Multi-file storage with Zip

The Java 1.1 library that supports the Zip format is much more extensive. With it you can easily store multiple files, and there’s even a separate class to make the process of reading a Zip file easy. The library uses the standard Zip format so that it works seamlessly with all the tools currently downloadable on the Internet. The following example has the same form as the previous example, but it handles as many command-line arguments as you want. In addition, it shows the use of the Checksum classes to calculate and verify the checksum for the file. There are two Checksum types: Adler32 (which is faster) and CRC32 (which is slower but slightly more accurate).

//: c11:ZipCompress.java
// Uses Java 1.1 Zip compression to compress
// any number of files whose names are passed
// on the command line.
import java.io.*;
import java.util.*;
import java.util.zip.*;

public class ZipCompress {
  public static void main(String[] args) {
    try {
      FileOutputStream f =
        new FileOutputStream("test.zip");
      CheckedOutputStream csum =
        new CheckedOutputStream(
          f, new Adler32());
      ZipOutputStream out =
        new ZipOutputStream(
          new BufferedOutputStream(csum));
      out.setComment("A test of Java Zipping");
      // Can't read the above comment, though
      for(int i = 0; i < args.length; i++) {
        System.out.println(
          "Writing file " + args[i]);
        BufferedReader in =
          new BufferedReader(
            new FileReader(args[i]));
        out.putNextEntry(new ZipEntry(args[i]));
        int c;
        while((c = in.read()) != -1)
          out.write(c);
        in.close();
      }
      out.close();
      // Checksum valid only after the file
      // has been closed!
      System.out.println("Checksum: " +
        csum.getChecksum().getValue());
      // Now extract the files:
      System.out.println("Reading file");
      FileInputStream fi =
         new FileInputStream("test.zip");
      CheckedInputStream csumi =
        new CheckedInputStream(
          fi, new Adler32());
      ZipInputStream in2 =
        new ZipInputStream(
          new BufferedInputStream(csumi));
      ZipEntry ze;
      System.out.println("Checksum: " +
        csumi.getChecksum().getValue());
      while((ze = in2.getNextEntry()) != null) {
        System.out.println("Reading file " + ze);
        int x;
        while((x = in2.read()) != -1)
          System.out.write(x);
      }
      in2.close();
      // Alternative way to open and read
      // zip files:
      ZipFile zf = new ZipFile("test.zip");
      Enumeration e = zf.entries();
      while(e.hasMoreElements()) {
        ZipEntry ze2 = (ZipEntry)e.nextElement();
        System.out.println("File: " + ze2);
        // ... and extract the data as before
      }
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~

For each file to add to the archive, you must call putNextEntry( ) and pass it a ZipEntry object. The ZipEntry object contains an extensive interface that allows you to get and set all the data available on that particular entry in your Zip file: name, compressed and uncompressed sizes, date, CRC checksum, extra field data, comment, compression method, and whether it’s a directory entry. However, even though the Zip format has a way to set a password, this is not supported in Java’s Zip library. And although CheckedInputStream and CheckedOutputStream support both Adler32 and CRC32 checksums, the ZipEntry class supports only an interface for CRC. This is a restriction of the underlying Zip format, but it might limit you from using the faster Adler32.

To extract files, ZipInputStream has a getNextEntry( ) method that returns the next ZipEntry if there is one. As a more succinct alternative, you can read the file using a ZipFile object, which has a method entries( ) to return an Enumeration to the ZipEntries.

In order to read the checksum you must somehow have access to the associated Checksum object. Here, a reference to the CheckedOutputStream and CheckedInputStream objects is retained, but you could also just hold onto a reference to the Checksum object.

A baffling method in Zip streams is setComment( ). As shown above, you can set a comment when you’re writing a file, but there’s no way to recover the comment in the ZipInputStream. Comments appear to be supported fully on an entry-by-entry basis only via ZipEntry.

Of course, you are not limited to files when using the GZIP or Zip libraries—you can compress anything, including data to be sent through a network connection.

The Java archive (jar) utility

The Zip format is also used in the Java 1.1 JAR (Java ARchive) file format, which is a way to collect a group of files into a single compressed file, just like Zip. However, like everything else in Java, JAR files are cross-platform so you don’t need to worry about platform issues. You can also include audio and image files as well as class files.

JAR files are particularly helpful when you deal with the Internet. Before JAR files, your Web browser would have to make repeated requests of a Web server in order to download all of the files that make up an applet. In addition, each of these files was uncompressed. By combining all of the files for a particular applet into a single JAR file, only one server request is necessary and the transfer is faster because of compression. And each entry in a JAR file can be digitally signed for security (refer to the Java documentation for details).

A JAR file consists of a single file containing a collection of zipped files along with a “manifest” that describes them. (You can create your own manifest file; otherwise the jar program will do it for you.) You can find out more about JAR manifests in the online documentation.

The jar utility that comes with Sun’s JDK automatically compresses the files of your choice. You invoke it on the command line:

jar [options] destination [manifest] inputfile(s)
The options are simply a collection of letters (no hyphen or any other indicator is necessary). These are:

c

Creates a new or empty archive.

t

Lists the table of contents.

x

Extracts all files

x file

Extracts the named file

f

Says: “I’m going to give you the name of the file.” If you don’t use this, jar assumes that its input will come from standard input, or, if it is creating a file, its output will go to standard output.

m

Says that the first argument will be the name of the user-created manifest file

v

Generates verbose output describing what jar is doing

0

Only store the files; doesn’t compress the files (use to create a JAR file that you can put in your classpath)

M

Don’t automatically create a manifest file

If a subdirectory is included in the files to be put into the JAR file, that subdirectory is automatically added, including all of its subdirectories, etc. Path information is also preserved.

Here are some typical ways to invoke jar:

jar cf myJarFile.jar *.class

This creates a JAR file called myJarFile.jar that contains all of the class files in the current directory, along with an automatically-generated manifest file.

jar cmf myJarFile.jar myManifestFile.mf *.class

Like the previous example, but adding a user-created manifest file called myManifestFile.mf.

jar tf myJarFile.jar

Produces a table of contents of the files in myJarFile.jar.

jar tvf myJarFile.jar

Adds the “verbose” flag to give more detailed information about the files in myJarFile.jar.

jar cvf myApp.jar audio classes image

Assuming audio, classes, and image are subdirectories, this combines all of the subdirectories into the file myApp.jar. The “verbose” flag is also included to give extra feedback while the jar program is working.

If you create a JAR file using the 0 option, that file can be placed in your CLASSPATH:

CLASSPATH="lib1.jar;lib2.jar;"

Then Java can search lib1.jar and lib2.jar for class files.

The jar tool isn’t as useful as a zip utility. For example, you can’t add or update files to an existing JAR file; you can create JAR files only from scratch. Also, you can’t move files into a JAR file, erasing them as they are moved. However, a JAR file created on one platform will be transparently readable by the jar tool on any other platform (a problem that sometimes plagues zip utilities).

As you will see in Chapter 13, JAR files are also used to package Java Beans.

Object serialization

Java 1.1 has added an interesting feature called object serialization that allows you to take any object that implements the Serializable interface and turn it into a sequence of bytes that can later be restored fully into the original object. This is even true across a network, which means that the serialization mechanism automatically compensates for differences in operating systems. That is, you can create an object on a Windows machine, serialize it, and send it across the network to a Unix machine where it will be correctly reconstructed. You don’t have to worry about the data representations on the different machines, the byte ordering, or any other details.

By itself, object serialization is interesting because it allows you to implement lightweight persistence. Remember that persistence means an object’s lifetime is not determined by whether a program is executing—the object lives in between invocations of the program. By taking a serializable object and writing it to disk, then restoring that object when the program is reinvoked, you’re able to produce the effect of persistence. The reason it’s called “lightweight” is that you can’t simply define an object using some kind of “persistent” keyword and let the system take care of the details (although this might happen in the future). Instead, you must explicitly serialize and de-serialize the objects in your program.

Object serialization was added to the language to support two major features. Java 1.1’s remote method invocation (RMI) allows objects that live on other machines to behave as if they live on your machine. When sending messages to remote objects, object serialization is necessary to transport the arguments and return values. RMI is discussed in Chapter 15.

Object serialization is also necessary for Java Beans, introduced in Java 1.1. When a Bean is used, its state information is generally configured at design time. This state information must be stored and later recovered when the program is started; object serialization performs this task.

Serializing an object is quite simple, as long as the object implements the Serializable interface (this interface is just a flag and has no methods). In Java 1.1, many standard library classes have been changed so they’re serializable, including all of the wrappers for the primitive types, all of the container classes, and many others. Even Class objects can be serialized. (See Chapter 12 for the implications of this.)

To serialize an object, you create some sort of OutputStream object and then wrap it inside an ObjectOutputStream object. At this point you need only call writeObject( ) and your object is serialized and sent to the OutputStream. To reverse the process, you wrap an InputStream inside an ObjectInputStream and call readObject( ). What comes back is, as usual, a reference to an upcast Object, so you must downcast to set things straight.

A particularly clever aspect of object serialization is that it not only saves an image of your object but it also follows all the references contained in your object and saves those objects, and follows all the references in each of those objects, etc. This is sometimes referred to as the “web of objects” that a single object can be connected to, and it includes arrays of references to objects as well as member objects. If you had to maintain your own object serialization scheme, maintaining the code to follow all these links would be a bit mind–boggling. However, Java object serialization seems to pull it off flawlessly, no doubt using an optimized algorithm that traverses the web of objects. The following example tests the serialization mechanism by making a “worm” of linked objects, each of which has a link to the next segment in the worm as well as an array of references to objects of a different class, Data:

//: c11:Worm.java
// Demonstrates object serialization.
import java.io.*;

class Data implements Serializable {
  private int i;
  Data(int x) { i = x; }
  public String toString() {
    return Integer.toString(i);
  }
}

public class Worm implements Serializable {
  // Generate a random int value:
  private static int r() {
    return (int)(Math.random() * 10);
  }
  private Data[] d = {
    new Data(r()), new Data(r()), new Data(r())
  };
  private Worm next;
  private char c;
  // Value of i == number of segments
  Worm(int i, char x) {
    System.out.println(" Worm constructor: " + i);
    c = x;
    if(--i > 0)
      next = new Worm(i, (char)(x + 1));
  }
  Worm() {
    System.out.println("Default constructor");
  }
  public String toString() {
    String s = ":" + c + "(";
    for(int i = 0; i < d.length; i++)
      s += d[i].toString();
    s += ")";
    if(next != null)
      s += next.toString();
    return s;
  }
  public static void main(String[] args) {
    Worm w = new Worm(6, 'a');
    System.out.println("w = " + w);
    try {
      ObjectOutputStream out =
        new ObjectOutputStream(
          new FileOutputStream("worm.out"));
      out.writeObject("Worm storage");
      out.writeObject(w);
      out.close(); // Also flushes output
      ObjectInputStream in =
        new ObjectInputStream(
          new FileInputStream("worm.out"));
      String s = (String)in.readObject();
      Worm w2 = (Worm)in.readObject();
      System.out.println(s + ", w2 = " + w2);
    } catch(Exception e) {
      e.printStackTrace();
    }
    try {
      ByteArrayOutputStream bout =
        new ByteArrayOutputStream();
      ObjectOutputStream out =
        new ObjectOutputStream(bout);
      out.writeObject("Worm storage");
      out.writeObject(w);
      out.flush();
      ObjectInputStream in =
        new ObjectInputStream(
          new ByteArrayInputStream(
            bout.toByteArray()));
      String s = (String)in.readObject();
      Worm w3 = (Worm)in.readObject();
      System.out.println(s + ", w3 = " + w3);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~

To make things interesting, the array of Data objects inside Worm are initialized with random numbers. (This way you don’t suspect the compiler of keeping some kind of meta-information.) Each Worm segment is labeled with a char that’s automatically generated in the process of recursively generating the linked list of Worms. When you create a Worm, you tell the constructor how long you want it to be. To make the next reference it calls the Worm constructor with a length of one less, etc. The final next reference is left as null, indicating the end of the Worm.

The point of all this was to make something reasonably complex that couldn’t easily be serialized. The act of serializing, however, is quite simple. Once the ObjectOutputStream is created from some other stream, writeObject( ) serializes the object. Notice the call to writeObject( ) for a String, as well. You can also write all the primitive data types using the same methods as DataOutputStream (they share the same interface).

There are two separate try blocks that look similar. The first writes and reads a file and the second, for variety, writes and reads a ByteArray. You can read and write an object using serialization to any DataInputStream or DataOutputStream including, as you will see in the networking chapter, a network. The output from one run was:

Worm constructor: 6
Worm constructor: 5
Worm constructor: 4
Worm constructor: 3
Worm constructor: 2
Worm constructor: 1
w = :a(262):b(100):c(396):d(480):e(316):f(398)
Worm storage, w2 = :a(262):b(100):c(396):d(480):e(316):f(398)
Worm storage, w3 = :a(262):b(100):c(396):d(480):e(316):f(398)

You can see that the deserialized object really does contain all of the links that were in the original object.

Note that no constructor, not even the default constructor, is called in the process of deserializing a Serializable object. The entire object is restored by recovering data from the InputStream.

Object serialization is another Java 1.1 feature that is not part of the new Reader and Writer hierarchies, but instead uses the old InputStream and OutputStream hierarchies. Thus you might encounter situations in which you’re forced to mix the two hierarchies.

Finding the class

You might wonder what’s necessary for an object to be recovered from its serialized state. For example, suppose you serialize an object and send it as a file or through a network to another machine. Could a program on the other machine reconstruct the object using only the contents of the file?

The best way to answer this question is (as usual) by performing an experiment. The following file goes in the subdirectory for this chapter:

//: c11:Alien.java
// A serializable class.
import java.io.*;

public class Alien implements Serializable {
} ///:~

The file that creates and serializes an Alien object goes in the same directory:

//: c11:FreezeAlien.java
// Create a serialized output file.
import java.io.*;

public class FreezeAlien {
  public static void main(String[] args) 
      throws Exception {
    ObjectOutput out = 
      new ObjectOutputStream(
        new FileOutputStream("file.x"));
    Alien zorcon = new Alien();
    out.writeObject(zorcon); 
  }
} ///:~

Rather than catching and handling exceptions, this program takes the quick and dirty approach of passing the exceptions out of main( ), so they’ll be reported on the command line.

Once the program is compiled and run, copy the resulting file.x to a subdirectory called xfiles, where the following code goes:

//: c11:xfiles:ThawAlien.java
// Try to recover a serialized file without the 
// class of object that's stored in that file.
package c11.xfiles;
import java.io.*;

public class ThawAlien {
  public static void main(String[] args) 
      throws Exception {
    ObjectInputStream in =
      new ObjectInputStream(
        new FileInputStream("file.x"));
    Object mystery = in.readObject();
    System.out.println(mystery.getClass());
  }
} ///:~

This program opens the file and reads in the object mystery successfully. However, as soon as you try to find out anything about the object—which requires the Class object for Alien—the Java Virtual Machine (JVM) cannot find Alien.class (unless it happens to be in the Classpath, which it shouldn’t be in this example). You’ll get a ClassNotFoundException. (Once again, all evidence of alien life vanishes before proof of its existence can be verified!)

If you expect to do much after you’ve recovered an object that has been serialized, you must make sure that the JVM can find the associated .class file either in the local class path or somewhere on the Internet.

Controlling serialization

As you can see, the default serialization mechanism is trivial to use. But what if you have special needs? Perhaps you have special security issues and you don’t want to serialize portions of your object, or perhaps it just doesn’t make sense for one sub-object to be serialized if that part needs to be created anew when the object is recovered.

You can control the process of serialization by implementing the Externalizable interface instead of the Serializable interface. The Externalizable interface extends the Serializable interface and adds two methods, writeExternal( ) and readExternal( ), that are automatically called for your object during serialization and deserialization so that you can perform your special operations.

The following example shows simple implementations of the Externalizable interface methods. Note that Blip1 and Blip2 are nearly identical except for a subtle difference (see if you can discover it by looking at the code):

//: c11:Blips.java
// Simple use of Externalizable & a pitfall.
import java.io.*;
import java.util.*;

class Blip1 implements Externalizable {
  public Blip1() {
    System.out.println("Blip1 Constructor");
  }
  public void writeExternal(ObjectOutput out)
      throws IOException {
    System.out.println("Blip1.writeExternal");
  }
  public void readExternal(ObjectInput in)
     throws IOException, ClassNotFoundException {
    System.out.println("Blip1.readExternal");
  }
}

class Blip2 implements Externalizable {
  Blip2() {
    System.out.println("Blip2 Constructor");
  }
  public void writeExternal(ObjectOutput out)
      throws IOException {
    System.out.println("Blip2.writeExternal");
  }
  public void readExternal(ObjectInput in)
     throws IOException, ClassNotFoundException {
    System.out.println("Blip2.readExternal");
  }
}

public class Blips {
  public static void main(String[] args) {
    System.out.println("Constructing objects:");
    Blip1 b1 = new Blip1();
    Blip2 b2 = new Blip2();
    try {
      ObjectOutputStream o =
        new ObjectOutputStream(
          new FileOutputStream("Blips.out"));
      System.out.println("Saving objects:");
      o.writeObject(b1);
      o.writeObject(b2);
      o.close();
      // Now get them back:
      ObjectInputStream in =
        new ObjectInputStream(
          new FileInputStream("Blips.out"));
      System.out.println("Recovering b1:");
      b1 = (Blip1)in.readObject();
      // OOPS! Throws an exception:
//!   System.out.println("Recovering b2:");
//!   b2 = (Blip2)in.readObject();
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~

The output for this program is:

Constructing objects:
Blip1 Constructor
Blip2 Constructor
Saving objects:
Blip1.writeExternal
Blip2.writeExternal
Recovering b1:
Blip1 Constructor
Blip1.readExternal

The reason that the Blip2 object is not recovered is that trying to do so causes an exception. Can you see the difference between Blip1 and Blip2? The constructor for Blip1 is public, while the constructor for Blip2 is not, and that causes the exception upon recovery. Try making Blip2’s constructor public and removing the //! comments to see the correct results.

When b1 is recovered, the Blip1 default constructor is called. This is different from recovering a Serializable object, in which the object is constructed entirely from its stored bits, with no constructor calls. With an Externalizable object, all the normal default construction behavior occurs (including the initializations at the point of field definition), and then readExternal( ) is called. You need to be aware of this—in particular the fact that all the default construction always takes place—to produce the correct behavior in your Externalizable objects.

Here’s an example that shows what you must do to fully store and retrieve an Externalizable object:

//: c11:Blip3.java
// Reconstructing an externalizable object.
import java.io.*;
import java.util.*;

class Blip3 implements Externalizable {
  int i;
  String s; // No initialization
  public Blip3() {
    System.out.println("Blip3 Constructor");
    // s, i not initialized
  }
  public Blip3(String x, int a) {
    System.out.println("Blip3(String x, int a)");
    s = x;
    i = a;
    // s & i initialized only in non-default
    // constructor.
  }
  public String toString() { return s + i; }
  public void writeExternal(ObjectOutput out)
      throws IOException {
    System.out.println("Blip3.writeExternal");
    // You must do this:
    out.writeObject(s); out.writeInt(i);
  }
  public void readExternal(ObjectInput in)
     throws IOException, ClassNotFoundException {
    System.out.println("Blip3.readExternal");
    // You must do this:
    s = (String)in.readObject(); 
    i =in.readInt();
  }
  public static void main(String[] args) {
    System.out.println("Constructing objects:");
    Blip3 b3 = new Blip3("A String ", 47);
    System.out.println(b3);
    try {
      ObjectOutputStream o =
        new ObjectOutputStream(
          new FileOutputStream("Blip3.out"));
      System.out.println("Saving object:");
      o.writeObject(b3);
      o.close();
      // Now get it back:
      ObjectInputStream in =
        new ObjectInputStream(
          new FileInputStream("Blip3.out"));
      System.out.println("Recovering b3:");
      b3 = (Blip3)in.readObject();
      System.out.println(b3);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~

The fields s and i are initialized only in the second constructor, but not in the default constructor. This means that if you don’t initialize s and i in readExternal, it will be null (since the storage for the object gets wiped to zero in the first step of object creation). If you comment out the two lines of code following the phrases “You must do this” and run the program, you’ll see that when the object is recovered, s is null and i is zero.

If you are inheriting from an Externalizable object, you’ll typically call the base-class versions of writeExternal( ) and readExternal( ) to provide proper storage and retrieval of the base-class components.

So to make things work correctly you must not only write the important data from the object during the writeExternal( ) method (there is no default behavior that writes any of the member objects for an Externalizable object), but you must also recover that data in the readExternal( ) method. This can be a bit confusing at first because the default construction behavior for an Externalizable object can make it seem like some kind of storage and retrieval takes place automatically. It does not.

The transient keyword

When you’re controlling serialization, there might be a particular subobject that you don’t want Java’s serialization mechanism to automatically save and restore. This is commonly the case if that subobject represents sensitive information that you don’t want to serialize, such as a password. Even if that information is private in the object, once it’s serialized it’s possible for someone to access it by reading a file or intercepting a network transmission.

One way to prevent sensitive parts of your object from being serialized is to implement your class as Externalizable, as shown previously. Then nothing is automatically serialized and you can explicitly serialize only the necessary parts inside writeExternal( ).

If you’re working with a Serializable object, however, all serialization happens automatically. To control this, you can turn off serialization on a field-by-field basis using the transient keyword, which says “Don’t bother saving or restoring this—I’ll take care of it.”

For example, consider a Login object that keeps information about a particular login session. Suppose that, once you verify the login, you want to store the data, but without the password. The easiest way to do this is by implementing Serializable and marking the password field as transient. Here’s what it looks like:

//: c11:Logon.java
// Demonstrates the "transient" keyword.
import java.io.*;
import java.util.*;

class Logon implements Serializable {
  private Date date = new Date();
  private String username;
  private transient String password;
  Logon(String name, String pwd) {
    username = name;
    password = pwd;
  }
  public String toString() {
    String pwd =
      (password == null) ? "(n/a)" : password;
    return "logon info: \n   " +
      "username: " + username +
      "\n   date: " + date +
      "\n   password: " + pwd;
  }
  public static void main(String[] args) {
    Logon a = new Logon("Hulk", "myLittlePony");
    System.out.println( "logon a = " + a);
    try {
      ObjectOutputStream o =
        new ObjectOutputStream(
          new FileOutputStream("Logon.out"));
      o.writeObject(a);
      o.close();
      // Delay:
      int seconds = 5;
      long t = System.currentTimeMillis()
             + seconds * 1000;
      while(System.currentTimeMillis() < t)
        ;
      // Now get them back:
      ObjectInputStream in =
        new ObjectInputStream(
          new FileInputStream("Logon.out"));
      System.out.println(
        "Recovering object at " + new Date());
      a = (Logon)in.readObject();
      System.out.println( "logon a = " + a);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~

You can see that the date and username fields are ordinary (not transient), and thus are automatically serialized. However, the password is transient, and so is not stored to disk; also the serialization mechanism makes no attempt to recover it. The output is:

logon a = logon info:
   username: Hulk
   date: Sun Mar 23 18:25:53 PST 1997
   password: myLittlePony
Recovering object at Sun Mar 23 18:25:59 PST 1997
logon a = logon info:
   username: Hulk
   date: Sun Mar 23 18:25:53 PST 1997
   password: (n/a)

When the object is recovered, the password field is null. Note that toString( ) must check for a null value of password because if you try to assemble a String object using the overloaded ‘+’ operator, and that operator encounters a null reference, you’ll get a NullPointerException. (Newer versions of Java might contain code to avoid this problem.)

You can also see that the date field is stored to and recovered from disk and not generated anew.

Since Externalizable objects do not store any of their fields by default, the transient keyword is for use with Serializable objects only.

An alternative to Externalizable

If you’re not keen on implementing the Externalizable interface, there’s another approach. You can implement the Serializable interface and add (notice I say “add” and not “override” or “implement”) methods called writeObject( ) and readObject( ) that will automatically be called when the object is serialized and deserialized, respectively. That is, if you provide these two methods they will be used instead of the default serialization.

The methods must have these exact signatures:

private void 
  writeObject(ObjectOutputStream stream)
    throws IOException;

private void 
  readObject(ObjectInputStream stream)
    throws IOException, ClassNotFoundException

From a design standpoint, things get really weird here. First of all, you might think that because these methods are not part of a base class or the Serializable interface, they ought to be defined in their own interface(s). But notice that they are defined as private, which means they are to be called only by other members of this class. However, you don’t actually call them from other members of this class, but instead the writeObject( ) and readObject( ) methods of the ObjectOutputStream and ObjectInputStream objects call your object’s writeObject( ) and readObject( ) methods. (Notice my tremendous restraint in not launching into a long diatribe about using the same method names here. In a word: confusing.) You might wonder how the ObjectOutputStream and ObjectInputStream objects have access to private methods of your class. We can only assume that this is part of the serialization magic.

In any event, anything defined in an interface is automatically public so if writeObject( ) and readObject( ) must be private, then they can’t be part of an interface. Since you must follow the signatures exactly, the effect is the same as if you’re implementing an interface.

It would appear that when you call ObjectOutputStream.writeObject( ), the Serializable object that you pass it to is interrogated (using reflection, no doubt) to see if it implements its own writeObject( ). If so, the normal serialization process is skipped and the writeObject( ) is called. The same sort of situation exists for readObject( ).

There’s one other twist. Inside your writeObject( ), you can choose to perform the default writeObject( ) action by calling defaultWriteObject( ). Likewise, inside readObject( ) you can call defaultReadObject( ). Here is a simple example that demonstrates how you can control the storage and retrieval of a Serializable object:

//: c11:SerialCtl.java
// Controlling serialization by adding your own
// writeObject() and readObject() methods.
import java.io.*;

public class SerialCtl implements Serializable {
  String a;
  transient String b;
  public SerialCtl(String aa, String bb) {
    a = "Not Transient: " + aa;
    b = "Transient: " + bb;
  }
  public String toString() {
    return a + "\n" + b;
  }
  private void 
    writeObject(ObjectOutputStream stream)
      throws IOException {
    stream.defaultWriteObject();
    stream.writeObject(b);
  }
  private void 
    readObject(ObjectInputStream stream)
      throws IOException, ClassNotFoundException {
    stream.defaultReadObject();
    b = (String)stream.readObject();
  }
  public static void main(String[] args) {
    SerialCtl sc = 
      new SerialCtl("Test1", "Test2");
    System.out.println("Before:\n" + sc);
    ByteArrayOutputStream buf = 
      new ByteArrayOutputStream();
    try {
      ObjectOutputStream o =
        new ObjectOutputStream(buf);
      o.writeObject(sc);
      // Now get it back:
      ObjectInputStream in =
        new ObjectInputStream(
          new ByteArrayInputStream(
            buf.toByteArray()));
      SerialCtl sc2 = (SerialCtl)in.readObject();
      System.out.println("After:\n" + sc2);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~

In this example, one String field is ordinary and the other is transient, to prove that the non-transient field is saved by the defaultWriteObject( ) method and the transient field is saved and restored explicitly. The fields are initialized inside the constructor rather than at the point of definition to prove that they are not being initialized by some automatic mechanism during deserialization.

If you are going to use the default mechanism to write the non-transient parts of your object, you must call defaultWriteObject( ) as the first operation in writeObject( ) and defaultReadObject( ) as the first operation in readObject( ). These are strange method calls. It would appear, for example, that you are calling defaultWriteObject( ) for an ObjectOutputStream and passing it no arguments, and yet it somehow turns around and knows the reference to your object and how to write all the non-transient parts. Spooky.

The storage and retrieval of the transient objects uses more familiar code. And yet, think about what happens here. In main( ), a SerialCtl object is created, and then it’s serialized to an ObjectOutputStream. (Notice in this case that a buffer is used instead of a file—it’s all the same to the ObjectOutputStream.) The serialization occurs in the line:

o.writeObject(sc);

The writeObject( ) method must be examining sc to see if it has its own writeObject( ) method. (Not by checking the interface—there isn’t one—or the class type, but by actually hunting for the method using reflection.) If it does, it uses that. A similar approach holds true for readObject( ). Perhaps this was the only practical way that they could solve the problem, but it’s certainly strange.

Versioning

It’s possible that you might want to change the version of a serializable class (objects of the original class might be stored in a database, for example). This is supported but you’ll probably do it only in special cases, and it requires an extra depth of understanding that we will not attempt to achieve here. The JDK1.1 HTML documents downloadable from Sun (which might be part of your Java package’s online documents) cover this topic quite thoroughly.

Using persistence

It’s quite appealing to use serialization technology to store some of the state of your program so that you can easily restore the program to the current state later. But before you can do this, some questions must be answered. What happens if you serialize two objects that both have a reference to a third object? When you restore those two objects from their serialized state, do you get only one occurrence of the third object? What if you serialize your two objects to separate files and deserialize them in different parts of your code?

Here’s an example that shows the problem:

//: c11:MyWorld.java
import java.io.*;
import java.util.*;

class House implements Serializable {}

class Animal implements Serializable {
  String name;
  House preferredHouse;
  Animal(String nm, House h) { 
    name = nm; 
    preferredHouse = h;
  }
  public String toString() {
    return name + "[" + super.toString() + 
      "], " + preferredHouse + "\n";
  }
}

public class MyWorld {
  public static void main(String[] args) {
    House house = new House();
    ArrayList  animals = new ArrayList();
    animals.add(
      new Animal("Bosco the dog", house));
    animals.add(
      new Animal("Ralph the hamster", house));
    animals.add(
      new Animal("Fronk the cat", house));
    System.out.println("animals: " + animals);

    try {
      ByteArrayOutputStream buf1 = 
        new ByteArrayOutputStream();
      ObjectOutputStream o1 =
        new ObjectOutputStream(buf1);
      o1.writeObject(animals);
      o1.writeObject(animals); // Write a 2nd set
      // Write to a different stream:
      ByteArrayOutputStream buf2 = 
        new ByteArrayOutputStream();
      ObjectOutputStream o2 =
        new ObjectOutputStream(buf2);
      o2.writeObject(animals);
      // Now get them back:
      ObjectInputStream in1 =
        new ObjectInputStream(
          new ByteArrayInputStream(
            buf1.toByteArray()));
      ObjectInputStream in2 =
        new ObjectInputStream(
          new ByteArrayInputStream(
            buf2.toByteArray()));
      ArrayList animals1 = (ArrayList)in1.readObject();
      ArrayList animals2 = (ArrayList)in1.readObject();
      ArrayList animals3 = (ArrayList)in2.readObject();
      System.out.println("animals1: " + animals1);
      System.out.println("animals2: " + animals2);
      System.out.println("animals3: " + animals3);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
} ///:~

One thing that’s interesting here is that it’s possible to use object serialization to and from a byte array as a way of doing a “deep copy” of any object that’s Serializable. (A deep copy means that you’re duplicating the entire web of objects, rather than just the basic object and its references.) Copying is covered in depth in Appendix A.

Animal objects contain fields of type House. In main( ), an ArrayList of these Animals is created and it is serialized twice to one stream and then again to a separate stream. When these are deserialized and printed, you see the following results for one run (the objects will be in different memory locations each run):

animals: [Bosco the dog[Animal@1cc76c], House@1cc769
, Ralph the hamster[Animal@1cc76d], House@1cc769
, Fronk the cat[Animal@1cc76e], House@1cc769
]
animals1: [Bosco the dog[Animal@1cca0c], House@1cca16
, Ralph the hamster[Animal@1cca17], House@1cca16
, Fronk the cat[Animal@1cca1b], House@1cca16
]
animals2: [Bosco the dog[Animal@1cca0c], House@1cca16
, Ralph the hamster[Animal@1cca17], House@1cca16
, Fronk the cat[Animal@1cca1b], House@1cca16
]
animals3: [Bosco the dog[Animal@1cca52], House@1cca5c
, Ralph the hamster[Animal@1cca5d], House@1cca5c
, Fronk the cat[Animal@1cca61], House@1cca5c
]

Of course you expect that the deserialized objects have different addresses from their originals. But notice that in animals1 and animals2 the same addresses appear, including the references to the House object that both share. On the other hand, when animals3 is recovered the system has no way of knowing that the objects in this other stream are aliases of the objects in the first stream, so it makes a completely different web of objects.

As long as you’re serializing everything to a single stream, you’ll be able to recover the same web of objects that you wrote, with no accidental duplication of objects. Of course, you can change the state of your objects in between the time you write the first and the last, but that’s your responsibility—the objects will be written in whatever state they are in (and with whatever connections they have to other objects) at the time you serialize them.

The safest thing to do if you want to save the state of a system is to serialize as an “atomic” operation. If you serialize some things, do some other work, and serialize some more, etc., then you will not be storing the system safely. Instead, put all the objects that comprise the state of your system in a single container and simply write that container out in one operation. Then you can restore it with a single method call as well.

The following example is an imaginary computer-aided design (CAD) system that demonstrates the approach. In addition, it throws in the issue of static fields—if you look at the documentation you’ll see that Class is Serializable, so it should be easy to store the static fields by simply serializing the Class object. That seems like a sensible approach, anyway.

//: c11:CADState.java
// Saving and restoring the state of a 
// pretend CAD system.
import java.io.*;
import java.util.*;

abstract class Shape implements Serializable {
  public static final int 
    RED = 1, BLUE = 2, GREEN = 3;
  private int xPos, yPos, dimension;
  private static Random r = new Random();
  private static int counter = 0;
  abstract public void setColor(int newColor);
  abstract public int getColor();
  public Shape(int xVal, int yVal, int dim) {
    xPos = xVal;
    yPos = yVal;
    dimension = dim;
  }
  public String toString() {
    return getClass() + 
      " color[" + getColor() +
      "] xPos[" + xPos +
      "] yPos[" + yPos +
      "] dim[" + dimension + "]\n";
  }
  public static Shape randomFactory() {
    int xVal = r.nextInt() % 100;
    int yVal = r.nextInt() % 100;
    int dim = r.nextInt() % 100;
    switch(counter++ % 3) {
      default: 
      case 0: return new Circle(xVal, yVal, dim);
      case 1: return new Square(xVal, yVal, dim);
      case 2: return new Line(xVal, yVal, dim);
    }
  }
}

class Circle extends Shape {
  private static int color = RED;
  public Circle(int xVal, int yVal, int dim) {
    super(xVal, yVal, dim);
  }
  public void setColor(int newColor) { 
    color = newColor;
  }
  public int getColor() { 
    return color;
  }
}

class Square extends Shape {
  private static int color;
  public Square(int xVal, int yVal, int dim) {
    super(xVal, yVal, dim);
    color = RED;
  }
  public void setColor(int newColor) { 
    color = newColor;
  }
  public int getColor() { 
    return color;
  }
}

class Line extends Shape {
  private static int color = RED;
  public static void 
  serializeStaticState(ObjectOutputStream os)
      throws IOException {
    os.writeInt(color);
  }
  public static void 
  deserializeStaticState(ObjectInputStream os)
      throws IOException {
    color = os.readInt();
  }
  public Line(int xVal, int yVal, int dim) {
    super(xVal, yVal, dim);
  }
  public void setColor(int newColor) { 
    color = newColor;
  }
  public int getColor() { 
    return color;
  }
}

public class CADState {
  public static void main(String[] args) 
      throws Exception {
    ArrayList shapeTypes, shapes;
    if(args.length == 0) {
      shapeTypes = new ArrayList();
      shapes = new ArrayList();
      // Add references to the class objects:
      shapeTypes.add(Circle.class);
      shapeTypes.add(Square.class);
      shapeTypes.add(Line.class);
      // Make some shapes:
      for(int i = 0; i < 10; i++)
        shapes.add(Shape.randomFactory());
      // Set all the static colors to GREEN:
      for(int i = 0; i < 10; i++)
        ((Shape)shapes.get(i))
          .setColor(Shape.GREEN);
      // Save the state vector:
      ObjectOutputStream out =
        new ObjectOutputStream(
          new FileOutputStream("CADState.out"));
      out.writeObject(shapeTypes);
      Line.serializeStaticState(out);
      out.writeObject(shapes);
    } else { // There's a command-line argument
      ObjectInputStream in =
        new ObjectInputStream(
          new FileInputStream(args[0]));
      // Read in the same order they were written:
      shapeTypes = (ArrayList)in.readObject();
      Line.deserializeStaticState(in);
      shapes = (ArrayList)in.readObject();
    }
    // Display the shapes:
    System.out.println(shapes);
  }
} ///:~

The Shape class implements Serializable, so anything that is inherited from Shape is automatically Serializable as well. Each Shape contains data, and each derived Shape class contains a static field that determines the color of all of those types of Shapes. (Placing a static field in the base class would result in only one field, since static fields are not duplicated in derived classes.) Methods in the base class can be overridden to set the color for the various types (static methods are not dynamically bound, so these are normal methods). The randomFactory( ) method creates a different Shape each time you call it, using random values for the Shape data.

Circle and Square are straightforward extensions of Shape; the only difference is that Circle initializes color at the point of definition and Square initializes it in the constructor. We’ll leave the discussion of Line for later.

In main( ), one ArrayList is used to hold the Class objects and the other to hold the shapes. If you don’t provide a command line argument the shapeTypes ArrayList is created and the Class objects are added, and then the shapes ArrayList is created and Shape objects are added. Next, all the static color values are set to GREEN, and everything is serialized to the file CADState.out.

If you provide a command line argument (presumably CADState.out), that file is opened and used to restore the state of the program. In both situations, the resulting ArrayList of Shapes is printed. The results from one run are:

>java CADState
[class Circle color[3] xPos[-51] yPos[-99] dim[38]
, class Square color[3] xPos[2] yPos[61] dim[-46]
, class Line color[3] xPos[51] yPos[73] dim[64]
, class Circle color[3] xPos[-70] yPos[1] dim[16]
, class Square color[3] xPos[3] yPos[94] dim[-36]
, class Line color[3] xPos[-84] yPos[-21] dim[-35]
, class Circle color[3] xPos[-75] yPos[-43] dim[22]
, class Square color[3] xPos[81] yPos[30] dim[-45]
, class Line color[3] xPos[-29] yPos[92] dim[17]
, class Circle color[3] xPos[17] yPos[90] dim[-76]
]

>java CADState CADState.out
[class Circle color[1] xPos[-51] yPos[-99] dim[38]
, class Square color[0] xPos[2] yPos[61] dim[-46]
, class Line color[3] xPos[51] yPos[73] dim[64]
, class Circle color[1] xPos[-70] yPos[1] dim[16]
, class Square color[0] xPos[3] yPos[94] dim[-36]
, class Line color[3] xPos[-84] yPos[-21] dim[-35]
, class Circle color[1] xPos[-75] yPos[-43] dim[22]
, class Square color[0] xPos[81] yPos[30] dim[-45]
, class Line color[3] xPos[-29] yPos[92] dim[17]
, class Circle color[1] xPos[17] yPos[90] dim[-76]
]

You can see that the values of xPos, yPos, and dim were all stored and recovered successfully, but there’s something wrong with the retrieval of the static information. It’s all ‘3’ going in, but it doesn’t come out that way. Circles have a value of 1 (RED, which is the definition), and Squares have a value of 0 (remember, they are initialized in the constructor). It’s as if the statics didn’t get serialized at all! That’s right—even though class Class is Serializable, it doesn’t do what you expect. So if you want to serialize statics, you must do it yourself.

This is what the serializeStaticState( ) and deserializeStaticState( ) static methods in Line are for. You can see that they are explicitly called as part of the storage and retrieval process. (Note that the order of writing to the serialize file and reading back from it must be maintained.) Thus to make CADState.java run correctly you must (1) Add a serializeStaticState( ) and deserializeStaticState( ) to the shapes, (2) Remove the ArrayList shapeTypes and all code related to it, and (3) Add calls to the new serialize and deserialize static methods in the shapes.

Another issue you might have to think about is security, since serialization also saves private data. If you have a security issue, those fields should be marked as transient. But then you have to design a secure way to store that information so that when you do a restore you can reset those private variables.

Checking capitalization style

In this section we’ll look at a more complete example of the use of Java IO. This project is directly useful because it performs a style check to make sure that your capitalization conforms to the Java style as found at http://java.sun.com/docs/codeconv/index.html. It opens each .java file in the current directory and extracts all the class names and identifiers, then shows you if any of them don’t meet the Java style.

For the program to operate correctly, you must first build a class name repository to hold all the class names in the standard Java library. You do this by moving into all the source code subdirectories for the standard Java library and running ClassScanner in each subdirectory. Provide as arguments the name of the repository file (using the same path and name each time) and the -a command-line option to indicate that the class names should be added to the repository.

To use the program to check your code, run it and hand it the path and name of the repository to use. It will check all the classes and identifiers in the current directory and tell you which ones don’t follow the typical Java capitalization style.

You should be aware that the program isn’t perfect; there are a few times when it will point out what it thinks is a problem but on looking at the code you’ll see that nothing needs to be changed. This is a little annoying, but it’s still much easier than trying to find all these cases by staring at your code.

The explanation immediately follows the listing:

//: c11:ClassScanner.java
// Scans all files in directory for classes
// and identifiers, to check capitalization.
// Assumes properly compiling code listings.
// Doesn't do everything right, but is a very
// useful aid.
import java.io.*;
import java.util.*;

class MultiStringMap extends HashMap {
  public void add(String key, String value) {
    if(!containsKey(key))
      put(key, new ArrayList());
    ((ArrayList)get(key)).add(value);
  }
  public ArrayList getArrayList(String key) {
    if(!containsKey(key)) {
      System.err.println(
        "ERROR: can't find key: " + key);
      System.exit(1);
    }
    return (ArrayList)get(key);
  }
  public void printValues(PrintStream p) {
    Iterator k = keySet().iterator();
    while(k.hasNext()) {
      String oneKey = (String)k.next();
      ArrayList val = getArrayList(oneKey);
      for(int i = 0; i < val.size(); i++)
        p.println((String)val.get(i));
    }
  }
}

public class ClassScanner {
  private File path;
  private String[] fileList;
  private Properties classes = new Properties();
  private MultiStringMap 
    classMap = new MultiStringMap(),
    identMap = new MultiStringMap();
  private StreamTokenizer in;
  public ClassScanner() {
    path = new File(".");
    fileList = path.list(new JavaFilter());
    for(int i = 0; i < fileList.length; i++) {
      System.out.println(fileList[i]);
      scanListing(fileList[i]);
    }
  }
  void scanListing(String fname) {
    try {
      in = new StreamTokenizer(
          new BufferedReader(
            new FileReader(fname)));
      // Doesn't seem to work:
      // in.slashStarComments(true);
      // in.slashSlashComments(true);
      in.ordinaryChar('/');
      in.ordinaryChar('.');
      in.wordChars('_', '_');
      in.eolIsSignificant(true);
      while(in.nextToken() != 
            StreamTokenizer.TT_EOF) {
        if(in.ttype == '/')
          eatComments();
        else if(in.ttype == 
                StreamTokenizer.TT_WORD) {
          if(in.sval.equals("class") || 
             in.sval.equals("interface")) {
            // Get class name:
               while(in.nextToken() != 
                     StreamTokenizer.TT_EOF
                     && in.ttype != 
                     StreamTokenizer.TT_WORD)
                 ;
               classes.put(in.sval, in.sval);
               classMap.add(fname, in.sval);
          }
          if(in.sval.equals("import") ||
             in.sval.equals("package"))
            discardLine();
          else // It's an identifier or keyword
            identMap.add(fname, in.sval);
        }
      }
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  void discardLine() {
    try {
      while(in.nextToken() != 
            StreamTokenizer.TT_EOF
            && in.ttype != 
            StreamTokenizer.TT_EOL)
        ; // Throw away tokens to end of line
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  // StreamTokenizer's comment removal seemed
  // to be broken. This extracts them:
  void eatComments() {
    try {
      if(in.nextToken() != 
         StreamTokenizer.TT_EOF) {
        if(in.ttype == '/')
          discardLine();
        else if(in.ttype != '*')
          in.pushBack();
        else 
          while(true) {
            if(in.nextToken() == 
              StreamTokenizer.TT_EOF)
              break;
            if(in.ttype == '*')
              if(in.nextToken() != 
                StreamTokenizer.TT_EOF
                && in.ttype == '/')
                break;
          }
      }
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  public String[] classNames() {
    String[] result = new String[classes.size()];
    Iterator e = classes.keySet().iterator();
    int i = 0;
    while(e.hasNext())
      result[i++] = (String)e.next();
    return result;
  }
  public void checkClassNames() {
    Iterator files = classMap.keySet().iterator();
    while(files.hasNext()) {
      String file = (String)files.next();
      ArrayList cls = classMap.getArrayList(file);
      for(int i = 0; i < cls.size(); i++) {
        String className = (String)cls.get(i);
        if(Character.isLowerCase(
             className.charAt(0)))
          System.out.println(
            "class capitalization error, file: "
            + file + ", class: " 
            + className);
      }
    }
  }
  public void checkIdentNames() {
    Iterator files = identMap.keySet().iterator();
    ArrayList reportSet = new ArrayList();
    while(files.hasNext()) {
      String file = (String)files.next();
      ArrayList ids = identMap.getArrayList(file);
      for(int i = 0; i < ids.size(); i++) {
        String id = (String)ids.get(i);
        if(!classes.contains(id)) {
          // Ignore identifiers of length 3 or
          // longer that are all uppercase
          // (probably static final values):
          if(id.length() >= 3 &&
             id.equals(
               id.toUpperCase()))
            continue;
          // Check to see if first char is upper:
          if(Character.isUpperCase(id.charAt(0))){
            if(reportSet.indexOf(file + id)
                == -1){ // Not reported yet
              reportSet.add(file + id);
              System.out.println(
                "Ident capitalization error in:"
                + file + ", ident: " + id);
            }
          }
        }
      }
    }
  }
  static final String usage =
    "Usage: \n" + 
    "ClassScanner classnames -a\n" +
    "\tAdds all the class names in this \n" +
    "\tdirectory to the repository file \n" +
    "\tcalled 'classnames'\n" +
    "ClassScanner classnames\n" +
    "\tChecks all the java files in this \n" +
    "\tdirectory for capitalization errors, \n" +
    "\tusing the repository file 'classnames'";
  private static void usage() {
    System.err.println(usage);
    System.exit(1);
  }
  public static void main(String[] args) {
    if(args.length < 1 || args.length > 2)
      usage();
    ClassScanner c = new ClassScanner();
    File old = new File(args[0]);
    if(old.exists()) {
      try {
        // Try to open an existing 
        // properties file:
        InputStream oldlist =
          new BufferedInputStream(
            new FileInputStream(old));
        c.classes.load(oldlist);
        oldlist.close();
      } catch(IOException e) {
        System.err.println("Could not open "
          + old + " for reading");
        System.exit(1);
      }
    }
    if(args.length == 1) {
      c.checkClassNames();
      c.checkIdentNames();
    }
    // Write the class names to a repository:
    if(args.length == 2) {
      if(!args[1].equals("-a"))
        usage();
      try {
        BufferedOutputStream out =
          new BufferedOutputStream(
            new FileOutputStream(args[0]));
        c.classes.save(out,
          "Classes found by ClassScanner.java");
        out.close();
      } catch(IOException e) {
        System.err.println(
          "Could not write " + args[0]);
        System.exit(1);
      }
    }
  }
}

class JavaFilter implements FilenameFilter {
  public boolean accept(File dir, String name) {
    // Strip path information:
    String f = new File(name).getName();
    return f.trim().endsWith(".java");
  }
} ///:~

The class MultiStringMap is a tool that allows you to map a group of strings onto each key entry. As in the previous example, it uses a HashMap (this time with inheritance) with the key as the single string that’s mapped onto the ArrayList value. The add( ) method simply checks to see if there’s a key already in the HashMap, and if not it puts one there. The getArrayList( ) method produces an ArrayList for a particular key, and printValues( ), which is primarily useful for debugging, prints out all the values ArrayList by ArrayList.

To keep life simple, the class names from the standard Java libraries are all put into a Properties object (from the standard Java library). Remember that a Properties object is a HashMap that holds only String objects for both the key and value entries. However, it can be saved to disk and restored from disk in one method call, so it’s ideal for the repository of names. Actually, we need only a list of names, and a HashMap can’t accept null for either its key or its value entry. So the same object will be used for both the key and the value.

For the classes and identifiers that are discovered for the files in a particular directory, two MultiStringMaps are used: classMap and identMap. Also, when the program starts up it loads the standard class name repository into the Properties object called classes, and when a new class name is found in the local directory that is also added to classes as well as to classMap. This way, classMap can be used to step through all the classes in the local directory, and classes can be used to see if the current token is a class name (which indicates a definition of an object or method is beginning, so grab the next tokens—until a semicolon—and put them into identMap).

The default constructor for ClassScanner creates a list of file names (using the JavaFilter implementation of FilenameFilter, as described in Chapter 11). Then it calls scanListing( ) for each file name.

Inside scanListing( ) the source code file is opened and turned into a StreamTokenizer. In the documentation, passing true to slashStarComments( ) and slashSlashComments( ) is supposed to strip those comments out, but this seems to be a bit flawed (it doesn’t quite work in Java 1.0). Instead, those lines are commented out and the comments are extracted by another method. To do this, the ‘/’ must be captured as an ordinary character rather than letting the StreamTokenizer absorb it as part of a comment, and the ordinaryChar( ) method tells the StreamTokenizer to do this. This is also true for dots (‘.’), since we want to have the method calls pulled apart into individual identifiers. However, the underscore, which is ordinarily treated by StreamTokenizer as an individual character, should be left as part of identifiers since it appears in such static final values as TT_EOF etc., used in this very program. The wordChars( ) method takes a range of characters you want to add to those that are left inside a token that is being parsed as a word. Finally, when parsing for one-line comments or discarding a line we need to know when an end-of-line occurs, so by calling eolIsSignificant(true) the eol will show up rather than being absorbed by the StreamTokenizer.

The rest of scanListing( ) reads and reacts to tokens until the end of the file, signified when nextToken( ) returns the final static value StreamTokenizer.TT_EOF.

If the token is a / it is potentially a comment, so eatComments( ) is called to deal with it. The only other situation we’re interested in here is if it’s a word, of which there are some special cases.

If the word is class or interface then the next token represents a class or interface name, and it is put into classes and classMap. If the word is import or package, then we don’t want the rest of the line. Anything else must be an identifier (which we’re interested in) or a keyword (which we’re not, but they’re all lowercase anyway so it won’t spoil things to put those in). These are added to identMap.

The discardLine( ) method is a simple tool that looks for the end of a line. Note that any time you get a new token, you must check for the end of the file.

The eatComments( ) method is called whenever a forward slash is encountered in the main parsing loop. However, that doesn’t necessarily mean a comment has been found, so the next token must be extracted to see if it’s another forward slash (in which case the line is discarded) or an asterisk. But if it’s neither of those, it means the token you’ve just pulled out is needed back in the main parsing loop! Fortunately, the pushBack( ) method allows you to “push back” the current token onto the input stream so that when the main parsing loop calls nextToken( ) it will get the one you just pushed back.

For convenience, the classNames( ) method produces an array of all the names in the classes container. This method is not used in the program but is helpful for debugging.

The next two methods are the ones in which the actual checking takes place. In checkClassNames( ), the class names are extracted from the classMap (which, remember, contains only the names in this directory, organized by file name so the file name can be printed along with the errant class name). This is accomplished by pulling each associated ArrayList and stepping through that, looking to see if the first character is lowercase. If so, the appropriate error message is printed.

In checkIdentNames( ), a similar approach is taken: each identifier name is extracted from identMap. If the name is not in the classes list, it’s assumed to be an identifier or keyword. A special case is checked: if the identifier length is 3 or more and all the characters are uppercase, this identifier is ignored because it’s probably a static final value such as TT_EOF. Of course, this is not a perfect algorithm, but it assumes that you’ll eventually notice any all-uppercase identifiers that are out of place.

Instead of reporting every identifier that starts with an uppercase character, this method keeps track of which ones have already been reported in an ArrayList called reportSet( ). This treats the ArrayList as a “set” that tells you whether an item is already in the set. The item is produced by concatenating the file name and identifier. If the element isn’t in the set, it’s added and then the report is made.

The rest of the listing is comprised of main( ), which busies itself by handling the command line arguments and figuring out whether you’re building a repository of class names from the standard Java library or checking the validity of code you’ve written. In both cases it makes a ClassScanner object.

Whether you’re building a repository or using one, you must try to open the existing repository. By making a File object and testing for existence, you can decide whether to open the file and load( ) the Properties list classes inside ClassScanner. (The classes from the repository add to, rather than overwrite, the classes found by the ClassScanner constructor.) If you provide only one command-line argument it means that you want to perform a check of the class names and identifier names, but if you provide two arguments (the second being “-a”) you’re building a class name repository. In this case, an output file is opened and the method Properties.save( ) is used to write the list into a file, along with a string that provides header file information.

Summary

The Java IO stream library does seem to satisfy the basic requirements: you can perform reading and writing with the console, a file, a block of memory, or even across the Internet (as you will see in Chapter 15). It’s possible (by inheriting from InputStream and OutputStream) to create new types of input and output objects. And you can even add a simple extensibility to the kinds of objects a stream will accept by redefining the toString( ) method that’s automatically called when you pass an object to a method that’s expecting a String (Java’s limited “automatic type conversion”).

There are questions left unanswered by the documentation and design of the IO stream library. For example, it would have been nice if you could say that you want an exception thrown if you try to overwrite a file when opening it for output—some programming systems allow you to specify that you want to open an output file, but only if it doesn’t already exist. In Java, it appears that you are supposed to use a File object to determine whether a file exists, because if you open it as an FileOutputStream or FileWriter it will always get overwritten. By representing both files and directory paths, the File class also suggests poor design by violating the maxim “Don’t try to do too much in a single class.”

The IO stream library brings up mixed feelings. It does much of the job and it’s portable. But if you don’t already understand the decorator pattern, the design is non-intuitive, so there’s extra overhead in learning and teaching it. It’s also incomplete: there’s no support for the kind of output formatting that almost every other language’s IO package supports. (This was not remedied in Java 1.1, which missed the opportunity to change the library design completely, and instead added even more special cases and complexity.) The Java 1.1 changes to the IO library haven’t been replacements, but rather additions, and it seems that the library designers couldn’t quite get straight which features are deprecated and which are preferred, resulting in annoying deprecation messages that show up the contradictions in the library design.

However, once you do understand the decorator pattern and begin using the library in situations that require its flexibility, you can begin to benefit from this design, at which point its cost in extra lines of code may not bother you as much.

Exercises

  1. Open a text file so that you can read the file one line at a time. Read each line as a String and place that String object into a LinkedList. Print all of the lines in the LinkedList in reverse order.
  2. Modify Exercise 1 so that the name of the file you read is provided as a command-line argument.
  3. Modify Exercise 2 to also open a text file so you can write text into it. Write the lines in the ArrayList, along with line numbers, out to the file.
  4. Modify Exercise 2 to force all the lines in the ArrayList to upper case and send the results to System.out.
  5. Modify Exercise 2 to take additional arguments of words to find in the file. Print any lines in which the words match.
  6. Create a class called SortedDirList with a constructor that takes file path information and builds a sorted directory list from the files at that path. Create two overloaded list( ) methods that will either produce the whole list or a subset of the list based on an argument. Add a size( ) method that takes a file name and produces the size of that file.
  7. In Blips.java, copy the file and rename it to BlipCheck.java and rename the class Blip2 to BlipCheck (making it public and removing the public scope from the class Blips in the process). Remove the //! marks in the file and execute the program including the offending lines. Next, comment out the default constructor for BlipCheck. Run it and explain why it works. Note that after compiling, you must execute the program with ‘java Blips’ because the main( ) method is still in class Blips.
  8. In Blip3.java, comment out the two lines after the phrases “You must do this:” and run the program. Explain the result and why it differs from when the two lines are in the program.
  9. Modify the SortedWordCount.java program so that it produces an alphabetic sort instead, using the tool from Chapter 9.
  10. Modify SortedWordCount.java so that it uses a class containing a String and a count value to store each different word, and a Set of these objects to maintain the list of words.
  11. Convert the SortedWordCount.java program to use the Java 1.1 IO Streams.
  12. Repair the program CADState.java as described in the text.
  13. (Intermediate) In Chapter 8, locate the GreenhouseControls.java example, which consists of three files. In GreenhouseControls.java, the Restart( ) inner class has a hard-coded set of events. Change the program so that it reads the events and their relative times from a text file. (Challenging: Use a factory method from to build the events—see Thinking in Patterns with Java, downloadable at www.BruceEckel.com.)

[56] Design Patterns, Erich Gamma et al., Addison-Wesley 1995.

[57] Perhaps by the time you read this, the bug will be fixed.

[ Previous Chapter ] [ Short TOC ] [ Table of Contents ] [ Index ] [ Next Chapter ]
Last Update:03/13/2000