Saturday, June 26, 2010

Java IO - Busting Buffered Streams Myth

Java IO is at the core of Java API and every java developer should be aware of how the Input/Output streams work. Streams support many different kinds of data, including simple bytes, primitive data types, localized characters, and objects. Some streams simply pass on data; others manipulate and transform the data in useful ways.

At the core of Java I/O are InputStream and OutputStream which are abstract classes and very basic implementations are FileInputStream and FileOutputStream that are used to read and write files respectively. But Java community recommends using Buffered versions of these streams such as BufferedInputStream and BufferedOutputStream due to performance benefits. This is due to the reason that internally the buffered streams create an default buffer of 8192 bytes and do not flush the stream unless this number is reached. But off late while writing a small java program to copy some movies on the file system I faced serious performance problems using Buffered streams that made me look into the source code of these classes and surprisingly the results were unexpected. Like other developers I have been just using the Buffered Streams without question but found that plain Input and Output streams were performing better in comparison.

But why is that so....the reason is synchronization. Yes, astonishingly the Java Doc of the BufferedInputStream and BufferedOutputStream does not mention that the read and write methods are synchronized. Hence the cost of acquiring and releasing the locks is such that plain streams are performing at least 3 times better than Buffered Streams.

Here are the two programs to test the performance of the streams, you can try them out and look at the difference.

Buffered: Average - 170000 ms

package work.filemaker;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Calendar;

public class BufferedReaderTest {

/**
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {

long startMilliSec = Calendar.getInstance().getTimeInMillis();

InputStream inputStream = new BufferedInputStream(new FileInputStream(new File("D:/3idiots.avi")));

BufferedOutputStream os = new BufferedOutputStream(new FileOutputStream("D:/copy3idiots.avi"));
int c;
while(( c = inputStream.read()) != -1) {
byte b = (byte) c;
os.write(b);
}

os.close();

long endMilliSec = Calendar.getInstance().getTimeInMillis();

System.out.println(endMilliSec - startMilliSec);

}

}

Plain Streams: Average - 52000 ms

package work.filemaker;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Calendar;

public class FileReaderTest {

/**
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {

long startMilliSec = Calendar.getInstance().getTimeInMillis();

InputStream inputStream = new FileInputStream(new File("D:/3idiots.avi"));

OutputStream os = new FileOutputStream("D:/copy3idiots.avi");
byte[] b = new byte[8192];
int c;
while(( c = inputStream.read(b)) != -1) {

os.write(b);
}

os.close();

long endMilliSec = Calendar.getInstance().getTimeInMillis();

System.out.println(endMilliSec - startMilliSec);

}

}

Hence think twice before using Buffered streams as these doesn't seem to improve the performance. One thing that is questionable is why are read and write methods in Buffered Streams synchronized? Conclusively,I would prefere to create my own buffers as specified in example above and use the Plain IO streams unless somebody can convince me to do otherwise.

User comments are most welcome....