Wednesday, November 9, 2011

Java file flushing performance

There are many situations when it is required to ensure that data was written to the disk and write is also required to be fast. The most most common where it has to happen are databases, journalling, etc. Also, it is often required to update some random position in a file. I specifically what to place emphasis on random access here, as far as the rest will cover just cases where it is supported, i.e. I'm not going to mention OutputStream.flush() & related topics. Just haven't tried it, as far as that wasn't my case at the moment.

There are several way of flushing the data to disk in java. These options can be quite different in the way they implemented internally and in their performance. Here is the list of existing things you can do:
  • FileChannel.force()
  • 'rws' or 'rwd' mode of RandomAccessFile, which 'works much like the force(boolean) method of the FileChannel class' (from javadoc).
  • MappedbyteBuffer.force()
  • RandomAccessFile.getFD().sync()
  • any close() method. Here I mean seek and close stream each time when access is required. Doing tests, I actually didn't seek, as far was updating data with zero offset.
Surprisingly (the only unsurprising exception is close()) all these methods gives very different performance and it varies almost randomly on different OSes and file systems. Worth noticing that hardware can also put it's correction on the performance of any of these methods. I have also a strong feeling that performance may vary even with minor change in OS or JVM version number. Here is the table with time it takes to flush 8bytes (keep in mind, that the real amount of flushed data depends on the size of caches and going to be much more that 8byes), just to give a flavour of how different is that:

RandomAccessFile.
getFD().sync()
RandomAccessFile, rwd mode MappedbyteBuffer.force()FileChannel.force()
Windows 0.2818ms0.0125ms 0.007ms 0.139ms
Linux 0.5354ms 0.5144ms 0.4663ms 0.0093ms

Please, do not treat these numbers as any relevant result. They are here just to give an example how these things can vary.

So, what the conclusion? Conclusion is that if you would need to write high-performance application which does lots of IO, you really need to test different approached on different OSes, on different file systems and, preferably  on different JVMs. Do not expect something to be fast on Linux (Solaris, AIX, etc) production box, when it is fast on your Windows (Linux, etc) workstation and vice versa. As can be seen, the difference can be in orders of magnitude.

1 comment:

Cd MaN said...

But are all these methods equivalent? As in: when they return, can the machine crash in the next moment and (not counting HDDs lying to the OS and stuff like that) still be sure that the data is on the disk on restart? It seems to me that if they were, there shouldn't be such a big difference (especially the MappedbyteBuffer.force under Windows looks very fishy...)