This is the first in a series of articles that will compare ODF and in particular the OpenOffice implementation and Microsoft Office and its various data formats with respect to various measures. This article will cover the efficiency of the .odt, .doc and .xml formats, with particular interest to native and compressible file sizes.
My windows test cases were generated using the following software:
- Microsoft Windows XP Professional 2002, SP2
- Microsoft Word 2003 (11.6368.6368) SP2
- OpenOffice 2.0.3
- Adobe Acrobat Standard 7.0.8 5/16/2006.
My Linux test cases where produced with the following software:
- SuSE Linux 10.1
- OpenOffice 22.214.171.124.1
- Adobe Reader 7.0.8 05/22/2006
I needed a fairly large chunk of text for my test, I decided on the November draft of the ISO/IEC C Standard, located at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1905.pdf (copy here). This is a significantly large document, and I decided only to use the first seven chapters for my test case. In order to produce the target documents, I selected the contents from the beginning of the document through chapter 7, and copied this to the clipboard. I then pasted the clipboard into native versions of Microsoft Word under Windows and OpenOffice Writer under both Windows and Linux. For Microsoft Word, I saved the document as a native .doc and .xml. For OpenOffice, I saved the document as native .odt and exported it as .doc. I also saved the content as .txt with Notepad under Windows as a reference point. For archival purposes, I have mirrored all documents referred to in this article on the 3monkey wiki download area.
|Microsoft Office .doc||921,088|
|Microsoft Office .xml||6,475,669|
|OpenOffice (XP) .odt||154,892|
|OpenOffice (XP) .doc||1,335,296|
|OpenOffice (Linux) .odt||160,045|
|OpenOffice (Linux) .doc||1,338,368|
My first observation was the Linux OpenOffice implementation created slightly larger file sizes than the Windows implementation. This was probably due to the differing versions. I will revisit this in a later article if it is merited.
My next observation was that the OpenOffice .doc file was significantly larger than the Microsoft Word version. This is likely due to Microsoft’s access to the complete .doc specification, and thus a better understanding of how to optimize the file content and size. For grins, I loaded the OpenOffice .doc with Microsoft Word and saved it naively. I also loaded the Microsoft Word .doc with OpenOffice and saved it both as a .doc and .odt. The results of these test are below.
|OO .doc loaded/saved in MS||808,960|
|MS .doc loaded/saved in OO||1,277,952|
|MS .doc loaded/saved as .odt in OO||155,113|
|File Type||Original Size||Compressed Size|
From this limited data sample, I have to declare OpenOffice Writer the champion of round one. Perhaps if Microsoft Word employed a compressed output form the outcome may have been different. It is actually a little strange that OpenOffice which is based on a pure text format (XML) is compressed into a binary zip file and that Microsoft Word, which is a proprietary binary format is not.
What Is Up Next?
For the most part these test cases did not contain much formatting or style information, nor did it consider such elements as tables and graphs. I will investigate how these effect the efficiency in a latter article. But before I do that, I will need to expose more of how ODF works. Therefore, the next few articles in this series will be a primer for the ODF specification.
Until next time…