OpenOffice ODF/.odt compared to Microsoft Word .doc
Overview
This is the first in a series of articles that will compare ODF and in particular the OpenOffice implementation and Microsoft Office and its various data formats with respect to various measures. This article will cover the efficiency of the .odt, .doc and .xml formats, with particular interest to native and compressible file sizes.
Methodology
My windows test cases were generated using the following software:
- Microsoft Windows XP Professional 2002, SP2
- Microsoft Word 2003 (11.6368.6368) SP2
- OpenOffice 2.0.3
- Adobe Acrobat Standard 7.0.8 5/16/2006.
My Linux test cases where produced with the following software:
- SuSE Linux 10.1
- OpenOffice 2.0.2.7.1
- Adobe Reader 7.0.8 05/22/2006
I needed a fairly large chunk of text for my test, I decided on the November draft of the ISO/IEC C Standard, located at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1905.pdf (copy here). This is a significantly large document, and I decided only to use the first seven chapters for my test case. In order to produce the target documents, I selected the contents from the beginning of the document through chapter 7, and copied this to the clipboard. I then pasted the clipboard into native versions of Microsoft Word under Windows and OpenOffice Writer under both Windows and Linux. For Microsoft Word, I saved the document as a native .doc and .xml. For OpenOffice, I saved the document as native .odt and exported it as .doc. I also saved the content as .txt with Notepad under Windows as a reference point. For archival purposes, I have mirrored all documents referred to in this article on the 3monkey wiki download area.
Raw Results
| File | Size |
|---|---|
| Microsoft Office .doc | 921,088 |
| Microsoft Office .xml | 6,475,669 |
| OpenOffice (XP) .odt | 154,892 |
| OpenOffice (XP) .doc | 1,335,296 |
| OpenOffice (Linux) .odt | 160,045 |
| OpenOffice (Linux) .doc | 1,338,368 |
| Notepad | 417,549 |
Observations
My first observation was the Linux OpenOffice implementation created slightly larger file sizes than the Windows implementation. This was probably due to the differing versions. I will revisit this in a later article if it is merited.
My next observation was that the OpenOffice .doc file was significantly larger than the Microsoft Word version. This is likely due to Microsoft’s access to the complete .doc specification, and thus a better understanding of how to optimize the file content and size. For grins, I loaded the OpenOffice .doc with Microsoft Word and saved it naively. I also loaded the Microsoft Word .doc with OpenOffice and saved it both as a .doc and .odt. The results of these test are below.
| File | Size |
|---|---|
| OO .doc loaded/saved in MS | 808,960 |
| MS .doc loaded/saved in OO | 1,277,952 |
| MS .doc loaded/saved as .odt in OO | 155,113 |
| File Type | Original Size | Compressed Size |
|---|---|---|
| .doc | 921,088 | 179,648 |
| .xml | 6,475,669 | 228,497 |
| .odt | 154,892 | 153,456 |
| .txt | 417,549 | 104,236 |
Conclusion
From this limited data sample, I have to declare OpenOffice Writer the champion of round one. Perhaps if Microsoft Word employed a compressed output form the outcome may have been different. It is actually a little strange that OpenOffice which is based on a pure text format (XML) is compressed into a binary zip file and that Microsoft Word, which is a proprietary binary format is not.
What Is Up Next?
For the most part these test cases did not contain much formatting or style information, nor did it consider such elements as tables and graphs. I will investigate how these effect the efficiency in a latter article. But before I do that, I will need to expose more of how ODF works. Therefore, the next few articles in this series will be a primer for the ODF specification.
Until next time…
-3Monkeys
Popularity: 100% [?]
















December 29th, 2006 at 1:56 pm
OpenOffice odt versus Microsoft doc…
Comparativa en ingles entre los formatos ODT y DOC….
January 12th, 2007 at 10:20 pm
[...] In the first article in this series, OpenOffice ODF/.odt compared to Microsoft Word .doc, I compared various file types for size efficiency. Of particular interest was the fact that OpenOffice Write stores .odts in a zip format, an implementation of PKZip to be exact. With this knowledge and the Open Document Format standard, we can investigate how certain elements of a document effect its size and overall efficiency. [...]
July 13th, 2007 at 3:40 pm
In all fairness, you should probably be comparing ODF against Microsoft’s new XML format, which is less apples-to-oranges.
August 14th, 2007 at 1:37 am
Which is better shouldn’t be judged by size.
August 15th, 2007 at 6:51 am
Krall…
Useful, thank you!…
January 16th, 2008 at 12:59 am
I think you meant “regardless” in the third paragraph under “Observations.” Irregardless means “with regard.”
January 17th, 2008 at 2:11 am
[...] Payla?mam gerekti?ini dü?ünüyorum. Ama telif hakk? meselesi nedeniyle özür dileyerek ingilizcesini veriyorum. Tablolar? bile inceleyip anlaman?z yeterli (S?ras?yla Dosya Tipleri ve [...]
May 27th, 2008 at 10:48 pm
Kevin: I cannot find a single dictionary that says irregardless means with regard. My print dictionary says see: Regardless. Meriam Webster online says “nonstandard : regardless”. All of these definitions: http://dictionary.reference.com/browse/irregardless say that is regardless.
It is nonstandard prehaps, but it does oddly mean regardless nonetheless.
September 12th, 2008 at 8:23 pm
I discovered this with a file that saved to 300k or thereabouts in ODT, and almost 900 in TXT.
Mind boggling.
December 7th, 2008 at 10:28 am
The origin of irregardless is not known for certain, but the consensus among references is that it is a blend of irrespective and regardless, both of which are commonly accepted standard English words. By blending these words, an illogical word is created. “Since the prefix ir- means ‘not’ (as it does with irrespective), and the suffix -less means ‘without,’ irregardless is a double negative.”[1]
thats from wiki.
Basically irregardless is a bastardization of the english language
derived from a lazy crunch of two words when one isnt sure which to use.
and according to my father lol, its like nails on a chalkboard.
February 14th, 2009 at 4:08 am
LOL! That comment about irregardless reminds me of when my wife used to say “could you refrain from not doing that?” So I comply. I keep right on doing it. It used to really get her mad just because I listened to her! She hasn’t done that in a long time so I guess she got the hint.
January 23rd, 2010 at 2:53 pm
Thanks for the information!
I too would not necessarily say that OpenOffice wins round one. Compressing the documents is obviously not a priority for Microsoft and I think for good reason. I use OpenOffice almost exclusively but I have notice significantly longer load times. I did not understand why until I read your article. If anyone is emailing either doc or odt they’re idiots. Fortunately, OO makes it very easy to generate a pdf! Now, is there a way to tell OO not to compress the odt? I would guess not.