OpenOffice .odt Opened Up - Part 1: Overview
Overview
In the first article in this series, OpenOffice ODF/.odt compared to Microsoft Word .doc, I compared various file types for size efficiency. Of particular interest was the fact that OpenOffice Write stores .odts in a zip format, an implementation of PKZip to be exact. With this knowledge and the Open Document Format standard, we can investigate how certain elements of a document effect its size and overall efficiency.
My test cases where produced with the following software:
- SuSE Linux 10.1
- OpenOffice 2.0.2.7.1
- zip 2.31 (March 8th 2005)
Starting Out
As we previously observed, .odt documents are stored in ZIP format. It is possible to store the document as a single XML file that conforms to the OpenOffice.org document type definition (DTD). It is also possible to store the document as several subdocuments, each with a different document root that represents a particular aspect of the document, such as, content or style.
Quoting the Open Document Format for Office Applications (OpenDocument) v1.0 (Second Edition), (ODF Specification):
The OpenDocument format supports the following two ways of document representation:
- As a single XML document.
- As a collection of several subdocuments within a package (see section 17), each of which stores part of the complete document. Each subdocument has a different document root and stores a particular aspect of the XML document. For example, one subdocument contains the style information and another subdocument contains the content of the document. All types of documents, for example, text and spreadsheet documents, use the same document and subdocuments definitions.
There are four types of subdocuments, each with different root elements. Additionally, the single XML document has its own root element, for a total of five different supported root elements. The root elements are summarized in the following table:
Root Element Subdocument Content Subdoc. Name in Package office:document Complete office document in a single XML document. n/a office:document-content Document content and automatic styles used in the content. content.xml office:document-styles Styles used in the document content and automatic styles used in the styles themselves. styles.xml office:document-meta Document meta information, such as the author or the time of the last save action. meta.xml office:document-settings Application-specific settings, such as the window size or printer information. settings.xml
So, what is in our reference .odt? We will use the Linux produced document from a prior article (oo_part1.odt) with XML compression disabled. We’ve done this so that the XML is more human readable. After we unzip the file using the Linux utility unzip, we have the raw files as shown below.

As you can see all four subdocuments as specified in the specification are present as well as several other files. In particular META-INF/manifest.xml list the contents of the package, including information such as full path and type.
The file Thumbnails/thumbnail.png although part of the package, is not part of the document. The thumbnail image should conform to the Thumbnail Managing Standard (TMS) at www.freedesktop.org, and therefore should be24bit, non-interlaced PNG image with full alpha transparency. The required size for the thumbnails is 128×128 pixel.
Here is the thumbnail from our reference document.
![]()
Having the thumbnail available in the package, allows other applications such as file managers to preview the document to the user. With a little creative programming, sites such as Google, Yahoo or Ask, could extract this thumbnail and preview the document for users, with little difficulty.
Document Elements
The office:document may contain any of the document elements listed below.
- office:document-attrs
- office:document-common-attrs
- office:meta
- office:settings
- office:scripts
- office:font-face-decls
- office:styles
- office:automatic-styles
- office:master-styles
- office:body
When the subdocument method is used however, elements are restricted to certain subdocuments.
Elements in content.xml
- office:document-content (subdocument root)
- office:document-common-attrs
- office:scripts
- office:font-face-decls
- office:automatic-styles
- office:body
Elements in styles.xml
- office:document-styles (subdocument root)
- office:document-attrs
- office:document-common-attrs
- office:font-face-decls
- office:styles
- office:automatic-styles
- office:master-styles
Elements in meta.xml
- office:document-meta (subdocument root)
- office:document-common-attrs
- office:meta
Elements in settings.xml
- office:document-settings (subdocument root)
- office:document-common-attrs
- office:settings
What’s Up Next?
At this point we have a clear understanding of the subdocument method that OpenOffice applies to its ODF implementation, and we know what top level elements are handled by each subdocument.
In the next article, we will ease into the subdocument elements by exploring the office:document-meta and office:document-settings elements. These two elements are rather simple and will not require as much review compared to office:document-content or office:document-styles.
Until next time.
-3Monkeys
Popularity: 24% [?]
















January 13th, 2007 at 6:33 am
[...] 12th, 2007 · No Comments 3Monkeys is doing a series where they will look inside of file formats to describe them. Thearticle seems quite interesting. This is a follow-on from an earlier comparison of file sizes. As you can see all four subdocuments as specified in the specification are present as well as several other files. In particular META-INF/manifest.xml list the contents of the package, including information such as full path and type. [...]
January 14th, 2007 at 5:42 am
[...] You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your ownsite. [...]
January 15th, 2007 at 3:55 pm
Comparao entre ODT e DOC…
Comparativo entre os formatos ODT do OpenOffice e DOC do Word….
January 15th, 2007 at 9:28 pm
Nice article that brings order to something I partially knew some years ago because a problem I had with an OO document. The problem was that I was working on a very important OO document and suddenly my PC rebooted without reason and the file get corrupted after that. Some friend told me that OO documents where just XML stored in ZIP format so I could easily fix that file by myself just editing the XML.
If you use OO for important files, read this article (and the first one). It can save your life.
Good work 3monkeys
November 2nd, 2007 at 11:48 am
[...] read more | digg story [...]