In the first article in this series, OpenOffice ODF/.odt compared to Microsoft Word .doc, I compared various file types for size efficiency. Of particular interest was the fact that OpenOffice Write stores .odts in a zip format, an implementation of PKZip to be exact. With this knowledge and the Open Document Format standard, we can investigate how certain elements of a document effect its size and overall efficiency.
My test cases where produced with the following software:
- SuSE Linux 10.1
- OpenOffice 220.127.116.11.1
- zip 2.31 (March 8th 2005)
As we previously observed, .odt documents are stored in ZIP format. It is possible to store the document as a single XML file that conforms to the OpenOffice.org document type definition (DTD). It is also possible to store the document as several subdocuments, each with a different document root that represents a particular aspect of the document, such as, content or style.
Quoting the Open Document Format for Office Applications (OpenDocument) v1.0 (Second Edition), (ODF Specification):
The OpenDocument format supports the following two ways of document representation:
- As a single XML document.
- As a collection of several subdocuments within a package (see section 17), each of which stores part of the complete document. Each subdocument has a different document root and stores a particular aspect of the XML document. For example, one subdocument contains the style information and another subdocument contains the content of the document. All types of documents, for example, text and spreadsheet documents, use the same document and subdocuments definitions.
There are four types of subdocuments, each with different root elements. Additionally, the single XML document has its own root element, for a total of five different supported root elements. The root elements are summarized in the following table:
Root Element Subdocument Content Subdoc. Name in Package office:document Complete office document in a single XML document. n/a office:document-content Document content and automatic styles used in the content. content.xml office:document-styles Styles used in the document content and automatic styles used in the styles themselves. styles.xml office:document-meta Document meta information, such as the author or the time of the last save action. meta.xml office:document-settings Application-specific settings, such as the window size or printer information. settings.xml
So, what is in our reference .odt? We will use the Linux produced document from a prior article (oo_part1.odt) with XML compression disabled. We’ve done this so that the XML is more human readable. After we unzip the file using the Linux utility unzip, we have the raw files as shown below
As you can see all four subdocuments as specified in the specification are present as well as several other files. In particular META-INF/manifest.xml list the contents of the package, including information such as full path and type.
The file Thumbnails/thumbnail.png although part of the package, is not part of the document. The thumbnail image should conform to the Thumbnail Managing Standard (TMS) at www.freedesktop.org, and therefore should be24bit, non-interlaced PNG image with full alpha transparency. The required size for the thumbnails is 128×128 pixel.
Here is the thumbnail from our reference document.
Having the thumbnail available in the package, allows other applications such as file managers to preview the document to the user. With a little creative programming, sites such as Google, Yahoo or Ask, could extract this thumbnail and preview the document for users, with little difficulty.
The office:document may contain any of the document elements listed below.
When the subdocument method is used however, elements are restricted to certain subdocuments.
Elements in content.xml
- office:document-content (subdocument root)
Elements in styles.xml
- office:document-styles (subdocument root)
Elements in meta.xml
- office:document-meta (subdocument root)
Elements in settings.xml
- office:document-settings (subdocument root)
What’s Up Next?
At this point we have a clear understanding of the subdocument method that OpenOffice applies to its ODF implementation, and we know what top level elements are handled by each subdocument.
In the next article, we will ease into the subdocument elements by exploring the office:document-meta and office:document-settings elements. These two elements are rather simple and will not require as much review compared to office:document-content or office:document-styles.
Until next time.