Archive for the 'Open Source' Category

Microsoft, Office Open XML and A Lie

Tuesday, January 23rd, 2007

A few day ago I posted an article, The Open XML Lie, I was misguided in that my arguments along with those or Rod Weir and Bob Sutor’s were so self evident that at minimum they would be understood. I was wrong, and I blame myself for not providing a clearer definition of the problem. First, I never really defined what was the “lie“. The “lie” is that OOXML, or Open XML is an open standard. An open standard is, as defined by Wikipedia.

Open standards are publicly available and implementable standards. By allowing anyone to obtain and implement the standard, they can increase compatibility between various hardware and software components, since anyone with the necessary technical know-how and resources can build products that work together with those of the other vendors that base their designs on the standard. Many technical specifications that are sometimes considered standards are proprietary rather than being open, and are only available under restrictive contract terms (if they can be obtained at all) from the organization that owns the copyright for the specification.

Notice that the definition refers to technical specifications being sometimes referred to as standards that are proprietary rather than open. One prominent example that comes to mind is the flash standard. These are not open standards put rather proprietary standards.

Where Open XML fails in this definition is the following, “anyone with the necessary technical know-how and resources can build products that work together with those of the other vendors that base their designs on the standard”. The problem exist in the fact that Microsoft is in a unique position to understand and implement such elements of the standard as “2.15.3.6 autoSpaceLikeWord95 (Emulate Word 95 Full-Width Character Spacing)“. This is akin to asking Bob Uecker to hit a baseball like Babe Ruth. No matter how much Uecker tries, only Ruth could hit like Ruth.

Rick Jelliffe recently received an offer to edit Wikipedia entries for pay by Microsoft. From his post:

Just scanning quickly the Wikipedia entry for OOXML, I see one example straight away: The OOXML specification requires conforming implementations to accept and understand various legacy office applications. But the conformance section to the ISO standard (which is only about page four) specifies conformance in terms of being able to accept the grammar, use the standard semantics for the bits you implement, and document where you do something different. The bits you don’t implement are no-one’s business.

While technically Mr. Jelliffe is correct, any competent organization would, and should, strive to fully implement a standard. While the specification does not require an understanding of various legacy office applications, it certainly limits the number of organizations that could fully implement the specification to exactly one — Microsoft.

ODF proponents are responding to these claims. For contradictions and objections, Grocdoc is hosting the EOOXML Objections. It also seems that Mr. Jelliffe’s collegues at O’Reilly are supporting ODF over EOOXML. Jean Hollis Weber writes:

Do we need two standards? I think not, and many people (with a lot more technical knowledge than I have) also think not.

Do we need two standards? No, competing (open) standards offer nothing to the consumer, and are simply an extra headache for developers. What is Microsoft’s motivation behind EOOXML? Why would they not adopt a community support standard such as ODF? One thought is that by adopting ODF, Microsoft would lose sales of its Office Suite applications. However, if they are successful in standardizing their own conceived format, then they can retain sales and lock in users.

Groklaw is questioning Microsoft’s motivation as well. In the article, Searching for Openness in Microsoft’s OOXML and Finding Contradictions, discusses the Novell-Microsoft Deal and its affect on interoperability, Microsoft’s past and ongoing record, and provides the details of the ISO standardization process.

See? “Only Novell” can do this. That isn’t interoperability, in the sense that you’d expect from a standard, is it? It’s just another Microsoft partner, maintaining the Microsoft unwillingness to share technical information with real competitors, to my eyes. Why would you even need the interoperability work between Novell and Microsoft, if Microsoft planned to offer a standard the whole world could use equally? Isn’t that what a standard is supposed to mean?

Perhaps the most telling statement against EOOXML being a truly open standard comes from the specification itself. Page 10 states the goal of the EOOXML standard.

The goal is to enable the implementation of the Office Open XML formats by the widest set of tools and platforms, fostering interoperability across office productivity applications and line-of-business systems, as well as to support and strengthen document archival and preservation, all in a way that is fully compatible with the large existing investments in Microsoft Office documents.

The ISO’s deadline for objections and contradictions is February 5th; I’m certain more will be reported as the deadline approaches.
Until next time-

-3Monkeys

Popularity: 10% [?]

  • DZone
  • StumbleUpon
  • Technorati
  • del.icio.us
  • Slashdot
  • Digg
  • Reddit
  • NewsVine
  • SphereIt
  • e-mail
  • Facebook
  • Google Bookmarks
  • Live
  • Propeller

The Open XML Lie

Wednesday, January 17th, 2007

Rob Weir recently posted “How to hire Guillaume Portes“, which appeared on Slashdot, both of which both are great resources for additional comments and debate. The basic premise of Rob’s article was that the Microsoft Open XML Specification was similar to creating a job description that would allow for only one qualified respondent. Such a job description might read as follows:

  • 5 years experience with Java, J2EE and web development, PHP, XSLT
  • Fluency in French and Corsican
  • Experience with the Llama farming industry
  • Mole on left shoulder
  • Sister named Bridgette

While perhaps a little extreme, he continues to show that indeed the Open XML Specification is indeed written to accommodate Microsoft products. I will not bore you with all of his examples, but here are a few are worth inspection.

2.15.3.6 autoSpaceLikeWord95 (Emulate Word 95 Full-Width Character Spacing)

This element specifies that applications shall emulate the behavior of a previously existing word processing application (Microsoft Word 95) when determining the spacing between full-width East Asian characters in a document’s content.

[Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]

and

2.15.3.51 suppressTopSpacingWP (Emulate WordPerfect 5.x Line Spacing)

This element specifies that applications shall emulate the behavior of a previously existing word processing application (WordPerfect 5.x) when determining the resulting spacing between lines in a paragraph using the spacing element (§2.3.1.33). This emulation typically results in line spacing which is reduced from its normal size.

[Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]

This gluttony is further illustrated by the shear complexity of the specification. As many 3Monkey readers know, I’m conducting a series of articles comparing the ODT and DOC formats. With Microsoft Office due to hit consumer shelves at the end of January, I thought I would get a jump on things and download the OOXML specification to get a jump on things. To my surprise the Open XML specification comes in 5 different PDF files with an 6 accompanying electronic annexes in excess of 43 megabytes. For comparison the ODF specification is a single 11 megabyte PDF, with 3 separate XML schemas. The ODF specification weighs in at a mere 722 pages, where as, the largest PDF in the Open XML specification is 5219 pages long.

While I have to wonder at Microsoft’s motivation for producing the Open XML standard, I do not have to guess at the motivation for ODF. Started as early as 1999, ODF was designed as an open and implementation neutral file format. The open specification process started in 2000 with the foundation of the OpenOffice.org open-source project. An even higher level of openness was established in 2002 with the creation of the OASIS Open Office Technical Committee (TC). ODF had gained full adoption with it’s early adopter including OpenOffice.org 1.0 and StarOffice 6 being introduced in May of 2002 and KOffice adoption of the ODF format in August of 2003.

IBM has provided the one voice of reason in this travesty. IBM voted against the certification of Microsoft Office document formats (Open XML) as an international standard at a general assembly of Ecma International in early December 2006. Bob Sutor, IBM’s vice president of standards and open source, confirms Mr. Weir’s sentiment that the ODF standard is of superior quality, versus Open XML which he considers to be “a vendor-dictated spec that documents proprietary products via XML“.

Open XML has been submitted to the ISO for standardization. I encourage each and every reader to oppose this standardization effort. Further details will be outlined on this blog as they become available.

Until next time-

-3Monkeys

Popularity: 15% [?]

  • DZone
  • StumbleUpon
  • Technorati
  • del.icio.us
  • Slashdot
  • Digg
  • Reddit
  • NewsVine
  • SphereIt
  • e-mail
  • Facebook
  • Google Bookmarks
  • Live
  • Propeller

OpenOffice .odt Opened Up – Part 2: Meta and Settings

Monday, January 15th, 2007

Overview

In my last article, OpenOffice .odt Opened Up – Part 1: Overview, I discussed the overall package scheme for ODT documents, and pointed out that OpenOffice uses the subdocument form. In this article, I will be taking a closer look at two of simpler top level subdocuments of the four included in the specification. Specifically, we will be taking a closer look at the office:document-meta and office:document-settings elements.

As before, my test cases where produced with the following software:

  • SuSE Linux 10.1
  • OpenOffice 2.0.2.7.1
  • zip 2.31 (March 8th 2005)

The original source document can be downloaded here oo_part1.odt, and in particular the two subdocuments under observation can be downloaded here meta.xml and settings.xml.

The office:document-meta element

The office:document-meta element provides metadata with respect to the document, such as, author, creation time and editing time, among other data. The metadata elements can be either pre-defined or user defined. Pre-defined elements should be respected and updated by the editing application. User defined elements provides a more generic way of storing and using metadata. Each user defined metadata element is compossed of a name, a type and a value. Supporting applications can access this information and display it to the user based on its type. Both pre-defined and user defined should be able to be referenced through appropriate document text fields.

The pre-defined metadata elements are largely based upon the metadata standards developed by the Dublin Core Metadata Initiative (http://www.dublincore.org), thus many of the elements use the dc namespace.

There are 18 pre-defined metadata elements, these are listed below:

  • meta:generator
  • dc:title
  • dc:description
  • dc:subject
  • dc:keyword – Can appear multiple times
  • meta:initial-creator
  • dc:creator – Last modifier
  • meta:printed-by
  • meta:creation-date – Format YYYY-MM-DDThh:mm:ss
  • dc:date – Last modification date, format YYYY-MM-DDThh:mm:ss
  • meta:print-date
  • meta:template
  • meta:auto-reload
  • meta-hyperlink-behaviour
  • dc:language – As defined by RFC3066, with ISO 639 language code and ISO 3166 country code
  • meta:editing-cycles
  • meta:editing-duration – Format PnYnMnDTnHnMnS
  • meta:document-statistic – Can appear multiple times, ODT attributes below
    • meta:page-count
    • meta:table-count
    • meta:draw-count
    • meta:image-count
    • meta:ole-object-count
    • meta:paragraph-count
    • meta:word-count
    • meta:character-count
    • meta:row-count
    • meta:frame-count
    • meta:sentence-count
    • meta:syllable-count
    • meta:non-whitespace-character-count

As I suggested regarding the thumbnail image in a prior article, this information could easily be extracted and displayed to users of popular search engines such as Google, Yahoo and Ask. Additionally these services could allow the user to narrow their search based on certain criteria found in the metadata.

Provided here, meta.pl, is an example written in perl using the XML::Simple package that extracts the last editor, modification date, and page and word count. If anyone would like to contribute ports of this to another language feel free. If there is significant interest, I will cover XML::Simple or other XML packages or utilities.

The office:document-settings element

Next we take a look at the office:document-settings element. This element contains application settings that may impact thedocument. It does not caontain a complete set of application settings. Being application settings, there are no particular entries that are defined ih the ODF Specification. A office:document-settings element will contain one or more config:config-item-set elements, these elements will in turn contain config:config-item, config:config-item-set, config:config-item-map-named or config:config-item-map-indexed. The discovery of how each of these elements works with a particular application, such as OpenOffice.org Writer, is left as an exersise to the reader since they do not directly affect our goals of understanding ODF as it relates to Microsoft’s .doc format. Suffice it to say, the office:document-settings element is of little interest to all but an application developer.

What’s up next?

Next up we will investigate the significantly more interesting office:document-styles element. We will also learn some optimization techniques that we can apply to this element and perhaps discover a little of how it relates to thee office:document-content element.

Until next time,

-3Monkeys

Popularity: 8% [?]

  • DZone
  • StumbleUpon
  • Technorati
  • del.icio.us
  • Slashdot
  • Digg
  • Reddit
  • NewsVine
  • SphereIt
  • e-mail
  • Facebook
  • Google Bookmarks
  • Live
  • Propeller