Archive for the 'XML' Category

OpenOffice .odt Opened Up – Part 3a: Styles/font-face-decls

Wednesday, April 4th, 2007

Overview

In my last article, OpenOffice .odt Opened Up – Part 2: Meta and Settings, I discussed two of the four top level subdocument elements, office:document-meta and office:document-settings. In this article, I will be taking a closer look at the office:document-styles element, in particular the office:font-face-decls sub-element. As before, my test cases where produced with the following software:

  • SuSE Linux 10.1
  • OpenOffice 2.0.2.7.1
  • zip 2.31 (March 8th 2005)

The Relax-NG schema language is used to define elements of the specification. The original source document can be downloaded here oo_part1.odt, and in particular the subdocument under observation can be downloaded here styles.xml.

The office:document-styles element

The office:document-styles root element contains all font face declarations, named styles, automatic styles and master styles need for the document.

office:document-styles schema

<define name="office-document-styles">
  <element name="office:document-styles">
    <ref name="office-document-common-attrs" />
    <ref name="office-font-face-decls" />
    <ref name="office-styles" />
    <ref name="office-automatic-styles" />
    <ref name="office-master-styles" />
  </element>
</define>

Next let us explore the office:font-face-decls sub-element.

The office:font-face-decls element

This element is actually duplicated in the top-level office:document-content element. A few simple test indicate that, if differences exist in the two sub-elements, complete element omissions in one are populated by the other, and where two elements differ in content the definition in office:document-styles takes precedence, though this behavior is not defined explicately in the specification.

The office:font-face-decls element consist of style:font-face elements. If you remember, we generated our test document by selecting text from a pdf and pasting that text into an .odt. This generated such style:font-face elements as follows:

<style:font-face style:name="EIDQUI+CMSLTT10"
                 svg:font-family="EIDQUI+CMSLTT10"/>

<style:font-face style:name="FFWLFJ+CMR10"
                 svg:font-family="FFWLFJ+CMR10"/>

<style:font-face style:name="GRVNVC+CMTT9"
                 svg:font-family="GRVNVC+CMTT9"/>

<style:font-face style:name="HJCZVV+CMTT8"
                 svg:font-family="HJCZVV+CMTT8"/>

<style:font-face style:name="Lucidasans1"
                 svg:font-family="Lucidasans"/>

With the exception of the last element, this looks pretty ugly. The following is a sample of style:font-face elements taken from a newly created document.

<style:font-face style:name="HG Mincho Light J"
                 svg:font-family="’HG Mincho Light J’"
                 style:font-pitch="variable"/>

<style:font-face style:name="Lucidasans"
                 svg:font-family="Lucidasans"
                 style:font-pitch="variable"/>

<style:font-face style:name="Thorndale AMT"
                 svg:font-family="’Thorndale AMT’"
                 style:font-family-generic="roman"
                 style:font-pitch="variable"/>

<style:font-face style:name="Albany AMT"
                 svg:font-family="’Albany AMT’"
                 style:font-family-generic="swiss" />

The reason for this is that OpenDocument font face declarations directly correspond to the @font-face font description of CSS2 and the <font-face> element of SVG, but have two extensions.

  1. OpenDocument font face declarations optionally may have an unique name. This name can be used inside styles as the value of the style:font-name attribute to immediately select a font face declaration. If a font face declaration is referenced this way, the steps described in CSS2 font matching algorithms for selecting a font declaration based on the font-family, font-style, font-variant, font-weight and font-size descriptors will not take place, but the referenced font face declaration is used directly.
  2. Some additional font descriptor attributes may exist.

Which basically means svg:font-family="EIDQUI+CMSLTT10" uses the SVG font matching algorithm and not the named font. SVG is beyond the scope of this article. Reference material for SVG font declarations can be found here.

Back to the bigger picture. The benefit we can observe from this, is that a predefined set of fonts can be applied to an .odt. By doing this we can ensure that documents contain a consistent set of fonts and eliminate potential redundancy or functional overlap. Care must be taken that if a style:font-face is replaced, that all style:font-name, style:font-name-complex and style:font-name-asian attributes are examined and replaced as well. While potential size gains are arguably minimal, gains in consistent look and output are immeasurable.

One option Open Office gives the user to tackle this issue is the font replacement option. Simply choose Tools -> Options then OpenOffice.org -> Fonts. You should see a dialog similar to the following:

Font Replacement Dialog

Click for full size image

The Open Office user can simply select which fonts to replace with which fonts on an Always or Screen only case. Though this is not always a complete solution. Amore complete solution will be provided in the final installment of OpenOffice .odt Opened Up – Part 3: Styles. I will provide an application that will indeed optimize all of the aspects of the office:document-style elements. Up next is the office:styles element.

Until next time,

-3monkeys

Microsoft Caught Trying to Change Wikipedia Entries

Wednesday, January 24th, 2007

Imagine my surprise when a story I happened to cover yesterday splashed up on my Google Homepage. Google News reports over 200 references to the story. In Microsoft, Office Open XML and A Lie, I reported on Mr. Jelliffe’s offer and blog entry. I’m pleased to see that the story is getting national top tier coverage. Here is some of the coverage:

Perhaps this will spark more debate on Microsoft’s motovation for the EOOXML standard.

Update: TechCrunch reported on this today, with more insight than the original AP story.

Until next time-

-3Monkeys

Microsoft, Office Open XML and A Lie

Tuesday, January 23rd, 2007

A few day ago I posted an article, The Open XML Lie, I was misguided in that my arguments along with those or Rod Weir and Bob Sutor’s were so self evident that at minimum they would be understood. I was wrong, and I blame myself for not providing a clearer definition of the problem. First, I never really defined what was the “lie“. The “lie” is that OOXML, or Open XML is an open standard. An open standard is, as defined by Wikipedia.

Open standards are publicly available and implementable standards. By allowing anyone to obtain and implement the standard, they can increase compatibility between various hardware and software components, since anyone with the necessary technical know-how and resources can build products that work together with those of the other vendors that base their designs on the standard. Many technical specifications that are sometimes considered standards are proprietary rather than being open, and are only available under restrictive contract terms (if they can be obtained at all) from the organization that owns the copyright for the specification.

Notice that the definition refers to technical specifications being sometimes referred to as standards that are proprietary rather than open. One prominent example that comes to mind is the flash standard. These are not open standards put rather proprietary standards.

Where Open XML fails in this definition is the following, “anyone with the necessary technical know-how and resources can build products that work together with those of the other vendors that base their designs on the standard”. The problem exist in the fact that Microsoft is in a unique position to understand and implement such elements of the standard as “2.15.3.6 autoSpaceLikeWord95 (Emulate Word 95 Full-Width Character Spacing)“. This is akin to asking Bob Uecker to hit a baseball like Babe Ruth. No matter how much Uecker tries, only Ruth could hit like Ruth.

Rick Jelliffe recently received an offer to edit Wikipedia entries for pay by Microsoft. From his post:

Just scanning quickly the Wikipedia entry for OOXML, I see one example straight away: The OOXML specification requires conforming implementations to accept and understand various legacy office applications. But the conformance section to the ISO standard (which is only about page four) specifies conformance in terms of being able to accept the grammar, use the standard semantics for the bits you implement, and document where you do something different. The bits you don’t implement are no-one’s business.

While technically Mr. Jelliffe is correct, any competent organization would, and should, strive to fully implement a standard. While the specification does not require an understanding of various legacy office applications, it certainly limits the number of organizations that could fully implement the specification to exactly one — Microsoft.

ODF proponents are responding to these claims. For contradictions and objections, Grocdoc is hosting the EOOXML Objections. It also seems that Mr. Jelliffe’s collegues at O’Reilly are supporting ODF over EOOXML. Jean Hollis Weber writes:

Do we need two standards? I think not, and many people (with a lot more technical knowledge than I have) also think not.

Do we need two standards? No, competing (open) standards offer nothing to the consumer, and are simply an extra headache for developers. What is Microsoft’s motivation behind EOOXML? Why would they not adopt a community support standard such as ODF? One thought is that by adopting ODF, Microsoft would lose sales of its Office Suite applications. However, if they are successful in standardizing their own conceived format, then they can retain sales and lock in users.

Groklaw is questioning Microsoft’s motivation as well. In the article, Searching for Openness in Microsoft’s OOXML and Finding Contradictions, discusses the Novell-Microsoft Deal and its affect on interoperability, Microsoft’s past and ongoing record, and provides the details of the ISO standardization process.

See? “Only Novell” can do this. That isn’t interoperability, in the sense that you’d expect from a standard, is it? It’s just another Microsoft partner, maintaining the Microsoft unwillingness to share technical information with real competitors, to my eyes. Why would you even need the interoperability work between Novell and Microsoft, if Microsoft planned to offer a standard the whole world could use equally? Isn’t that what a standard is supposed to mean?

Perhaps the most telling statement against EOOXML being a truly open standard comes from the specification itself. Page 10 states the goal of the EOOXML standard.

The goal is to enable the implementation of the Office Open XML formats by the widest set of tools and platforms, fostering interoperability across office productivity applications and line-of-business systems, as well as to support and strengthen document archival and preservation, all in a way that is fully compatible with the large existing investments in Microsoft Office documents.

The ISO’s deadline for objections and contradictions is February 5th; I’m certain more will be reported as the deadline approaches.
Until next time-

-3Monkeys