Archive for January, 2007

OpenOffice .odt Opened Up – Part 1: Overview

Friday, January 12th, 2007

Overview

In the first article in this series, OpenOffice ODF/.odt compared to Microsoft Word .doc, I compared various file types for size efficiency. Of particular interest was the fact that OpenOffice Write stores .odts in a zip format, an implementation of PKZip to be exact. With this knowledge and the Open Document Format standard, we can investigate how certain elements of a document effect its size and overall efficiency.

My test cases where produced with the following software:

  • SuSE Linux 10.1
  • OpenOffice 2.0.2.7.1
  • zip 2.31 (March 8th 2005)

Starting Out

As we previously observed, .odt documents are stored in ZIP format. It is possible to store the document as a single XML file that conforms to the OpenOffice.org document type definition (DTD). It is also possible to store the document as several subdocuments, each with a different document root that represents a particular aspect of the document, such as, content or style.
Quoting the Open Document Format for Office Applications (OpenDocument) v1.0 (Second Edition), (ODF Specification):

The OpenDocument format supports the following two ways of document representation:

  • As a single XML document.
  • As a collection of several subdocuments within a package (see section 17), each of which stores part of the complete document. Each subdocument has a different document root and stores a particular aspect of the XML document. For example, one subdocument contains the style information and another subdocument contains the content of the document. All types of documents, for example, text and spreadsheet documents, use the same document and subdocuments definitions.

There are four types of subdocuments, each with different root elements. Additionally, the single XML document has its own root element, for a total of five different supported root elements. The root elements are summarized in the following table:

Root Element Subdocument Content Subdoc. Name in Package
office:document Complete office document in a single XML document. n/a
office:document-content Document content and automatic styles used in the content. content.xml
office:document-styles Styles used in the document content and automatic styles used in the styles themselves. styles.xml
office:document-meta Document meta information, such as the author or the time of the last save action. meta.xml
office:document-settings Application-specific settings, such as the window size or printer information. settings.xml

So, what is in our reference .odt? We will use the Linux produced document from a prior article (oo_part1.odt) with XML compression disabled. We’ve done this so that the XML is more human readable. After we unzip the file using the Linux utility unzip, we have the raw files as shown below.

.odt unzipped directory tree

As you can see all four subdocuments as specified in the specification are present as well as several other files. In particular META-INF/manifest.xml list the contents of the package, including information such as full path and type.

The file Thumbnails/thumbnail.png although part of the package, is not part of the document. The thumbnail image should conform to the Thumbnail Managing Standard (TMS) at www.freedesktop.org, and therefore should be24bit, non-interlaced PNG image with full alpha transparency. The required size for the thumbnails is 128×128 pixel.

Here is the thumbnail from our reference document.

thumbnail.png

Having the thumbnail available in the package, allows other applications such as file managers to preview the document to the user. With a little creative programming, sites such as Google, Yahoo or Ask, could extract this thumbnail and preview the document for users, with little difficulty.

Document Elements

The office:document may contain any of the document elements listed below.

  • office:document-attrs
  • office:document-common-attrs
  • office:meta
  • office:settings
  • office:scripts
  • office:font-face-decls
  • office:styles
  • office:automatic-styles
  • office:master-styles
  • office:body

When the subdocument method is used however, elements are restricted to certain subdocuments.

Elements in content.xml

  • office:document-content (subdocument root)
  • office:document-common-attrs
  • office:scripts
  • office:font-face-decls
  • office:automatic-styles
  • office:body

Elements in styles.xml

  • office:document-styles (subdocument root)
  • office:document-attrs
  • office:document-common-attrs
  • office:font-face-decls
  • office:styles
  • office:automatic-styles
  • office:master-styles

Elements in meta.xml

  • office:document-meta (subdocument root)
  • office:document-common-attrs
  • office:meta

Elements in settings.xml

  • office:document-settings (subdocument root)
  • office:document-common-attrs
  • office:settings

What’s Up Next?

At this point we have a clear understanding of the subdocument method that OpenOffice applies to its ODF implementation, and we know what top level elements are handled by each subdocument.

In the next article, we will ease into the subdocument elements by exploring the office:document-meta and office:document-settings elements. These two elements are rather simple and will not require as much review compared to office:document-content or office:document-styles.
Until next time.

-3Monkeys

Popularity: 23% [?]

  • DZone
  • StumbleUpon
  • Technorati
  • del.icio.us
  • Slashdot
  • Digg
  • Reddit
  • NewsVine
  • SphereIt
  • e-mail
  • Facebook
  • Google Bookmarks
  • Live
  • Propeller

Digg Duplicates, A Fundamental Flaw Exposed

Tuesday, January 9th, 2007

A while back I wrote, Observations on Digg’s Quality, today I have found another fundamental flaw with digg’s quality. Here’s how it happened.

I was taking a quick break from the day job earlier and was checking my RSS feeds and saw this:

Apple Phone Feed

Three stories about the iPhone had made the Technology front page. However upon opening the Technology front page only one of them was listed, the other two apparently had been buried. Well, I think to myself, this is great the community is policing itself well. Then I open the front page story to read the story. What do I find? A link to a flickr photo, a lousy picture at that.

That got me to wondering, what were the other two post? One was to a blog, that contained only a picture, a better picture mind you, and the other to an engadget article containing not only some prose on the subject, but contained 50 separate pictures. Now considering that this post had not only some prose, but also many more pictures than the other two articles, why was it buried as opposed to the single poor quality post?

Was the digg administrators closing front page stories that were duplicates? Was it digg users mass burying without reading? I can’t begin to guess, but for certain this points out a sever flaw in digg’s duplicate problem. Perhaps more interesting was the time-line of the submissions.

  1. http://digg.com/apple/Apple_Announces_iPhone
    (original of the front page article with same title, engadget, never made front page)
  2. http://digg.com/apple/Apple_Releases_iPhone
    (1 minute later, engadget, made front page but buried)
  3. http://digg.com/apple/Apple_Announces_iPhone_2
    (3 minutes later, poor flickr picture, front page story +6000 diggs)
  4. http://digg.com/apple/Apple_iPhone_Announced
    (4 minutes later, blog post, single picture, made front page but buried)

As far a quality and timeliness are concerned, either of the first two articles should be the front page post. So what gives digg? Kevin? Jay? can the digg staff offer any explanation for this?

Until next time.

-3Monkeys

Popularity: 8% [?]

  • DZone
  • StumbleUpon
  • Technorati
  • del.icio.us
  • Slashdot
  • Digg
  • Reddit
  • NewsVine
  • SphereIt
  • e-mail
  • Facebook
  • Google Bookmarks
  • Live
  • Propeller

Longest Legitimate Reply on Digg? “Bad, Vista!”

Sunday, January 7th, 2007

A few days ago I submitted a story on digg, Linux: Introducing The Data Corruption Bug, from Kernal Trap. I took a few people by surprise that I would post an article critical of Linux, but as an self-respecting journalist would (like there are many self-respecting journalist), I bypassed my personal bias and submitted the article. One friend in particular, Roy Schestowitz, was particularly interested in letting me know that I had stumbled over to the Dark Side. If asked Roy if I could share his thoughts with my readers and he has graciously agreed.

I will have to admit that the comment along with associated links made for much more of an interesting read.

It all starts from the first comment on the story from DocWhoWho.

buggy linux crap

Then Roy makes one response — one long response. Note: The following has been formatted to be more visually appealing to the reader.

True. All software has some bugs. Except Vista. It’s perfect. Microsoft says so.

Vista Bug re-appears

Nice… minus one million, three hundred eight thousand, two hundred fifty nine bytes… how is that any where near or mathematically altered to 2.11 MB?

http://jadeallen.com/toms/index.asp?DoAction=ReadDay&ID=359

Windows Vista’s Hideous Wakeup Support

One thing we just can’t wrap our mind about is the terrible, broken, and completely pitiful support for waking Vista up from a Deep Sleep or hibernation.’ Any time you attempt to wake Vista up from Hibernation or “Deep Sleep” (S3-induced sleep mode), it dies. It’s either a BSOD, or a driver error, or a broken network, no DWM, lack of sound… the list goes on, and on. So much for an operating system to “power” the future! (No pun intended!) That’s with properly-signed drivers and no buggy software on multiple PCs…

http://neosmart.net/blog/archives/299

TI won’t rush into Microsoft Vista readily

Corporate American, no, make that corporate everywhere is treating Vista like a dead animal found in the woods, they will poke it with a stick, but there is no way they will take it home. Take TI for instance, it is not going to touch the wonder OS for another two years or so.

http://www.theinquirer.net/default.aspx?article=36259

Installing A PS/2 Mouse Turns Off Firewall, Huh!

I plugged in the PS/2 mouse, rebooted the PC and on restart XP found the PS/2 mouse and suggested rebooting again, which I did. When it restarted the Microsoft Firewall had been switched off. How do I know, because it told me I had no firewall.

http://crunchysoftware.wordpress.com/2006/12/09/ps2-mouse-turns-off-firewall/

Microsoft Internet Explorer (IE) 7.0 Sucks!

I know Microsoft’s release of IE 7.0 is in beta, but I still didn’t expect it to be such a big piece of crap and cause hours of misery.
[...]
What Microsoft hoped would help it win back Firefox “switchers” has done nothing but add one more to their growing ranks.

http://www.marketingpilgrim.com/2006/02/microsoft-ie-70-sucks.html

Just One Question For Vista: Does It Simply Work — Like An Apple?

Jim Allchin responds over on his blog regarding a recent news report quoting him as saying that he would have bought a Mac if he weren’t working for Microsoft in an email to Steve Ballmer and Bill Gates.

[...]

So the challenge for Microsoft, and in some ways to an even larger extent Allchin himself personally, is to ship Vista in a state that is as bug free as the Mac.
Vista is Allchin’s final legacy at Microsoft. After a long successful career this is his last hurah and in order to be successful it simply must be perfect.

http://biz.yahoo.com/seekingalpha/061213/22341_id.html?.v=1

Life with Vista – Is this dogfood really for the dogs?

I’m an absolute freak when it comes to .NET technologies. My blog is called the “.NET Addict”, so it should be pretty obvious that the day Vista’s RTM build came out, I downloaded it and installed it on every Vista-capable machine I had in my possession. I’ve been using Vista for several weeks now and I’ve come to a couple of conclusions that I think might startle and shock some of you.

http://dotnetaddict.dotnetdevelopersjournal.com/vista_dogfood.htm

Vista Breaks Applications

The big secret at Redmond is that existing applications and new products will not work with Vista.
Microsoft really doesn’t want you to know this, but many of your existing applications won’t work with Vista. In fact, some brand new products won’t work with Vista.

http://www.eweek.com/article2/0,1895,2062318,00.asp

Vista… Don’t Try to Copy and Paste

Vista gave me the following when trying to copy a file:
Error 0x800705AD.

Ohh, well of course… I should have known error 0x800705AD means the user tried to COPY AND PASTE. Are you shitting me?! Insufficient quota? WTF are you talking about Vista. First let’s check that I have enough disk space:

Ok, now by my calculations 413 MBs < 22.8 GBs. I can normally figure out almost any computer related problem.

http://buckwheats.wordpress.com/2006/12/12/vista%e2%80%a6-don%e2%80%99t-try-to-copy-and-paste-o/

Vista: Behind the scenes

Some of the glitches were already known. Many were things that have already been fixed, and a few were too new and need investigating. None appeared to be a show-stopper.

http://news.zdnet.com/2100-9590_22-6133491.html

Analysts: Microsoft Changes Meaning Of ‘Release Candidate

Two industry watchers say Microsoft is corrupting the term, leading to major confusion among customers and others about whether the operating system is truly ready to evaluate.

Two analysts Thursday accused Microsoft Corp. of changing the meaning of “release candidate” by pushing out a version of Windows Vista that still needs major work.

[...]
Joe Wilcox, an analyst with JupiterResearch, said that Microsoft’s corrupted the term.

http://www.informationweek.com/story/showArticle.jhtml?articleID=192700055&cid=RSSfeed_IWK_News

Now folks, I don’t know if Roy was drinking a lot of coffee that morning, but this has to qualify for one of the longest replies in digg history.

Update:

I was asked to give a word and character count for the comment — 841 Words, 5364 characters.

Until next time.

-3Monkeys

Popularity: 19% [?]

  • DZone
  • StumbleUpon
  • Technorati
  • del.icio.us
  • Slashdot
  • Digg
  • Reddit
  • NewsVine
  • SphereIt
  • e-mail
  • Facebook
  • Google Bookmarks
  • Live
  • Propeller