Blunder Dome Sighting  

Hangout for experimental confirmation and demonstration of software, computing, and networking. The exercises don't always work out. The professor is a bumbler and the laboratory assistant is a skanky dufus.

Click for Blog Feed
Blog Feed

Recent Items
My FUD is FUDDier than your FUD, so FUD this!
Agile Scope-Creep and How to Detect It
Sending Orcmid to (Code) Camp
Relaxing Patent Licenses for Open Documents
Navigating Data Models
Assessing Open Source for Corporate Usability
Safe Software: Getting Easier?
The End of the Historical Record?
Software Inspection's Lonely Adherents
Trustworthy Deployment: What's That?

This page is powered by Blogger. Isn't yours?

Locations of visitors to this site
visits to Orcmid's Lair pages

The nfoCentrale Blog Conclave
Millennia Antica: The Kiln Sitter's Diary
nfoWorks: Pursuing Harmony
Numbering Peano
Orcmid's Lair
Orcmid's Live Hideout
Prof. von Clueless in the Blunder Dome
Spanner Wingnut's Muddleware Lab (experimental)

nfoCentrale Associated Sites
DMA: The Document Management Alliance
DMware: Document Management Interoperability Exchange
Millennia Antica Pottery
The Miser Project
nfoCentrale: the Anchor Site
nfoWare: Information Processing Technology
nfoWorks: Tools for Document Interoperability
NuovoDoc: Design for Document System Interoperability
ODMA Interoperability Exchange
Orcmid's Lair
TROST: Open-System Trustworthiness



Magical Thinking and the Universal Document Elixir

As long as we’re sitting here by the campfire telling ghost stories about the great OpenDocument vs. Microsoft Office XML FUDwrestle, it is appropriate to discuss the really great idea that the OpenDocument format is designed to be a universal file format such that, according to one commenter “all the information in any file format should be able to be stored in ODF without loss.”  It is appealing to then conclude that “this would allow it to be use[d] as the native format in many applications and, most importantly, a universal translation method between any two different formats.”

What a wonderful straw man!  What a beautiful dream.  A universal format that serves as a universal document model that all formats can be translated through.  Douglas Englebart will be very happy to know that this knotty problem is solved and he can get on with the OHS and other projects dear to his heart.

Microsoft: Damned If You Do, Damned If You Don’t

What’s really great about this is how it makes such a cool mouse trap to use on the folks at Microsoft.  Here’s where another comment took it:

“If the MSXML binary key and software bindings do not exist, then Microsoft (and everyone else for that matter) should be able to provide the marketplace with clean clear transformation filters enabling easy conversions from MSXML to ODF and back? If they did this, then their software would meet the Massachusetts requirements. But they don't!”

Let me see, we’re supposed to assume that the magical binary key must exist because if it didn’t, there would be transformation filters between MSXML (I am not sure which XML that is, but let’s suppose its the existing WordML just for clarity) and ODF.  But there aren’t so the magical invisible binary key must exist?  Well, maybe there aren’t because “should be able to” is actually a really hard problem? 

There are two difficulties here.  One problem is that the commentator is quoting Gary Edwards again and I’d really like to hear from someone else who can speak authoritatively about OpenDocument.  I’d like some sense for who else is drinking the same cool-aid and most-of-all who is willing to provide some technical evidence for all of these weird claims.  The other thing, and that is what I really want to talk about, is the presumption of universal translatability, if that is what is really meant (e.g., easy conversions over and back).

Is There a Universal Document Format?

I have my doubts whether a universal document format is even possible.  I am willing to consider that some practical level of this might be accomplished for a selected set of cases and document models that can be conformed somehow.  We’ve barely gotten to that level with programming languages (thanks to the .NET CLI, actually) after a quest of almost 50 years, and programming languages are easier (unless a human has to understand the result, and then it might be harder).

So what I’m looking for is not some vague claim of a dream fulfilled but a simple demonstration of how and what level of universal transformation layer has actually been accomplished.  What is the model and what was concluded about the conditions under which inter-translation works?  What are/were the metrics?

How’d This Become the Terms of Debate?

The basis for this claim is that interview of Edwards (sorry) where he is reported to have said

“When the Open Document Technical Committee talks about legacy systems, we're talking about at least 30 years of legacy information systems that cross an incredible spectrum of information and file format types. Boeing is an excellent example, and ODF TC member Doug Alberg was a most important driver in the first 18 months of ODF TC work, a period I always refer to as the “universal transformation layer” period because interoperability with legacy information systems was our primary concern.”

The interview continues to reaffirmation of the universal transformation layer with

“The first 18 months of the Open Document project were to perfect the Open Document XML as a transformation layer, where all of these legacy systems could be connected to the transformation layer. Once it's in the common transformation layer, then you can pick and choose which publishing and content management system you would want.”

 In the cited examples of publishing and content-management systems, nothing from Microsoft is mentioned.  I also don’t see mention of  TeX, PDF, DocBook (or SGML generally) or a contemporaneous ISO specification, the Open Document Architecture (ODA).  Since these last are well- and fully-specified, I would think they’d make great tests for successful universal transformation. 

What You See Is All You Get

Beside Doug Alberg of Boeing, Edwards also gives great credit to “legendary Daniel Vogelheim” (co-architect of the XML file format and a Sun Software Engineer) for this period of the work.  Vogelheim is more conservative in his stance, according to Eric van der Vlist writing in <?xmlhack?>.  It seems that Vogelheim takes “transformability” to mean that the format is usable outside of the office application, something which should be pretty-much true of any XML format for a document and the point of examples that Brian Jones posts about integrating/blending WordML and Excel XML formats with business applications. 

The full abstract for Vogelheim’s XML2002 talk expands on this notion.  It is clear that extraction and repurposing is intended.  Nowhere is there any claim for universal transformation between document formats, something Edwards appears to mean and that everyone else picks up on.  This also appears to be the basis for whatever logic has people believe that all Microsoft has to do is adopt the OpenDocument format. 

I’m willing to believe that Edwards is serious about this when he makes comments on Bob Sutor’s blog like, “The magic transformation qualities of ODF on the other hand are legendary, and it's only five years old!”  I just can’t see anywhere that has been handled.

Show Me the Elixir

Here is where I end up with this.   If there were indeed a charge to ensure some degree of universal translation with ODF as an intermediary, there is no evidence of it in the OASIS Specification.  I did a search through the PDF for every occurrence of “transformation” in the document.  The greatest number of occurrences have to do with transformation as used in presentation systems (such as Adobe Postscript) for transformations of drawing geometries.  There are a few cases where design and feature changes are described in terms of making transformation of documents via XSLT a little easier. 

The key example, to my mind, is the design goal of having it be possible for any elements below the paragraph to be ignored (that is, the tags are dropped) and the remaining content be appropriate for text extraction.  This is nowhere like preserving formatting and document models and whatever else as part of a translation with ODF as a document lingua franca.  [It also appears to capture hidden text.]  Most of these features are described in terms of how they should make such transformation easier.  None of them seem to be about preserving the document in going from/to ODF.  I also see this principle as a barrier to the successful translation of non-ODF document architectures to ODF, when that architecture depends on sub-paragraph elements with content that is not intended to be part of the text content at all.  (Whether or not that was a good idea, the question is how does one get into ODF with it.)

Now if translation were part of the charter and charge of the Open Document Technical Committee (if you can find it let me know), and some kind of universal document model were achieved, I would expect that

  • It would be shouted from the rooftops by a wide community of experts.
  • There would be serious technical excitement about the prospect.
  • There would be substantial content in the specification, as well as some non-normative appendices, explaining the model, accounting for the benchmarks that were used and reporting how well they were approached.

I find nothing like that.  Anywhere.

The universal doc comment was amusing. ODF can't even transform the Office XML formats w/o loss of information which is probably the primary value of the Office XML formats (being able to move legacy binary formats to fully documented XML formats w/o data loss).

I'm still waiting to see what hoops MA is going to jump through to get complex legacy documents into ODF without loss of data and/or time and money that could've been avoided if they allowed Office XML. I see another Munich Linux migration-styled mishap.

I do think MS should consider formalizing the Office XML and XPS Reach formats with ISO after they finalize the specs. I personally have no issue with the current licensing, but going to ISO would allow them to have a base standard everyone could use as a common denominator for interchange while allowing them to continue making further enhancements to their formats for future releases, much like they're doing currently with .NET and the CLI, C#, C++/CLI standards and Adobe w/ PDF. Maybe this has already been considered though, and they are waiting for final specs before submission.
I have been reading about ODF and Open XML for a while and haven't yet stumbled upon the "universal translation layer" concept, but it is not surprising, I guess, that people would jump on it. The idea that any format is even aiming for universal is ridiculous, and ODF certainly doesn't normally make that claim. All it really claims to be is open, and the question with Open XML is how open it really is. I honestly think it is a great leap forward for Microsoft to add a true XML storage format and document it, as it will greatly enhance the ability for third parties to integrate with Microsoft documents, but that is not really the same thing as an open standard.

The problem is that we have entered the silly season. Some of the ODF supporters are making crazy claims for how universal it is, and Microsoft is making crazy claims about how Open XML is an "open standard", which it pretty much is in name only. Neither move really lessens the value of the formats - ODF is still a valuable move toward a general standard for office formats, and Open XML is still a valuable move to exposing what has been a proprietary format and ensuring that it will not be changed without some forewarning. If either party could be satisfied with that, everything would be fine and the two formats could interact and coexist comfortably with slightly different purposes, but neither party is likely to be satisfied with anything but complete victory. Silly, really.

Construction Structure (Hard Hat Area) You are navigating Orcmid's Lair.

template created 2004-06-17-20:01 -0700 (pdt) by orcmid
$$Author: Orcmid $
$$Date: 10-04-30 22:33 $
$$Revision: 21 $