![]() |
Professor von Clueless in the Blunder Dome |
status privacy about contact |
|
Hangout for experimental confirmation and demonstration of software, computing, and networking. The exercises don't always work out. The professor is a bumbler and the laboratory assistant is a skanky dufus.
Atom Feed Associated Blogs Recent Items Archives |
Monday, October 17, 2005Magical Thinking and the Universal Document Elixir
As long as we’re sitting here by the campfire telling ghost stories about the great OpenDocument vs. Microsoft Office XML FUDwrestle, it is appropriate to discuss the really great idea that the OpenDocument format is designed to be a universal file format such that, according to one commenter “all the information in any file format should be able to be stored in ODF without loss.” It is appealing to then conclude that “this would allow it to be use[d] as the native format in many applications and, most importantly, a universal translation method between any two different formats.” What a wonderful straw man! What a beautiful dream. A universal format that serves as a universal document model that all formats can be translated through. Douglas Englebart will be very happy to know that this knotty problem is solved and he can get on with the OHS and other projects dear to his heart. Microsoft: Damned If You Do, Damned If You Don’tWhat’s really great about this is how it makes such a cool mouse trap to use on the folks at Microsoft. Here’s where another comment took it:
Let me see, we’re supposed to assume that the magical binary key must exist because if it didn’t, there would be transformation filters between MSXML (I am not sure which XML that is, but let’s suppose its the existing WordML just for clarity) and ODF. But there aren’t so the magical invisible binary key must exist? Well, maybe there aren’t because “should be able to” is actually a really hard problem? There are two difficulties here. One problem is that the commentator is quoting Gary Edwards again and I’d really like to hear from someone else who can speak authoritatively about OpenDocument. I’d like some sense for who else is drinking the same cool-aid and most-of-all who is willing to provide some technical evidence for all of these weird claims. The other thing, and that is what I really want to talk about, is the presumption of universal translatability, if that is what is really meant (e.g., easy conversions over and back). Is There a Universal Document Format?I have my doubts whether a universal document format is even possible. I am willing to consider that some practical level of this might be accomplished for a selected set of cases and document models that can be conformed somehow. We’ve barely gotten to that level with programming languages (thanks to the .NET CLI, actually) after a quest of almost 50 years, and programming languages are easier (unless a human has to understand the result, and then it might be harder). So what I’m looking for is not some vague claim of a dream fulfilled but a simple demonstration of how and what level of universal transformation layer has actually been accomplished. What is the model and what was concluded about the conditions under which inter-translation works? What are/were the metrics? How’d This Become the Terms of Debate?The basis for this claim is that interview of Edwards (sorry) where he is reported to have said
The interview continues to reaffirmation of the universal transformation layer with
In the cited examples of publishing and content-management systems, nothing from Microsoft is mentioned. I also don’t see mention of TeX, PDF, DocBook (or SGML generally) or a contemporaneous ISO specification, the Open Document Architecture (ODA). Since these last are well- and fully-specified, I would think they’d make great tests for successful universal transformation. What You See Is All You GetBeside Doug Alberg of Boeing, Edwards also gives great credit to “legendary Daniel Vogelheim” (co-architect of the OpenOffice.org XML file format and a Sun Software Engineer) for this period of the work. Vogelheim is more conservative in his stance, according to Eric van der Vlist writing in <?xmlhack?>. It seems that Vogelheim takes “transformability” to mean that the format is usable outside of the office application, something which should be pretty-much true of any XML format for a document and the point of examples that Brian Jones posts about integrating/blending WordML and Excel XML formats with business applications. The full abstract for Vogelheim’s XML2002 talk expands on this notion. It is clear that extraction and repurposing is intended. Nowhere is there any claim for universal transformation between document formats, something Edwards appears to mean and that everyone else picks up on. This also appears to be the basis for whatever logic has people believe that all Microsoft has to do is adopt the OpenDocument format. I’m willing to believe that Edwards is serious about this when he makes comments on Bob Sutor’s blog like, “The magic transformation qualities of ODF on the other hand are legendary, and it's only five years old!” I just can’t see anywhere that has been handled. Show Me the ElixirHere is where I end up with this. If there were indeed a charge to ensure some degree of universal translation with ODF as an intermediary, there is no evidence of it in the OASIS Specification. I did a search through the PDF for every occurrence of “transformation” in the document. The greatest number of occurrences have to do with transformation as used in presentation systems (such as Adobe Postscript) for transformations of drawing geometries. There are a few cases where design and feature changes are described in terms of making transformation of documents via XSLT a little easier. The key example, to my mind, is the design goal of having it be possible for any elements below the paragraph to be ignored (that is, the tags are dropped) and the remaining content be appropriate for text extraction. This is nowhere like preserving formatting and document models and whatever else as part of a translation with ODF as a document lingua franca. [It also appears to capture hidden text.] Most of these features are described in terms of how they should make such transformation easier. None of them seem to be about preserving the document in going from/to ODF. I also see this principle as a barrier to the successful translation of non-ODF document architectures to ODF, when that architecture depends on sub-paragraph elements with content that is not intended to be part of the text content at all. (Whether or not that was a good idea, the question is how does one get into ODF with it.) Now if translation were part of the charter and charge of the Open Document Technical Committee (if you can find it let me know), and some kind of universal document model were achieved, I would expect that
I find nothing like that. Anywhere. Comments: The universal doc comment was amusing. ODF can't even transform the Office XML formats w/o loss of information which is probably the primary value of the Office XML formats (being able to move legacy binary formats to fully documented XML formats w/o data loss). I'm still waiting to see what hoops MA is going to jump through to get complex legacy documents into ODF without loss of data and/or time and money that could've been avoided if they allowed Office XML. I see another Munich Linux migration-styled mishap. I do think MS should consider formalizing the Office XML and XPS Reach formats with ISO after they finalize the specs. I personally have no issue with the current licensing, but going to ISO would allow them to have a base standard everyone could use as a common denominator for interchange while allowing them to continue making further enhancements to their formats for future releases, much like they're doing currently with .NET and the CLI, C#, C++/CLI standards and Adobe w/ PDF. Maybe this has already been considered though, and they are waiting for final specs before submission. I have been reading about ODF and Open XML for a while and haven't yet stumbled upon the "universal translation layer" concept, but it is not surprising, I guess, that people would jump on it. The idea that any format is even aiming for universal is ridiculous, and ODF certainly doesn't normally make that claim. All it really claims to be is open, and the question with Open XML is how open it really is. I honestly think it is a great leap forward for Microsoft to add a true XML storage format and document it, as it will greatly enhance the ability for third parties to integrate with Microsoft documents, but that is not really the same thing as an open standard. The problem is that we have entered the silly season. Some of the ODF supporters are making crazy claims for how universal it is, and Microsoft is making crazy claims about how Open XML is an "open standard", which it pretty much is in name only. Neither move really lessens the value of the formats - ODF is still a valuable move toward a general standard for office formats, and Open XML is still a valuable move to exposing what has been a proprietary format and ensuring that it will not be changed without some forewarning. If either party could be satisfied with that, everything would be fine and the two formats could interact and coexist comfortably with slightly different purposes, but neither party is likely to be satisfied with anything but complete victory. Silly, really. My FUD is FUDDier than your FUD, so FUD this!
I have been paying attention to the posturing that goes on around OASIS Open Document Format (ODF) and the Microsoft Office XML Reference Schemas (supported now in Office 2003 components) and the Microsoft Office Office Open XML (OX) that will be used as the new default format for Word, Excel, and PowerPoint in the next version of Microsoft Office. Sometimes people I think are quite senior and knowledgeable seem to take leave of their senses in proclaiming things that years of experience in their organizations should suggest is not quite so bare-faced nor so simply-accomplished. Then there is the stuff that comes up when someone’s FUD detector has the gain too high and it goes into feedback because someone sneezed in the parking lot. The Binary Key That Everybody Knows AboutThe funniest examples, if they weren’t so irritating to me, are the ones that are passed around as technical facts that “everybody knows” and quoted and referenced gleefully but never fact-checked. One that really gets me is one that some posters have been asking Brian Jones to explain and when he does, suggesting that the truth of the matter is dependent on believing Brian or a comment on Groklaw (no kidding), when there is a simple, confirmable technical fact in dispute. Here’s what I mean. Gary Edwards is one of the editors of the OASIS Open Document specification. He was interviewed by Christian Einfeld in an article published on Mad Penguin. According to the article, Gary said this:
[Update: The quote is apparently accurate. One of the other places this story is told is in a comment on Bub Sutor’s IBM blog. It seems to be Gary Edwards again. So far, I haven’t found any source for this that doesn’t end up being based on a statement credited to Gary Edwards.] I keep asking people to show me that key that is so well-known and appears in the header of every Microsoft XML document. Just show me the binary key. Uh, So How Come It’s Not Here?I went looking for confirmation. My abandoned M.Sc dissertation draft is a Word 2003 document. So I saved it as XML, then opened it in FrontPage as an XML document. I used FrontPage to pretty-print it so all of the tags line up, the entities are indented, and so on. In the snippet below, I also used Notepad to add further line breaks and indentations to make the tags and elements easier to comprehend. Here’s the beginning of the file:
and it goes on like that. There is binary content later on, in Base64 encoding. Most of it is for images that I created outside of Word and then included in the document. I gave it that binary. There is also something called <w:fldData> that is scattered throughout my document and its short content is also in what looks like Base64 encoding. Then I thought that maybe it is the use of a UUID as the URI of a namespace to be used with prefix dt:. I don’t know what that is, and I couldn’t find any actual use of the namespace so I deleted that namespace declaration. When I loaded the XML document in Word, there was no discernible difference. It doesn’t seem to matter. So, where is this binary key that is so well-known and such a terrible barrier to conversion of Microsoft XML documents to ODF? Where the FUD is it? If it’s so well known and in every Microsoft XML document, where is it? You Mean to Tell Me Exchange Is Doing It?I checked the Groklaw post that is supposed to be informative on the matter. It’s apparently from Gary Edwards and it doesn’t say anything about where the key is or what it is in the documents. It says something about how Exchange Server and IE6 are apparently in an act with Word involving a secret transformation to XML and back. I can’t figure out what that’s about and I marvel that this is so well-known, whatever it is. I don’t have Exchange, so there’s no way I can figure out how to test that or even care what some transient XML usage is about. When I ask for XML I don’t get any magical key. That’s all I know. The comment then goes on to speculate about all of the evils that the existence of this key is evidence for. Then the comment goes off about an XSL/XSLT style sheet, XML2FO.xsl, that Microsoft developers came up with that apparently doesn’t work real great and this is tied back to the mystery key by arguing about whose experts are more expert. I still can’t find the mystery key. Comments: Post a Comment |
|
|
You are navigating the Blunder Dome |
template created 2004-06-17-20:01 -0700 (pdt)
by orcmid |