|
|
privacy |
||
|
Hangout for experimental confirmation and demonstration of software, computing, and networking. The exercises don't always work out. The professor is a bumbler and the laboratory assistant is a skanky dufus.
Blog Feed Recent Items The nfoCentrale Blog Conclave nfoCentrale Associated Sites |
2005-10-17Magical Thinking and the Universal Document ElixirAs long as we’re sitting here by the campfire telling ghost stories about the great OpenDocument vs. Microsoft Office XML FUDwrestle, it is appropriate to discuss the really great idea that the OpenDocument format is designed to be a universal file format such that, according to one commenter “all the information in any file format should be able to be stored in ODF without loss.” It is appealing to then conclude that “this would allow it to be use[d] as the native format in many applications and, most importantly, a universal translation method between any two different formats.” What a wonderful straw man! What a beautiful dream. A universal format that serves as a universal document model that all formats can be translated through. Douglas Englebart will be very happy to know that this knotty problem is solved and he can get on with the OHS and other projects dear to his heart. Microsoft: Damned If You Do, Damned If You Don’tWhat’s really great about this is how it makes such a cool mouse trap to use on the folks at Microsoft. Here’s where another comment took it:
Let me see, we’re supposed to assume that the magical binary key must exist because if it didn’t, there would be transformation filters between MSXML (I am not sure which XML that is, but let’s suppose its the existing WordML just for clarity) and ODF. But there aren’t so the magical invisible binary key must exist? Well, maybe there aren’t because “should be able to” is actually a really hard problem? There are two difficulties here. One problem is that the commentator is quoting Gary Edwards again and I’d really like to hear from someone else who can speak authoritatively about OpenDocument. I’d like some sense for who else is drinking the same cool-aid and most-of-all who is willing to provide some technical evidence for all of these weird claims. The other thing, and that is what I really want to talk about, is the presumption of universal translatability, if that is what is really meant (e.g., easy conversions over and back). Is There a Universal Document Format?I have my doubts whether a universal document format is even possible. I am willing to consider that some practical level of this might be accomplished for a selected set of cases and document models that can be conformed somehow. We’ve barely gotten to that level with programming languages (thanks to the .NET CLI, actually) after a quest of almost 50 years, and programming languages are easier (unless a human has to understand the result, and then it might be harder). So what I’m looking for is not some vague claim of a dream fulfilled but a simple demonstration of how and what level of universal transformation layer has actually been accomplished. What is the model and what was concluded about the conditions under which inter-translation works? What are/were the metrics? How’d This Become the Terms of Debate?The basis for this claim is that interview of Edwards (sorry) where he is reported to have said
The interview continues to reaffirmation of the universal transformation layer with
In the cited examples of publishing and content-management systems, nothing from Microsoft is mentioned. I also don’t see mention of TeX, PDF, DocBook (or SGML generally) or a contemporaneous ISO specification, the Open Document Architecture (ODA). Since these last are well- and fully-specified, I would think they’d make great tests for successful universal transformation. What You See Is All You GetBeside Doug Alberg of Boeing, Edwards also gives great credit to “legendary Daniel Vogelheim” (co-architect of the OpenOffice.org XML file format and a Sun Software Engineer) for this period of the work. Vogelheim is more conservative in his stance, according to Eric van der Vlist writing in <?xmlhack?>. It seems that Vogelheim takes “transformability” to mean that the format is usable outside of the office application, something which should be pretty-much true of any XML format for a document and the point of examples that Brian Jones posts about integrating/blending WordML and Excel XML formats with business applications. The full abstract for Vogelheim’s XML2002 talk expands on this notion. It is clear that extraction and repurposing is intended. Nowhere is there any claim for universal transformation between document formats, something Edwards appears to mean and that everyone else picks up on. This also appears to be the basis for whatever logic has people believe that all Microsoft has to do is adopt the OpenDocument format. I’m willing to believe that Edwards is serious about this when he makes comments on Bob Sutor’s blog like, “The magic transformation qualities of ODF on the other hand are legendary, and it's only five years old!” I just can’t see anywhere that has been handled. Show Me the ElixirHere is where I end up with this. If there were indeed a charge to ensure some degree of universal translation with ODF as an intermediary, there is no evidence of it in the OASIS Specification. I did a search through the PDF for every occurrence of “transformation” in the document. The greatest number of occurrences have to do with transformation as used in presentation systems (such as Adobe Postscript) for transformations of drawing geometries. There are a few cases where design and feature changes are described in terms of making transformation of documents via XSLT a little easier. The key example, to my mind, is the design goal of having it be possible for any elements below the paragraph to be ignored (that is, the tags are dropped) and the remaining content be appropriate for text extraction. This is nowhere like preserving formatting and document models and whatever else as part of a translation with ODF as a document lingua franca. [It also appears to capture hidden text.] Most of these features are described in terms of how they should make such transformation easier. None of them seem to be about preserving the document in going from/to ODF. I also see this principle as a barrier to the successful translation of non-ODF document architectures to ODF, when that architecture depends on sub-paragraph elements with content that is not intended to be part of the text content at all. (Whether or not that was a good idea, the question is how does one get into ODF with it.) Now if translation were part of the charter and charge of the Open Document Technical Committee (if you can find it let me know), and some kind of universal document model were achieved, I would expect that
I find nothing like that. Anywhere. My FUD is FUDDier than your FUD, so FUD this!I have been paying attention to the posturing that goes on around OASIS Open Document Format (ODF) and the Microsoft Office XML Reference Schemas (supported now in Office 2003 components) and the Microsoft Office Office Open XML (OX) that will be used as the new default format for Word, Excel, and PowerPoint in the next version of Microsoft Office. Sometimes people I think are quite senior and knowledgeable seem to take leave of their senses in proclaiming things that years of experience in their organizations should suggest is not quite so bare-faced nor so simply-accomplished. Then there is the stuff that comes up when someone’s FUD detector has the gain too high and it goes into feedback because someone sneezed in the parking lot. The Binary Key That Everybody Knows AboutThe funniest examples, if they weren’t so irritating to me, are the ones that are passed around as technical facts that “everybody knows” and quoted and referenced gleefully but never fact-checked. One that really gets me is one that some posters have been asking Brian Jones to explain and when he does, suggesting that the truth of the matter is dependent on believing Brian or a comment on Groklaw (no kidding), when there is a simple, confirmable technical fact in dispute. Here’s what I mean. Gary Edwards is one of the editors of the OASIS Open Document specification. He was interviewed by Christian Einfeld in an article published on Mad Penguin. According to the article, Gary said this:
[Update: The quote is apparently accurate. One of the other places this story is told is in a comment on Bub Sutor’s IBM blog. It seems to be Gary Edwards again. So far, I haven’t found any source for this that doesn’t end up being based on a statement credited to Gary Edwards.] I keep asking people to show me that key that is so well-known and appears in the header of every Microsoft XML document. Just show me the binary key. Uh, So How Come It’s Not Here?I went looking for confirmation. My abandoned M.Sc dissertation draft is a Word 2003 document. So I saved it as XML, then opened it in FrontPage as an XML document. I used FrontPage to pretty-print it so all of the tags line up, the entities are indented, and so on. In the snippet below, I also used Notepad to add further line breaks and indentations to make the tags and elements easier to comprehend. Here’s the beginning of the file:
and it goes on like that. There is binary content later on, in Base64 encoding. Most of it is for images that I created outside of Word and then included in the document. I gave it that binary. There is also something called <w:fldData> that is scattered throughout my document and its short content is also in what looks like Base64 encoding. Then I thought that maybe it is the use of a UUID as the URI of a namespace to be used with prefix dt:. I don’t know what that is, and I couldn’t find any actual use of the namespace so I deleted that namespace declaration. When I loaded the XML document in Word, there was no discernible difference. It doesn’t seem to matter. So, where is this binary key that is so well-known and such a terrible barrier to conversion of Microsoft XML documents to ODF? Where the FUD is it? If it’s so well known and in every Microsoft XML document, where is it? You Mean to Tell Me Exchange Is Doing It?I checked the Groklaw post that is supposed to be informative on the matter. It’s apparently from Gary Edwards and it doesn’t say anything about where the key is or what it is in the documents. It says something about how Exchange Server and IE6 are apparently in an act with Word involving a secret transformation to XML and back. I can’t figure out what that’s about and I marvel that this is so well-known, whatever it is. I don’t have Exchange, so there’s no way I can figure out how to test that or even care what some transient XML usage is about. When I ask for XML I don’t get any magical key. That’s all I know. The comment then goes on to speculate about all of the evils that the existence of this key is evidence for. Then the comment goes off about an XSL/XSLT style sheet, XML2FO.xsl, that Microsoft developers came up with that apparently doesn’t work real great and this is tied back to the mystery key by arguing about whose experts are more expert. I still can’t find the mystery key. |
||
|
|
You are navigating Orcmid's Lair. |
template
created 2004-06-17-20:01 -0700 (pdt)
by orcmid |