Welcome to Orcmid's Lair, the playground for family connections, pastimes, and scholarly vocation -- the collected professional and recreational work of Dennis E. Hamilton
OOX-ODF: The Danger of Finding Only What You're Looking For
One problem that scientific investigators must always be cautious about is finding what we are looking for because that is all we are able to see. This has tripped up many an “objective” person. I confess to getting that particular pie in my face often enough to be very careful. I’ll not be surprised when it happens again. I do promise to correct whatever is necessary.
This human condition is amply demonstrated when we are making abstract attributions and speculations about the actions of others. It is easy to see all conduct as evidence that justifies the attitude we already bring to the party. Our spectacles are already smeared with the stain of our own prejudgments.
It is easy to see how this arises in the way everything Microsoft does or doesn’t do is constantly explained by the corporation’s malevolent intentions. This applies to the simplest acts and some silly and sometimes completely stupid moves. But the indignant proclamations of further evidence of evil intent continues. They are not even “convicted” of being a monopoly — that’s not illegal, but that is often said in justifying every suspicion.
Similarly, to consider that the behavior of IBM as a corporation and of a few visible IBM employees in opposing the promulgation of Office Open XML (OOX) to be a considered, calculated business maneuver is, to my mind, giving IBM far too much credit. There is considerable risk of potential embarrassment (and worse) were such machinations substantiated by non-repudiatable facts. There are too many other explanations that fit the observed behavior, especially for all of the years the IBM internal echo chamber has had to establish the received wisdom that Microsoft did them wrong (out IBM-ing IBM, more-or-less, for those of us with long memories). I’m willing to believe that it is really personal. I am not prepared to go farther than that. There are lots of agendas here. I doubt that one can isolate and confirm a single one to fit all of the conduct that we see. The same goes for Microsoft when one employee or another lashes out in some particular way.
The so-called evidence for contradiction in OOX that has been compiled at Groklaw is another example of the lengths we can go when we are too happy with our findings and are careless with over-reaching interpretations of facts that are open for anyone to inspect more carefully. There may be a pony in there, but the readiness to accept blatant nonsense tends to smear the pony with manure. I’ll pick the first example that came to my attention, because it is so clear-cut. (It is not my purpose to engage in an extensive analysis of complaints about OOX, I’m looking at where attitude leads to credulity.) After that I’ll turn to a more-recent sequence of extrapolations that are even sillier, especially considering how easy it is to check.
You Say Bright Green, I say Chartreuse
On January 28, Sam Hiser blogged about a problem with the way colors are handled in OOX. The summary statement in his blog feed was pretty direct:
On the blog page there’s a little table with two different sets of color mappings for a single set of color names (dark blue, dark cyan, etc.). I first thought the difference might have to do with the choice of standard web color codes, but the OOX choices don’t completely line up with those either.
After looking at Yoon Kit Hasan Saidin’s article, I concluded two things (in a long-winded comment), and Yoon Kit has replied. What seems to have gotten lost in the discussion, and I am certainly at fault for not writing more crisply, is this:
Well, where did those widely different color mappings in the blog-post table come from? They arose by taking the codes for oranges and lining them up with codes for apples (sorry). The codes shown for OOX mappings have nothing to do with the SVG names nor the use of colors in DrawingML. The OOX ones being compared with the identified SVG colors are the OOX colors for highlighting over text. OOX has a limited set of highlighting colors for use in Wordprocessing ML and only in WordprocessingML. They are also named by fixed, rigidly-defined English-language text strings in attribute values. The red-green-blue levels that those attribute values correspond to are also rigorously and clearly defined. Some happen to be different mappings than ones used for similarly-spelled (but different-)attribute values in DrawingML.
The highlight-color attributes are different attributes with different use, and there is no apparent intention to correlate them with the same names when used for the SVG and DrawingML colors. These never show up in DrawingML. Similarly, implementing them correctly is trivial when following the specification. And, in fact, not even Microsoft Office Word 2007 uses all of the same names in its English (US) interface for selection of highlight colors.
The attribute values are for technical coding of colors. They are not about what users see or what the colors might be called by different users. OOX doesn’t prescribe any of that. You might have done this differently, and I might have also, given a blank sheet of paper to start with. It doesn’t matter.
This is not rocket science. Yet the declaration of Yoon Kit, parroted by Sam is “MSOOXML contradicts W3C SVG Colour definitions” and the comparison in the table between SVG colors and the highlight colors in OOX WordProcessingML is simply bogus.
What I say: No harm, no foul. Interesting way to learn more about how OOX is specified.
The Mysterious Document Updates
[updated 2007-02-12T07:15Z to smooth out some bumps and account for a third flavor, the single 5-parts-in-one PDF that was apparently submitted to ISO.]
On February 8, Rob Weir posted about “Here Today, Gone Tomorrow.” Here;s the gist of it:
Of course there are hilariously speculative comments to go with the full post. And Microsoft’s Brian Jones is now archly pestered by some messenger of joy leaving innuendos on his blog and challenging him to explain what happened.
Let’s take the section number issue first. It is true that the ECMA specification is in 5
I have a different beef with the section numbering in the OOX specification. The tables of content
Now, about the repagination. Rob is a smart guy, and he has gone through the big
I’m going out on a limb here, because I have no idea what documents were physically delivered to ISO JTC1 as the ECMA submission. What I have in my possession are the files for the December 2006 ECMA-376 as they were on February 9, 2007 when I downloaded them. Of these, the Part 4 DOCX (in a Zip file with other material) version is corrupted (a problem I sometimes have with some documents from some sites), so I couldn’t use Word 2007 to compare it with any earlier edition. I also have the final TC45 Drafts that were created in October for comparison. Here is the result of my explorations.
Differences in the Part 4, Markup Language Reference PDF. This is the big honker that you dive into when you want to see the precise details of all of the attributes and formats of OOX. Here’s where those who are looking for it find buried treasure and smoking guns. So what’s the difference? Well, the PDF that was created on February 1 has one more physical page in it than the PDF that was created on October 6, 2006. That is the blank page after the front title page. The page is counted in the page numbering, it just wasn’t physically present in the October 6 version. I can’t do a line by line or word for word comparison, but I can tell you that the numberings on the pages are identical and the table of contents (and the pages referenced in the table of contents) are identical. When I obtain the DOCX version of the latest downloads, I will have Word 2007 compare them and show me the changes. Meanwhile, if page references are to the numbers on the pages (and not the sequential page-count positions displayed by PDF), there seems to be no problem. And even if the PDF page-count numbers were used, the new document’s PDF page-count numbers for the main section are simply greater by 1 for the same numbered page. [updated 2007-02-11T11:33Z in a poor attempt to distinguish between the numberings PDF shows by counting the pages that are there, in sequence, and the numbers that are printed on the pages.]
Differences in Part 1, Fundamentals. I know that older versions of Word can compare two documents and synthesize what are the additions and deletions between one and the other. Versions since at least Office 2003 also provide sidebar annotations that explain changes, making it easy to scroll through all of them. There are other ways to navigate from difference to difference as well. I’d seen a post that suggested that Office 2007 does this even better. So what a great test: doing some document forensics on the OOX specification itself using the current closest implementation.
I started with Part 1 for no good reason other than it was first. Using the .docx files now, the final committee draft is dated October 9 and it has 173 pages in the file. These are pages i (un-numbered cover) through viii (end of the Introduction), followed by 166 pages of the main document text.
The latest download (dated January 25, 2007 in the file itself) has 178 pages in the file. These are page i (un-numbered cover) through xii (end of Introduction), followed by 166 pages of the main document content. The tables of contents are the same, and the numbered pages of the main text are the same. I mean the same. There is no material difference in content. The pagination on the pages themselves has not changed at all. (These are all formatted for 8.5” by 11” paper, as are the PDFs, by the way.)
The difference in the front matter is the addition of two new cover sheets (with blank backs) and modification of the original part 1 title to simply introduce the part and remove text that applied only in the TC45 draft. There are other differences. They are immaterial. They seem to be entirely involved with styles in the tables of content, fields that produce titles over the pages, and formatting of bulleted lists. The result seems to be indistinguishable, but Word 2007 says there was a change. I saw nothing different outside of the 4 new pages of front matter and the edits to the original title page.
This is not an exhaustive comparison. Anyone who thinks they might find something significant is welcome to do more.
Maybe ECMA could use stronger document controls and account for re-issues better. But there appears to be no difference and the material is on the same-numbered pages that it has been since October. I suspect ECMA might feel they are insulted, though. And rightly so.
My provisional assessment: No harm. No foul. Waste of time. On further reflection: some lessons about how to cross-reference documents and also on how to provide some document engineering controls so people know what’s what.
Comments: Post a Comment
|You are navigating Orcmid's Lair.|