Welcome to Orcmid's Lair, the playground for family connections, pastimes, and scholarly vocation -- the collected professional and recreational work of Dennis E. Hamilton
Microsoft ODF Interoperability Workshop
Technorati Tags: interoperability, Microsoft, OpenDocument Format, Microsoft Office 2007, ODF-OOXML Harmonization, ODF, OOXML, Document Interoperability Initiative
On Wednesday, July 30, a full-day workshop explored Microsoft's approach to adding Open Document Format support directly into Office 2007. Microsoft's first built-in support will arrive with the Office 2007 SP2 service pack expected mid-2009-ish. Attendees kicked the tires on the current pre-beta implementation (well before initial-beta availability sometime this-year-ish). The workshop provided interaction with the ODF community on technical approach, the challenges being faced, and the balancing act that Office-ODF interoperability requires. By the first break, observing all of the lively conversations among the attendees, I concluded that the meeting was already a success. The additional sessions and the evening dinner reinforced that conclusion. I saw no down side.
In addition to providing a face to the Microsoft Office developers working on ODF implementation, the meeting provided face-to-face acquaintance among people who had only known each other through their presence on the web. That was a special reward for me. Here are my impressions.
I am cherry-picking aspects of the workshop that fall in my areas of concern. This is out of balance with the full range of topics and the discussion. I invite your consultation of additional posts on the overall event.
[update 2008-08-06T17:31Z There are a few other posts added to the list of links, with adjustments of the text where availability of the other information is pertinent. I corrected a couple of grammar slips at the same time.]
On May 21, 2008, Microsoft announced initial built-in support of OpenDocument Format as part of the 2009 release of the Microsoft Office 2007 SP2 service pack. Some interpreted this as favoring ODF over the updating of existing OOXML provisions to align with differences introduced in IS 29500, changes not expected until the next version of Microsoft Office. [In my reality, inherent limitations of the current ODF specifications prevent anything close to parity with the already-substantial IS 29500 support in Office 2003-2007 until a future version of Office. Support for evolution of ODF (standards and implementations), of OOXML (standards and implementations), and of down-up-level compatibility in the same integrated office-productivity suites will involve some fascinating and instructive evolution of ODF and OOXML support under the same roof.]
One month later (June 22, 2008), OASIS Open Document TC members were invited to the July workshop by their new member, Microsoft's Doug Mahugh. After ensuring that all ODF TC members who desired to come had a place, Mahugh issued a general invitation on July 9. The July 23 welcome kit provided the essential parameters:
Along with briefings and hands-on usage, the agenda included three full-group roundtable discussions on topics that Microsoft was grappling with. A fourth would be added, along with free-wheeling discussions held throughout the day. Emphasis on this being the first such workshop and absence of non-disclosure agreements are heartening indications that a serious, open conversation is beginning.
2. General Approach to Document Interoperability for Microsoft Office
Following the initial welcome, Paul Lorimer (Group Program Manager, Office Interoperability) sketched his organization's responsibility for standards engagement across Microsoft Office, including Doug Mahugh's participation at the ODF TC and other standards bodies. Lorimer pointed to the significant amounts of interoperability documentation, including for the binary formats and Office-related protocols, that have been produced. Lorimer sketched the progression of the work from the start of aggressive licensing in 2003 to the Interoperability Principles and Document Interoperability Initiative of 2008.
For ODF specifically, Peter Amstein (Development Manager for Microsoft Word and ODF-implementation architect for Office) described the five guiding principles that govern ODF support in Word, Excel, and PowerPoint. These set the priorities that apply in making trade-offs:
Along with this prioritization, there is balancing of different interests: standards groups, corporations, institutions, government agencies, regulatory bodies, and general users.
Doug Mahugh explained that ODF 1.1 (instead of the ISO 26300 standard for ODF 1.0) is chosen because of the accessibility additions and because current non-beta implementations are overwhelmingly for ODF 1.1. Amstein added what I take as another important reason for starting at this level: where ODF 1.1 is ambiguous or incomplete, the Office implementation can be guided by current practice in OpenOffice.org, mainly, and other implementations including KOffice and AbiWord.
Peter Amstein and the Microsoft Office team are reluctant to make liberal use of extension mechanisms, even though provided in ODF 1.1. They want to avoid all appearance of an embrace-extend attempt.
3. The Devil Is in the Details
Balancing of competing considerations is not trivial. To illustrate the tensions involved, there's a significant challenge for Excel support of ODF spreadsheet documents: there is no way to incorporate spreadsheet formulas in ODF files without relying on an extension mechanism. It is, of course, not a meaningful option to omit support for spreadsheet formulas in ODF spreadsheet documents.
All ODF-conformant spreadsheet implementations, including that of OpenOffice.org, must use extension mechanisms to implement their particular spreadsheet formulas. That is the common practice in this case. Accordingly, Excel uses formula extension to preserve OOXML-defined Excel formulas in ODF spreadsheets. Excel identifies the formulas as employing its extension and accepts them back from ODF input files. In the pre-beta implementation, Excel drops any formulas based on different "foreign" extensions. I presume the current thinking is to eventually converge on OpenFormula rather than attempt to map to any other foreign extensions, even OO.o's, in the interim.
The irony is that current OpenOffice.org implementations fail to check whether a formula conforms to its own extension or not. OpenOffice will inadvertently but successfully accept some formulas produced by Excel's ODF implementation as if they are OO.o's. [Afterthought: This happenstance is a doubtful blessing in terms of the potential for user confusion and it may fail the predictability criterion even though this is not Microsoft's problem to solve.] When an Excel formula is successfully "OOo-injected" this way, it will be saved with identification as an OO.o-extension formula and Excel will ignore that unrecognized ODF extension on return in an ODF. [update: Florian Reuter has posted a defect report to OpenOffice.org based on this and other information from the workshop.]
This provisional, hopefully-interim Excel-formula approach is an extreme case of extension conflicts; it didn't arouse much concern at the workshop. There were other situations where the Office team takes the opposite approach and avoids extensions. This triggered greater concern from some participants.
4. Impact of the Office Processing Model
In my unofficial even-simpler version (fig.1), the central feature of typical document processing software is the internal, in-memory representation of the software's document model.
The User Interface (UI) provides the creation, manipulation, and viewing/presentation features that the human operator observes and controls. It is important to recognize that all of the user-interactions with the document features involve the internal document representation. It is that representation that supports the UI-selectable features and the visible results that are displayed.
At a lower level there are operations that can be selected by the user for transfer between the document representation and external elements (printers, scanners, and persistent storage, typically). In the diagram, I've featured those document-processing services that transfer between internal representation and persistent storage in a variety of formats.
Depending on constraints of the internal document representation and the strength of the architectural boundary for import-export of persistent forms, the internal document representation might not reflect anything about the persistent forms that are supported. Architecturally, this is a widely-used approach to maintaining editing performance and isolating file-format treatment in document load and save operations.
Topics I Didn't Think to Raise. I didn't notice the implications of this for ODF interoperability in Office (and OOXML interoperability elsewhere) until ruminating the next day. The problem arises as soon as there are multiple formats that must be coherently supported in a single implementation. If UI features work against the internal document representation, they may have little capacity for reflecting differences in capabilities with respect to the persistent formats that are accommodated. Typically, disparities between the internal representation's capabilities and the features of persistent forms are resolvable only when there is a transfer (either input or output) between the internal document representation and an external representation. The first impact is that features may be lost on input. The second impact is that features may be lost on output. This seems straightforward until we realize that the user may rely on features that succeed with the internal representation, only to have them be degraded or lost entirely in the chosen external representation. You can use all of the software's features while editing and end up losing some of them when saving the document. An added complication is that users might not need to commit to any particular external representation before editing. It is not unusual to save a single internal document in more than one format, from crudely-formatted plain-text to HTML to whatever the richest "native" external format is. To the extent that an office-processing model maintains internal ignorance of the external formats, there may be difficult-to-mitigate user-experience consequences.
Speaking from Ignorance. I have no knowledge of Microsoft Office System internals. I can't speculate what the specific limitations might be, if any. However, as we move toward increased interoperability where multiple productivity-software formats are supported as fully as possible in the same product, only learning about degradation at input and output will become unhelpful (fig.2). This consideration applies not just to Microsoft Office; it applies to all products that rely on a similar separation of concerns and have an integrated internal document representation. It will be challenging to smooth out movement among the formats without discouraging and alienating users. [I don't believe that limiting users to a greatest-common-denominator internal representation is an option for mainstream productivity suites, even though I advocate exploring that option for cases where interoperable fidelity trumps all else.]
More Flavors of Import/Export. We did not dig into the model at the workshop. After providing a basic sketch of the overall model, Peter Amstein continued with an explanation of the different ways external formats are employed in the main Office 2007 applications (fig.1):
There were brief examples applying guiding principles to a number of specific cases other than those I've identified here. [update: Doug Mahugh has provided an extended account of the principles and of the Model-View-Controller processing model.]
5. Demonstrations and Discussions
5.1 Microsoft Word 2007 ODF Support
Microsoft Word Program Manager Amani Ahmed provided a quick demonstration of the ODF support for Word documents in its pre-beta form.
The first example involved opening of the ODF 1.1 specification, the ODF version, in Microsoft Word. The document opened directly and relatively quickly (especially in comparison with current translator plug-in solutions). Ahmed added text to the title page and saved the modified document. Opening the document in OpenOffice.org Writer, she scrolled around enough to demonstrate that her editing of the document had been fully preserved when saved back out to ODF by Word. (A sharp-eyed observer noticed that the saved document was noticeably but not frightfully larger than the original. The difference is apparently related to how the ODF styles are mapped into the Word 2007 internal representation and then brought out again in the pre-beta implementation.)
The second example consisted of a richly-formatted .docx with a number of interesting features, including defined value-entry fields. This document also transferred to ODF with preservation of its features.
Different Strokes for Different Folks. Chatting before the start of the workshop, ODF TC Editor Patrick Durusau reported that he often takes advantage of OpenOffice.org's ability to open Office binary-format documents. Durusau finds the OO.o interface leaner, more intuitive, and more appealing to use. We had been commiserating on being lost in the Word 2007 UI and not being adept enough to know the still-working keyboard shortcuts. Watching Ahmed use Word 2007 to navigate around the ODF specification, I realized that Word 2007 is a better viewer for my specification-review work. Using OpenOffice.org and the new Adobe Reader 9, I am frustrated by being able to follow cross-references and table-of-content links and not being able to backtrack (although I just now found and enabled the backtrack option in Reader 9). The document map, thumbnails, and screen-reading views of Word 2007 are what I have overlooked as aides to my specification-review efforts. Acknowledging that I may simply have failed to find the desired features in different products, it is encouraging that multiple implementations for standard formats will also expand opportunities for people to match their ways-of-working and, in particular, their individual ways of discovering the available functionality.
5.2 Microsoft Excel 2007 ODF Support
Microsoft Excel Program Manager Eric Patterson started with an .xlsx document that was saved as an ODF .ods spreadsheet document. Opening in OpenOffice.org Calc preserved a variety of features but not all conditional formatting cases. A simple formula transferred correctly (by accident) and the returned formula was dropped by Excel, as already discussed (section 3).
The current proposal is to allow all Excel capabilities to be used with the internal representation even though saving the document as ODF will lose some of the features. The thinking is that a number of the special formatting and presentation capabilities in Excel 2007 are valuable to have available even though they are not preserved in the saved ODF spreadsheet. I suppose this would be useful when preparing a printed document or when keeping it privately in .xlsx and circulating the ODF otherwise. I'm not so sure about such niceties if they have to be re-introduced each time the ODF version returns or is re-opened. [Afterthought: Another way to have more preservation of some advanced formatting and logic would be to paste or embed the richer Excel version into an OpenOffice.org-produced ODF document. But such cases are are already available without requiring ODF support in Office 2007. Added 2008-08-06: Oddly, the OLE degradation cases (missing application, missing linked file, missing OLE support) work more smoothly and round-trip restoration of original functionality works too.]
5.3 Microsoft PowerPoint 2007 ODF Support
Microsoft PowerPoint Program Manager Alan Huang demonstrated two interesting aspects to the saving of PowerPoint 2007 presentations as ODF presentations. His example included two-level master hierarchy (themes to masters to instances), transitions, tables, slide notes, and changed templates on some slides.
The presentation appearance was preserved very well, although the master hierarchy is lost in the ODF and tables lose their "tableness." There are mismatches in coloring, gradients, and some layout-preservation bugs are being looked into.
This is one of the places where pixel-for-pixel fidelity can be expected, and where, as John Head ably advocated, users likely won't care about standards if conformance prevents that. The cases are complicated, especially where Microsoft Office features don't have a safe representation in ODF and where ODF features don't map well into the Office model. I imagine there are also concerns about self-compatibility as the support for ODF evolves with future releases.
5.4 Office Graphics Support in ODF
Megan Bates, working on shared graphics and objects across the Office suite, demonstrated how graphics are being mapped to and from ODF. Some of the features that are new in Office 2007 do not map well into ODF. In some cases, it is proposed that loss of shape and color fidelity be tolerated in order to preserve editability in ODF applications.
These graphics arise in Word, Excel, and PowerPoint documents and feature discrepancies will be apparent to those who rely heavily on the most-advanced aspects.
I'm not sure there was any clear proposal whether OLE-embedded objects would be moved through ODF, although it is provided for in the specification and it is also supported by OpenOffice.org. There is also the difference between interoperating with ODF of your own origination and not that of others (a version of the spreadsheet formula situation) because the OLE binaries are necessarily introduced via ODF-tolerated extension. [Added 2008-08-06: Oddly enough, reliance on OLE embeddings can, in these cases, provide better predictability and more gradual degradation of fidelity in the absence of corresponding support in a different application configuration. This has to be balanced against the prospect of creating a covert channel for leakage of embedded data.]
5.5 Roundtable Discussions
The roundtables covered four topics and then some general discussion.
These discussions did not arrive at conclusions. The purpose was to air different approaches and concerns, giving us and the Microsoft team much to think about. The purpose was fulfilled. I also saw some points that may be worth expanding into separate web posts and discussions.
5.6 Meeting the People
It was delightful to finally meet a large number of individuals that I knew of and all but two I had never met in person:
It was quite a day.
6. What Others Are Saying
The following posts provide the varied perspectives of blogging attendees and other observers:
Comments: Post a Comment
|You are navigating Orcmid's Lair.|