Orcmid's Lair status 


Microsoft ODF Interoperability Workshop

Front row of the squared roundtable (John Head far left, Peter Amstein second from right).  More photos on Flickr
DII ODF Event 2008-07-30: Roundtable Discussions

On Wednesday, July 30, a full-day workshop explored Microsoft's approach to adding Open Document Format support directly into Office 2007.  Microsoft's first built-in support will arrive with the Office 2007 SP2 service pack expected mid-2009-ish.  Attendees kicked the tires on the current pre-beta implementation (well before initial-beta availability sometime this-year-ish).   The workshop provided interaction with the ODF community on technical approach, the challenges being faced, and the balancing act that Office-ODF interoperability requires.  By the first break, observing all of the lively conversations among the attendees, I concluded that the meeting was already a success.  The additional sessions and the evening dinner reinforced that conclusion.  I saw no down side.

In addition to providing a face to the Microsoft Office developers working on ODF implementation, the meeting provided face-to-face acquaintance among people who had only known each other through their presence on the web.  That was a special reward for me.  Here are my impressions.

1. Context
2. General Approach to Document Interoperability for Microsoft Office
3. The Devil Is in the Details
4. Impact of the Office Processing Model
5. Demonstrations and Discussions
    5.1 Microsoft Word 2007 ODF Support
    5.2 Microsoft Excel 2007 ODF Support
    5.3 Microsoft PowerPoint 2007 ODF Support
    5.4 Office Graphics Support in ODF
    5.5 Roundtable Discussions
    5.6 Meeting the People
6. What Others Are Saying

I am cherry-picking aspects of the workshop that fall in my areas of concern.  This is out of balance with the full range of topics and the discussion.  I invite your consultation of additional posts on the overall event. 

[update 2008-08-06T17:31Z There are a few other posts added to the list of links, with adjustments of the text where availability of the other information is pertinent. I corrected a couple of grammar slips at the same time.]

1. Context

On May 21, 2008, Microsoft announced initial built-in support of OpenDocument Format as part of the 2009 release of the Microsoft Office 2007 SP2 service pack.  Some interpreted this as favoring ODF over the updating of existing OOXML provisions to align with differences introduced in IS 29500, changes not expected until the next version of Microsoft Office.  [In my reality, inherent limitations of the current ODF specifications prevent anything close to parity with the already-substantial IS 29500 support in Office 2003-2007 until a future version of Office.  Support for evolution of ODF (standards and implementations), of OOXML (standards and implementations), and of down-up-level compatibility in the same integrated office-productivity suites will involve some fascinating and instructive evolution of ODF and OOXML support under the same roof.]

One month later (June 22, 2008), OASIS Open Document TC members were invited to the July workshop by their new member, Microsoft's Doug Mahugh.  After ensuring that all ODF TC members who desired to come had a place, Mahugh issued a general invitation on July 9.  The July 23 welcome kit provided the essential parameters:

"Welcome to the first DII [Document Interoperability Initiative] Workshop focused on our ODF implementation. ... We will be using this workshop to preview the work we've done implementing ODF ..., to give you an opportunity to try it out for yourselves, and to get your feedback on the challenges and opportunities surrounding interoperability and ODF."

Along with briefings and hands-on usage, the agenda included three full-group roundtable discussions on topics that Microsoft was grappling with.  A fourth would be added, along with free-wheeling discussions held throughout the day.  Emphasis on this being the first such workshop and absence of non-disclosure agreements are heartening indications  that a serious, open conversation is beginning.

2. General Approach to Document Interoperability for Microsoft Office

Following the initial welcome, Paul Lorimer (Group Program Manager, Office Interoperability) sketched his organization's responsibility for standards engagement across Microsoft Office, including Doug Mahugh's participation at the ODF TC and other standards bodies.  Lorimer pointed to the significant amounts of interoperability documentation, including for the binary formats and Office-related protocols, that have been produced.  Lorimer sketched the progression of the work from the start of aggressive licensing in 2003 to the Interoperability Principles and Document Interoperability Initiative of 2008.

For ODF specifically, Peter Amstein (Development Manager for Microsoft Word and ODF-implementation architect for Office) described the five guiding principles that govern ODF support in Word, Excel, and PowerPoint.  These set the priorities that apply in making trade-offs:

    1. Adhere to ODF 1.1
    2. Be Predictable
    3. Preserve User Intent
    4. Preserve Editability
    5. Preserve Visual Fidelity

Along with this prioritization, there is balancing of different interests: standards groups, corporations, institutions, government agencies, regulatory bodies, and general users.

Doug Mahugh explained that ODF 1.1 (instead of the ISO 26300 standard for ODF 1.0) is chosen because of the accessibility additions and because current non-beta implementations are overwhelmingly for ODF 1.1.  Amstein added what I take as another important reason for starting at this level: where ODF 1.1 is ambiguous or incomplete, the Office implementation can be guided by current practice in OpenOffice.org, mainly, and other implementations including KOffice and AbiWord.

Peter Amstein and the Microsoft Office team are reluctant to make liberal use of extension mechanisms, even though provided in ODF 1.1.  They want to avoid all appearance of an embrace-extend attempt. 

3. The Devil Is in the Details

Balancing of competing considerations is not trivial.  To illustrate the tensions involved, there's a significant challenge for Excel support of ODF spreadsheet documents: there is no way to incorporate spreadsheet formulas in ODF files without relying on an extension mechanism.  It is, of course, not a meaningful option to omit support for spreadsheet formulas in ODF spreadsheet documents.

All ODF-conformant spreadsheet implementations, including that of OpenOffice.org, must use extension mechanisms to implement their particular spreadsheet formulas.  That is the common practice in this case.   Accordingly, Excel uses formula extension to preserve OOXML-defined Excel formulas in ODF spreadsheets.  Excel identifies the formulas as employing its extension and accepts them back from ODF input files.  In the pre-beta implementation, Excel drops any formulas based on different "foreign" extensions.  I presume the current thinking is to eventually converge on OpenFormula rather than attempt to map to any other foreign extensions, even OO.o's, in the interim.

The irony is that current OpenOffice.org implementations fail to check whether a formula conforms to its own extension or not.  OpenOffice will inadvertently but successfully accept some formulas produced by Excel's ODF implementation as if they are OO.o's.  [Afterthought: This happenstance is a doubtful blessing in terms of the potential for user confusion and it may fail the predictability criterion even though this is not Microsoft's problem to solve.]  When an Excel formula is successfully "OOo-injected" this way, it will be saved with identification as an OO.o-extension formula and Excel will ignore that unrecognized ODF extension on return in an ODF.  [update: Florian Reuter has posted a defect report to OpenOffice.org based on this and other information from the workshop.]

This provisional, hopefully-interim Excel-formula approach is an extreme case of extension conflicts; it didn't arouse much concern at the workshop.  There were other situations where the Office team takes the opposite approach and avoids extensions.  This triggered greater concern from some participants.

4. Impact of the Office Processing Model

Figure 1: Highly-schematic office-application software architecture
Peter Amstein also provided a diagram of the Office processing model, apologizing for lack of details that this audience might have wanted.  On reflection, I think his diagram was just right: it illustrates a common difficulty when productivity software supports multiple formats having different feature sets.

In my unofficial even-simpler version (fig.1), the central feature of typical document processing software is the internal, in-memory representation of the software's document model.

The User Interface (UI) provides the creation, manipulation, and viewing/presentation features that the human operator observes and controls.  It is important to recognize that all of the user-interactions with the document features involve the internal document representation.  It is that representation that supports the UI-selectable features and the visible results that are displayed.

At a lower level there are operations that can be selected by the user for transfer between the document representation and external elements (printers, scanners, and persistent storage, typically).  In the diagram, I've featured those document-processing services that transfer between internal representation and persistent storage in a variety of formats.

Depending on constraints of the internal document representation and the strength of the architectural boundary for import-export of persistent forms, the internal document representation might not reflect anything about the persistent forms that are supported.  Architecturally, this is a widely-used approach to maintaining editing performance and isolating file-format treatment in document load and save operations.

Topics I Didn't Think to Raise.  I didn't notice the implications of this for ODF interoperability in Office (and OOXML interoperability elsewhere) until ruminating the next day.  The problem arises as soon as there are multiple formats that must be coherently supported in a single implementation.  If UI features work against the internal document representation, they may have little capacity for reflecting differences in capabilities with respect to the persistent formats that are accommodated.  Typically, disparities between the internal representation's capabilities and the features of persistent forms are resolvable only when there is a transfer (either input or output) between the internal document representation and an external representation.  The first impact is that features may be lost on input.  The second impact is that features may be lost on output.  This seems straightforward until we realize that the user may rely on features that succeed with the internal representation, only to have them be degraded or lost entirely in the chosen external representation.  You can use all of the software's features while editing and end up losing some of them when saving the document.  An added complication is that users might not need to commit to any particular external representation before editing.  It is not unusual to save a single internal document in more than one format, from crudely-formatted plain-text to HTML to whatever the richest "native" external format is.  To the extent that an office-processing model maintains internal ignorance of the external formats, there may be difficult-to-mitigate user-experience consequences. 

Figure 2: Failure to match features between internal representation
and persistent form:
(top) degradation of down-level acceptance for later format;
(bottom) generic warning when exporting to a different format
Typical Import Problem: OOXML into down-level Word 2003

Typical Export Problem: Saving to OOXML from OO.o 2.4 (Novell edition)

Speaking from Ignorance.  I have no knowledge of Microsoft Office System internals.  I can't speculate what the specific limitations might be, if any.  However, as we move toward increased interoperability where multiple productivity-software formats are supported as fully as possible in the same product, only learning about degradation at input and output will become unhelpful (fig.2).  This consideration applies not just to Microsoft Office; it applies to all products that rely on a similar separation of concerns and have an integrated internal document representation.  It will be challenging to smooth out movement among the formats without discouraging and alienating users.  [I don't believe that limiting users to a greatest-common-denominator internal representation is an option for mainstream productivity suites, even though I advocate exploring that option for cases where interoperable fidelity trumps all else.]

More Flavors of Import/Export.  We did not dig into the model at the workshop.  After providing a basic sketch of the overall model, Peter Amstein continued with an explanation of the different ways external formats are employed in the main Office 2007 applications (fig.1):

  • (bin) Originally, there was a strong correspondence between the internal representation and external representations that were essentially serializations of the internal data structures; the Office binary formats (.doc in the diagram above) are the descendants of that approach.  [This is now a legacy format, essentially tied forever to the Office 97-2003 internal model and its provisions for down-level recognition of newer features]
  • (rtf) The RTF format was available as a interchange format.  Conversion plug-ins can  operate at an interface for receiving and producing RTF.  [This is apparently the main avenue for addition of plug-ins also compatible with previous versions of Office.]
  • (markup) transfer to and from markup languages is also supported, with OOXML and ODF fitting that case.  Office 2007 SP2 will improve the ability to add plug-ins of this kind and also choose any of them as the default format.

There were brief examples applying guiding principles to a number of specific cases other than those I've identified here.  [update: Doug Mahugh has provided an extended account of the principles and of the Model-View-Controller processing model.]

5. Demonstrations and Discussions

Here are a few highlights.  Jesper Lund Stocholm has provided more information in his sketch of the proposed approaches and the roundtable discussion.

5.1 Microsoft Word 2007 ODF Support

Microsoft Word Program Manager Amani Ahmed provided a quick demonstration of the ODF support for Word documents in its pre-beta form. 

The first example involved opening of the ODF 1.1 specification, the ODF version, in Microsoft Word.  The document opened directly and relatively quickly (especially in comparison with current translator plug-in solutions).   Ahmed added text to the title page and saved the modified document.   Opening the document in OpenOffice.org Writer, she scrolled around enough to demonstrate that her editing of the document had been fully preserved when saved back out to ODF by Word.  (A sharp-eyed observer noticed that the saved document was noticeably but not frightfully larger than the original.  The difference is apparently related to how the ODF styles are mapped into the Word 2007 internal representation and then brought out again in the pre-beta implementation.)

The second example consisted of a richly-formatted .docx with a number of interesting features, including defined value-entry fields.  This document also transferred to ODF with preservation of its features.

Different Strokes for Different Folks.  Chatting before the start of the workshop, ODF TC Editor Patrick Durusau reported that he often takes advantage of OpenOffice.org's ability to open Office binary-format documents.  Durusau finds the OO.o interface leaner, more intuitive, and more appealing to use.   We had been commiserating on being lost in the Word 2007 UI and not being adept enough to know the still-working keyboard shortcuts.  Watching Ahmed use Word 2007 to navigate around the ODF specification, I realized that Word 2007 is a better viewer for my specification-review work.  Using OpenOffice.org and the new Adobe Reader 9, I am frustrated by being able to follow cross-references and table-of-content links and not being able to backtrack (although I just now found and enabled the backtrack option in Reader 9).  The document map, thumbnails, and screen-reading views of Word 2007 are what I have overlooked as aides to my specification-review efforts.  Acknowledging that I may simply have failed to find the desired features in different products, it is encouraging that multiple implementations for standard formats will also expand opportunities for people to match their ways-of-working and, in particular, their individual ways of discovering the available functionality.

5.2 Microsoft Excel 2007 ODF Support

Microsoft Excel Program Manager Eric Patterson started with an .xlsx document that was saved as an ODF .ods spreadsheet document.  Opening in OpenOffice.org Calc preserved a variety of features but not all conditional formatting cases.  A simple formula transferred correctly (by accident) and the returned formula was dropped by Excel, as already discussed (section 3).

The current proposal is to allow all Excel capabilities to be used with the internal representation even though saving the document as ODF will lose some of the features.  The thinking is that a number of the special formatting and presentation capabilities in Excel 2007 are valuable to have available even though they are not preserved in the saved ODF spreadsheet.   I suppose this would be useful when preparing a printed document or when keeping it privately in .xlsx and circulating the ODF otherwise.  I'm not so sure about such niceties if they have to be re-introduced each time the ODF version returns or is re-opened.  [Afterthought: Another way to have more preservation of some advanced formatting and logic would be to paste or embed the richer Excel version into an OpenOffice.org-produced ODF document.  But such cases are are already available without requiring ODF support in Office 2007.  Added 2008-08-06: Oddly, the OLE degradation cases (missing application, missing linked file, missing OLE support) work more smoothly and round-trip restoration of original functionality works too.]

5.3 Microsoft PowerPoint 2007 ODF Support

Microsoft PowerPoint Program Manager Alan Huang demonstrated two interesting aspects to the saving of PowerPoint 2007 presentations as ODF presentations.  His example included two-level master hierarchy (themes to masters to instances), transitions, tables, slide notes, and changed templates on some slides. 

The presentation appearance was preserved very well, although the master hierarchy is lost in the ODF and tables lose their "tableness."  There are mismatches in coloring, gradients, and some layout-preservation bugs are being looked into.

This is one of the places where pixel-for-pixel fidelity can be expected, and where, as John Head ably advocated, users likely won't care about standards if conformance prevents that.  The cases are complicated, especially where Microsoft Office features don't have a safe representation in ODF and where ODF features don't map well into the Office model.  I imagine there are also concerns about self-compatibility as the support for ODF evolves with future releases.

5.4 Office Graphics Support in ODF

Megan Bates, working on shared graphics and objects across the Office suite, demonstrated how graphics are being mapped to and from ODF.  Some of the features that are new in Office 2007 do not map well into ODF.  In some cases, it is proposed that loss of shape and color fidelity be tolerated in order to preserve editability in ODF applications. 

These graphics arise in Word, Excel, and PowerPoint documents and feature discrepancies will be apparent to those who rely heavily on the most-advanced aspects.

I'm not sure there was any clear proposal whether OLE-embedded objects would be moved through ODF, although it is provided for in the specification and it is also supported by OpenOffice.org.   There is also the difference between interoperating with ODF of your own origination and not that of others (a version of the spreadsheet formula situation) because the OLE binaries are necessarily introduced via ODF-tolerated extension.  [Added 2008-08-06: Oddly enough, reliance on OLE embeddings can, in these cases, provide better predictability and more gradual degradation of fidelity in the absence of corresponding support in a different application configuration.  This has to be balanced against the prospect of creating a covert channel for leakage of embedded data.]

5.5 Roundtable Discussions

The roundtables covered four topics and then some general discussion.

  • De facto vs. Written Standards
    The examples introducing this topic involved practices that operate in places where available specifications do not connect the dots well enough (e.g., in some specifics around the use of Zip, itself a mostly-documented de facto standard).
    This reminded me of the reason why WS-I is needed, although there are also problems with ill-connected dots, bugs in specs and deviations in products, and how this all gets sorted out over time.  I'm also in favor of quickly-developed implementer agreements that are published as interim/informational supplements and worked into future versions of standards when the opportunity arrives.
    There was some interesting discussion among Florian Reuter, Doug Mahugh, Patrick Durusau, Peter Amheit, Jesper Lund Stocholm, and others.   Some wondered/speculated how Symphony handles some cases in contrast to OpenOffice.org and proposed Microsoft Office operation.
  • Extensions vs. Creative Support vs. No Support
    This discussion ran the gamut between purism, finding creative ways to accomplish an odd case within provisions of the specification, or making extensions.  I don't think anyone was considering extensions that didn't fit the extension provisions of ODF.  Florian Reuter argued an interesting gracefully-failing extension technique that, on reflection, I believe could avoid breaking changes against earlier ODF specifications of namespaces, their elements and their attributes too.
  • Fundamental Application Differences
    We started out in terms of basic differences, such as differences in capacity (e.g., spreadsheet columns or rows), layout engines, what can occur inside what, and application feature sets that don't correspond between implementations or come across quite differently.  One factor that might inhibit application-software feature parity is intellectual property protection that is separate from implementation of the document format.  John Head reported that concern in regard to Excel 2007 pivot-chart functionality.
    There was some strong concern about the kinds of messages (fig.2) that come up at the end of the user's work and for which the consequences are completely beyond the ken of the user at that point.  I made one of my best-ever public growls in favor of that sentiment and against the uninformative and not-understandably-actionable messages.   
  • Usability and General Concerns
    John Head had expressed a very strong position about what happens at the level of the user and how users can understand the consequences of feature use and the level at which office software is used, not in the bowels of formats and their processing.
    There were some potential "use case" problems, such as an application being quite friendly with its own ODF but not with that of other significant implementations, because of the ways that features are sliced, extensions are introduced and recognized, and different format features are confusingly used for the same purposes between implementations.
    With regard to preserving not-understood content from another application, Peter Amstein pointed out that privacy concerns and policies of trustworthy computing make it generally undesirable to preserve content that is not understood by the recipient.  [Afterthought: Its more than privacy; tolerating an unknown payload may permit creation of a deliberate covert channel, not just accidental ones.]  Others are concerned about this reluctance interfering with successful interoperability at the user level, especially considering that there are so many ways to sneak unknown material into office-application document formats anyhow.

These discussions did not arrive at conclusions.  The purpose was to air different approaches and concerns, giving us and the Microsoft team much to think about.  The purpose was fulfilled.  I also saw some points that may be worth expanding into separate web posts and discussions.

5.6 Meeting the People

It was delightful to finally meet a large number of individuals that I knew of and all but two I had never met in person:

  • Standards activists Patrick Durusau, Florian Reuter, and Jesper Lund Stocholm
  • Microsoft interoperability and standards folk along with our hosts Paul Lorimer and Doug Mahugh: Paul Cotton, Gray Knowlton, Jean Paoli, and Vijay Rajagopalan
  • Microsoft technicology officers Oliver Bell, Vijay Kapur, and Stephen McGibbon
  • The generous Microsoft Office developers who presented, took our complaints, discussed their work, and those who hung out at the back of the room, including, beside those already mentioned, Brian Jones and Jeffrey Murray
  • John Head, Geordie Keuber, John Peltonen, Dirk Vollmar, and many other attendees that I didn't meet long enough to capture their names.

It was quite a day.

6. What Others Are Saying

The following posts provide the varied perspectives of blogging attendees and other observers:

A great write up Dennis, and good to finally meet you!
Post a Comment
Construction Zone (Hard Hat Area) You are navigating Orcmid's Lair.

template created 2002-10-28-07:25 -0800 (pst) by orcmid
$$Author: Orcmid $
$$Date: 08-10-07 13:23 $
$$Revision: 1 $