Orcmid's Lair status 
privacy 
 
about 
contact 

2008-05-19

OOXML: Excel2007 versus IS 29500? The Fun Begins

This is a hasty post.  I haven't done the research to determine whose bug this is about.  Added: I'm beginning to think it "bug" is too simple for this kind of discrepancy.

[update 2008-05-19T23:30Z: Rob Weir has been picking apart this situation, along with the benefit of having a proof copy of IS 29500.  His analysis is much deeper into the workings of Excel date-related functions and how DIS 29500final is mucked up with regard to the same functions.  His analysis is important, completely apart from his observations about the ISO and BRM processes.  My sense is that the bar has been set too low and sloppy for spreadsheet functions.  The bar must be raised and maybe the OpenFormula effort is a place to come together on that.  (My recommendation is that OpenFormula not attempt to match the Excel/IS 29500 flavors at all, avoiding even use of the same names, with rapprochement if and when IS 29500, or a sequel, are repaired.)

update 2008-05-19T22:56Z: On reflection, this is a point case of a very complex and too-subtle situation.  Here's a way to look at it apart from the details of the specific however-illuminating case:

  1. There are spreadsheet functions that seem to have obvious behavior that, in practice, are non-obvious and not well-specified.
      
  2. There's a terrific "me too" among spreadsheet implementation and naming of functions, although it is not clear that the implementations match and they are mostly (or more-so) under-specified.
      
  3. The behavior of spreadsheet functions can change from one version of a product to the next.
      
  4. Some spreadsheet functions relate to specific practices and disciplines (e.g., accounting and finance) that software developers should not be making up their own, independent definitions for.
      
  5. Some of those practices might be ill-defined in their discipline of origin too.  Whether they are or not, it is not up to software developers to come up with unilateral repairs.
      
  6. We're in a continuous learning and correction process when it comes to perfecting the definitions of spreadsheet functions (not to mention practically every human discipline that we attempt to computerize).
      
  7. Change and correction is assured.  It takes great care and patience to work that out in a non-disruptive way in a world where digital artifacts are disseminated and preserved everywhere with no way to force migration.
      
  8. The Microsoft Excel, ECMA-376, and IS 29500 (?) YEARFRAC functions provide exemplary demonstration of all of this.
      
  9. We can't stand under-specified and deviating functions in an interoperable world.  We're just learning how to remedy the situation.  David Wheeler and the others working toward an OpenFormula specification are the pathfinders.

   I'd like to say much more about this.  It will have to be added to my already-mountainous backlog.]


David Wheeler, Chair of the OpenDocument TC's OpenFormula subcommittee, has noticed a breaking change between Excel 2007 and OOXML.

There's a problem, though:

  1. Excel 2007 was shipped in 2007.  Office 2007 SP1 was shipped in December 2007.
      
  2. Wheeler is comparing Excel behavior with "Office Open Extensible Mark-up Language (OOXML), final draft ISO/IEC 29500-1:2008(E) as of 2008-05-01" which I presume is DIS 29500final or the much-awaited proof copy of IS 29500.  This document did not exist before April, 2008, and reflects changes agreed in the DIS 29500 Ballot Resolution meeting.  There were a lot of changes around the handling of calendar dates in ECMA-376 (the original DIS 29500 basis).

To be fair, one should test Excel 2007 against the December 2006 ECMA-376 (part 4), the best specification that Excel 2007 files and functions could possibly be in conformance with. 

Unfortunately, all I can establish there is that there's a bug in ECMA-376, part 4, section 3.17.7.348 YEARFRAC, where an extraneous day-count-basis argument is listed but not used.  The function is seriously under-specified concerning the ranges and acceptable separations of parameters.  In addition, there is no identified source of authoritative definitions for the methods identified by the basis argument. 

I could find no ECMA-376 description of the basis rules to compare with the one (PDF file) Wheeler has found, but ECMA-376 has some non-normative examples that indirectly answer one of Wheeler's questions.  Excel 2007 produces the results shown in the ECMA-376 examples, but it is clear from Wheeler's better edge cases that, although the ECMA-376 examples are in accord with the IS 29500 definitions, Excel does not arrive at its results by the method defined in IS 29500.  So we appear to have a bug in Excel (that may have been there for some time), a bug in the definition added to IS 29500, or both.  [This affirms for me something that I've always felt: that the apparent lack of precision and rigor in the published definitions of Excel functions (and other people's too) is a terrible thing.  I have been grateful to the OOXML standardization effort for creating conditions where that has to be remedied.]

I'm a bit confused about the ISO/IEC 29500 document that Wheeler mentions.  First, I would like to know where it can be obtained and what its status is so I and others could check this and other matters of importance.  It may be that there is a defect in the editing of this document that should be prepared, although it doesn't sound like it.  Also, there is a serious domain-knowledge question (concerning financial practices and what the agreed definitions of those basis cases are, and there seem to be contradictory definitions around) that must be answered to figure out what the methods should be.  I understand Wheeler's haste, if there is meant to be a compatible YEARFRAC in OpenFormula.

Meanwhile, this adventure and others like it are going to provide some great examples of how breaking changes from one version of a standard to another are navigated by implementations.  Along with that, we get to see what can be responsibly done in changes from one version (standard or non-standard or out-and-out defect) of an implementation feature to another while still interoperating up-level and down-level with all of the unchanged versions out there.  It has long been impossible to throw a switch and suddenly have all programs and documents aligned with a particular version of a specification.  How standards are crafted to support this reality is important. 

As Wheeler observes in his detailed documentation (PDF file) of this problem, there are many lessons to be gained here.  Wheeler also provides further details (start with the README) on discovering what the Excel implementation is, when it changed, and what some think the correct answers are.

 
Comments:
 
You might also like (or not) to read Rob Weir's further analysis of David's findings...
 
 
Thanks. My feed reader had not picked up that post yet. I find it valuable and have added a link at the top of this post, along with some other ammendations of mine.
 
Post a Comment
 
Construction Zone (Hard Hat Area) You are navigating Orcmid's Lair.

template created 2002-10-28-07:25 -0800 (pst) by orcmid
$$Author: Orcmid $
$$Date: 08-10-07 13:22 $
$$Revision: 1 $