Welcome to Orcmid's Lair, the playground for family connections, pastimes, and scholarly vocation -- the collected professional and recreational work of Dennis E. Hamilton
OOXML: Excel2007 versus IS 29500? The Fun Begins
This is a hasty post. I haven't done the research to determine whose bug this is about. Added: I'm beginning to think it "bug" is too simple for this kind of discrepancy.
[update 2008-05-19T23:30Z: Rob Weir has been picking apart this situation, along with the benefit of having a proof copy of IS 29500. His analysis is much deeper into the workings of Excel date-related functions and how DIS 29500final is mucked up with regard to the same functions. His analysis is important, completely apart from his observations about the ISO and BRM processes. My sense is that the bar has been set too low and sloppy for spreadsheet functions. The bar must be raised and maybe the OpenFormula effort is a place to come together on that. (My recommendation is that OpenFormula not attempt to match the Excel/IS 29500 flavors at all, avoiding even use of the same names, with rapprochement if and when IS 29500, or a sequel, are repaired.)
update 2008-05-19T22:56Z: On reflection, this is a point case of a very complex and too-subtle situation. Here's a way to look at it apart from the details of the specific however-illuminating case:
I'd like to say much more about this. It will have to be added to my already-mountainous backlog.]
David Wheeler, Chair of the OpenDocument TC's OpenFormula subcommittee, has noticed a breaking change between Excel 2007 and OOXML.
There's a problem, though:
To be fair, one should test Excel 2007 against the December 2006 ECMA-376 (part 4), the best specification that Excel 2007 files and functions could possibly be in conformance with.
Unfortunately, all I can establish there is that there's a bug in ECMA-376, part 4, section 18.104.22.1688 YEARFRAC, where an extraneous day-count-basis argument is listed but not used. The function is seriously under-specified concerning the ranges and acceptable separations of parameters. In addition, there is no identified source of authoritative definitions for the methods identified by the basis argument.
I could find no ECMA-376 description of the basis rules to compare with the one (PDF file) Wheeler has found, but ECMA-376 has some non-normative examples that indirectly answer one of Wheeler's questions. Excel 2007 produces the results shown in the ECMA-376 examples, but it is clear from Wheeler's better edge cases that, although the ECMA-376 examples are in accord with the IS 29500 definitions, Excel does not arrive at its results by the method defined in IS 29500. So we appear to have a bug in Excel (that may have been there for some time), a bug in the definition added to IS 29500, or both. [This affirms for me something that I've always felt: that the apparent lack of precision and rigor in the published definitions of Excel functions (and other people's too) is a terrible thing. I have been grateful to the OOXML standardization effort for creating conditions where that has to be remedied.]
I'm a bit confused about the ISO/IEC 29500 document that Wheeler mentions. First, I would like to know where it can be obtained and what its status is so I and others could check this and other matters of importance. It may be that there is a defect in the editing of this document that should be prepared, although it doesn't sound like it. Also, there is a serious domain-knowledge question (concerning financial practices and what the agreed definitions of those basis cases are, and there seem to be contradictory definitions around) that must be answered to figure out what the methods should be. I understand Wheeler's haste, if there is meant to be a compatible YEARFRAC in OpenFormula.
Meanwhile, this adventure and others like it are going to provide some great examples of how breaking changes from one version of a standard to another are navigated by implementations. Along with that, we get to see what can be responsibly done in changes from one version (standard or non-standard or out-and-out defect) of an implementation feature to another while still interoperating up-level and down-level with all of the unchanged versions out there. It has long been impossible to throw a switch and suddenly have all programs and documents aligned with a particular version of a specification. How standards are crafted to support this reality is important.
As Wheeler observes in his detailed documentation (PDF file) of this problem, there are many lessons to be gained here. Wheeler also provides further details (start with the README) on discovering what the Excel implementation is, when it changed, and what some think the correct answers are.
Thanks. My feed reader had not picked up that post yet. I find it valuable and have added a link at the top of this post, along with some other ammendations of mine.
|You are navigating Orcmid's Lair.|