|
|
status privacy about contact |
|
|
Welcome to Orcmid's Lair, the playground for family connections, pastimes, and scholarly vocation -- the collected professional and recreational work of Dennis E. Hamilton
Archives
Atom Feed Associated Blogs Recent Items |
2008-03-05Document Interchange: OOXML without Macros
Technorati Tags: Rob Weir, ODF-OOXML Harmonization, Interoperability Principles, ISO/IEC JTC1 ballot, ECMA-376, ISO/IEC DIS 29500, Microsoft Office 2007, Document Interoperability Initiative Rob Weir threw out a curve ball yesterday with his post on OOXML, Macros and Security. Weir points out that although ECMA-376 (and the DIS 29500 put before ISO/IEC JTC1) identifies where user-supplied macros may be invoked in a document, it provides no guidance whatsoever on how macros are incorporated in OOXML document packages, how a processor will recognize macros it is being supplied, and what the model for macro behavior within a document is and how does the macro express access to anything about the document. 1. They Meant It!It is very clear that the omission of anything about the implementation and behavior of macros is intentional. In fact, the following sort of statement is replicated with boring regularity in the relevant passages of ECMA-376 (in this case, in Part 4 section 2.16.5 entryMacro, with duplicate words and all):
Also,
And (Part 4, section 2.18.58 ST_MacroName),
2. Instant Design, Anyone?It is very easy to go off into instant design on all of this. First, the various passages (including for attributes such as "vbProcedure") are not mutually consistent and some are under-specified. Weir focuses on MACROBUTTON, a document-automation field that has two arguments (ECMA-376 Part 4 section 2.16.5.41):
In this case, it is pretty clear what field-argument-2 is about. However, there is no amplification about field-argument-1's form and usage (there being no reference to ST_MacroName and it is not clear what "or command" means). Evidently, the lack of definiteness was noted in the balloting on DIS 29500. According to Weir, the document amendment approved in the BRM is by addition of:
which is perfectly consistent with the treatment of macros and scripts elsewhere in ECMA-376. It could use a little more wordsmithing, but the intention is clear enough. 3. Should Scripting be Interoperable?Clearly, there is no interoperable scripting provided under ECMA-376. Rob Weir evangelizes for the importance of interoperable scripting, proclaiming:
I'd say that is accurate, if what you want is to assure interoperability among scripted documents. (I have long ago noted that ODF 1.0 did not take up this challenge, although it has a nice feature in the way it avoided it (below).) If what you want to do is make sure that documents are interoperable across a variety of processors, at different degrees of conformance and implementation of the specification, I'm not so sure that macros and scripting should be in the minimum set. Personally, I had completely overlooked that ECMA-376 leaves scripting and macros completely up to document-processor implementations. I assumed that the documents we are concerned about the most would be static (and secure from malware), neither active nor dynamic. It's just the way I have been thinking. I don't use macros; I wouldn't use macros in a document intended for broad interchange and preservation; I never looked for the macro capabilities in OOXML. Musing on this some more, I also think it was wise as a matter of architectural principle to deal with scripting in this way. I have no idea how ECMA TC45 arrived at this position (although I can craft a likely story). I think it is fortunate that macros and scripting are not defined. For one thing, there is no macro capability to argue about and to want to replace with something more tolerable from a wide-interoperability perspective. For another, there is room to add specific macro and scripting support after the stakeholders in the industry and major adopters of open-standard formats take a serious look at the implications and come to some agreement on the best way to go about it. As Weir notes, there are security implications, and I would add that there are also implications for levels of conformance and for long-term preservation. I do agree this leaves a gap that many will want filled. I think it should be figured out in a supplemental activity. It might even be an opportunity for an ecumenical liaison between OASIS and Ecma International (if not under the direct umbrella of ISO/IEC JTC1 SC34). 4. Well, Instant Placeholder Then?Here's a problem that I have with the way macros and scripts are not defined in ECMA-376. Suppose you build document-processing software that does support macros in an implementation of OOXML. When receiving a document, how is it possible to know that your processor's macro scheme is being used? There is no clean way to reconcile disconnects and collisions in different approaches to macros and scripting that may show up in received documents. Also, if you would like to accept the scripting schemes of different processors, at least as alternative forms, how do you recognize those? It would be very much in the spirit of Interoperability Principles to borrow a page from the ODF playbook and require unique identification of a particular implementation's approach to its application-defined macros and scripts. The way ODF 1.0 finessed spreadsheet formulas and all manners of scripting would seem to be useful here too (ODF 1.0 section 2.5, Scripts):
And (ODF 1.0 section 2.5.1, Script):
That's it. That is the provision for scripting in ODF 1.0. There is no definition of the Document Object Model (DOM) for ODF, although it is possible to make a successful guess of what it is (confirmable by playing around with OfficeOpen.org). And notice the polite admission of compiled (binary, would we say) code. There is nothing more about scripts except for identifying some places where they may be used and how events might trigger them. Or, as Rob Weir adds in a comment to his post,
My mileage to "far beyond" is apparently much greater than Weir's. But I do hold that
That strikes me as a minimal interoperability principle for such cases. So, in the case of ODF 1.0, I would have wanted to have it say "the name must be preceded by a namespace prefix." (This, by the way, appeals to a very old principle in some OSI standards produced under ISO.) It is not too late for OOXML implementers to band together and simply agree to use such a scheme for ST_MacroName and the other places (e.g., user-defined functions in spreadsheets) where there may be difficulty in reconciling an application-dependent feature in an interchange and interoperability setting. This could be by simple supplemental agreement until ECMA-376 (and DIS 29500, if approved) goes to a maintenance revision. This would be an useful topic for Microsoft's forthcoming Document Interoperability Initiative too. 5. About the Security ThingI remember, when Office 2007 was announced, that there was a lot said about how the Microsoft Office filename extensions had been separated for macro-free documents and macro-laden documents. For Office 2007, the .docx extension signifies a document package that doesn't use any macros, period. If there are found to be macros, some sort of public hanging will be carried out by Office 2007. [The mechanism does not hinge completely on the filename. See the update at the end of this section.]
Guess what? ECMA-376 does not say anything about the naming of files. There are some examples involving file names where .docx names appear, but there is no explanation for the names and there is no requirement for any naming convention whatsoever. That's also true for template documents (with or without macros also). Now that I think about it, it makes sense. Other implementations of OOXML might want to use different filenames because a different processor is intended, even though there is some degree of interoperable format usage. There could be limited implementations that one would not want to confuse with full-up OOXML support. It is also convenient that Office 2007 (and its predecessors) do not completely rely on the filename extensions. The operating system uses them to launch the correct application, but the Office System also sniffs inside files to confirm which kind and form of document is being presented. (A bug in this procedure, corrected in Office 2007 SP1, prevented ODF plug-ins from being able to open their files. The sniffing didn't go far enough to realize it was not OOXML.) I am still surprised. It makes sense, but I didn't expect it. For comparison, ODF 1.0 does specify the filename extensions use with ODF-formatted documents, whether or not designed to be interchangeable between two (let alone all) different ODF processors. You names your poison, you takes your chances. Update 2008-03-06T04:53Z: Thanks to a clarification from Mauricio Ordoñez (comments below), I learned that the file extension has nothing to do with Office 2007 behavior concerning macros. The differentiation is the use of a private (that is, implementation-specific) content type for macro-enabled documents. This content type is not part of the ECMA-376 set and is not in the set for Open Office XML content. What this means for ECMA-376-compliant documents is that Microsoft Office 2007 will never treat them as macro-enabled. Startling, huh? After I thought about it for a while, I realized that it is an effective if not altogether pleasant way to deal with the "is it one of ours or not?" question. The private content type is a way of Microsoft Office 2007 software understanding that all of those application-specific macro choices are ones designed for and implemented by it. It is a lot more ham-handed than the ODF 1.0 escape mechanism, and not very, uh, ecumenical. I could say a lot more, but I think it is best left as something that will be an object-lesson for "interoperability by design" and the Interoperability Principles, going forward. The more-disturbing realization is that the .docx versus .docm distinction Comments: Hi Dennis, There is a lot to absorb in this article, so rather than comment on the entire piece, I just wanted to point out a fact regarding #5. You said: "Given that Office 2007 uses the file names of OOXML documents as an enforced guarantee that macros are not involved" Office does not rely on file names. There is a better way. Non-macro and macro enabled documents use different content types for the root content part (aka the start part). Using word processing documents as an example: regular documents use 'application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml' while macro-enabled documents use 'application/vnd.ms-word.document.macroEnabled.main+xml'. Renaming file extensions has no effect at all because the contents of Open XML are sufficiently self-descriptive. The key is the Open Packaging Convention. In fact if you create a .docm, rename to .docx, and try to open it, Word 2007 will determine that "something fishy" is going on and will not open the file. [Note: To identify the start part in an Open XML package, you search the root for the part with a relatioship type of 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument', not by URI] Thanks Mauricio, That is very good to know, and I will modify the body of the post. I couldn't remember where start parts were specified in ECMA-376 and I abandoned my search for the information. Mauricio, Well, it is not so simple. It turns out that ECMA-379 does not have a separate content type for macro-enabled documents. I had to Create an Office Word 2007 document to see it. The content type Word 2007 uses is application/vnd.ms-word.document.macroEnabled.main+xml So I will have to find a way to reword this differently. There are two problems of course. One is related to ECMA-376. The other has to do with whether or not malicious .docx documents can be easily created by malware authors (the answer seems to be trivially yes). Nice analysis. I'm not expecting to see interoperable OOXML macros anytime soon. We have it on our agenda for ODF, but even that is in the indeterminate future. But I think interoperability at the bulk level, being able to locate retrieve, move, rename or delete executable code is essential. Even if you cannot understand the internals of the code, you need to be able to do these types of bulk manipulations. Otherwise, preserving the referential integrity of a document is impossible under some pretty basic scenarios, like splitting or merging content. The other aspect of this is that the BRM passed a change which to prevent a common kind of spoofing attack. The specific text that is added is this It is a requirement of this standard that dynamic extension mechanisms, such as scripting languages and macro mechanisms, shall use, for the executable parts, the correct content types, and shall not use any of the content types already defined in this standard. This targets the spoofing problem, where a macro file is given a jpg extension, for example, and then some innocent-looking script attempts to execute that. |
|
|
You are navigating Orcmid's Lair. |
template
created 2002-10-28-07:25 -0800 (pst)
by orcmid |