|
|
status privacy about contact |
|
|
Welcome to Orcmid's Lair, the playground for family connections, pastimes, and scholarly vocation -- the collected professional and recreational work of Dennis E. Hamilton
Archives
Atom Feed Associated Blogs Recent Items |
2007-02-20
OOX-ODF: Is that an XML I See Before Me?Stephane Rodriguez and I disagree about the way Word 2007 provides password protection over DOCX files, the ones in Office Open XML format. It’s a simple disagreement, although it is valuable for users and custodians of office documents to understand what it is about. Stephane also sees the approach for Word 2007 as a falsehood about the openness of DOCX and whether Office Open XML format has been fully disclosed. We disagree about that too. I see this as a separate issue, as I explain at the end. How Much Word Should a Password Lock if a Password Could Lock Word?If you have a means for password protecting a document, and you can either lock the document or not, how much of the document should the software protect? Given a binary choice like that, I say the best answer is “all of it.” If this is all the information that is available, the most-secure interpretation of an user request to password-protect a document is to protect it completely. That’s exactly what Microsoft Office Word 2007 does. If you select Office Button | Prepare | Encrypt Document, you can password protect the entire file, even though it is saved with a .docx extension. The result will be completely encrypted and there is no access to the file as a Zip and the file does not reveal any Office Open XML structure or content. I think that’s fine. Apparently the password encryption of the previous binary formats was “leaky,” in that the DocFile structure, and metadata within it, was still available to inspection by those who know how to examine such structures. In OpenOffice.org’s implementation of Open Document Format, that is also the case. Encryption of the ODF document does not encrypt the entire file. The .odf file continues to be viewable as a Zip file and, although the content is not viewable directly as XML (even though the parts have .xml extensions), the manifest and the metadata are in clear. I think that’s a bug. Exposing metadata may leak important information that the person who locks the file is unaware of and does not want to become known. Exposing the manifest may also be helpful for someone who is attempting to crack the file. Now, there’s still information to be gained simply knowing the filename of the document and knowing its location in a file system or other repository. Sometimes even the existence of the document is too much information. Safeguards for that are unrelated to the capabilities of Microsoft Word 2007 and its default document format. I don’t accept the use-case and requirement that Stephane puts forward. I don’t think, given a binary (locked or not locked) protection scheme, that any content management system should have access to anything in the file, and it should certainly not be able to manipulate material or make substitutions. So, that’s the disagreement. I think the Word 2007 approach is entirely appropriate, and I think any information exposure from the stored form of the document is a bad idea. The prospect of undetected alteration is a notion that I find terrible, since it completely undermines the use of encryption as a simple way to establish the provenance of a document. I am also not claiming any right-or-wrong about this. I favor the Microsoft Word 2007 approach (ignoring any technicalities about whether the encryption is any good or not and the general risks of relying on passwords) over one that allows information to be disclosed to a knowledgeable but casual snooper. So Is this Open Office XML or Not?Stephane claims this particular behavior of Word 2007 as evidence that Microsoft has not fully-disclosed the default Open Office XML file format. This appears to be based on the fact that the file is still saved with a .docx extension, but it can’t be accessed as Open Office XML in the encrypted form. I suppose they could have invented yet-another file-name extension or added a modifier to the extension (the way some Unix-based compression programs operate). [In passing, I notice that when OpenOffice.org encrypts an ODF document, the encrypted parts having .xml extensions are not xml any longer. Their standard names have not been changed, although they are not readable as XML in their encrypted form.] It’s difficult for me to be excited about this. When interchanging and interoperating with documents, especially as part of eGovernment operations, I don’t think anyone will be encrypting and password protecting those unless they’re not meant to be public or usable by other than their originators. I also signed a Word 2007 document with a self-issued digital signature. It worked fine. The .docx can be opened as an Office Open XML Zip package and the digital signature sections are visible in the package structure. This is a specified part of ECMA-376. It accomplishes the great purpose of any alterations violating the signature and also giving non-repudiatable assurance of the provenance of the document file. Digital signatures do represent a challenge for translations between signed OOX and signed ODF, since translation can’t preserve the original signatures. This leads to interesting provenance-preservation challenges under format conversion. We can worry about that (and possible encryption of parts, allowed in ODF but not specifically provided in OOX) later on. Stephane’s latest comments are on a post of mine that discusses a different set of topics. The remarks are a continuation of a comment exchange that occurred on Craig Kitterman’s blog. For Stephane’s interest in my education and proper upbringing around matters of document systems, content management, and the importance of metadata there is this initial sally. I apparently invited this upon myself with this comment and others on a post at Brian Jones’s blog. Dunce that I am, I have failed to absorb Stephane’s tutelage. In particular, this achievement did not occur: “It took me three blog comments to … make you realize you are backing liars.” I have not been made to realize that. Sorry. Comments: "I don’t think, given a binary (locked or not locked) protection scheme, that any content management system should have access" Pure rhetoric. Let me ask you a question, are you a major vendor selling a CMS? If no, please accept the fact you are just venting off. Accessing the file's metadata is not the same than accessing the actual file's content. Not only it is not the same, it so happens CMSes (including NTFS, which is sort of built-in CMS in hard drives) have been doing that for a number of decades. And all of a sudden Microsoft decides it should not happen anymore. At the very least, they are breaking a feature. I'd love to see this breaking change posted in MS blogs out there. What I see, instead, is blogs like yours which not only keep moving the argument out of context, and backing off Microsoft all along the way. Again, let me repeat that to you, Microsoft NEEDS you, they are the little guys, and their influence and impact is very small. As for the value of what you have been posting for so much time, the first time I heard of you was through a link you added (that bad self-promotion behavior) on Brian Jones's blog almost two years ago, where you claimed to analyzed the covenant not to sue. Well, kilometers worth of crap after that, not a human being could make the slightest sense out of what you said. I have to ask you a question at this point: are you a lawyer? If the answer is no, please accept my apologies. But if you are one, just accept the fact that you live on a different planet. Stephane's comment reminds me that this material is seriously out-of-date. The publication of ECMA-376 and the application of the Microsoft Open Specification Promise makes comparison much simpler now. Now I think attention is best placed on conformance and compliance conditions, the degree of underspecification in any of the formats, and how software implementations will be certified for interoperable use in particular application settings. Isn't it funny how the nastiest comments are always anonymous? Glad to see it works this way on your blog, too. :-) The metadata debate comes up in a slightly different context when you look at tracked changes. Should a document's change history (who changed what when) be available to future consumers of the document? There are many scenarios with varying answers to that question, and Open XML provides for two approaches: track WHAT without tracking who or when, or track all of the above. By default, Office does the former, and if you turn change-tracking on you get the latter. But back to the topic at hand: what exactly is the breaking change you'd like for MS bloggers to blog about, Mr. Anonymous? I just returned from a visit to a major CMS vendor who mentioned how much easier it is to access document metadata in Open XML, so perhaps I could roll their perspective and yours into a post. Seriously -- I'm looking for ideas. Regards, Doug Doug, I think Stephane's complaint is specifically about encrypted Word 2007 .docx documents and how there is no exposure of embedded metadata (or indeed, any of the OPC structure). |
|
|
You are navigating Orcmid's Lair. |
template
created 2002-10-28-07:25 -0800 (pst)
by orcmid |