Blunder Dome Sighting  
privacy 
 
 
 

Hangout for experimental confirmation and demonstration of software, computing, and networking. The exercises don't always work out. The professor is a bumbler and the laboratory assistant is a skanky dufus.



Click for Blog Feed
Blog Feed

Recent Items
 
Appreciating ALGOL 60: Launching Computer Science
 
Orcmid's Lair: Global Social Identity
 
What We See Is Not What We Get: Character Codes an...
 
Blinking at Quarks: Is It an Object that I See Bef...
 
The Quarks of Object-Oriented Development
 
Performing in Teams: Where's the Praxis?
 
Windows Home Server Edition
 
The Ultimate Confirmable Incoherence Experience
 
To Express or Not To Express: Choosing a C/C++ Com...
 
Agile Builds: Making a Bad Idea Efficient?

This page is powered by Blogger. Isn't yours?
  

Locations of visitors to this site
visits to Orcmid's Lair pages

The nfoCentrale Blog Conclave
 
Millennia Antica: The Kiln Sitter's Diary
 
nfoWorks: Pursuing Harmony
 
Numbering Peano
 
Orcmid's Lair
 
Orcmid's Live Hideout
 
Prof. von Clueless in the Blunder Dome
 
Spanner Wingnut's Muddleware Lab (experimental)

nfoCentrale Associated Sites
 
DMA: The Document Management Alliance
 
DMware: Document Management Interoperability Exchange
 
Millennia Antica Pottery
 
The Miser Project
 
nfoCentrale: the Anchor Site
 
nfoWare: Information Processing Technology
 
nfoWorks: Tools for Document Interoperability
 
NuovoDoc: Design for Document System Interoperability
 
ODMA Interoperability Exchange
 
Orcmid's Lair
 
TROST: Open-System Trustworthiness

2006-03-29

 

Modeling the Office Open XML Packaging Conventions

OpenXML Developer - Modeling OOX Packages. I've been waiting for an open forum for discussion of the ECMA TC45 Office Open XML Document Interchange Specification (now at draft 1) and especially the OOX package conventions, apparently a subset of the Microsoft Open Packaging Conventions.  Now there’s openXMLdeveloper.org (I hate the name — XML is already open — but love the place).  I just posted about my interest in the packaging model and though it would be useful to share that more widely.

{tags:   }

I'm a protocols and formats standards junky and I started to look at how to describe a conceptual model for the Office Open XML packages. I notice that there seem to be at least three levels of abstractions involved, and while I mull on that some more I was wondering if anyone with a similar interest had any observations to share.

Here are the at-least three:

  1. Package Conceptual Model - this is the highest level of what a generic OOX package is, what its essentials are and what it carries. This does not deal with the specific OOX content, just the package itself. Getting to the carrying of office documents is a much bigger deal, with more levels of abstraction. The relationship and content-type items might not exist independently of parts at this level.
        
  2. Package Logical Model - this is in terms of the abstracted items. I'm waffling, here, on whether the [Content_Types].xml part is reflected or whether every component simply has a content-type attribute. I'm pretty certain that parts and relationships are reflected here, as is the hierarchic structure. This provides a logical, navigational representation for the conceptual OOX packages. It is independent of particular method of storage and technology for access and manipulation.
        
  3. Persisted/Serialized Storage Model(s) - this is in terms of storage-system and data stream formats. Carrying a package in a hierarchical file system (e.g., before-after Zip-extraction) applies as do other possible storage abstractions. The Zip format as a serialized storage structure or data stream is another case. This level makes use of the Zip model (itself an abstraction) as a carrier. Taking it all the way to the bits can be handled below that. An important characteristics at this level is that the models all have a way to be transformed into and out of the Zip serialization. I find the use of a constrained hierarchical-storage model easier to visualize and explain, even though the Zip serialization is the key to interchange.

So that's what I've been thinking about as I tinker with diagrammatic ways of coming to grips with the OOX Package conventions. The reason I'm doing this is that document-processing is a pet interest of mine. I'm interested in explaining and demonstrating how we use abstract levels like this to ultimately accomplish useful processing of digital documents. I think OOX Packages (and OPC even more-so in some respects) are ideal choices because these open formats are going to be of great practical value as well as useful objects of study.

Dennis E. Hamilton
AIIM DMware Technical Coordinator
http://odma.info http://DMware.info

The package model is a very interesting application of Zip files as containers for more-complex related components that may carry references to each other and to material that is elsewhere.  The package allows the essential relationships (and the nature of the package components) to be determined without having to know how to process the individual parts themselves.  That makes for some interesting opportunities in document processing and in the interchange and preservation of digital assets.

 
Dennis, I'm with you (I think) on the three-part breakdown, but I'm confused a bit by the choice of words, especially the use of "logical". I know it's common practice to create an abstract model; I think it's required. But I'm just wondering why the abstract model is called "logical". It is about structure and relationships, and, possibly, as you suggest, some metadata. Perhaps, as we've discussed elsewhere, I'm just having a metaphor confusion. But I'm having trouble seeing, say, hierarchy as logic.

I also wonder about the persisted model. Isn't serialization just one way to persist stuff? For the sake of parallel construction isn't it better for the persistence model to be named "Package Storage Model"?
 
 
Great questions. I'm going to ponder these and make a follow-up post, but here is my initial thinking:

"Logical" is from old practice in database work and even computer architecture ("logical design"). I could just say navigational [data] model or [navigational] data model (which are still abstract and include notions of hierarchy.) I think it goes back to a logical-physical distinction and is not really logical in the sense of "logical inference." (And then there are "logical operations" and "logical data types"). I will take another look to see if there is something sharper to be done here.

I don't think I mean "Storage Model" so I need to think very hard about that one. Serialization may be ephemeral (e.g., as in for transmission whether RPC or something else) and not about recorded digital media. Also, the reference model at this level is in terms of (abstract) ZIP packaging. I'll look
at this one again, too, and see what the openXMLdeveloper.org folk have to say, although I think most there are concerned about using OOX and OPC, not conceptualizing it.
 
 
Dennis, you are correct in the older usage of logical models for datasets, etc. I conveniently put that knowledge aside so I could just ask the question about the choice of words. It may be a good word to use, since it likely has a usable understanding in the software field.

The serialization question is an interesting one. Now that I read this again I'm thinking that the logical model exists as a representation of the conceptual one. And finally, on the computers, we have a concreate realization of the logical representation. I think it would be good to, somehow, allow the physical, embodied, datasets to have some kind of self-describing patch sewn on their shirts, that helps convey the model they realize.

Maybe that's off base here. But for permanent usability, I'm thinking that some model info has to accompany the stored bits. At some point in the future the bits may be all that's available. (I almost said "readable" there.)
 
 
Bill, I like what you say here. And I like the last little touch. I suppose it is what is intended by providing metadata with/in our digital artifacts and it is certainly what some see XML as a means to. I always liked the OSI ASN.1 OID method of appealing to some external authoritative source. XML Namespaces can be thought of as a distributed-authority approach to the same problem, given an appropriate community praxis. Heh.
 

 
Construction Structure (Hard Hat Area) You are navigating Orcmid's Lair.

template created 2004-06-17-20:01 -0700 (pdt) by orcmid
$$Author: Orcmid $
$$Date: 10-04-30 22:33 $
$$Revision: 21 $