Blunder Dome Sighting  
privacy 
 
 
 

Hangout for experimental confirmation and demonstration of software, computing, and networking. The exercises don't always work out. The professor is a bumbler and the laboratory assistant is a skanky dufus.



Click for Blog Feed
Blog Feed

Recent Items
 
Republishing before Silence
 
Command Line Utilities: What Would Purr Do?
 
Retiring InfoNuovo.com
 
Confirmable Experience: What a Wideness Gains
 
Confirmable Experience: Consider the Real World
 
Cybersmith: IE 8.0 Mitigation #1: Site-wide Compat...
 
DMware: OK, What's CMIS Exactly?
 
Document Interoperability: The Web Lesson
 
Cybersmith: The IE 8.0 Disruption
 
Cybersmith: The Confirmability of Confirmable Expe...

This page is powered by Blogger. Isn't yours?
  

Locations of visitors to this site
visits to Orcmid's Lair pages

The nfoCentrale Blog Conclave
 
Millennia Antica: The Kiln Sitter's Diary
 
nfoWorks: Pursuing Harmony
 
Numbering Peano
 
Orcmid's Lair
 
Orcmid's Live Hideout
 
Prof. von Clueless in the Blunder Dome
 
Spanner Wingnut's Muddleware Lab (experimental)

nfoCentrale Associated Sites
 
DMA: The Document Management Alliance
 
DMware: Document Management Interoperability Exchange
 
Millennia Antica Pottery
 
The Miser Project
 
nfoCentrale: the Anchor Site
 
nfoWare: Information Processing Technology
 
nfoWorks: Tools for Document Interoperability
 
NuovoDoc: Design for Document System Interoperability
 
ODMA Interoperability Exchange
 
Orcmid's Lair
 
TROST: Open-System Trustworthiness

2008-09-12

 

Cybersmith: IE 8.0 Mitigation #1: Site-wide Compatibility

I have been experimenting with Internet Explorer 8.0 beta 2 enough to realize that all of my own web sites are best viewed in compatibility mode, not standards mode.  I find it interesting that other browsers, such as Google Chrome, apparently apply that approach automatically, suggesting to me that the IE 8.0 standards mode is going to cause tremors across the web.

The first step to obtaining immediate, successful viewing under IE 8.0, as well as older and different browsers, is to simply mark all of my sites as requiring compatibility mode.  That is the least activity that can possibly work.  It provides a tremendous breathing room for being more selective, followed eventually by substitution of fully-standard versions of new and heavily-visited web pages on my sites. 

Some pages may remain perpetually under compatibility mode, especially since the convergence of web browsers around HTML 5 support will apparently preserve accommodations for legacy pages designed against non-standard browser behaviors.

This post narrates my effort to accomplish site-wide selection of compatibility mode by making simple changes to web-server parameters, not touching any of the web pages at all.

#0. The Story So Far
#1. The Simplest First Step That Can Possibly Work
  2.  Satisfying the Prerequisites
  3.  Experimental Approach for Confirming mod_headers Operation
  4.  Web Deployment Approach
  5.  Authoring the .htaccess File
  6.  Deploying the .htaccess File
  7.  Confirming .htaccess Success
  8.  Shampoo, Rinse, Repeat
  9.  Tools and Resources 

See also:
2008-08-30 Orcmid's Lair: Interoperability: The IE 8.0 Disruption for the situation and the basic approach
[undated] MSDN Library: Defining Document Compatibility, Internet Explorer 8.0 beta, preliminary
[undated] MSDN Library: Implementing the META Switch on Apache, preliminary
2008-08-28 Hanu Kommalapati: Apache httpd configuration for IE7 standard mode rendering in IE8 (via Bruce Kyle)

#0. The Story So Far

On installing Internet Explorer 8.0 beta 2, I confirmed that none of my web sites render properly using the default standards-mode rendering.  However, my sites render as designed for the past nine years if I view them in compatibility mode.

Although I want to move my high-usage pages to standards-mode over time, I don't want users of Internet 8.0 to have to manually-select compatibility mode when visiting my sites and their blog pages.

What I want now is the simplest step that will advertise to browsers that my pages are all to be viewed in compatibility mode.  This will direct the same presentation in IE8.0 as provided by older versions of Internet Explorer and and current browsers (such as Google Chrome) that don't have the IE8.0 standards mode.   I can then look at a gradual migration toward having new and high-activity pages be designed for standards-mode viewing while other pages may continue to require compatibility mode indefinitely.

#1. The Simplest First Step That Can Possibly Work

There are ways to have web sites define the required document compatibility without having to touch the existing web pages at all.  If I am able to accomplish that, I will have achieved an easy first step:

  1. Have a complete site automatically set to be browsed in compatibility mode (EmulateIE7, in my case), buying time to provide finer grain solutions.

This is accomplished by convincing the web server for my sites to insert the following custom-header line in the headers of every HTTP response that the server makes:

X-UA-Compatible: IE=EmulateIE7

The HTTP-response lines precede the web page that the web server returns.  The browser recognizes all lines before the first empty line as headers.  Everything following that empty line is the source for the web page.  The browser processes the headers it is designed to recognize and ignores any others.

You can see the headers returned as part of an HTTP request by using utilities such as cURL and WFetch.  Here are the headers from my primary web site using a show-headers-only request via the command-line tool, cURL:

Headers from http://nfoCentrale.com/ (click for full-size image)

2. Satisfying the Prerequisites

The MSDN Article on Defining Document Compatibility describes site-wide compatibility control for two web servers: Microsoft Internet Information Server (IIS) and Apache HTTP Server 1.3, 2.0, and 2.2.

  • Have direct access to the folders of web pages on my web server.
    Satisfied:
    I have FTP access to the complete set of directories that hold web pages for my sites.
      
  • Determine that my Apache web server is one of those discussed.
    Satisfied:
    My web-site hosting service uses Apache HTTP Server on Linux.  When I check the administrator control panel provided by the hosting service, I discover that I am on Linux kernel 2.6.9 and Apache version 2.2.9 (Unix).  The Apache 2.2 requirement is satisfied.
      
  • Determine that I can set server-level response.
    Not Satisfied:
    My sites (nfoCentrale.com and all domains that are added-on and shared under it) are on a shared server.  I do not have access to the overall server nor to the httpd.conf configuration file.

The Apache 2.2 Module mod_headers documentation and Hanu Kommalapati's example describe how directory-level response headers can be specified instead.

  • Determine that I can set directory-level response headers.
    Not Clear:
    I know that I can create .htaccess files everywhere on my sites.  I don't know whether mod_headers is installed in my server configuration.  This list suggests that the option is not available, and there might not even be dynamic loading of extensions (mod_so).   The easiest way to confirm the actual state of affairs is by experiment.
      
    To Be Verified: I also submitted a support ticket; the hosting company responded immediately.  The answer is, yes, mod_headers is supported on my server.  Great!  Now to see it working.

3. Experimental Approach for Confirming mod_headers Operation

  1. I want to introduce an .htaccess file that has the following content:

    <IfModule headers_module>
    Header set X-UA-Compatible: IE=EmulateIE7
    </IfModule>
      
  2. I will try it out on a site that is a placeholder with no meaningful content and no visitors other than myself: eoware.org.
     
  3. The current state of the site has the following default result when I direct IE8.0 beta2 to http://eoware.org:
     
    Unaltered eoware.org site as seen by IE8.0 beta 2 (default standards-mode) 
      
    This portion of the default page is shown in default standards-mode view.  Notice the broken-page button to the left of the refresh button.  This indicates that a compatibility-mode view is available.
      
    This rendering is not in agreement with how the page was designed (and not touched since 2006).  Notice the color of the horizontal line, and the (added) spacing in the right-justified revision-history information lines.
       
  4. When compatibility view is selected, the page appears as it was designed (non-standard as it is):
      
    Unaltered eoware.org site rendered in IE8.0 beta 2 compatibility view
        
    The compatibility button is depressed, the horizontal line is the proper deep-blue color, and the spacing of the right-justified revision-history information is as expected.
      
  5. The desired state is as in (4) but with no action needed from the user and with the compatibility-view button not presented.  This will confirm that, for this site, the EmulateIE7 mode has been specified and automatically honored by the browser.  Other browsers (such as the initial Google Chrome release and older versions of Internet Explorer) will default to this view regardless; the custom HTTP header is harmlessly ignored by older browsers.

4. Web Deployment Approach

Having direct FTP access to the web-page folders on my server, along with an out-of-the-way place to try out the change, is relatively safe.   Since I am placing an .htaccess file where there presently is none, it is feasible to (1) upload the file, (2) see if it works, and (3) quickly delete it if there is any failure.  Having succeeded in introducing .htaccess files for other purposes, I'm confident I can make the change correctly.

I'll not do it that way.  Instead, I will rely on my web-deployment-safety model and take advantage of the safety net it affords, even though I could do without it if all I wanted was experimental confirmation.  This is a cybersmith post, and I want to illustrate a disciplined approach that has more flexibility in the long run.  To see the result that could have been attained by using the direct approach, you can peek ahead to section 6, below, and the image just above it.

Here is the structure of safeguards that I employ to control updates to my sites, keep them backed up, and also have a way to restore/move some or all of the sites.  I can also roll-back changes that are incorrect or damaging.  I can repair a corrupted site too (and I have had to do that in the past).

  1. The hosted-site web-server pages are maintained using FTP between the server and a private development server which holds a complete image of the hosted-site content.  This mirror is in the file system of the development server (an old lap-top running Windows XP and connected on my household LAN for duty as a light-weight server).   Synchronization between the site and the mirror is accomplished by FTP in either direction, depending on where new content first appears (on the server for Blogger web logs, on the development server for manually-authored pages and their updates).
      
    Basic Site Deployment Setup.  The hosted-site is kept synchronized by FTP with a file-system image (left) on a private development server. The content of http://nfocentrale.com is mapped to the folder structure starting at public_html. The Visual Source Safe project $/A2HostingWeb provides source-code management, versioning, and backup of the public_html image (right).
    The hosted-site is completely mirrored  in the file system of the development server (click for larger image) VSS Project $/A2HostingWeb provides source control over the hosted-mirror public_html directory (click for larger image)

        
  2. In addition to being separately backed-up, the hosted-site image corresponding to http://nfocentrale.com is also the working folder of a Visual SourceSafe project, $/A2HostingWeb.  VSS holds the current and also older versions of site material.  VSS is also the source of newly-authored material that is ready to be added to the image and published to the hosted-site itself.
       
  3. At the web-site, the eoware folder under public_html (the http://nfocentrale.com top level) has also been defined as the top level of the eoware.org domain.  This is accomplished by use of the hosting-service administrative control panel to create an add-on domain (of an already-leased domain name) at a particular sub-folder of my main site.  This feature is a key factor in my choice of the particular hosting service.   In addition, the .htaccess file that I already have in public_html causes any access directly to the eoware/ folder to be redirected to use the http://eoware.org URL.  This forces the correct address in the browser for use in book-marking and in search results.  These two cURL requests demonstrate the mapping:
      
    The nfocentrale.com/eoware is the home location of eoware.org; access to the folder automatically switches to the eoware.org domain (click for larger image)  
      
  4. To use this deployment path, I need to author the desired .htaccess file and have it checked-into VSS under the $/A2HostingWeb/eoware/ project.  To deploy the new material, the content is exported (using Get Latest Version ...) to the file-system site-image location on the development server computer: c:\publicca\A2hosting\public_html\eoware\.  Transfer to the web site is by FTP synchronization of the site-image eoware\ folder and the corresponding hosted-site folder.
      
  5. Although this is a roundabout way to make this individual page, it maintains the practice of not ever authoring directly in the deployment path except under serious emergency.  In the absence of an emergency, all development-site authoring occurs elsewhere and is brought under VSS management first.  New and changed material flows from VSS to the site image to the web site.

    Some new material and changes do originate on the hosted site first.  Typical site-first materials include blog posts originated from Blogger and Windows Live Writer, wiki pages (whenever that day comes), and data gathered from web-page forms.  Those materials are also periodically synchronized from the hosted-site to the site image and then checked into VSS, keeping a complete site image on the development server under VSS management and for backup.

5. Authoring the .htaccess File

If I was simply making an .htaccess page, I would create it directly in a text editor, having a file such as that created in step 4, below.  That page would be saved at a convenient location-machine address and then transferred to the hosted-site using FTP, leading to the result in section 6.

To preserve my development and deployment model, I require more steps. 

  1. Although I also have a development web site on the development server, it is an IIS server, not an Apache HTTP Server.  The IIS server, FrontPage 2003, and the FrontPage Server Extensions that I use to perform web-page authoring do not permit editing of Apache .htaccess pages and their automatic inclusion under source-code management. 
       
    I cling to this web-page authoring model not merely because I have used it since 1998, although that's reason enough.  I place high value on the development web-site pages being kept under automatic source-code management using Visual Source Safe.  I use the sharing feature of VSS to share the development-site web page content into the $/A2HostingWeb project too.  This permits staged synchronization between the development site, the hosted-site image, and indirectly the hosted site itself; and, vice versa.  (For those wondering what's on the quiz, I call this a hybrid Microsoft Site Server model, because it works in both directions to also capture authored material that arises initially on the hosted site.)
      
  2. Because .htaccess files and other Apache-site administration files need to be introduced under source-code management some other way than via FrontPage and my development web site, there's an administrative skeleton project, $/A2HostingAccount, that mimics the folder structure of the hosted-site image just enough to carry any administrative files. 
      
    Working Project for Authoring Adminstrative Files. Project $/A2HostingAccount holds administrative pages that are shared to the hosted-site via $/A2HostingWeb (left). Working folders under the local c:\MyProjects\A2HostingAccount directory provide check-out and editing of the .htaccess file (right).
    VSS $/A2HostingAccount with the eoware/.htaccess addition (click for larger image) MyProjects\A2HostingAccount Working Folder with eoware\.htaccess for authoring (click for larger image)

      
    The $/A2HostingWeb folder structure is replicated in $/A2HostingAccount only to the levels where .htaccess files appear.  The same .htaccess file is shared between a folder of $/A2HostingAccount and the same folder of $/A2HostingWeb.  In this way, $/A2HostingWeb has source-code management of the entire hosted-site image and $/A2HostingAccount provides a view that is limited to the administrative material.
      
  3. Because I already have some .htaccess pages for my server, I have lifted an existing one (from nfoworks) by sharing it to eoware, branching it so that changes don't reflect back to nfoworks, and sharing the newly-branched one to $/A2HostingWeb/eoware/ (so that changes will also be seen in that project).  This pattern will be re-used to quickly make more .htaccess pages for the root folders of my remaining web sites that need one as part of IE8.0 mitigation. 
      
  4. The customization of the new .htaccess file is accomplished by performing a VSS Get Latest Version ... operation of the eoware project to the working folders on my local machine (on the right, above).
      
    The next step is to open the local .htaccess copy in a VSS-aware editor and check out the file (or check it out first otherwise).  The checked-out version is edited to customize it for the new destination, adding the IE8.0-mitigating mod_headers instructions.
      
    After editing is complete, the changed file is saved to disk and checked-in via VSS control.
      
    This is the completed result:
      
    The eoware.org .htaccess after editing and check-in (click for larger image)

6. Deploying the .htaccess File

Having edited the .htaccess file on my development machine and checked it into VSS (on the development server), the next steps are all conducted on the development server:

  1. Using VSS on the development server, I perform a Get Latest Version ... operation on the $/A2HostingWeb/eoware project.  This delivers the newly-customized .htaccess file to the hosted-site image, the working folders for $/A2HostingWeb there.
      
  2. Using WS_FTP Pro on the development server, I connect to the hosted site.
     
  3. After connection, I drag the hosted-site image c:\publicca\A2Hosting\public_html\eoware\ folder to the connected-site /public_html folder in the WS_FTP connecting- and connected-site panes.  The eoware/ folder of the connected site is updated with any new or more-recent files from the hosted-site image.  I see that the .htaccess file is now there.

This is the same deployment procedure for updating of any of my individual sites under the hosted-site.  A script for it would be useful.  This is on my someday-not-now list.  Scripted or not, this is the basic procedure.

7. Confirming .htaccess Success

Assuming that the .htaccess introduction has not derailed the server, confirmation of the parameters and their success is straightforward:

  1. Access to eoware.org with the cURL utility should reveal the custom header in the response.  There it is:
     
    cURL request for eoWare.org headers confirms EmulateIE7 (click for larger image)
      
  2. Next, access to http://eoware.org/ with IE 8.0 beta 2 should provide a different experience.  And here that is:
      
    EmulateIE7 removes the Compatibility-View button and provides the compatible rendering
      
    This access automatically provides the compatibility view and there is no Compatibility View button.  (Compare with the second result in section 3.)

8. Shampoo, Rinse, Repeat

We've demonstrated that the .htaccess customization works correctly on my web site and provides the desired result on a little-used URL that is a placeholder for work yet to come.  After this cautious effort, it will be straightforward to add similar .htaccess files to each of the individual sites implemented on the hosted-site.  

Before that, I will first add the custom-header response to the .htaccess file that is already at public_html, the root of the main site, http://nfocentrale.com.   This provides the custom header for all access. 

Once all site access returns the custom HTTP header, I can then take my time determining how to work toward migrating sections of web sites to pages that view properly in IE 8.0 standards mode.  That will be accounted for as additional mitigation steps.

9. Tools and Resources

The following tools were used in this mitigation step:

  • cmd.exe, the Microsoft Windows XP command-line console shell
      
  • Hypersnap 6, for screen-shot diary and demonstration of the mitigation step
      
  • cURL 7.15.4, command-line HTTP request and response capture tool
      
  • Apache HTTP Server 2.2.9 (Unix), running as a shared server on a Linux web-hosting system
      
  • Windows XP computer for desktop development-site authoring
     
  • Windows XP computer providing the development web server (IIS), a co-located source-control database, and the hosted-site image for FTP synchronization with the hosting-service site
      
  • Visual SourceSafe 6.0d database on the same computer as the development web server
     
  • WS_FTP to synchronize directories without transferring unchanged material
      
  • jEdit 4.3 for editing the .htaccess page.  Any basic text editor can be used.  I use jEdit because (1) I already use it for other things, (2) it integrates with Visual Source Safe, and (3) it provides syntax highlighting for .htaccess files.

Labels: , ,

2008-09-10

 

DMware: OK, What's CMIS Exactly?

There's a nice flurry of interoperability news today, announcing the Content Management Interoperability Services (CMIS) Specification sponsored by EMC, IBM, and Microsoft, with the participation of other content-management vendors, including Open Text. [update: There is extensive coverage on Cover Pages. I recommend that as the comprehensive source.]

Content/Document-Management Integration/Middleware Scheme for This Century?

The stratospheric view from Josh Brodkin suggests that CMIS is a means for cross-over between different content-management regimes as well as bridging from content-aware applications to content-management systems.

The Sharepoint Team describes CMIS as an adapter and integration model for access from content-aware applications in a CMS-neutral way, relying on distributed services via SOAP, REST, and Atom protocols.

The 0.5 draft specification (2008-08-28 6.64MB Zip File download) provides a core data model for expression of managed repository entities, with loosely-coupled interface for application access to repositories via that model:

"The CMIS interface is designed to be layered on top of existing Content Management systems and their existing programmatic interfaces. It is not intended to prescribe how specific features should be implemented within those CM systems, nor to exhaustively expose all of the CM system’s capabilities through the CMIS interfaces. Rather, it is intended to define a generic/universal set of capabilities provided by a CM system and a set of services for working with those capabilities."

It appears that a wide variety of service integrations are possible, although the basic diagram has the familiar shape of an adapter-supported integration on the model of ODBC (and TWAIN and ODMA). Although that's the model, the integration approach is decidedly this-century, relying on relatively-straightforward HTTP-carried protocols rather than client-side integration. Clients must rely on the Service-Oriented Interface, and there is room for provision of client-side adapters to encapsulate that. Either way, this strikes me as timely and very welcome.

CMIS Integration Model featuring Service-Oriented Interface [via EMC: click for full-size iamge]

Next Steps

The authors have been working for two years to arrive at the draft that will now be submitted to OASIS, estimating that it will take another year to finalize a 1.0 version. [Such a Committee Draft would then go through some rounds of review before promulgation as an OASIS Standard.]

I thought, at first, that this was some form of off-shoot from the AIIM Interoperable ECM (iECM) Standards Project, yet there is no hint of that in the CMIS materials nor on the iECM project and wiki pages.

Announcement of the Proposed TC has just appeared at OASIS [afternoon, September 10]. OASIS Members will make any comments on the proposed charter by September 24, after which there will be a call for participation and then an initial meeting. OASIS members who want to participate in the TC can sign up after the call for participation. The initial meeting is provisionally targeted for a November 10 teleconference. The first face-to-face meeting is planned for three days of mid-January in Redmond. You can follow the charter-discuss list here to see whether there are any questions about the charter, scope, and overlaps with other efforts.

Announcements, Commentary, and Resources


[update 2008-09-12T08:09Z I am having trouble getting Blogger to push updates through FTP to my site. This repost is an attempt to get the previous changes posted. update 2008-09-11T19:12Z Well, added some interesting links as deeper analysis and pontification arises. I don't expect to add more unless Dare or Tim Bray chime in. update 2008-09-11T15:43Z Use full-size CMIS diagram from EMC (via Cover Pages) update 2008-09-11T15:35Z Repair handling of images and attract attention to the Service-Oriented Interface notion employed in the CMIS diagram. update 2008-09-11T15:17Z Add link to comprehensive Cover Pages compilation. I'm also praying that Blogger's FTP update succeeds sometime in the proximity to my submitting the post. update 2008-09-11T02:37Z Add links to additional resources at EMC and to add images/videos to the page. update 2008-09-11T01:15Z Added more links and information about the OASIS Proposed Charter for the CMIS TC.]

Labels: ,

2008-09-07

 

Document Interoperability: The Web Lesson

[update 2008-09-08T00:24Z Cross-posted from Pursuing Harmony because of the overlap with convergence of HTML, web standards, and the IE80.0 mitigation that is touched on here.]

"are there alternatives to google groups search for searching old USENET messages? because groups date fielded search is teh broken."

-- Richard Akerman on Twitter, 2008-08-31 

Be prepared for a dramatic shift in the reality of web-site browsing and the honoring of web-page standards.   The pending release of Microsoft Internet Explorer 8 is going to put the reality of web standards and their loose adherence in our faces.  Although Internet Explorer is indicted as the archetypical contributor to disharmony on the web, Internet Explorer 8 is going to challenge all of us to deal with the reality of our mutual contribution to the current state of affairs.

Here is a lesson, probably many lessons, for document interoperability and the way that standards for document formats evolve and harmonize, or not, over time.

The Web as Clinical Science

The movement from loosely-standard pages and their browsing to strictly-standard pages and standards-mode browsing will illustrate every aspect of the same challenge for office-productivity documents and the office suites that process them. 

Web pages are the experimental drosophilae of digital documents.  All aspects of dynamic convergence on standards, themselves evolving, and the forces of divergence, are demonstrated clearly and rapidly.  I expect it to take Internet generations for significant convergence, with no static level of standards adherence anywhere in sight.  It took us almost 20 years to get to this point on the Web; I figure it will take at least five more to dig out of it far enough to claim that there is a standards-based web in existence and in practice.  I'm optimistic, considering that HTML 5, the great stabilization, is not expected to achieve W3C Recommendation status until 2012.

No document-interoperability convergence effort is anywhere close to the promising situation of the web as Internet Explorer 8, HTML5 implementations, and other compatibility-savvy browsers roll out over the next several years.  It is useful to use that situation to calibrate how convergence and interoperability could work for document interoperability.  There are significant technical barriers.  The non-technical barriers are the most daunting.  That should be no surprise.

Versioning in Document Use

I've written on Orcmid's Lair about the IE 8.0 Disruption.  This involves changes in Internet Explorer 8.0 by which web pages are rendered in standards-mode on the assumption that pages are conformant with applicable web standards.  In the past, it was presumed that pages were loosely-standard and browsers, also loosely-standard, made a kind of best effort to present the page.  The consequences have been explained marvelously in Joel Spolski's post on Martian Headsets.

We are similarly relying on document-format standards as a way to provide for many-to-many interchange and interoperability between different (implementations of versions of) document-format standards and different (implementations of versions of) processors of those digital documents.  That means we have a version of the loosely-standard documents with loosely-standard processing problem.  We can't be strictly standard because the standards can't (and definitely don't) have strict implementations at the moment; and there are many ways that specifications and implementations have been kept loose by design.  Accompanying that looseness by design is the the simple fact of immaturity among the contending document-format standards for office applications, particularly as vehicles for interoperable applications.

For office-productivity documents as we know and love them, there are five, count 'em five "official standards." 

The "Official" Public Standards of Office Documents

For Office Open XML Format (OOXML), there is the ECMA-376 specification of December 2006.  There is also the ISO/IEC 29500:2008 Office Open XML File Formats standard once it is made available.  IS 29500 will have some substantive differences from ECMA-376.  We won't have a solid calibration of the differences until the IS 29500 specifications are available and subject to extensive review.

For the OpenDocument Format, there is the Open Document Format for Office Applications (OpenDocument) v1.0 OASIS Standard issued 1 May 2005.  There is also the ISO/IEC 26300:2006 Open Document For Office Applications (OpenDocument) v1.0 standard (also on the publicly-available listing).  IS 26300 is for the same format as the OASIS v1.0 standard, but it is on a completely-separate standards progression.  Appendix E.3 accounts for the differences of IS 26300 from the text of the May 2005 OASIS Standard.  The first page of the IS 26300:2006 document (page 5 of the PDF) identifies its source as Open Document Format for Office Applications (OpenDocument) v1.0 (Second Edition) Committee Specification 1, dated 19 July 2006, derived from document file OpenDocument-v1.0ed2-cs1.odt; this is not another OASIS Standard, however.

The second and latest OASIS Standard for ODF is Open Document Format for Office Applications (OpenDocument) v1.1 issued 2 February 2007.  This document is derived from OpenDocument v1.0 (Second Edition) Committee Specification 1, the same specification that is the source of content for ISO/IEC 26300:2006.  The changes made to arrive at ODF v1.1 from the v1.0 (Second Edition) committee specification are detailed in Appendix G.4.  There are some mildly-breaking changes from ODF v1.0 to ODF v1.1, mostly of a clarification or correction nature.  There are a few additional features that have no down-level counterparts in ODF v1.0.

A third OASIS Standard, ODF v1.2, is under development.  The current drafts, using a very-different organization from v1.1, are available as pubic documents of the OASIS Open Document TC. 

We can expect to see more versions of ODF and of OOXML at their various standards venues.  We'll be watching here on nfoWorks as the situation becomes even more chaotic.  Notice that this diversity ignores the variety of divergent implementations of the various specifications.

Format Versions that Live Forever

It is possible for one document-format specification to officially supplant another, with the older specification deprecated.  That has not been done so far with any of the five-and-growing document-format specifications, any more than it has been done for most of the versions of HTML specifications that have been recommendations of the W3C (and IETF before the development track moved entirely to W3C). 

For example, the last full-up specification for HTML, the HTML 4.01 W3C Recommendation of 24 December 1999, has this to say about its immediate predecessor: "This document obsoletes previous versions of HTML 4.0, although W3C will continue to make those specifications and their DTDs available at the W3C Web site."  This was possible because HTML 4.0 was young and there were important defects that 4.01 cured.

The HTML 4.01 specification continues with the following recommendation: "W3C recommends that user agents and authors (and in particular, authoring tools) produce HTML 4.01 documents rather than HTML 4.0 documents. W3C recommends that authors produce HTML 4 documents instead of HTML 3.2 documents. For reasons of backward compatibility, W3C also recommends that tools interpreting HTML 4 continue to support HTML 3.2 [W3C Recommendation 14 January 1997] and HTML 2.0 [IETF rfc1866 November 1995 and the IETF-obsoleting rfc2854 June 2000] as well." 

The XHTML branch of specifications, originally derived from HTML 4.01, were intended as the basis for a future generation. 

Meanwhile, there has been work toward both XHTML 2 and HTML 5.0

HTML 5.0 is currently intended to exist alongside XHTML 1.x and its newer arrangements while also absorbing XHTML 1.x to some degree (by having an XML form).  The current HTML 5.0 draft specifies legacy processing (in its HTML-syntax form) for variations of over 60 HTML DOCTYPE DTD flavors, extending back to HTML 1.0 and other variants.  The intention is to converge HTML and XHTML 1.x under a consistent HTML 5 processing model with only no-quirks, some-quirks, and quirks modes.  This is also intended to end the variation and extension of HTML (not XHTML) by capturing <!DOCTYPE HTML> for its own and having a concrete HTML syntax that is fully-divorced from both SGML and XML.  It is important to point out that HTML 5 is not going to eliminate the divergence that browser (user-agent) plug-in models, plug-in implementations and scripting systems (especially client side) bring to the mix.

Document-format versions are not easily abandoned.  Even if production of a format is deprecated, consumption of the format may need to continue into the indefinite future, and certainly so long as emitters of deprecated formats have significant usage.  The W3C progression of HTML is at a point where that is fully-recognized and being honored in reaching toward an HTML 5 plateau sometime in the next decade.

Considering this promising stabilization, when would I manage to change all of my web sites and blogs to clean HTML 5 pages?  Not until I know that visits to those sites are only a small fraction of Internet Explorer versions prior to IE8 (or maybe IE9) and other browsers lacking full-up standards-mode processing.  Fortunately, the HTML 5 specification-effort promises to show me exactly how to do that in a mechanical way.  I am looking forward to automated assistance.  In my case, I'll also have the benefit of my IE 8.0 mitigation effort.  Other web sites may require other approaches, and user browser choice will involve important trade-offs for some time. 

I am surprised by the number of people who operate multiple browsers.  Although I operate multiple products for office applications these days, that's mostly to explore their interoperable use, not to ensure ability to interchange documents (well, not until I joined OASIS and the ODF TC).  I've been a serial adopter of Internet Explorer versions since IE 2.0.  As a typical late-adopter, I may finally branch out now just to have a better calibration of the migration to standards-based sites and browsers for them.

This is an important lesson for the management of the expanding variety of specifications of formats for office-application documents, formats of which HTML packagings are sometimes one of the flavors.

Reconciling office-application document-format versions does not promise to be so easy as the current effort to stabilize HTML for the web.

The Looseness of Document Specifications

Of course, OOXML and ODF are not close dialects off a single family tree, as HTML variants might be treated (and HTML 5 demonstrates, if successful).  In addition, the current specifications are not for same-conformance, interchangeable-everywhere documents:

  • There are weak conformance requirements.  It is not necessary to implement any particular amount of the specified format: OOXML or ODF.  This is by design.  I don't expect that to change.  There is also no way to indicate how much or how little is accepted and/or produced.  Well, you could look to see what software produced the document, using ODF as our example:

<office:document-meta
    xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
    xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
    office:version="1.2">
  <office:meta>
    <meta:generator>
        OpenOffice.org/3.0_Beta$Win32 OpenOffice.org_project/300m3$Build-9328
    </meta:generator>
  </office:meta>
</office:document-meta>

This strikes me as even less appealing than the challenge of sites adjusting for browsers and browsers adjusting to HTML DOCTYPE declarations (and their absence).
   
It is not encouraging that the office:version attribute and <meta:generator> element are both optional.  It is unfortunate that the office:version attribute is generally uninformative about the processing requirements for the document file in hand, serving merely as an automatic claim of one specification the document conforms to.  The document is also likely to conform to earlier versions and probably alter later versions, although it is unclear how we can determine that easily for a given document representation.

  • Arbitrary "foreign" elements are allowed.  I'm not clear how IS 29500 for OOXML will allow for this kind of thing, but the ODF specifications are justly-notorious for this provision (ODF 1.1, section 1.5):
      
    "Documents that conform to the OpenDocument specification may contain elements and attributes not specified within the OpenDocument schema. Such elements and attributes must not be part of a namespace that is defined within this specification and are called foreign elements and attributes.
      
    "Conforming applications either shall read documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place, or shall write documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place.
      
    "Conforming applications that read and write documents may preserve foreign elements and attributes."
      

    There are some further wrinkles and this proviso:
      
    "Foreign elements may have an office:process-content attribute attached that has the value true or false. If the attribute's value is true, or if the attribute does not exist, the element's content should be processed by conforming applications. Otherwise conforming applications should not process the element's content, but may only preserve its content. If the element's content should be processed, the document itself shall be valid against the OpenDocument schema if the unknown element is replaced with its content only."
      

    As a developer, I love gimmicks like this.  But, basically, this only works with processors that re-encounter document files that they themselves produced.  Anything more coherent requires that the implementers of different processors form some sort of out-of-band, separate-from-the-standard interoperability agreement on particular foreign elements and handling of office:process-content attributes.  Users, confident that their software is "standard," will have frustrating and inexplicable interchange experiences (unless the usual thing is done and everyone agrees to lock in on the same software [version], surprise, surprise).

    OOXML has a versioning scheme that might provide controlled extensions that degrade usefully when processed by implementations of down-level specification versions.  It is unclear at this point whether this is just a more complicated way to end up with the same interoperability problems.

  • Some features require foreign content.   Both OOXML and ODF have features where content is represented by a binary-data part elsewhere in the package.  There is little (OOXML) or no (ODF) indication of what the format of the binary element is and what MIME types are allowed for such document components.  All use of those features and any interchange agreements about them are beyond the current provisions of the relevant document-format standards.
      
    There are other places where implementation-defined values are expected and are expected to be preserved by other implementations.

  • Some values and default selections are implementation-specific.  I was mining in the ODF specification the other day.  I did not expect to find attributes having text on these patterns:
       
    "The value of this attribute is implementation [or application] specific."
      
    "If this attribute is not present, the application might or might not display [whatever]."

    These are relatively minor considering the amount of variability from the other conditions already mentioned.  What's curious about these is the elevation of particular implementation-specific features as specification-favored. In the case of implementation-specific attribute values, there is also the interesting problem of a processor determining whether such a value is intended to have its implementation-specific interpretation or not.  It appears that the related features will only be useful under tightly-restricted interchange conditions.
      
    I will not be surprised to find similar looseness in the OOXML specification, IS 29500.

Prospects for Interoperable Convergence

We already have before us difficulties with interoperable convergence of individual progression of a single standard and its variety of implementation.  This makes the prospect of harmonization between different standard formats rather murky.

Desktop office-application software has more promise with regard to application of Postel's Law, to be liberal in what is accepted and conservative in what is produced.  Unfortunately, the current specifications do not require conservative, interoperable implementations; the current specifications are arguably antagonistic to such an achievement.

I suspect that this is an unintended consequence mixed with some inattention to what it takes for interoperability to be achievable. 

It remains to see how our experience and understanding matures.   We are at the beginning, not the finish.  The journey may seem endless.


The process of IE 8.0 mitigation and preparation for a standards-mode approach to web browsing impacts this site and blog as well as every other web page I have ever posted (somewhere over 120MB worth and climbing).

I'm not going to say anything more about IE 8.0 mitigation and HTML harmonization here.  The overall effort will be tracked in that category of Professor von Clueless posts; that's the place to follow along.  The lesson for document interoperability is something that is definitely appropriate for Pursuing Harmony; there'll be much more to say about that.

Labels: , , , , ,

 
Construction Structure (Hard Hat Area) You are navigating Orcmid's Lair.

template created 2004-06-17-20:01 -0700 (pdt) by orcmid
$$Author: Orcmid $
$$Date: 10-04-30 22:33 $
$$Revision: 21 $