Blunder Dome Sighting  
privacy 
 
 
 

Hangout for experimental confirmation and demonstration of software, computing, and networking. The exercises don't always work out. The professor is a bumbler and the laboratory assistant is a skanky dufus.



Click for Blog Feed
Blog Feed

Recent Items
 
Dear Microsoft: No Thanks for the Updates
 
Raymond Chen: What Feature Did You Remove Today?
 
More Spolsky Gems: Open-Source, the Desktop, and S...
 
Is There an MVC in the House?
 
Nano-ISV Are I, Are I
 
Amazon.com: Your Order Has Shipped - Joy to Book R...
 
Tweaking, Tweaking, Tweaking, Roll-on ...
 
... Switching Oxygen Supplies ...
 
Stepping Into the Void
 
BlunderDome Disruption

This page is powered by Blogger. Isn't yours?
  

Locations of visitors to this site
visits to Orcmid's Lair pages

The nfoCentrale Blog Conclave
 
Millennia Antica: The Kiln Sitter's Diary
 
nfoWorks: Pursuing Harmony
 
Numbering Peano
 
Orcmid's Lair
 
Orcmid's Live Hideout
 
Prof. von Clueless in the Blunder Dome
 
Spanner Wingnut's Muddleware Lab (experimental)

nfoCentrale Associated Sites
 
DMA: The Document Management Alliance
 
DMware: Document Management Interoperability Exchange
 
Millennia Antica Pottery
 
The Miser Project
 
nfoCentrale: the Anchor Site
 
nfoWare: Information Processing Technology
 
nfoWorks: Tools for Document Interoperability
 
NuovoDoc: Design for Document System Interoperability
 
ODMA Interoperability Exchange
 
Orcmid's Lair
 
TROST: Open-System Trustworthiness

2007-02-12

 

Getting to Unicode: The Least That Could Possibly Work

I’m in the process of stabilizing the first beta release of a project.  I’m doing mini-drops of patches that move from 0.50beta (the first beta achieved) to 0.60beta.  Getting from 0.52 to 0.54 involves adding code-page sensitivity to conversion from some native Windows interfaces that are hard-wired for single-byte codes.  I must produce Unicode for use in Java and any other wrapper layers that must work in internationalized settings.

{tags: }

In considering this update, I looked at four solutions.  The first solution leaves exposed the single-byte codes, delivered them into buffers of whatever wrapper surrounds my lowest-level native Windows layer.  Solution #1 basically punts the entire problem of correct conversion to all higher levels.   I have a long list of reasons why that is unsavory and putting the job in the wrong place.   Launching myself into architecture orbit, I considered three other solutions.  The fourth completely encapsulates the conversion to Unicode at my deepest integration layer, making it a general solution for whatever kind of wrapper sits above me, whether to interface Java, plain C++, .NET, who knows.  Naturally, I am in love with solution #4.

Last night, I went to sleep with the one last concern on my mind: all of the current unit and regression tests for the bottom layer will no longer work.  They will have to be completely redone for Unicode: all of my tests, their displays and results, filenames, everything that is now conveyed in single-byte code.

This morning, I found the trump card.  With solution #1, the conversion to Unicode with code-page sensitivity happens in exactly the place where I am converting to Unicode without code-page sensitivity.  So no black-box tests have to change.  They simply become regression tests and demonstrations that the single-byte codes outside of the basic ASCII set are coming through properly, something that really matters for the European ISV that is using the result of this work.

So, I am back to solution #1 and its winning qualities:  It is the least change that can possibly work.  It provides running code in the hands of an integrator as early as possible with the least possible destabilization.  It requires additional testing to introduce interesting character codes into the test cases, but all regression-test code works without change.

I wasted a week figuring this out.  I wonder if my hesitancy was because of some nagging sense that I was going down a dangerous path?


I will, at a more convenient later time, be refactoring the lower and intermediate layers of my code as part of hardening and getting as much of the work as possible done at the native, high-performance layer.  This will be at a point where my top-level component interfaces will be locked down and no refactoring will be visible to applications that use the components.   It’ll still be risky to make those changes, but I’ll have painfully-solid regression tests by then.  At that point, I’ll look at approach #4 once again.  I’ll let you know what happens.

 
Construction Structure (Hard Hat Area) You are navigating Orcmid's Lair.

template created 2004-06-17-20:01 -0700 (pdt) by orcmid
$$Author: Orcmid $
$$Date: 10-04-30 22:33 $
$$Revision: 21 $