Blunder Dome Sighting  

Hangout for experimental confirmation and demonstration of software, computing, and networking. The exercises don't always work out. The professor is a bumbler and the laboratory assistant is a skanky dufus.

Click for Blog Feed
Blog Feed

Recent Items
Dear Microsoft: No Thanks for the Updates
Raymond Chen: What Feature Did You Remove Today?
More Spolsky Gems: Open-Source, the Desktop, and S...
Is There an MVC in the House?
Nano-ISV Are I, Are I Your Order Has Shipped - Joy to Book R...
Tweaking, Tweaking, Tweaking, Roll-on ...
... Switching Oxygen Supplies ...
Stepping Into the Void
BlunderDome Disruption

This page is powered by Blogger. Isn't yours?

Locations of visitors to this site
visits to Orcmid's Lair pages

The nfoCentrale Blog Conclave
Millennia Antica: The Kiln Sitter's Diary
nfoWorks: Pursuing Harmony
Numbering Peano
Orcmid's Lair
Orcmid's Live Hideout
Prof. von Clueless in the Blunder Dome
Spanner Wingnut's Muddleware Lab (experimental)

nfoCentrale Associated Sites
DMA: The Document Management Alliance
DMware: Document Management Interoperability Exchange
Millennia Antica Pottery
The Miser Project
nfoCentrale: the Anchor Site
nfoWare: Information Processing Technology
nfoWorks: Tools for Document Interoperability
NuovoDoc: Design for Document System Interoperability
ODMA Interoperability Exchange
Orcmid's Lair
TROST: Open-System Trustworthiness



Getting to Unicode: The Least That Could Possibly Work

I’m in the process of stabilizing the first beta release of a project.  I’m doing mini-drops of patches that move from 0.50beta (the first beta achieved) to 0.60beta.  Getting from 0.52 to 0.54 involves adding code-page sensitivity to conversion from some native Windows interfaces that are hard-wired for single-byte codes.  I must produce Unicode for use in Java and any other wrapper layers that must work in internationalized settings.

{tags: }

In considering this update, I looked at four solutions.  The first solution leaves exposed the single-byte codes, delivered them into buffers of whatever wrapper surrounds my lowest-level native Windows layer.  Solution #1 basically punts the entire problem of correct conversion to all higher levels.   I have a long list of reasons why that is unsavory and putting the job in the wrong place.   Launching myself into architecture orbit, I considered three other solutions.  The fourth completely encapsulates the conversion to Unicode at my deepest integration layer, making it a general solution for whatever kind of wrapper sits above me, whether to interface Java, plain C++, .NET, who knows.  Naturally, I am in love with solution #4.

Last night, I went to sleep with the one last concern on my mind: all of the current unit and regression tests for the bottom layer will no longer work.  They will have to be completely redone for Unicode: all of my tests, their displays and results, filenames, everything that is now conveyed in single-byte code.

This morning, I found the trump card.  With solution #1, the conversion to Unicode with code-page sensitivity happens in exactly the place where I am converting to Unicode without code-page sensitivity.  So no black-box tests have to change.  They simply become regression tests and demonstrations that the single-byte codes outside of the basic ASCII set are coming through properly, something that really matters for the European ISV that is using the result of this work.

So, I am back to solution #1 and its winning qualities:  It is the least change that can possibly work.  It provides running code in the hands of an integrator as early as possible with the least possible destabilization.  It requires additional testing to introduce interesting character codes into the test cases, but all regression-test code works without change.

I wasted a week figuring this out.  I wonder if my hesitancy was because of some nagging sense that I was going down a dangerous path?

I will, at a more convenient later time, be refactoring the lower and intermediate layers of my code as part of hardening and getting as much of the work as possible done at the native, high-performance layer.  This will be at a point where my top-level component interfaces will be locked down and no refactoring will be visible to applications that use the components.   It’ll still be risky to make those changes, but I’ll have painfully-solid regression tests by then.  At that point, I’ll look at approach #4 once again.  I’ll let you know what happens.

Construction Structure (Hard Hat Area) You are navigating Orcmid's Lair.

template created 2004-06-17-20:01 -0700 (pdt) by orcmid
$$Author: Orcmid $
$$Date: 10-04-30 22:33 $
$$Revision: 21 $