Professor von Clueless in the Blunder Dome: Getting to Unicode: The Least That Could Possibly Work

Professor von Clueless in the Blunder Dome

privacy

Hangout for experimental confirmation and demonstration of software, computing, and networking. The exercises don't always work out. The professor is a bumbler and the laboratory assistant is a skanky dufus.

Blog Feed

Recent Items

Dear Microsoft: No Thanks for the Updates

Raymond Chen: What Feature Did You Remove Today?

More Spolsky Gems: Open-Source, the Desktop, and S...

Is There an MVC in the House?

Nano-ISV Are I, Are I

Amazon.com: Your Order Has Shipped - Joy to Book R...

Tweaking, Tweaking, Tweaking, Roll-on ...

... Switching Oxygen Supplies ...

Stepping Into the Void

BlunderDome Disruption

visits to Orcmid's Lair pages

The nfoCentrale Blog Conclave

Millennia Antica: The Kiln Sitter's Diary

nfoWorks: Pursuing Harmony

Numbering Peano

Orcmid's Lair

Orcmid's Live Hideout

Prof. von Clueless in the Blunder Dome

Spanner Wingnut's Muddleware Lab (experimental)

nfoCentrale Associated Sites

DMA: The Document Management Alliance

DMware: Document Management Interoperability Exchange

Millennia Antica Pottery

The Miser Project

nfoCentrale: the Anchor Site

nfoWare: Information Processing Technology

nfoWorks: Tools for Document Interoperability

NuovoDoc: Design for Document System Interoperability

ODMA Interoperability Exchange

Orcmid's Lair

TROST: Open-System Trustworthiness

2007-02-12

Getting to Unicode: The Least That Could Possibly Work

I’m in the process of stabilizing the first beta release of a project. I’m doing mini-drops of patches that move from 0.50beta (the first beta achieved) to 0.60beta. Getting from 0.52 to 0.54 involves adding code-page sensitivity to conversion from some native Windows interfaces that are hard-wired for single-byte codes. I must produce Unicode for use in Java and any other wrapper layers that must work in internationalized settings.

{tags: orcmid software engineering software testing evolutionary development}

In considering this update, I looked at four solutions. The first solution leaves exposed the single-byte codes, delivered them into buffers of whatever wrapper surrounds my lowest-level native Windows layer. Solution #1 basically punts the entire problem of correct conversion to all higher levels. I have a long list of reasons why that is unsavory and putting the job in the wrong place. Launching myself into architecture orbit, I considered three other solutions. The fourth completely encapsulates the conversion to Unicode at my deepest integration layer, making it a general solution for whatever kind of wrapper sits above me, whether to interface Java, plain C++, .NET, who knows. Naturally, I am in love with solution #4.

Last night, I went to sleep with the one last concern on my mind: all of the current unit and regression tests for the bottom layer will no longer work. They will have to be completely redone for Unicode: all of my tests, their displays and results, filenames, everything that is now conveyed in single-byte code.

This morning, I found the trump card. With solution #1, the conversion to Unicode with code-page sensitivity happens in exactly the place where I am converting to Unicode without code-page sensitivity. So no black-box tests have to change. They simply become regression tests and demonstrations that the single-byte codes outside of the basic ASCII set are coming through properly, something that really matters for the European ISV that is using the result of this work.

So, I am back to solution #1 and its winning qualities: It is the least change that can possibly work. It provides running code in the hands of an integrator as early as possible with the least possible destabilization. It requires additional testing to introduce interesting character codes into the test cases, but all regression-test code works without change.

I wasted a week figuring this out. I wonder if my hesitancy was because of some nagging sense that I was going down a dangerous path?

I will, at a more convenient later time, be refactoring the lower and intermediate layers of my code as part of hardening and getting as much of the work as possible done at the native, high-performance layer. This will be at a point where my top-level component interfaces will be locked down and no refactoring will be visible to applications that use the components. It’ll still be risky to make those changes, but I’ll have painfully-solid regression tests by then. At that point, I’ll look at approach #4 once again. I’ll let you know what happens.

¶ posted by orcmid at 2/12/2007 09:45:00 AM

Create a Link - Post a Comment -

You are navigating Orcmid's Lair.

template created 2004-06-17-20:01 -0700 (pdt) by orcmid
$$Author: Orcmid $
$$Date: 10-04-30 22:33 $
$$Revision: 21 $