|
|
privacy |
||
|
Hangout for experimental confirmation and demonstration of software, computing, and networking. The exercises don't always work out. The professor is a bumbler and the laboratory assistant is a skanky dufus.
Blog Feed Recent Items The nfoCentrale Blog Conclave nfoCentrale Associated Sites |
2007-02-12Getting to Unicode: The Least That Could Possibly WorkI’m in the process of stabilizing the first beta release of a project. I’m doing mini-drops of patches that move from 0.50beta (the first beta achieved) to 0.60beta. Getting from 0.52 to 0.54 involves adding code-page sensitivity to conversion from some native Windows interfaces that are hard-wired for single-byte codes. I must produce Unicode for use in Java and any other wrapper layers that must work in internationalized settings. {tags: orcmid software engineering software testing evolutionary development} In considering this update, I looked at four solutions. The first solution leaves exposed the single-byte codes, delivered them into buffers of whatever wrapper surrounds my lowest-level native Windows layer. Solution #1 basically punts the entire problem of correct conversion to all higher levels. I have a long list of reasons why that is unsavory and putting the job in the wrong place. Launching myself into architecture orbit, I considered three other solutions. The fourth completely encapsulates the conversion to Unicode at my deepest integration layer, making it a general solution for whatever kind of wrapper sits above me, whether to interface Java, plain C++, .NET, who knows. Naturally, I am in love with solution #4. Last night, I went to sleep with the one last concern on my mind: all of the current unit and regression tests for the bottom layer will no longer work. They will have to be completely redone for Unicode: all of my tests, their displays and results, filenames, everything that is now conveyed in single-byte code. This morning, I found the trump card. With solution #1, the conversion to Unicode with code-page sensitivity happens in exactly the place where I am converting to Unicode without code-page sensitivity. So no black-box tests have to change. They simply become regression tests and demonstrations that the single-byte codes outside of the basic ASCII set are coming through properly, something that really matters for the European ISV that is using the result of this work. So, I am back to solution #1 and its winning qualities: It is the least change that can possibly work. It provides running code in the hands of an integrator as early as possible with the least possible destabilization. It requires additional testing to introduce interesting character codes into the test cases, but all regression-test code works without change. I wasted a week figuring this out. I wonder if my hesitancy was because of some nagging sense that I was going down a dangerous path? I will, at a more convenient later time, be refactoring the lower and intermediate layers of my code as part of hardening and getting as much of the work as possible done at the native, high-performance layer. This will be at a point where my top-level component interfaces will be locked down and no refactoring will be visible to applications that use the components. It’ll still be risky to make those changes, but I’ll have painfully-solid regression tests by then. At that point, I’ll look at approach #4 once again. I’ll let you know what happens.
|
||
|
|
You are navigating Orcmid's Lair. |
template
created 2004-06-17-20:01 -0700 (pdt)
by orcmid |