Tech in the 603, The Granite State Hacker

SSIS: Unit Testing

I’ve spent the past couple days putting together unit tests for SSIS packages. It’s not as easy to do as it is to write unit & integration tests for, say, typical C# projects.

SSIS Data flows can be really complex. Worse, you really can’t execute portions of a single data flow separately and get meaninful results.

Further, one of the key features of SSIS is the fact that the built-in data flow toolbox items can be equated to framework functionality. There’s not so much value in unit testing the framework.

Excuses come easy, but really, unit testing in SSIS is not impossible…

So meaningful unit testing of SSIS packages really comes down to testing of Executables in a control flow, and particularly executables with a high degree of programability. The two most significant control flow executable types are Script Task executables and Data Flow executables.

Ultimately, the solution to SSIS unit testing becomes package execution automation.

There are a certain number of things you have to do before you can start writing C# to test your scripts and data flows, though. I’ll go through my experience with it, so far.

In order to automate SSIS package execution for unit testing, you must have Visual Studio 2005 (or greater) with the language of your choice installed (I chose C#).

Interestingly, while you can develop and debug SSIS in the Business Intelligence Development System (BIDS, a subset of Visual Studio), you cannot execute SSIS packages from C# without SQL Server 2005 Developer or Enterprise edition installed (“go Microsoft!”).

Another important caveat… you CAN have your unit test project in the same solution as your SSIS project. Due to over-excessive design time validation of SSIS packages, you can’t effectively execute the SSIS packages from your unit test code if you have the SSIS project loaded at the same time. I’ve found that the only way I can safely run my unit tests is to “Unload Project” on the SSIS project before attempting to execute the unit test host app. Even then, Visual Studio occassionally holds locks on files that force me to close and re-open Visual Studio in order to release them.

Anyway, I chose to use a console application as the host app. There’s some info out there on the ‘net about how to configure a .config file borrowing from dtexec.exe.config, the SSIS command line utility, but I didn’t see anything special in there that I had to include.

The only reference you need to add to your project is a ref to Microsoft.SqlServer.ManagedDTS. The core namespace you’ll need is

using Microsoft.SqlServer.Dts.Runtime;

In my first case, most of my unit testing is variations on a single input file. The package validates the input and produces three outputs: a table that contains source records which have passed validation, a flat output file that contains source records that failed validation, and a target table that contains transformed results.

What I ended up doing was creating a very small framework that allowed me to declare a test and some metadata about it. The metadata associates a group of resources that include a test input, and the three baseline outputs by a common URN. Once I have my input and baselines established, I can circumvent downloading the “real” source file, inject my test source into the process, and compare the results with my baselines.

Here’s an example Unit test of a Validation executable within my SSIS package:

[TestInfo(Name = "Unit: Validate Source, duplicated line in source", TestURN = "Dupes")]
public void ValidationUnitDupeLineTest()
{
using (Package thePackage = _dtsApp.LoadPackage(packageFilePath, this))
{
thePackage.DelayValidation = true;
DisableAllExecutables(thePackage);
EnableValidationExecutable(thePackage);
InjectBaselineSource(GetBaselineResource("Stage_1_Source_" + TestURN), thePackage.Variables["SourceFilePath"]);
thePackage.Execute(null, null, this, null, null);
string errorFilePath = thePackage.Variables["ErrorLogFilePath"].Value as string;
//throw new AbortTestingException();
AssertPackageExecutionResult(thePackage, DTSExecResult.Failure);
AssertBaselineAdjustSource(TestURN);
AssertBaselineFile(GetBaselineResourceString("Baseline_Stage1_" + TestURN), errorFilePath);
}
}

Here’s the code that does some of the SSIS Package manipulation referenced above:


#region Utilities
protected virtual void DisableAllExecutables(Package thePackage)
{
Sequence aContainer = thePackage.Executables["Adjustments, Stage 1"] as Sequence;
(aContainer.Executables["Download Source From SharePoint"] as TaskHost).Disable = true;
(aContainer.Executables["Prep Target Tables"] as TaskHost).Disable = true;
(aContainer.Executables["Validate Source Data"] as TaskHost).Disable = true;
(aContainer.Executables["Process Source Data"] as TaskHost).Disable = true;
(aContainer.Executables["Source Validation Failure Sequence"] as Sequence).Disable = true;
(aContainer.Executables["Execute Report Subscription"] as TaskHost).Disable = true;
(thePackage.Executables["Package Success Sequence"] as Sequence).Disable = true;
(thePackage.Executables["Package Failure Sequence"] as Sequence).Disable = true;
}


protected virtual void DisableDownloadExecutable(Package thePackage)
{
Sequence aContainer = thePackage.Executables["Adjustments, Stage 1"] as Sequence;
TaskHost dLScriptTask = aContainer.Executables["Download Source From SharePoint"] as TaskHost;
dLScriptTask.Disable = true;
}


protected virtual void EnableValidationExecutable(Package thePackage)
{
Sequence aContainer = thePackage.Executables["Adjustments, Stage 1"] as Sequence;
TaskHost validationFlow = aContainer.Executables["Validate Source Data"] as TaskHost;
validationFlow.Disable = false;
}

protected virtual void EnableValidationExecutable(Package thePackage)
{
Sequence aContainer = thePackage.Executables["Adjustments, Stage 1"] as Sequence;
TaskHost validationFlow = aContainer.Executables["Validate Source Data"] as TaskHost;
validationFlow.Disable = false;
}

Another really handy thing to be aware of…

IDTSEvents

I highly recommend you implement this interface and pass it into your packages. Of course, in each event handler in the interface, implement code to send reasonable information to an output stream. Notice the call to thePackage.Execute, way up in the first code snippet… the class that contains that method implements that interface, so I can manipulate (when necessary) how to handle certain events.

Interestingly, I haven’t needed to do anything fancy with that so far, but I can imagine that functionality being very important in future unit tests that I write.

Here’s a visual on all the resources… the image shows SSMS over VS, with both database tables and project resources with common URNs to relate them.

I won’t get into the details of the framework functionality, but I found it useful to be able to do things like set a flag to rebuild baseline resources from current outputs, and such.

I modeled some of my framework (very loosely) functionality on the Visual Studio Team System Edition for Testers, which we used on the TWM ISIS project.

Another interesting lesson learned: I can see that the folks who built SSIS were not avid unit testers themselves. SSIS Executables have a “Validate()” method. I encountered lots of problems when I tried to use it. Hangs, intermittent errors, all that stuff that testing should have ironed out.

Tech in the 603, The Granite State Hacker

Enterprise Reprogramming

I found an interview of a Satyam Senior VP listed on LinkedIn relatively interesting (link below).

This senior VP talks about how Satyam and the IT industry is responding to new challenges.

One thing that stands out to me is the statement that they are moving from services to solutions. They make the implication that they are rebuilding or reprogramming businesses at the workflow / process level. They appear to be successfully applying technology build-out as a commodity service while implementing their solutions… Sounds like they’re treating the enterprise as a sort of programmable platform, like SOA / BPM on a grand scale.

From the article:
“A solutions provider transforms business. The difference in business will happen when we change those business processes as well. That is where we are bringing in business transformation solutions — process optimisation, process reengineering, etc. “

My intuition doesn’t quite square with Satyam’s vision.

Lots of things have been pointing towards more innovation in the top layers of applications, built on a very stable technology base. To me, it still feels like there’s an unspoken motivation for that: business leadership wants IT folks to make ruggedized app dev tools and hand them over to power users (and/or process owners). Business leaders want IT to get the C# off their business processes.

All of that is sorta where I started cooking up the hypothesis of metaware from.

I’m curious to know how Satyam’s vision is really working. I guess we’ll know in a few years.

‘Moving towards IP-led revenues’

Tech in the 603, The Granite State Hacker

Artless Programming

So maybe I am strange… I actually have printed snips of source code and UML diagrams and hung them on my office wall because I found them inspirational.

Reminds me of a quote from The Matrix movies…
Cypher [to Neo]: “I don’t even see the code. All I see is blonde, brunette, red-head.” 🙂

It’s not quite like that, but you get the point. There’s gotta be a back-story behind the witty writing. I suspect it has something to do with a programmer appreciating particularly elegant solutions.

One of the hard parts about knowing that programming is an artful craft is being forced to write artless code. It happens all the time. Risks get in the way… a risk of going over budget, blowing the schedule, adding complexity, breaking something else.

It all builds up. The reality is, as much as we software implementers really want application development to be an art, our business sponsors really want it to be a defined process.

The good news for programmers is that every application is a custom application.

It really sucks when you’re surgically injecting a single new business rule into an existing, ancient system.

This is the case with one of my current clients. At every corner, there’s a constraint limiting me. One false move, and whole subsystems could fail… I have such limited visibility into those subsystems, I won’t know until after I deploy to their QA systems and let them discover it. If I ask for more visibility, we risk scope creep. The risks pile up, force my hand, and I end up pushed into a very tightly confined implementation. The end result is awkward, at best. It’s arguably even more unmaintainable.

These are the types of projects that remind me to appreciate those snips of inspirational code.

Don’t get me wrong. I’m happy there’s a fitting solution within scope at all. I’m very happy that the client’s happy… the project’s under budget and ahead of schedule.

The “fun” in this case, has been facing the Class 5 rapids, and finding that one navigable path to a solution.

See also:
politechnosis: Art & Science

Tech in the 603, The Granite State Hacker

Art & Science

Fire & Ice… Day & Night…

This question, Art vs. Science, has come up a million times in software development circles. Reading Paul Johnson’s (Paul’s Pontifications) blog post, in conjunction with a discussion in the Tech Mill at Edgewater, (thanks, Jason!) I have come to see that art and science are not as opposite as I once viewed them to be.

What hit me was that Paul makes the statement that there’s no process to implementing software. I still disagree. There are many processes.

The number of processes that an implementer can choose from to write his/her code is often vast, and depends on the problem set. A problem set includes many things, including requirements, tools, target platform, development platform, existing code, and even the implementer’s mood and frame of mind. That is what makes implementing code, like painting, or creating a recipe, an art.

Within a common implementation problem set, there can be a large number of processes which can be applied to derive valid solutions. In fact, there are so many, that some distinct processes may actually render the very same code. So, to be more clear, it’s not that there’s no process… it’s that there’s no single valid process.

Knowing that there’s no one single valid process doesn’t mean that we can’t pick a needle from the haystack… if the process produces a solution within the problem set, it’s good.

Now consider what happens when you start to narrow a problem set. There’s lots of things you can do. Frameworks, platforms, clear-specific requirements, best practices, coding standards, well structured architectures… these things are all factors that limit the problem set. By narrowing a problem set, you narrow the number of valid processes. By narrowing the number of valid processes that a developer can choose from, lots of interesting things start to happen. You achieve more predictable results, and are more likely to achieve repeatable schedules… and you reduce overall project risk.

This is what’s so interesting about contemporary trends in software development, such as Ruby on Rails… use of these tools narrows problem sets that developers face. This means the implementer can spend less time figuring out where the blanks are, and more time filling them.

Now let’s take this further. What happens when you reduce the problem set dramatically…? Take a single, relatively well known problem, on a very specific platform, using a very small set of unambiguous expressions. You get a very tightly defined process. By doing this, you wring the art out of creating something, to the point where it becomes machinable. The process becomes realized as a factory.

So to answer the question… Art or Science?

It’s a trick question… art and science are not exclusive opposites. Art is about freedom to choose your creative process. Science is about knowing what processes are available, and the pros and cons of each. So programming, like all creative activities, is usually art (except in single-processed cases), and usually science (except in cases of serendipity and true miracles).

Paul’s Pontifications: An Under-Appreciated Fact: We Don’t Know How We Program

Tech in the 603, The Granite State Hacker

Metaware

There’s been a fair amount of buzz in the IT world about IT-Business alignment lately. The complaints seem to be that IT seems to produce solutions that are simply too expensive. Solutions seem to range from “Agile” methodologies to dissolving the contemporary IT group into the rest of the enterprise.

I think there’s another piece that the industry is failing to fully explore.

I think what I’ve observed is that the most expensive part of application development is actually the communications overhead. It seems to me that the number one reason for bad apps, delays, and outright project failures, is firmly grounded in communications issues. Getting it “right” is always expensive. (Getting it wrong is dramatically worse.) In the current IT industry, getting it right typically means teaching analysts, technical writers, developers, QA, and help desk significant aspects of the problem domain, along with all the underlying technologies they need to know.

In the early days of “application development”, software based applications were most often developed by savvy business users with tools like Lotus 1-2-3. The really savvy types dug in on dBase. We all know why this didn’t work, and the ultimate response was client-server technology. Unfortunately, the client-server application development methodologies also entrenched this broad knowledge sharing requirement.

So how do you smooth out this wrinkle? I believe Business Analytics, SOA/BPM, Semantic web, portals/portlets… they’re providing hints.

There have been a few times in my career where I was asked to provide rather nebulous functionality to customers. Specificially, I can think of two early client-server projects where the users wanted to be able to query a database in uncertain terms of their problem domain. In both of these cases, I built application UI’s that allowed the user to express query details in easy, domain-specific terms. User expressions were translated dynamically by software into SQL. All of the technical jargon was hidden away from the user. I was even able to allow users to save favorite queries, and share them with co-workers. They enabled the users to look at all their information in ways that no one, not even I, had considered before hand. The apps worked without giving up the advances of client-server technology, and without forcing the user into technical learning curves. These projects were both delivered on time and budget. As importantly, they were considered great successes.

In more recent times, certain trends that have caught my attention: the popularity of BI (especially cube analysis), and portal/portlets. Of all the other tools/technologies out there, these tools are actively demanded by business end-users. At the same time, classic software application development seems to be in relatively reduced demand.

Pulling it all together, it seems like the IT departments have tied business innovation into the rigors of client-server software application development. By doing this, all the communications overhead that goes with doing it right are implied.

It seems like we need a new abstraction on top of software… a layer that splits technology out of the problem domain, allowing business users to develop their own applications.

I’ve hijacked the word “metaware” as a way of thinking about the edge between business users as process actors (wetware) and software. Of course, it’s derived from metadata application concepts. At first, it seems hard to grasp, but the more I use it, the more it seems to make sense to me.

Here is how I approach the term…
Application Space. This diagram shows the surface of IT to User domains across technology and business process space. This surface covers hardware, software, metaware, and wetware, including where these 'wares touch.
As I’ve mentioned in the past, I think of people’s roles in business systems as “wetware“. Wikipedia has an entry for wetware that describes its use in various domains. Wetware is great at problem solving.

Why don’t we implement all solutions using wetware?

It’s not fast, reliable, or consistent enough for modern business needs. Frankly, wetware doesn’t scale well.

Hardware, of course, is easy to grasp… it’s the physical machine. It tends to be responsible for physical storage and high-speed signal transmission, as well as providing the calculation iron, and general processing brains for the software. It’s lightening fast, and extremely reliable. Hardware is perfect in the physical world… if you intend to produce physical products, you need hardware. Hardware applications extends all the way out to wetware, typically in the form of human interfaces. (The term HID tends to neglect output such as displays. I think that’s an oversight… just because monitors don’t connect to USB ports doesn’t mean they’re not human interface devices.)

Why do we not use hardware to implement all solutions?

Because hardware is very expensive to manipulate, and takes lots of specialized tools and engineering know how to implement relatively small details. Turnaround time on changes makes it impractical in risk-management aspects for general purpose / business application development.

Software in the contemporary sense is also easy to grasp. It is most often thought to provide a layer on top of a general purpose hardware platform to integrate hardware and create applications with semantics in a particular domain. Software is also used to smooth out differences between hardware components and even other software components. It even smooths over differences in wetware by making localization, configuration, and personalization easier. Software is the concept that enabled the modern computing era.

When is software the wrong choice for an application?

Application software becomes a problem when it doesn’t respect separation of concerns between integration points. The most critical “integration point” abstraction that gets flubbed is between business process and the underlying technology. Typically, general purpose application development tools are still too technical for user domain developers, and so quite a bit of communications overhead is required even for small changes. This communications overhead is becomes expensive, and complicated by generally difficult deployment issues. While significant efforts have been made to reduce the communications overhead, these tend to attempt to eliminate artifacts that are necessary for the continued maintenance and development of the system.

Enter metaware. Metaware is similar in nature to software. It runs entirely on a software-supported client-server platform. Most software engineers would think of it as process owners’ expressions interpreted dynamically by software. It’s the culmination of SOA/BPM… for example BPMN (Notation) that is then rendered as a business application by an enterprise software system.

While some might dismiss the idea of metaware as buzz, it suggests important changes to the way IT departments might write software. Respecting the metaware layer will affect the way I design software in the future. Further, respecting metaware concepts suggests important changes in the relationship between IT and the rest of the enterprise.

Ultimately it cuts costs in application development by restoring separation of concerns… IT focuses on building and managing technology to enable business users to express their artifacts in a technologically safe framework. Business users can then get back to innovating without IT in their hair.