S.Lott-Software Architect: database administration

Showing posts with label database administration. Show all posts

Tuesday, September 27, 2016

Database Schema Migration

Some thoughts: http://workingwithdevs.com/delivering-databases-migrations-vs-state/

This covers a lot of ground on the Declarative vs. Procedural question. It explains a lot of the considerations that lead to choosing a procedural schema evolution vs. a declarative schema with an implied change sequence to migrate to each new declared state.

The article calls the declarative "state-based" and procedural approach "migration-based".

My 2¢ are focused on this point:

When using a state-based solution you will most often be using a diff tool like those provided by Redgate or Visual Studio to examine the differences and generate an upgrade script. While this is a very efficient solution for most changes, with table renames and a few other types of table refactoring they can do bad things, ...

This point about table refactoring is, for me, the show-stopper. Relational theory tells me that I can map any schema to any other schema using selection, projection, and join. I can denormalize data and I can normalize again via group-by clauses. I can reduce the original schema to a sequence of object-attribute-value triples, and restructure this into any desired new schema.

Given enough time, a change tracking tool should be able to find a minimal-cost transformation from schema to schema. This might involve a complex search over a large state space, and it certainly involves creating costs for each alternative query plan.

Pragmatically, I'm not sold on this being a good idea. And I'm rarely sure I even want to get involved in a fully automated solution. While a tool might be able to detect and automate a variety of simple changes, I think that developers must always vet those change scripts.

In particular, the search space is emphatically not limited to select, project, and join. There are also database unload-reload, index create and drop. There are even more complex operations like creating intermediate results which aren't part of the final database structure. With proper indices, these might actually be beneficial.

In some cases, the continuous operation requirements are such that we might have two copies of a database: one being used and the other being transformed. A logger tracks transactions in the older copy and a synchronizer replicates those transactions in the new copy. After the data is moved, the customer access is moved via a feature toggle from the old database to the new database.

Semantic Drift

Also important is the issue of semantic drift. When we're making structural changes where the "before" column names match the "after" column names, then there's little chance for semantic drift. There's still some possibility, though. We can (and sometimes do) repurpose columns, preserving the original name. In some cases, we might change a database constraint without renaming the column.

In the larger case, of course, it doesn't require "‘hot-fix’ changes to QA or even production databases" to create profound semantic changes. All it takes is an app developer deciding that a column should be repurposed. There's may be no structural change on the schema overall.

A non-structural change in some past release could have implications for structural change in a future release. Imagine three columns in three tables with the same names. Two started out life as simple foreign keys to the third. But one became optional, and now the semantics don't match but the names do. Automated tools are unlikely to discern the intent here.

Conclusion?

It's all procedural migration. I'm not declarative ("state") tools can be trusted beyond discerning the changes and suggesting a possible migration.

Thursday, July 25, 2013

Database Conversion or Schema Migration

I was told a horror story about a bad database schema migration. Let's call the author of the horror story HPL.

HPL related a tale of woeful conversion problems from a mismanaged schema migration.

While I could feel HPL's pain, the reasons given for their pain were wrong. They didn't quite get the lessons they had learned. Consequently, HPL sounded like someone doomed to repeat the mistake, or—worse—unlearning their lessons.

Here's HPL's most distressing comment.

"we can't migrate over the weekend and be done w/ it."

Apparently, the horror of a weekend migration was somehow desirable to HPL. Who wants a lost weekend? And who wants to put all of the eggs in a single basket?

Anyone who's done more than one "lost weekend migration"—and who's also honest—knows that they don't go well. There are always subsets of data that (a) don't get converted properly and (b) have to get swept under the carpet in order to claim to meet the schedule.

It's a standard situation to have less than 100% of the data successfully converted and still call the effort a success. If 100% was not required, why lose a weekend over it?

Good Plans and Bad Plans

From far wiser people than me, I learned an important lesson in schema migration.

These Wiser Heads ran a "conversion" business. They moved data and applications from platform to platform. They knew a lot about database schema migrations. A lot.

Their standard plan was to build a schema migration script (usually a sequence of apps) that could be run to convert the database (or files or whatever) from old to new schema as often as was necessary.

I'll repeat that.

As often as was necessary.

They debugged the script to get to an acceptable level of conversion. The data conversion (or schema migration) was perfectly repeatable. Of course, they longed for 100% conversion; but pragmatically, the legacy software had bad data. So some fraction would not convert. And once that fraction was found, the schema migration applications could be modified to treat the non-convertable data in some intelligent way.

Their stated goal was to convert data and run parallel testing with that converted data as often as necessary to create confidence that the new data was as perfect a conversion as was possible. At some point, the confidence became certainty and the parallel testing was deemed complete. Since they were parallel testing with live data, the decision amounted to a formalized "commissioning" of the new application. And by then, the new application was already being used.

There are bad ways to do schema migration, of course. HPL listed many.

Horrible Mistakes

The horror story from HPL included this:

"For the migrated tables, create views in the old system and create instead of triggers on those views to ship data to the new system."

It appears that they used views and triggers to create a new system "facade" over the legacy system. Apparently, they wanted both suites of application software to coexist. Not a good approach to schema migration. It appeared that they were trying to share one database with two application schema.

This seems like it's doomed. Unless they're all geniuses.

Wiser Heads would have strongly suggested that the new system use a extract of the old system's data.

HPL goes on to complain,

"Sometimes we can take over a column or 2 and sometimes we can only take over some of the data in the table".

HPL emphasizes this point with "This is not that far fetched". I'm not sure why the emphasis was needed.

This is not "far fetched". It doesn't need emphasis. It's not really much of a problem, either. It's a standard part of schema migration. Extracting a copy of the data makes this quite easy. Triggers and views to create some kind of active SQL-based Facade is what created the complexity. Not the number of columns involved.

HPL summarizes,

"So you end up w/ [many] tables/views triggers all moving data back and forth and faking stuff out"

Back and forth. A fundamental mistake. A copy can be much easier to manage. One way data movement: Legacy to New.

HPL concludes with a litany of errors of various types: performance, change management, file system issues, error logging and auditing. Blah blah blah. Yes, it was a nightmare. I feel their pain.

What About Coexistence?

It appears that HPL was involved in a project where the entire old and new applications were supposed to somehow coexist during the conversion.

It appeared that they failed to do any kind of partitioning.

Coexistence is not a trivial exercise. Nor is it a monolith where the entire legacy application suite must coexist with the entire new schema and the entire new application suite.

Pragmatically, coexistence usually means that some portion of the legacy must be kept running while some other portion is modernized. This means the coexistence requires that the application portfolio be partitioned.

Step 1: Some suite of functionality is migrated. That means data from the legacy database/file system is copied to new. That also means some data from new is copied back into the legacy database file/system. Copied.

Step 2: Some other suite of functionality is migrated. As functionality is moved, less and less data is copied back to the legacy.

At some point, this copying back is of no value and can be discontinued.

What About Timing?

This copying clearly requires some coordination. It's not done haphazardly.

Does it require "real time" data movement? i.e. triggers and views?

Rarely is real time movement required. This is the point behind partitioning wisely. Partitioning includes timing considerations as well as data quality and functionality considerations.

It's remotely possible that timing and partitioning are so pathological that data is required in both legacy and new applications concurrently. This is no reason to throw the baby out with the bathwater. This is nothing more than an indication that the data is being copied back to the legacy application close to real time.

This also means performance must be part of the test plan. As well as error handling and diagnostic logging. None of this is particularly difficult. It simply requires care.

Lessons Learned

HPL appeared to make the claim that schema migration is super hard. Or maybe that coexistence is really hard.

Worse, HPL's horror story may be designed to indicate that a horrifying lost weekend is the only way to do schema migration.

Any or all of these are the wrong lessons to learn.

I think there are several more valuable lessons here.

Schema migration can and should be done incrementally. It's ideally tackled as an Agile project using Scrum techniques. It's okay to have release cycles that are just days apart as each phase of the conversion is run in parallel and tested to the user's satisfaction.
Coexistence requires partitioning to copy any data back to unconverted legacy components. Triggers and views and coexistence of entire suites of software make a difficult problem harder.
The conversion script is just another first-class application. The same quality features apply to the conversion as to every other component of the app suite.
The conversion must be trivially repeatable. It must be the kind of thing that can be run as often as necessary to move legacy data to the new schema.

Sunday, December 20, 2009

The Data Cartel and "Users"

I work with a CIO who calls the DBA's "The Data Cartel". They control the data. Working with some DBA's always seems to turn into hostage negotiation sessions.

The worst problems seem to arise when we get out of the DBA comfort zone and start to talk about how the data is actually going to be used by actual human beings.

The Users Won't Mind

I had one customer where the DBA demanded we use some Oracle-supplied job -- running in crontab -- for the LDAP to database synchronization. I was writing a J2EE application; we had direct access to database and LDAP server. But to the data cartel, their SQL script had some magical properties that seemed essential to them.

Sadly, a crontab job introduces a mandatory delay into the processing while the user waits for the job to run and finish the processing. This creates either a long transaction or a multi-step transaction where the user gets emails or checks back or something.

The DBA claimed that the delays and the complex workflow were perfectly acceptable to the users. The users wouldn't mind the delay. Further, spawning a background process (which could lead to multiple concurrent jobs) was unacceptable.

This kind of DBA decision-making occurs in a weird vacuum. They just made a claim about the user's needs. The DBA claimed that they wouldn't mind the delay. Since the DBA controls the data, we're forced to agree. So if we don't agree, what? A file "accidentally" gets deleted?

The good news is that the crontab-based script could not be made to work in their environment in time to meet the schedule, so I had to fall back to the simpler solution of reading the LDAP entries directly and providing (1) immediate feedback to the user and (2) a 1-step workflow.

We wasted time because the data cartel insisted (without any factual evidence) that the users wouldn't mind the delays and complexity.

[The same DBA turned all the conversations on security into a nightmare by repeating the catch-phrase "we don't know what we don't know." That was another hostage negotiation situation: they wouldn't agree to anything until we paid for a security audit that illustrated all the shabby security practices. The OWASP list wasn't good enough.]

The Users Shouldn't Learn

Recent conversations occurred in a similarly vacuous environment.

It's not clear what's going on -- the story from the data cartel is often sketchy and missing details. But the gaps in the story indicate how uncomfortable DBA's are with people using their precious data.

It appears that a reporting data model has a number of many-to-many associations. Periodically, a new association arrives on the scene, and the DBA's create a many-to-many association table. (The DBA makes it sound like a daily occurrence.)

Someone -- it's not clear who -- claimed this was silly. The DBA claims the product owner said that incremental requirements causing incremental database changes was silly. I think the DBA is simply too lazy to create the required many-to-many association tables. It's a table with two FK references. A real nightmare of labor. But there were 3 or maybe 4 instances of this. And no end in sight.

It appears that the worst part was that the data model requirements didn't arrive all at once. Instead, these requirements had the temerity to trickle in through incremental evolution of the requirements. This incremental design became a "problem" that needed a a "solution".

Two Layers of Hated User Interaction

First, users are a problem because they're always touching the data. Some DBA's do not want to know why users are always touching the data. Users exist on the other side of some bulkhead. What the users are doing on their side is none of our concern as DBA.

Second, users are a problem because they're fickle. Learning -- and the evolution of requirements that is a consequence of learning -- is a problem that we need to solve. Someone should monitor this bulkhead, collect all of the requirements and pass them through the bulkhead just once. No more. What the users are learning on their side is none of our concern as DBA.

What's Missing?

What's missing from the above story? Use Cases.

According to the DBA, the product owner is an endless sequence of demands for data model features. Apparently, adding features incrementally is silly. Further, there's no rhyme or reason behind these requests. To the DBA they appear random.

The DBA wanted some magical OO design feature that would make it possible to avoid all the work involved in adding each new many-to-many association table.

I asked for use cases. After some back and forth, I got something that made no sense.

It turns out that the data model involves "customers" the DBA started out describing the customer-centric features of the data model. After all, the "actor" in a use case is a person and the database contains information on people. That's as far as the DBA was willing to go: repeat the data model elements that involved people.

If It Weren't For the Users

The DBA could not name a user of the application, or provide a use case for the application. They actually refused to articulate one reason why people put data in or took data out. They sent an angry email saying they could not find a reason why anyone would need these many-to-many association tables.

I responded that if there's no user putting data in or getting data out then there's no system. Nothing to build. Stop asking me for help with your design if no person will ever use it.

To the DBA, this was an exercise in pure data: there was no purpose behind it. Seriously. Why else would they tell me that there were no use cases for the application.

Just Write Down What's Supposed to Happen

So I demanded that the DBA write down some sequence of interactions between actual real-world end-user and system that created something of value to the organization. (My idea was to slide past the "use case" buzzword and get past that objection.)

The DBA wrote down a 34-step sequence of steps. 34 steps! While it's a dreadful use case, it's a start: far better than what we had before, which was nothing. We had a grudging acknowledgement that actual people actually used the database for something.

We're moving on to do simplistic noun analysis of the use case to try and determine what's really going on with the many-to-many associations. My approach is to try and step outside of "pure data" and focus on what the users are doing with all those many-to-many associations.

That didn't go well. The data cartel, it appears, doesn't like end-users.

The Final Response

Here's what the DBA said. "The ideal case is to find a person that is actually trying to do something and solve a real end user problem. Unfortunately, I don't have this situation. Instead, my situation is to describe how a system responds to inputs and the desired end state of the system."

Bottom line. No requirements for the data model. No actors. No use case. No reality. Just pure abstract data modeling.

Absent requirements, this approach will turn into endless hypothetical "what if" scenarios. New, fanciful "features" will inevitably spring out of the woodwork randomly when there are no actual requirements grounded in reality. Design exists to solve problems. But the DBA has twice refused to discuss the problem that they're trying to solve by designing additional tables.

S.Lott-Software Architect

Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.