Wednesday, November 12, 2008

Two Mistakes

I was challenged by Andy Warren to write a post about two mistakes I have made in my career since he was sure I had made many. I'm sure I have made many, but only one SQL Server mistake sticks out in my mind although I compounded it while trying to fix it.My second mistake does not relate to SQL Server, but did have a negative impact on the business I was working for.
  1. In the first year of my career my colleagues and I had written an order entry system for the paper mill we worked at using SQL Server and classic asp. Being an agile shop, even before agile was really popular, we rolled it out and made upgrades/fixes on about a weekly basis. One evening I was tasked with rolling out an update to the database. We had our development/QA database up to date with the latest data and I had a dts package that I had developed to migrate the changes. Well, I started the package, and I realized I had made a mistake and stopped the package. THAT was my REAL mistake as the first step of the package DROPPED the existing tables and I stopped the package after tables had been dropped, but BEFORE they had been recreated. I had not put the package ina transaction (another mistake) so there was no rollback. Ouch! Unfortunately, I was young and not paranoid, like any good DBA would be, so I had NOT taken a full database backup before starting the process. I did have the previous night's backup and all the transaction log backups from the day so I was able to restore the database and then re-do the upgrade. Of course it took me "forever" to do the restore because first I forgot to add WITH NORECOVERY to the first restore so I had to do the full restore twice! So what should have taken me an hour took 5 hours. What did I learn? Always, always, always take a full backup BEFORE doing any maintenance or upgrades on a database or server.
  2. My second memorable mistake (I am sure co-workers have others, I have blocked the rest) does not have anything to do with SQL Server, but was a pretty big one. Our mill ran on an AS400 based system that tracked, labeled, wrapped, and shipped the product. When a roll of paper (not like a toilet paper roll, but a large roll of 500+ lbs) was weighed, labeled, and wrapped the system compared the actual weight with an estimated weight to make sure the roll was the one that was expected. Humans would be a barcode on the end of the roll that was scanned as it was weighed and, believe it or not, occasionally they would put the wrong barcode tag on. Obviously, this process would take the scanned ID and read the DB2 database to get the estomated weight. Well, a younger co-worker was on call and had a problem that he could not solve and I was the only person he could get in touch with to help. I was not and am not an AS400 or DB2 expert, but I "knew" where to look for this problem. We checked for locks on the "rolls" table and, sure enough, a process had a lock on roll that it had not released. The precsribed treatment for this diagnosis is to "kill" the process that had not released the lock. This was done fairly regularly and all that happened was that user just needed to re-enter the program they were in, no big deal. Of course, I did not notice that the process I was killing was a system process, not a user process. So I killed the wrapper program and what was already the bottleneck in our productin line was SHUT DOWN! Needless to say the production folks were NOT happy and neither were the other IT guys who had to be called at 2 AM to first figure out what I had done, and then to fix it! I did learn how to determine what was a system process on an AS400 from that mistake.
I'll pass along the challenge to a couple of SQLServerCentral friends, Grant Fritchey and R. Barry Young.

No comments:

Post a Comment

So what do you think I am?