The Art of Being a Successful DBA- Paranoid DBA Best Practices
Ever look at a screen’s output and get that puckered feeling in the pit of your stomach? If you have been working in this profession for any amount of time, you know the feeling I’m talking about. The feeling that makes you think you would rather be living in Montana making woodcarvings at a roadside stand than being a DBA. I’ll be taking a somewhat lighthearted look at the perils of our profession and discuss ways to reduce problem occurrences.
The Perils of our Profession
One of the common challenges that all DBAs face, no matter what vendor’s database they work on, is the absolute attention to detail our profession demands. Switch a couple of characters in a script, forget to set your SID, set the wrong flag at the wrong time and the end result usually isn’t pretty. Many commands we issue on a regular basis are destructive by their very nature. This is the reason why I have a great respect for all technicians who have selected database administration as their chosen profession.
I know they have all experienced that uncontrolled “eye-twitching” at 2 AM when they are ready to hit the final enter key to execute the command. You know what command I’m talking about too. It’s that one command that you really, really, really hope is going to come back with a successful return code and ultimately end with a database that is finally usable. Whether it’s a recovery, a file fix or corrupt data is immaterial, it’s the wait that we are talking about.
There is no longer wait in the DBA profession than waiting for the message below after a database recovery:
Database opened.
Time always seems to stand still. The longer the recovery, the messier the recovery, the more critical the database – the longer you wait. You stare at the screen hoping beyond hope that the above message will appear. It’s the ritual cross your fingers, spin around three times, face towards Oracle headquarters and pray to everything that is Larry Ellison wait. I don’t care how sure you are of your capabilities, or how much of an Oracle “Ace” you are – you know the anticipation I’m talking about. You then either breathe a sigh of relief or you are in absolute disgust when you see an Oracle error message appear.
Not only must we try to prevent our own mistakes, we must safeguard our environments against the mistakes of others. Operating system administrators, disk storage technicians and application developers are just like us. We are all part of the human community that makes mistakes from time to time.
If you never make mistakes, send me a resume. I’m always looking for a “Patron Saint of Oracle” here at RDX. It will also save us on travel costs because I’m sure you’ll be able to spread your wings and fly here on your own.
As my old boss used to tell me (when I was a VERY junior DBA), “It really doesn’t make a difference who broke the database. You are the technician who is ultimately responsible for fixing it. The buck stops with you. If you can’t protect your environments, you aren’t doing your job.” We all know he’s absolutely correct.
Then there’s the software glitches. The problems that pop up out of the blue and make you go:
“WHAT THE? – How did THAT happen? I’ve done this 317 times in a row and it worked every time.”
For you math majors, here’s my calculation for this:
CLOSER YOU ARE TO PRODUCTION TURNOVER
+ THE GREATER THE VISIBILITY OF THE PROJECT
= THE MORE LIKELY A PREVIOUSLY UNKNOWN SOFTWARE GLITCH WILL OCCUR
I don’t care what software you are using, you will run into the “only occurs on this release, on this version of the operating system, using this particular feature on the third Tuesday of the sixth month when it’s cloudy outside” BUG. Be sure to expect management to stop by and ask “Well, why didn’t you test this on the third Tuesday of the sixth month when it was cloudy outside?”
The more complex the database ecosystem, the more paranoid I become. Which is why I’m not a follower of “the database is getting so easy – we won’t need DBAs” mantra that mindless industry pundits profess on a seemingly endless basis.
So now we know that our jobs are somewhat unforgiving and we do make a mistake from time to time. What can we do to reduce the chance of an error occurring?
Poka-Yoke for DBAs!
Poka-Yoke is a Japanese term that means “fail-safeing” or “mistake- proofing.” Wikipedia’s definition of Poka-Yoke is: “Its purpose is to eliminate product defects by preventing, correcting or drawing attention to human errors as they occur.”
Since I’m a car nut, here’s a couple of automotive Poka-Yoke examples. You can’t take the keys out of most modern cars until the car is in park. In addition, most cars won’t allow you to shift out of park until the key is in the “ON” position. How about gas caps that have the little tether that prevents us from driving off without the cap? Most gas caps are also attached using a ratchet assembly that ensures proper tightness and prevents over tightening.
Take a look around you, you’ll see dozens of Poka-Yokes during your daily activities:
- The little holes in bathroom sinks that prevent overflows
- Microwaves will stop when the door is opened
- Dryer doors will also stop when the door is opened
- Lawn movers that have a safety bar that must be depressed before they will run
- Disk brakes that begin to make a noise before they are completely ground down
- Rumble strips on roads
The list really is endless. We have applied the Poka-Yoke process to our daily activities here at RDX. We have checklists, process documentation, best practices, sign-off sheets – the works.
I’d be very interested to learn your Poka-Yoke ideas! If you have a Poka-Yoke idea, please respond and we’ll be glad to discuss it. Here’s some general ones that I recommend.
The Second Set of Eyes
As I have stated in previous blogs, I have over 20 years of experience using Oracle and have done my fair share of database backups and recoveries. During my career as an Oracle instructor, I have assisted in hundreds of database recoveries in Oracle’s classroom environments. During later stages of my career, I still had others review my recovery strategy and recovery steps before I began the recovery process. I used backup and recovery just as an example. Whatever the process is you are performing, a second opinion may prevent you from making a mistake. A review from a fellow DBA has saved me more than once. I may be described as having an ego (I have no idea where they get that opinion), but it doesn’t prevent me from asking for help from others.
A while back, a few RDX DBAs were correcting a very poor third-party utility backup script that was created by a customer’s previous database support vendor The third-party backup storage utility was overly complex, but it was the product the customer standardized on years ago. The customer described this particular environment as “if it goes down, we lose our ability to make money” application. After the massive set of changes was complete, two DBAs went line-by-line verifying each line of the backup script. At the end of each script they asked each other, “Are you OK with this?” Only then, did they move on to the next one. I don’t care how much time you have “in the seat” using Oracle, you need to put your ego aside at times and have someone check your work on critical activities. Our next step for this customer was, you guessed it, High Availability implementation.
Concentration
I used to work for a shop that subscribed to the “everybody in one big room” philosophy. I guess it was supposed to allow everyone to work together as a team and become as “one with each other.” It may have achieved that purpose, but it sure didn’t allow you to concentrate on your work very well. You could hear so many different conversations they had to pump in white noise. The constant ‘whhhsssssshhhssshhh” noise made me feel like I was a crew member of the Starship Enterprise.
Like all DBA units, our particular area was often populated with various developers and O/S technicians. Many different conversations were occurring, some that could be described as somewhat animated. The environment did not allow you to concentrate on the task at hand. We often had to go into small conference rooms to work on critical tasks.
The point I’m trying to make is that no matter what type of environment you work in: if you can concentrate, OK. However, if you are like me and you can’t, find a spot where you can. Block off some time, send questions to other DBAs and concentrate on the task at hand. Don’t attempt to answer questions and code a complex script at the same time. This may seem obvious, but throughout my career, I have personally watched numerous DBAs attempt to multitask when they are working on a critical process. It’s a recipe for a problem. Once you are done, follow rule number one and have someone review your work.
What Database Are You Working IN?
Ever work in the wrong database? Working in the wrong database is a common problem for database experts as well as their less experienced counterparts. How many times have YOU found yourself running statements in the wrong environment? Feel free to include me in that not so select group. The operating system command SET can be used in Windows systems to display environment variables. The ENV command can be used to display the environment variables in UNIX. Many seasoned database administrators change their UNIX shell prompt in their profile to display the current Oracle SID. Displaying the current Oracle SID in the shell’s prompt provides a continuous reminder to the DBA of the database they are working in. Google it – you’ll find dozens of scripts by your fellow DBAs.
Saving Time VS Creating a Problem
At a large manufacturing firm, I once watched a fellow DBA perform a rather complex set of administrative tasks to solve a problem. He was rapidly flipping back and forth between at least 15 active screens, copying and pasting and editing and copying and pasting and editing… I describe this particular activity as “Multiple Screen Syndrome.” He also had several other screens open that were connected to other databases. He was multi-tasking to its highest degree. Take a break, take a breath and look at what you are doing.
How about the rm -r /u0*/ora*/prod*/*/*.* command in UNIX? It’s the command that drops multiple databases in multiple directories- all in one painful swoop. How many times have you heard of a mistake caused by commands like this causing mass mayhem? When you make a mistake like this, you become immortalized in conversations for years to come. Get a few technicians together after work and ultimately the conversation will include, “Remember when Bob so-and-so ran that big rm -r command by mistake and wiped out the entire O/S on our production web server?” You can’t tell me you haven’t heard stories like this.
My opinion is that I would rather you take your time than showcase your multi-tasking and time saving skills. The more complex and critical the activity, the more basic you should become in your plan of attack. Trust me when I say I won’t be impressed with your time savings “cut and paste” and wildcard expertise if I think it can even remotely be dangerous.
Safety First Mindset
You need to think “Safety First” when you are performing any particular complex or critical activity. Take the time and put one or two safeguards in place.
Other DBAs may call you paranoid; I’ll call you an experienced DBA that would rather be safe than sorry.
Wrapup
The intent of this blog post was to not provide you with a laundry list of recommendations; it was intended to help jump-start your creative juices to think about different methods to protect yourself against problems. If you have any helpful hints, please feel free to respond to this blog with your Safety First Tips and Tricks.
Thanks for reading.
