clckwrk Quick Guide: Oracle DBA – Managing the Alert Log on AWS
At clckwrk we manage Oracle on the AWS Cloud. First rule of being a DBA – check the alert log.The database alert log is the file where the database logs core activity and any messages that the DBA needs to understand what the database is doing, any tuning that needs to happen at the database level and any errors that the database is producing. It’s the errors that are critical for us as they can be a key indicator of the root cause of any problems that the users may have.Checking the alert log manually is a slow way to work and in AWS we have a unified monitoring system in CloudWatch. This blog shows how we can optimise for cloud and use CloudWatch Log Agent to monitor the alert log. In this example we have an Oracle database that is self managed on an AWS EC2 instance running Oracle Enterprise Linux.
Step 1: Set up CloudWatch Logs Agent on the server
CloudWatch Logs Agent is installed like any other package into the operating system – the commands below download and execute the setup package:
>wget https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/awslogs-agent-setup.py
>python ./awslogs-agent-setup.py –region us-east-1
When the installation executes there is a handy wizard to set up the service to capture the logs. The screen shot shows the wizard collecting our key information and the location of the alert log. We’re asked to give the log group a name and to name the stream, then we can repeat the process to capture additional logs or we can exit.The wizard just writes our configuration information into the configuration files for the Log Agent. These are located at /var/awslogs/etc/awslogs.conf and the log file configuration is written to the end of the file – see the screenshot below.
Step 2: Verify the setup in the console…When we set up the Log Agent we provided our key information and some data about the log group and stream. These have been used to set up the console for us. Connect to the console and navigate to Services>CloudWatch. Within the CloudWatch console we can open Logs, select our log group, select our stream and view the contents of the log that have been collected by the agent.
Step 3: Remove the undifferentiated heavy lifting (UHL)…In AWS speak the UHL are the tasks that we need to repeat many times but add very little value, typically things that are best done by a machine and automated to the point that they need an administrator to make a decision only in the case of an exception. In this case we don’t want to manually scan the alert log so we create filters and alarms in CloudWatch to do the work for us.Back in the Logs console you’ll see you can select the group and create a filter. In production we’d be primarily interested in errors starting with ‘ORA-‘, but as this is a brand-new-zero-errors database we don’t have any of those so we’ve checked the word ‘parameter’ which we know will occur a few times in the log at startup.Once the filter is created you can create an alarm to fire based on the filter. In this case you can see that when the metric is >1 (so the filter has found at least one occurrence of the word parameter) then the alarm fires and sends an email to our test address.
Of course this is AWS so sending an alert email to the admin team is only the start. You can perform actions on the instance, scale the application tier, run some fix code through Lambda – whatever is going to work for your application.At clckwrk we migrate Oracle database and applications to the cloud to lower your costs, improve performance and give you bullet proof availability. Contact us if you’re planning your own migration.