{"id":1302,"date":"2017-05-02T22:29:35","date_gmt":"2017-05-02T22:29:35","guid":{"rendered":"https:\/\/w2.cleardb.net\/?p=1302"},"modified":"2022-09-22T14:41:42","modified_gmt":"2022-09-22T14:41:42","slug":"best-practices-for-monitoring-and-measuring-data-center-performance","status":"publish","type":"post","link":"https:\/\/www.navisite.com\/blog\/best-practices-for-monitoring-and-measuring-data-center-performance\/","title":{"rendered":"Best Practices for Monitoring and Measuring Data Center Performance"},"content":{"rendered":"

IT professionals are acutely aware just how closely tied their data center infrastructure performance is to their business performance in our digitally-driven world. Technology consumers \u2013 employees, suppliers, customers, and prospects<\/em> \u2013 expect highly available, fast and responsive interactions from all the systems they touch. As a result, IT professionals have critical roles in empowering the strategic success and tactical effectiveness of many businesses today. Accordingly, it is vital for IT to know what hardware and software metrics to monitor, and to understand how these metrics relate to each other. This enables IT to continuously optimize the infrastructure that empowers businesses to achieve their goals and objectives.
\nIn addition to knowing what metrics to monitor, cloud administrators often conduct before\/after and A\/B tests on pre-optimized resources to compare these metrics with metrics from production infrastructure. These tests measure the effectiveness of tuning strategies and performance solutions. In public clouds, it is simple and cost-effective to provision such testing resources.
\nThe tests and metrics used to monitor the productivity of IT infrastructure are generally grouped into three categories; quantity measures, quality measures, and responsiveness measures. These groups are applied to every layer of the IT infrastructure stack; from operating systems, CPUs, storage tiers, and networks; to the efficiency and effectiveness of application code, computing services, and databases.<\/p>\n

    \n
  1. Quantity measures<\/strong> track the amount of work being done by some component of the infrastructure stack. These measures are referred to as \u201cthroughput\u201d metrics, and they are usually represented by an absolute number for some unit of time. For an application, throughput is generally measured by the number of concurrent processes managed per minute or second; whereas throughput for a database server is often represented by the number of queries executed per second. For a web server, the number of client requests successfully processed per second is a common measure of throughput.<\/li>\n
  2. Quality measures<\/strong> look at the success or failure of process and application (workload) operations. For those executed correctly, the metrics represent the percentage of total work that is processed successfully. Error metrics, in comparison, capture the number of failed or erroneous results. They are commonly expressed as an error rate for some unit of time, or they are normalized by the process\u2019s throughput to yield the number of errors per a unit of work.<\/li>\n
  3. Responsiveness measures<\/strong> quantify how efficiently an infrastructure component completes its work. In essence, the speed of an end-to-end operation. Such measures are generally referred to as \u201clatency\u201d metrics; and they are usually expressed as an average or as a percentile of processing time. Latency might measure the time when a client issues a transaction until it receives a response, or it might measure when a database receives the request until it queues its response. As an example, latency is often shown as the percentage of operations completed within a unit of time, such as \u201c97% returned within 0.3 seconds<\/em>.\u201d<\/li>\n<\/ol>\n

    The challenge in monitoring these metrics is that the performance of multiple infrastructure components is interrelated. Network capacities and speed, the number of cores and power of CPU\u2019s, the efficiency of application code, the levels of contentions for shared computing resources; and the various configurations of hypervisors, databases, and other computing services can all impact performance capabilities. As a result, focusing on just one layer of the data center infrastructure stack without considering the multi-dimensional impact it has on the others, can negate the effectiveness of performance solutions and tuning strategies. Accordingly, multiple metrics are monitored from each group.
    \nTherefore, it is very helpful to use application and system monitoring tools to stay ahead of potential issues. These tools provide alerts to application and hardware problems, often before they are noticed by end users. Lists of various monitoring tools can be found here<\/a> and here<\/a>.
    \nSo, what are these tools measuring and monitoring?
    \nAs you know, computer systems have several types of physical resources \u2013 CPU, volatile memory, network, and persistent storage<\/em> \u2013 which collectively affect data center performance. Those resources also impact application performance as well. And, it is the level of application performance that determines how the data center is judged in achieving its strategic performance goals and objectives. \u2026a data center with low operating costs and efficient power usage is still considered a failure if it cannot protect its data or meet its applications\u2019 quantity, quality and responsiveness targets<\/em>\u2026
    \nConsequently, monitoring tools continually measure the data center\u2019s:<\/p>\n