Monitoring the working environment of IT systems not only allows you to reduce the cost of operating a failure but also allows you to optimize the cost of maintaining the entire infrastructure.
Environmental monitoring of facilities where data are processing is one of the most important aspects of prevention of disaster. Today, informatic systems based on ICT infrastructure support the operation of entire organizations, from manufacturing to management. Any break in IT systems implies a break in the business and thus leads to measurable financial losses for the business.
do not think about possible failure
84%experienced an unplanned break
91%However, research shows that 84% of infrastructure managers prefer not to let themselves think of a failure, and 91% of them experienced an unplanned break in the last 24 months.
Therefore, let's try to explain environmental monitoring and justify why it is so important to ensure the continuity of the work of information technology in enterprises.
In order to present the issue of monitoring we will try to answer some fundamental questions:
- what is environmental monitoring,
- why we should monitor environment,
- where to use environmental monitoring,
- what parameters should be monitored,
- how we should monitor.
What is environmental monitoring ?
In free translation of definitions found on the Internet, monitoring is a system of continuous or systematically repeated measurements and observations of the state of selected features and properties of the environment based on the system of measurement points using measurement and control apparatus, to provide information about current state and trends of changes in enviroment under the influence of external factors.
The definition looks very intricate and complicated, but it shows several features of the monitoring system:
- it is a system, so we should have tools to automate the monitoring and storage of measurements,
- the system performs measurements, so we perform actions that aim at obtaining the value (usually numeric or state) of the measured feature. The result of the measurement should not be of the type: heat, cold, good, bad,
- the measurements are made continuously or systematically, which implies the need to automate the entire monitoring process, we observe the state of the selected features and properties of the environment, which means that we know what parameters we monitor,
- we get information about the current state of the environment because there are always external factors that may affect our environment.
Why we should monitor environment ?
According to the definition, the purpose of monitoring is to provide information on the current state and trends of change. These two pieces of information show that the environmental monitoring system can be used for two purposes:
- information about exceeding safety values of measured parameters,
- Analyzing data to find trends and information to optimize the work of the infrastructure.
Exceeding the safe values of the measured parameters usually results in failure of the information systems, therefore environmental monitoring is part of proactive action to prevent failures in the information environment..
The ability to analyze collected data is an additional feature of the monitoring system resulting from the ability to store and aggregate measurement information and is an important element in the process of optimizing infrastructure and IT environments.
Monitoring system as a tool to prevent outages
It is estimated that in 2014 the data center market in the world will be worth more than 6 billion dollars. In 2013, worldwide there were over 500 thousand data centers.
It is estimated that if all data centers in the world have a break in the operation for one hour at the same time that the losses have reached a value of 69 tryllions of dollars.
Due to the avalanche increase of the data processed by data center facilities, has significantly increased the cost of a break in the action of such objects. The average cost of minute break in the action of data center in 2013 increased by 41% compared to 2010.
Outages in data centers, in most cases, are caused by:
- failures in cooling systems,
- failures of emergency power systems,
- failures of IT equipment.
Failures in cooling systems in the majority of cases are caused by failures of the chillers and CRACs and leaks of chilled water installation. Any failure of the air conditioning system directly leads to a reduction of the quality of environmental parameters in the server room, in particular temperature and humidity settings.
Failures of emergency power systems in the vast majority of the cases are due to loss of UPS battery capacity. Losses of battery capacity is widespread, but keeping the batteries in unsuitable environmental conditions leads to a drastic shortening of their life.
Failures of IT equipment are caused not only by the wear of the components in servers or storage systems. IT equipment which is working in an environment where parameters are not properly maintained have its lifetime significantly reduced. Invalid values of parameters such as temperature, humidity of the data center environment can be caused by both the infrastructure failures and mistakes in configurations of devices.
It should also be borne in mind that the monitoring system can also prevent failures in environments that have implemented fault tolerant infrastructure solutions. An example would be a server room with a redundant air conditioning system. However, as it turned out, the two heater modules were not used in the two units in the external aggregates. The result was a simultaneous failure of both installations during the cold winter leading to overheating of the server infrastructure and consequent interruption in IT services.
Another example demonstrating the need for an environmental monitoring system may be a server room that houses a server-network infrastructure maintained by the UPS system with additional batteries. Additional batteries enabled the entire system to run for hours without using main power. When planning this solution, it was forgotten that in the event of a power failure, the air conditioning system that is not supported by the UPS will be shut down. The consequence of the multi-hour power outage (broken power lines due to summer storms) was overheating of the server systems, their automatic shutdown, and the failure of the storage system.
Monitoring system as an optimization tool
The largest part of the operating costs of data centers is the cost of electricity. By analyzing the energy consumption of data centers, it can be seen that the average cost of electricity decomposes as follows:
- 10% - energy losses associated with the distribution of energy
- 50% - energy consumption by IT equipment,
- near 40% - power consumption for cooling systems,
- below 1% - other systems such as lighting, BMS.
distribution of energy
IT equipment
cooling systems
other systems
The data presented show how important it is to maintain optimal environmental conditions such as temperature, airflow in data centers and server rooms. Maintaining low temperatures leads to a significant increase in energy consumption, but on the other hand, maintaining a high temperature significantly increases theconsumption of IT infrastructure and leads to failure. The proper management of air conditioning systems allows for significant reduction of electricity costs, while guaranteeing safe values of environmental parameters.
Properly implemented monitoring system also allows for the creation of thermal profiles of servers cabinets and the devices installed in them. Observing, among others the difference in temperature at both the inlet and outlet of the servers racks allows you to identify devices that generate too much heat compared to other appliances. Correlation of such information with the load on IT systems installed on such devices can help justify replacement of newer generation devices with lower power consumption. Taking into account that in the longer term the price of electricity will increase such an exchange can reduce the cost of maintaining the IT infrastructure.
If you have a properly implemented monitoring system that allows you to store and process the collected information, you can optimize it based on the collected data. However, in many cases where older monitoring systems are implemented, there is no way to store the collected measurements. In such cases, additional devices ( Data Logger) are in place to collect, store and process data. It is worth noting that such tools do not have additional functionality that has a monitoring system. It is therefore worthwhile to recalculate whether or not it is more cost-effective to implement a modern monitoring system working as a failure prevention system and as a tool for collecting and analyzing measurement data.monitoring as a failure prevention system and as a tool for collecting and analyzing measurement data.
Keep in mind that monitoring of environmental parameters should be a continuous process leading to constant collection of measured values and their analysis. Servers running in server rooms or data centers continually change their load, depending on, for example, the load generated by users. IT infrastructure also undergoes a continuous modernization and replacement process, which results in different characteristics of the work of each object, and it is impossible to plan the most reliable time interval in which measurements will be made to analyze and draw conclusions about, for example, the possibility of reducing power consumption.
Recently, numerical methods for computational fluid dynamics (CFD) has become increasingly used for servers rooms and data centers design. These tools allow to simulate server room work for given parameters, ie. The shape of the room, the number and type of servers, the type and power of the air conditioner, etc. However, it should be remembered that such models have constant input parameters such as installed IT equipment, power of air conditioning equipment, etc. These models do not take into account the dynamics of IT systems which take pllace during normal operation of systems. Hence, monitoring of actual environmental parameters is essential for obtaining real knowledge about the current state of the infrastructure.
Where to use environmental monitoring ?
IT infrastructure supports the implementation of business processes and any interruption in its operation leads to breaks in part or whole enterprise. Therefore, environmental monitoring of IT infrastructure should be implemented in all locations where the network infrastructure is installed.
It should be noted that environmental monitoring should not be only for systems in the company's main server room. Monitoring should be carried out at all locations where the company's IT infrastructure, such as branches, subsidiaries, access points / nodes, etc. is installed. The monitoring system should support implementation on both individual and distributed infrastructure.
What parameters should be monitored ?
Generally speaking, all possible parameters should be taken into account that may affect the continuity of server room infrastructure and IT infrastructure.
The most known environmental parameters measured in server rooms and data centers are:
- temperature,
- humidity.
Temperature monitoring is one of the basic topics in monitoring the environment in data centers and server rooms. Maintaining the correct temperature level prevents overheating and ensures normal equipment operating conditions. It also contributes to optimizing the cost of electricity used by air conditioning systems to cool the infrastructure.
Data center and server rooms monitor humidity as important as temperature monitoring. Maintaining adequate humidity levels can prevent ESDs from occurring at low humidity, as well as from condensation and corrosion of components with excessive humidity.
In Data Center facilities the standard parameters which it should be monitored are also:
- measurement of load of power of lines supplying power for the servers racks,
- fluid leakage,
- the presence of smoke,
- fuel level in the aggregate tank,
- opening doors in server cabinets.
Measurement of current flowing in power lines is one of the most important aspects of planning the continuity of server rooms and data centers. Overloading of power lines can cause a break in the supply of power to the information systems, and as a consequence, a break in the provision of services. In most cases, measurements are made on the Power Distribution Unit and on the main electrical switchboard. However, do not forget to monitor the load of power lines in lower level switchgears or in busbar lines.
in the part of the objects you can meet environmental monitoring solutions based additionally:
- air flow sensors,
- shock sensors,
- sound level sensors.
Monitoring the airflow in a server room or data center allows for optimal configuration of air conditioning units and allows for proper placement of equipment with high cooling requirements. As a result, active airflow monitoring gives the opportunity to optimize the total cost of electricity used to cool information systems.
Shock sensors detect movement or interference with equipment inside the server rack. In addition, if you have a server room in the area of heavy industry production halls, sensors can detect shocks near sensitive equipment such as storage devices.
In most cases, IT devices signal problems or breakdowns by audible tones and / or by increasing the speed of the fans. These abnormalities are detected by the sound level sensors, thus allowing a much faster response to potential equipment damage.
How we should you monitor ?
Not so long ago, environmental monitoring systems were not part of the infrastructure of a server room or data center, but were closely related to the building project of the server room. Such a situation caused that the implemented system did not allow to obtain reliable information about the state of the server infrastructure. In addition, this caused a lot of trouble for the later modernization of the monitoring system in order to adapt to the changing IT infrastructure.
The need for additional cabling or upgrading within a server room makes that in most of cases measurement points are not chosen so as to obtain the most reliable and necessary information, but are the result of technological constraints imposed by the monitoring systems used.
Modern monitoring systems based on wireless technologies allow for the installation of monitoring in places where the most valuable information about the environment of a server room or data center can be obtained.
The server room or data center can be divided into three zones in a simplified way:
- servers racks zone,
- server room zone,
- zone of raised floor.
Zone of servers racks and closed isles
Zone of racks and enclosed corridors directly responsible for monitoring the environment closest to the IT infrastructure. Detection of threats in this area is supported by following sensors:
- temperature,
- humidity,
- sound level,
- shocks.
In accordance with the recommendations of The American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE), every rack should be monitored for at least 6 temperature sensors (top, middle, bottom at the front and back of rack) in order to ensure adequate safety for equipment installed in rack.
In typical implementations it is recommended to monitor the temperature of each rack in three points:
- in front on bottom of the rack, to check the temperature of the cold air arriving to the rack,
- in front on top of the rack, to see if cold air reaches the top of the rack,
- in the back on top of the rack, to check the maximum temperature, which is generated by the equipment mounted in the rack.
It is widely accepted that the supply air temperature to the rack (cold corridor) should be between 18°C - 27°C. The temperature coming out of the rack should not be greater than the supply air temperature plus 20°C.
Server room zone
Server room zone is responsible for the environmental conditions throughout the server room. The key parameters to be monitored in this zone are:
- temperature,
- humidity,
- airflow,
- smoke.
As recommended by ASHRAE humidity of server room should be between 40% and 60% rH. Too dry air will cause the appearance of electrostatic discharges (ESD). Too moist air will cause condensation of water vapor and begin the process of corrosion of components. Another parameter, that should be monitored is the dew point temperature. It is the temperature at which water vapor contained in the air begins to condense on the devices. ASHRAE defines the lower limit for the dew point as 5°C and the upper limit at 15°C with a maximum humidity of 60%.
Air temperature in the server room very much depends on whether the data center uses a system of closed corridors. If the data center uses a closed cold aisle, air temperature in the server room can reach 37°C. However, such high temperatures leave little margin in the event of failure of the air conditioning system.
Therefore, the monitoring of temperatures in closed isles and server room with their trends is necessary to ensure appropriate working conditions for the IT infrastructure.
Zone of raised floor and other devices
The raised floor area, and other devices usually there are equipment and systems directly support the activities of the data center infrastructure. The key parameters to be monitored in this zone are:
- leakage of fluid,
- monitoring of load of power lines.
Flooding is probably the most well known threat to the infrastructure of data centers and server rooms so monitoring fluid leaks is one of the most important aspects in their daily running. Leaks can be caused by many different factors such as an untight roof, leaky air conditioning system or bursts in water pipes. Since most server rooms have raised floors these leaks often remain undetected for some time which can cause significant damage to the equipment. Water detection sensors if installed properly and close to potential sources of leakage can alert staff to even tiniest escapes of water thus preventing any unnecessary damage to the infrastructure.
Is it worth it ?
An environmental monitoring system should be implemented in every IT environment.
It is very important part for the proper functioning of business continuity procedures of work of IT environments in enterprises.
In the long run, the implementation of a modern system of monitoring the working environment of IT systems not only allows to reduce the cost of failure, but also enables the optimize the cost of maintenance of the entire infrastructure.