Listly by edureka.co
Nagios is one of the most widely used tools for Continuous Monitoring. Since organizations are now releasing softwares more frequently than ever so there is a dire need for a tool that can monitor the functioning of the softwares and provide teams with the relevant feedback. This is one of the reasons that brought Continuous Monitoring into the picture. This makes Nagios a very important tool to implement DevOps. So below is the list of Nagios Interview Questions. I have collected these questions after doing a lot of research and after discussing with some of the experts who are directly involved in the hiring process.
I will advise you to follow the below explanation for this answer:
Begin this answer by defining Plugins.
Plugins are scripts (Perl scripts, Shell scripts, etc.) that can run from a command line to check the status of a host or service. Nagios uses the results from the plugins to determine the current status of hosts and services on your network.
Once you have defined Plugins I will suggest you to explain why we need plugins.
Nagios will execute a Plugin whenever there is a need to check the status of a host or service. The plugin will perform the check and then simply returns the result to Nagios. Nagios will process the results that it receives from the Plugin and take the necessary actions.
For this answer first give a small definition of NRPE.
The NRPE addon is designed to allow you to execute Nagios plugins on remote Linux/Unix machines. The main reason for doing this is to allow Nagios to monitor “local” resources (like CPU load, memory usage, etc.) on remote machines. Since these public resources are not usually exposed to external machines, an agent like NRPE must be installed on the remote Linux/Unix machines.
Now I will advise you to explain the NRPE architecture on the basis of diagram shown below.
The NRPE addon consists of two pieces:
The check_nrpe plugin, which resides on the local monitoring machine.
The NRPE daemon, which runs on the remote Linux/Unix machine.
There is a SSL (Secure Socket Layer) connection between monitoring host and the remote host as shown in the diagram.
Both Configuration and Logs can be stored in a backend. Configurations are stored in backend using NagiosQL. Historical data are stored using ndoutils. In addition, you also have nagdb and opdb.
Passive checks are initiated and performed by external applications/processes and the Passive check results are submitted to Nagios for processing.
Now I will advise you to explain the need for Passive check.
Passive checks are useful for monitoring services that are Asynchronous in nature and cannot be monitored effectively by polling their status on a regularly scheduled basis. It can also be used for monitoring services that are Located behind a firewall and cannot be checked actively from the monitoring host.
Make sure that you stick to the question during your explanation so I will advise you to follow the below mentioned flow:
Nagios check for external commands under the following conditions:
At regular intervals specified by the command_check_interval option in the main configuration file or,
Immediately after event handlers are executed. This is in addition to the regular cycle of external command checks and is done to provide immediate action if an event handler submits commands to Nagios.
For this answer first point out the basic difference Active and Passive check.
The major difference between Active and Passive checks is that Active checks are initiated and performed by Nagios, while passive checks are performed by external applications.
If your interviewer is looking unconvinced with the above explanation then I will suggest you to also mention some key features of both Active and Passive checks:
Passive checks are useful for monitoring services that are:
Asynchronous in nature and cannot be monitored effectively by polling their status on a regularly scheduled basis.
Located behind a firewall and cannot be checked actively from the monitoring host.
The main features of Actives checks are as follows:
Active checks are initiated by the Nagios process.
Active checks are run on a regularly scheduled basis.
Interviewer is expecting an answer related to the distributed architecture of Nagios so I will suggest you to answer it in the below mentioned format:
With Nagios you can monitor your whole enterprise by using a distributed monitoring scheme in which local slave instances of Nagios perform monitoring tasks and report the results back to a single master. You manage all configuration, notification, and reporting from the master, while the slaves do all the work. This design takes advantage of Nagios’s ability to utilize passive checks i.e. external applications or processes that send results back to Nagios. In a distributed configuration, these external applications are other instances of Nagios.
I will suggest you to first mention what this main configuration file contains and its function.
The main configuration file contains a number of directives that affect how the Nagios daemon operates. This config file is read by both the Nagios daemon and the CGIs (It specifies the location of your main configuration file).
Now you can tell where it is present and how it is created.
A sample main configuration file is created in the base directory of the Nagios distribution when you run the configure script. The default name of the main configuration file is nagios.cfg, it is usually placed in the etc/ subdirectory of you Nagios installation (i.e. /usr/local/nagios/etc/).
I will advise you to first explain Flapping first.
Flapping occurs when a service or host changes state too frequently, this causes lot of problem and recovery notifications.
Once you have defined Flapping explain how Nagios detects Flapping.
Whenever Nagios checks the status of a host or service, it will check to see if it has started or stopped flapping. Nagios follow the below procedure to do that:
Storing the results of the last 21 checks of the host or service analyzing the historical check results and determine where state changes/transitions occur.
Using the state transitions to determine a percent state change value (a measure of change) for the host or service.
Comparing the percent state change value against low and high flapping thresholds
A host or service is determined to have started flapping when its percent state change first exceeds a high flapping threshold.
A host or service is determined to have stopped flapping when its percent state goes below a low flapping threshold.
To continue reading more questions, you can click here