Service Engine Status Check (Service Check)
  • 03 May 2023
  • 2 Minutes to read
  • Dark
    Light
  • PDF

Service Engine Status Check (Service Check)

  • Dark
    Light
  • PDF

Article Summary

Service Check Category: Netreo Checks

Passive: No

Description

This service check is automatically added to a newly created service engine by Netreo's internal logic (that is, not through a device template or auto-configuration rules), and is generally not intended to be configured manually by a user.

This service check monitors multiple aspects of a service engine. These are listed below, then explained further down.

  • Minimum resource sizing requirements
  • Essential running processes
  • Database corruption

These aspects are checked in the order listed according to the schedule configured for the service check itself. If any are found to be failed, the check returns as failed and does not continue checking the other aspects. This first failure becomes the subject of the alarm.

Minimum resource sizing requirements

If the resources available to a service engine appliance are not adequate, the service engine may not function properly. These minimum requirements are based on the number of devices being monitored by a given service engine.

  • 500 or greater devices - 8 CPU cores and 16GB RAM minimum
  • 1000 or greater devices - 16 CPU cores and 32GB RAM minimum
  • 5000 or greater devices - 32 CPU cores and 32GB RAM minimum

If the resource minimums for a service engine under the given load are not met, the check will return as failed.

Essential running processes

There are several processes that must be active on a service engine for it to operate properly. The required processes are different depending on which types of services are running on the service engine. (See Service Engine for more information on what services are available to be run on a service engine.)

  • For the remote collector/remote poller service
    • pollmaster
    • oam
    • mysql
  • For the traffic collector service
    • nf_worker
    • nf_listen
    • nf_result
    • nf_cache
    • mysql
  • For the log collector service
    • snmptr_syslog
    • syslog_listen
    • eventlog_worker.pl

If any of the processes required for the services assigned to a service engine are not running, the check will return as failed. (Service engines running more than one service require all applicable processes for all running services to be active.)

CRITICAL SOFT alarms for failed processes
These monitored processes occasionally restart. A restarting process may show in the UI as a CRITICAL SOFT state alarm on the Services tab of the Device Dashboard and may safely be ignored while in the SOFT state. Restarting processes generally finish starting before the check advances to the CRITICAL HARD state (which is when an alert notification is typically sent), so a CRITICAL HARD state alarm for a process is generally indicative of a problem. See Service Check for more information on CRITICAL SOFT and CRITICAL HARD states for service checks.

In addition to the running processes check, a TCP/443 port status check is also made to ensure that the required ports for service engine communication are open and available.

Database corruption

If any tables in the local databases of the service engine are found to be corrupt, the check will return as failed and will not continue to look for additional corrupt tables. The name of the corrupt table will be indicated in the alarm.

Check-specific Fields

(See Service Check for configuration parameters common to all service checks.)

  • DESCRIPTION
    (Required) This field specifies a name for this check. The name entered must be unique among service check names on the host it is added to (the name may used again only on a different host). It is used to identify this specific check from among other service checks added to the same host.

Was this article helpful?