Is Monitoring Killing Your Performance? Real-World Cases You Need to Know

June 6, 2025 | by dbsnoop

Is Monitoring Killing Your Performance? Real-World Cases You Need to Know

Monitoring and observability tools are essential for ensuring system health and performance — but what happens when they become the very cause of slowness, locks, or even downtime?

Yes, it happens. More often than you might think.

This article presents real-world examples of when monitoring became the offender in the very systems it was meant to protect — especially in database environments — and how you can avoid falling into the same trap.

The Problem: Intrusive Monitoring

Many tools collect data using:

  • Aggressive polling
  • Queries on heavy views
  • Agents that consume excessive resources (CPU, I/O, locks)
  • Scraping at short intervals without adaptive control

In mission-critical environments, any additional resource consumption can be costly — causing exactly the symptoms you’re trying to avoid: slowness, deadlocks, I/O wait, resource escalation, and false alarms.

Many tools neither know how to identify the best times to perform more aggressive data collection nor have the intelligence to do it in batches, pause when resource usage escalates, or prevent harm to the monitored asset.

Documented Real-World Cases

1. Datadog + PostgreSQL / MySQL

  • The postgres.d and mysql.d integrations (Datadog Agent) ran frequent queries on pg_stat_activity, pg_stat_statements, information_schema.tables, etc.
  • Real impact: increased latency in OLTP queries, unwanted locks, and elevated CPU usage on underprovisioned servers.

Source: GitHub Issues and Datadog Docs

2. Zabbix Monitoring MySQL in Production

  • Zabbix default templates execute SHOW FULL PROCESSLIST, SHOW ENGINE INNODB STATUS, and SELECT COUNT(*) queries on large tables.
  • During peak hours, these caused critical slowdowns in databases handling thousands of transactions per minute.
  • Reports from DBAs in the MySQL Brazil community and events like Percona Live confirm this pattern.

3. Filebeat + Metricbeat in High Log Rotation Environments

  • In applications with intensive logging (e.g., databases with binlog enabled), Beats generate continuous disk reads and local data compression, consuming CPU and increasing I/O wait.
  • This caused automatic scaling of instances and higher costs without visible benefits.

4. Oracle OEM (Enterprise Manager)

  • Even in enterprise solutions like Oracle OEM, data collection from views such as DBA_HIST and ASH impacted backup routines and nightly jobs.
  • In some environments, DBAs moved data collection to replicas or reduced its frequency to avoid interference.

5. Prometheus + mysqld_exporter / postgres_exporter

  • Collecting detailed metrics every 15 seconds (e.g., lock histograms, query latency) without adaptive throttling.
  • This caused overload on Prometheus and on the monitored instances themselves in companies running dozens of databases simultaneously.

Source: GitHub Issues of the Exporters

But why does this happen?

Most of these tools don’t differentiate between production and test environments. They start from a generic approach, with:

  • Focus on collecting the maximum amount of data,
  • Short scraping intervals,
  • No real-time workload awareness,
  • And often running queries directly on the primary instance, without using replicas or derived metrics.

Consequences and repercussions:

  • Additional I/O competing with user queries.
  • Locks on internal views and tables (information_schema, performance_schema, pg_stat_*).
  • Memory consumption by the collection agent, interfering with the database buffer pool.
  • False alarms triggered due to the monitoring tool’s own impact.
  • DBAs or SREs spending hours hunting a “culprit” that, in the end, is the monitoring tool itself.

Best practices:

  • Use asynchronous data collection whenever possible.
  • Rely on native, non-intrusive metrics (performance_schema, pg_stat_statements, cached system views).
  • Avoid agents with a high footprint. Prefer lightweight collection with adaptive throttling.
  • Leverage secondary replicas for collecting analytical data.
  • Implement monitoring of your monitoring system. Yes, exactly: track how much CPU/RAM/disk your observability stack is consuming.

Thought of the day: observability that hinders is not observability.

If the monitoring tool is heavier than the actual workload, you have a new problem — not a solution.

If you want to find out whether your current stack is causing this kind of side effect, talk to the Flightdeck team. Built for high-criticality environments, it provides real visibility without creating significant load on the monitored database — and without promising flashy dashboards at the expense of performance.

Visit our YouTube channel to learn about the platform and watch tutorials.

Schedule a demo here.

Learn more about Flightdeck!

Learn about database monitoring with advanced tools here.

Share

Read more

MONITOR YOUR ASSETS WITH FLIGHTDECK

NO INSTALL – 100% SAAS

Complete the form below to proceed

*Mandatory