Oracle Performance Tuning: From Internals Analysis to AI Observability

September 9, 2024 | by dbsnoop

A deep technical guide for DBAs, SREs, and Data Architects.

A Senior Oracle Database Administrator (DBA) has a routine of high-pressure troubleshooting investigations. The day rarely starts calmly: it begins with the analysis of a 50-page AWR (Automatic Workload Repository) report, an immersion into raw ASH (Active Session History) data, or a tense navigation through Oracle Enterprise Manager screens while the phone won’t stop ringing—or the chat beeping, to use more current terms.

Oracle Database performance is not a metric dependent on static elements, unfortunately. On the contrary, constant fine-tuning becomes fundamental: a ceaseless hunt for cryptic wait events like gc buffer busy acquire or log file sync, and a constant battle against the Query Optimizer that decided to change an execution plan at the worst possible time.

An Oracle professional’s expertise is measured by their ability to interpret dense data and turn it into action. However, the fundamental problem with this traditional approach is that it is intrinsically reactive: you analyze the AWR after the CPU has exploded or check the ASH after the application has crashed.

This article contains a complete Oracle performance guide. We will dissect the advanced memory architecture (Keep/Recycle Pools), the time-based tuning methodology (Wait Interface), the complexity of Oracle Grid/RAC, and finally, how Artificial Intelligence (AI) and the dbsnOOp platform are transforming performance management from a manual and reactive art into a predictive and automated science.

1. Advanced Memory Architecture

Memory management in Oracle Database is a subject often misunderstood. Many DBAs blindly trust AMM (Automatic Memory Management) or ASMM (Automatic Shared Memory Management), defining a MEMORY_TARGET and hoping for the best. Although useful, these mechanisms treat memory as a unitary block, which can be disastrous for mixed workloads (OLTP + Batch).

LRU and Buffer Cache Pollution

The most critical component of the SGA is the Buffer Cache. By default, it operates with an LRU (Least Recently Used) algorithm. The idea is simple: keep the most accessed data in RAM and discard the old.

However, in practice, we encounter a problem: a single report query or backup that performs a Full Table Scan on a 500GB table can read thousands of blocks, mark them as “recently used,” and evict dozens of small lookup tables (e.g., Cities, Status, Settings tables) from memory that are vital for transactional performance. The result is a sudden increase in physical I/O (db file sequential read) for operations that should be instant.

Solution: Memory Segregation (Keep, Recycle, and Default)

True optimization begins when you stop treating the Buffer Cache as a single pool. Oracle allows you to subdivide the cache into three distinct areas to protect your critical data.

DEFAULT Pool: Compartment where data lives by default. Subject to standard LRU behavior.
KEEP Pool: The “VIP Area”. Designed for small, frequently accessed objects that must always remain in RAM, immune to the pollution of massive scans.
RECYCLE Pool: The “Express Exit Queue”. Designed for giant tables read sporadically. The goal is to discard these blocks from memory immediately after use, preventing them from stealing space from useful data.

Practical Implementation

Use the code below to find small objects with high frequency of logical access (“buffer touches”).

-- Identifies candidates for the KEEP POOL
-- Criteria: Objects < 100MB with more than 10,000 buffer touches
SELECT
    o.owner,
    o.object_name,
    o.object_type,
    COUNT(bh.objd) AS buffer_touches,
    ROUND(SUM(s.bytes) / 1024 / 1024, 2) AS size_mb
FROM v$bh bh
JOIN dba_objects o ON bh.objd = o.data_object_id
JOIN dba_segments s ON o.owner = s.owner AND o.object_name = s.segment_name
WHERE o.owner NOT IN ('SYS', 'SYSTEM')
  AND s.bytes < (100 * 1024 * 1024) -- Size filter
GROUP BY o.owner, o.object_name, o.object_type
HAVING COUNT(bh.objd) > 10000 -- Frequency filter
ORDER BY buffer_touches DESC;

Once identified, configure the memory and alter the objects. Remember: memory allocated to KEEP/RECYCLE is subtracted from DEFAULT.

-- 1. Configure pool sizes (Requires ALTER SYSTEM privilege)
ALTER SYSTEM SET DB_KEEP_CACHE_SIZE = 512M SCOPE = BOTH;
ALTER SYSTEM SET DB_RECYCLE_CACHE_SIZE = 2G SCOPE = BOTH;
 
-- 2. Move the objects (No downtime, but clears object cache)
-- Critical table for KEEP
ALTER TABLE app_core.status_codes STORAGE (BUFFER_POOL KEEP);
 
-- Gigantic Logs table for RECYCLE
ALTER TABLE app_logs.audit_trail STORAGE (BUFFER_POOL RECYCLE);

2. Time-Based Tuning (Oracle Wait Interface)

For decades, DBAs aimed for a 99% Buffer Cache Hit Ratio. However, this metric should not dominate your attention, as you can have a 99% hit ratio and a frozen database. This is due to the fact that the Buffer Cache Hit Ratio ignores service time.

The equation most suited to Oracle performance is:

Response Time = Service Time + Wait Time

When a query is slow, either burning CPU (calculation, parse, logic), or is stopped waiting for something (Disk, Network, Lock), the Oracle Wait Interface (OWI) is the mechanism that instruments these waits.

Main Wait Events

An expert must look at the predominant wait events in V$SESSION_WAIT or V$SYSTEM_EVENT.

Read I/O (The Disk Bottleneck)

db file sequential read: Single block read. Usually associated with Index Seeks.
- Cause: Fragmented indexes, outdated statistics, or ironically, excessive use of indexes on tables that should be scanned via Full Scan.
db file scattered read: Multi-block read. Classic sign of Full Table Scan.
- Cause: Missing indexes, non-sargable predicates (e.g., WHERE function(col) = val) or small tables where the scan is legitimate.

Concurrency and Application

enq: TX – row lock contention: One session wants to update a row already locked by another.
- Diagnosis: Not a database problem, it’s an application problem. Long transactions, lack of commits, or conflicting business logic.
log file sync: The session committed but is waiting for the LGWR (Log Writer) to write to disk.
- Cause: Slow Redo Log disks or “Excessive Commits” (committing inside a loop instead of in batches).

Diagnosis via SQL

To find what is killing the database right now, forget the slow Enterprise Manager and go straight to the source:

-- Real-time Top Wait Events (excluding idle sessions)
SELECT
    n.name AS event_name,
    count(*) AS session_count,
    SUM(seconds_in_wait) AS total_seconds_waiting
FROM v$session_wait w
JOIN v$event_name n ON w.event_id = n.event_id
WHERE w.wait_class != 'Idle'
GROUP BY n.name
ORDER BY session_count DESC;

3. Oracle Grid and RAC

Oracle Grid Infrastructure is the foundation that allows distributed computing. It is the “operating system” underneath Oracle RAC (Real Application Clusters) and ASM (Automatic Storage Management).

In large corporations, the Grid is essential for High Availability. If a node fails, the Grid moves services (VIPs) to another surviving node. However, RAC introduces a new dimension of latency: the Global Cache.

Linear Scalability

Adding nodes to a RAC does not guarantee linear performance. Nodes need to exchange data blocks via Interconnect (Private Network) to maintain cache consistency (Cache Fusion).

This generates RAC-specific wait events, such as:

gc buffer busy acquire: A node wants a block that is being modified on another node.
gc cr request: Consistent read request across the network.

Managing Oracle Grid requires monitoring not just the database, but private network latency, the voting disk, and the OCR. Diagnostic complexity triples in RAC environments.

4. Human Limitations

So far, we’ve seen powerful tools: Keep Pools, Wait Interface, Grid Control. The problem? All require manual intervention and deep expertise.

You cannot reconfigure the Keep Pool every hour as the load changes.
You cannot manually correlate a gc buffer busy spike with a specific query in real-time during a 2 AM incident.
The AWR is an autopsy; it tells you why the patient died, not how to save them.

Modern management requires Predictive Observability, and this is where dbsnOOp enters as a game-changer.

5. dbsnOOp: The Virtual DBA

dbsnOOp is an intelligence platform that acts as a cognitive layer over your Oracle Database.

Unified View

dbsnOOp centralizes the view of your monitored instances. The tool uses Machine Learning algorithms to create dynamic baselines and learns that it is normal for CPU to rise at 02:00 (backup), but if it rises at 10:00, it is an anomaly requiring an alert, even if it doesn’t reach the critical threshold.

Space Prediction (Capacity Planning): Instead of alerting when the Tablespace is 95% full, the AI analyzes the growth derivative.
- dbsnOOp Alert: “Tablespace USERS will burst in 4 days if the current ingestion rate (2GB/hour) continues. Recommendation: Add datafile or purge old data.”
SQL Regression Detection: Oracle changes execution plans (Plan Flip). dbsnOOp detects this instantly.
- dbsnOOp Alert: “Critical query XPTO_123 changed plan. I/O cost rose 400%. Previous plan used Index; new one does Full Scan.”

Predictive Analysis and RCA (Root Cause Analysis)

Instead of a vague alert, dbsnOOp correlates events.

Problem: High latency in Sales API.
AI Diagnosis: “Latency increased due to a spike in enq: TX – row lock contention wait on the INVENTORY table. The root cause is session SID 123, Serial# 9876 (user JOB_STOCK) which has been holding an exclusive lock (TX Mode 6) for 45 seconds, blocking other transactions.”
Suggested Solution: ALTER SYSTEM KILL SESSION ‘123,9876’ IMMEDIATE; and recommendation to review the Job’s query.

Query Performance

The platform understands the context of engine metadata and translates this into deep analysis, allowing professionals of different levels to make quick decisions without relying on complex system commands.

Text-to-SQL

Allows writing queries in natural language which are converted into queries executed on the system of your choice SQL. The result arrives in the form of a table within the platform itself. Thus, access to data from diverse technologies is not limited only to specialists.

Database Engineering Optimized for Oracle Database

The complexity of Oracle Grid, the volatility of execution plans, and the requirement for 24/7 availability make the manual approach unsustainable.

Adopting an AI Configuration strategy using dbsnOOp implies arming your DBA with the information they need. By automating collection, correlation, and preliminary analysis, AI frees the engineer to focus on architecture and strategy, while the platform takes care of predictive surveillance.

Stop optimizing the Buffer Cache blindly. Use access data to configure Keep Pools.
Stop looking at Hit Ratios. Use the Wait Interface to find real bottlenecks.
Stop reacting to failures. Use AI to predict and prevent incidents.

Transform your Oracle from an unpredictable black box into a high-precision engine.

Schedule a demo here.

Learn more about dbsnOOp!

Learn about database monitoring with advanced tools here.

Visit our YouTube channel to learn about the platform and watch tutorials.

Oracle Performance Tuning: From Internals Analysis to AI Observability

September 9, 2024 | by dbsnoop

A deep technical guide for DBAs, SREs, and Data Architects.

1. Advanced Memory Architecture

LRU and Buffer Cache Pollution

Solution: Memory Segregation (Keep, Recycle, and Default)

Practical Implementation

2. Time-Based Tuning (Oracle Wait Interface)

Main Wait Events

Read I/O (The Disk Bottleneck)

Concurrency and Application

Diagnosis via SQL

3. Oracle Grid and RAC

Linear Scalability

4. Human Limitations

5. dbsnOOp: The Virtual DBA

Unified View

Predictive Analysis and RCA (Root Cause Analysis)

Query Performance

Text-to-SQL

Database Engineering Optimized for Oracle Database

Recommended Reading

Read more

Implementing SRE for Databases: An Action Plan

How to Diagnose and Remove Bloat in PostgreSQL Tables and Indexes

HOME

PRODUCTS

SUPPORT

PARTNERS

COMPANY

Oracle Performance Tuning: From Internals Analysis to AI Observability

September 9, 2024 | by dbsnoop

A deep technical guide for DBAs, SREs, and Data Architects.

1. Advanced Memory Architecture

LRU and Buffer Cache Pollution

Solution: Memory Segregation (Keep, Recycle, and Default)

Practical Implementation

2. Time-Based Tuning (Oracle Wait Interface)

Main Wait Events

Read I/O (The Disk Bottleneck)

Concurrency and Application

Diagnosis via SQL

3. Oracle Grid and RAC

Linear Scalability

4. Human Limitations

5. dbsnOOp: The Virtual DBA

Unified View

Predictive Analysis and RCA (Root Cause Analysis)

Query Performance

Text-to-SQL

Database Engineering Optimized for Oracle Database

Recommended Reading

Read more

Implementing SRE for Databases: An Action Plan

How to Diagnose and Remove Bloat in PostgreSQL Tables and Indexes

UPGRADE YOUR OPERATION WITH AUTONOMOUS DBA