Monitoring System

Production-hardened monitoring with tiered thresholds, sliding windows, and ratio-based domain protection

1. Tiered Mailbox Thresholds

We use a two-tier system to catch issues before they cause irreversible damage:

⚠️

WARNING State

3 bounces / 60 sends

~5% bounce rate

Action: Mailbox transitions to warning state. Logging intensifies. Operators alerted.

🛑

PAUSE State

5 bounces / 100 sends

5% bounce rate threshold

Action: Mailbox paused immediately. No new emails sent. Cooldown period begins.

Why Two Tiers?

Early warning at 3/60 gives operators time to investigate before the mailbox is paused at 5/100. This prevents surprises and allows manual intervention.

1b. Minimum Volume Requirement

Enforcement thresholds only activate after a mailbox reaches a minimum send volume. This prevents the system from overreacting to statistically insignificant data.

📊

Why minimum volume matters

A mailbox that sent 4 emails and received 1 bounce has a 25% bounce rate — but this is not statistically meaningful. One bounce at low volume does not indicate a deliverability problem. Pausing mailboxes based on tiny sample sizes would constantly churn accounts during early warmup.

The three enforcement triggers:

TriggerConditionPurpose
Percentage3%+ bounce rate AND 60+ sendsCatches sustained problems after sufficient data
Absolute5+ bounces in sliding windowSafety net regardless of send volume
Early warning3+ bounces within first 60 sendsFlags risk before percentage trigger activates

Example: Low Volume vs High Volume

4 sends, 1 bounce (25%)

Below 60-send minimum. No enforcement action. The system monitors but waits for more data before making decisions.

Result: Mailbox stays active

80 sends, 3 bounces (3.75%)

Above 60-send minimum and above 3% threshold. The bounce pattern is statistically significant.

Result: Mailbox paused, removed from campaigns, enters healing

2. Sliding Window Logic

Instead of hard resetting stats to 0/0 after 100 sends, we use a sliding window that keeps 50% of past data.

Old Behavior (Hard Reset)

100 sent, 6 bounces → reset → 0 sent, 0 bounces
Problem: Volatility patterns erased

New Behavior (Sliding Window)

100 sent, 6 bounces → slide → 50 sent, 3 bounces
Benefit: Volatility preserved, reputation context maintained

Impact

A mailbox with a history of bounces won't suddenly appear "clean" after 100 sends. The sliding window ensures reputation tracking reflects reality.

3. Ratio-Based Domain Protection

Domain health is calculated using percentage of unhealthy mailboxes, not absolute counts. This allows the system to scale from small teams (3 mailboxes) to large agencies (200+ mailboxes).

ThresholdPercentageAction
WARNING30% unhealthyDomain enters warning
PAUSE50% unhealthyDomain paused, all mailboxes blocked

Scaling Example

Total MailboxesUnhealthyPercentageStatus
3133%⚠️ Warning
10220%✅ Healthy
10550%🛑 Paused
301033%⚠️ Warning
20011055%🛑 Paused

Why Ratios?

Absolute thresholds don't scale:

  • • With 3 mailboxes, losing 2 is catastrophic (67% failure)
  • • With 30 mailboxes, losing 2 is negligible (7% failure)
  • • Ratio-based logic adapts automatically as infrastructure grows

Monitoring Dashboard

The monitoring dashboard provides real-time visibility into:

Mailbox Metrics

  • • Current status (healthy, warning, paused)
  • • Window bounce count (e.g., 3/60)
  • • Total sends and bounces
  • • Cooldown expiry time

Domain Aggregations

  • • Total mailboxes vs unhealthy count
  • • Unhealthy percentage
  • • Domain status (healthy, warning, paused)
  • • Average risk score across mailboxes

🎯 Production-Hardened

These monitoring refinements are based on real-world outbound operations:

  • Tiered thresholds prevent surprise pauses
  • Sliding windows maintain reputation context
  • Ratio-based domains scale with infrastructure
  • • All thresholds are tunable in Configuration