Written by: Nimesh Chakravarthi, Co-founder & CTO, Struct
Key Takeaways
- Alert fatigue causes teams to ignore most alerts, which increases outage risk and can cost up to $800K per incident. PagerDuty’s tools can reduce noise by 70–98% depending on configuration.
- Deduplication keys, event orchestration grouping, and clear severity policies can cut incident volume by 30–50% and improve resolution times by up to 40%.
- PagerDuty AIOps features such as auto-pause, suppression, tuned escalation policies, and Response Plays improve on-call coverage while reducing unnecessary interruptions.
- Struct adds proactive AI investigation on top of PagerDuty, reducing triage time by about 80% through automated root cause analysis from logs and metrics.
- Successful teams see around 50% noise reduction and resolution-time improvements that often match the 40% gain from standardized severity frameworks. See how Struct eliminates alert fatigue with automated investigation.
Before You Begin: Assess Your Alert Fatigue Baseline
Start with a clear picture of your current alert load. Use PagerDuty’s analytics dashboard and navigate to Analytics > Incidents to review weekly alert volume, false positive rates, and mean time to acknowledgment (MTTA). A VP Engineering at a Healthcare SaaS company reported that their on-call engineers receive 200+ pages per week, with maybe 5 being real incidents.
Identify your noisiest services by examining incident frequency patterns. Look for flapping alerts that trigger and resolve repeatedly within short timeframes, duplicate alerts from the same root cause, and low-severity alerts that rarely require immediate action. Document which services generate the most false positives, and involve your software engineering leadership in prioritizing which alerts to tune first. Once you have your baseline and priority list, you are ready to apply the seven configurations below.
Implementation Guidelines and Common Pitfalls
Protect critical coverage while you reduce noise. Avoid over-suppression that might hide serious issues from your team. Start with conservative deduplication and grouping rules, then expand them as you observe stable patterns. Never suppress alerts for core business functions without thorough testing and explicit stakeholder approval.
Roll out changes gradually so you can see the impact of each adjustment. Test new configurations on non-critical services first, then promote successful patterns to high-value systems. Maintain audit trails for suppression and grouping decisions, and document your reasoning so future team members understand why certain alerts behave differently.
Modern teams benefit from a hybrid approach. PagerDuty’s native AIOps features reduce raw alert volume, while intelligent automation platforms like Struct handle proactive investigation. Together they provide both noise reduction and faster incident understanding for engineering teams.
7 Step-by-Step Ways to Reduce Alert Fatigue in PagerDuty
1. Implement Deduplication with dedup_key
Deduplication stops multiple alerts for the same issue from creating separate incidents. In PagerDuty, navigate to Services > [Your Service] > Edit > Deduplication. Configure the dedup_key field using event properties such as {{event.summary}} or {{event.source}}:{{event.component}}. Test your deduplication rules in a sandbox environment before you deploy them to production.
Broad deduplication keys can merge unrelated incidents into a single ticket and hide the true scope of problems. Start with specific patterns, for example matching on exact service names and error codes, and then expand rules only after you see which alerts consistently share a root cause. Well-tuned deduplication often reduces incident volume by 30–50% for services that generate repetitive alerts.
2. Set Up Event Orchestration Grouping
Event Orchestration groups related alerts before incidents are created, which further reduces noise after deduplication is in place. Access Event Orchestration from your PagerDuty dashboard and create grouping rules based on cluster_key, severity levels, or service dependencies. Configure time windows for grouping related events, typically 5–10 minutes for most environments.
Group alerts by logical service boundaries rather than by individual technical components. For example, group all database connection alerts for a specific application instead of separating them by each database instance. This structure gives on-call engineers clearer incident context and reduces the number of separate incidents they must open.
3. Configure Urgency and Severity Policies
Clear severity classification improves response efficiency and reduces unnecessary wake-ups. PagerDuty states that standardized severity frameworks can improve resolution times by as much as 40%. Navigate to Services > Escalation Policies and map severity levels to specific notification methods.
Configure SEV-1 incidents for immediate phone calls and SMS because they represent complete outages that demand urgent action. Use push notifications for SEV-2 incidents that degrade functionality but do not fully stop service. Reserve email-only alerts during business hours for SEV-3 and lower issues that usually involve minor performance problems. Link each severity level to concrete business impact thresholds, such as percentage of users affected or revenue at risk, so your team classifies incidents consistently.
4. Enable AIOps Auto-Pause and Suppression
PagerDuty’s AI-powered alert grouping uses intelligent pattern recognition to significantly reduce alert noise. Open Event Intelligence from your PagerDuty dashboard and enable auto-pause for flapping alerts. Configure suppression rules for known transient issues such as scheduled maintenance windows or deployment-related alerts.
Start with conservative thresholds so you do not hide important signals. For example, suppress alerts only after they have flapped three or more times within 30 minutes. Review suppression performance weekly and adjust thresholds based on the balance between missed critical alerts and the noise reduction you achieve.
5. Tune Escalation Policies
Well-tuned escalation policies protect coverage while limiting unnecessary interruptions. Configure primary on-call with a 5-minute timeout, secondary on-call at 10 minutes, and manager escalation at 20 minutes. Use PagerDuty’s schedule override features for planned absences so you always have a clear backup.
Align escalation with severity. Critical SEV-1 incidents should escalate quickly through the chain, while SEV-3 issues can wait until business hours or use longer timeout periods. Adjust these intervals based on team size, regional coverage, and how often incidents truly require leadership involvement.
6. Use Response Plays and Analytics
Response Plays standardize common incident actions and reduce cognitive load during stressful moments. Create plays for frequent incident types that automatically add responders, open conference bridges, or update status pages. Use PagerDuty’s Responder Insights to track interruption frequency, acknowledgment rates, and time on-call.
Review analytics every week to spot patterns that contribute to alert fatigue. PagerDuty’s Responder Insights tracks interruptions by time of day, categorizing them into business hours, off hours, and sleep hours. Use this data to refine alert thresholds, escalation timing, and which incidents truly require waking someone at night. These six PagerDuty-native configurations address alert volume, but they do not remove the investigation work that follows each alert.
7. Integrate AI Automation with Struct
Struct automates the investigation phase so your team spends less time digging through dashboards. To move beyond reactive alert management, integrate Struct with your PagerDuty and Slack channels. Setup takes under 10 minutes, and Struct automatically pulls logs from Datadog, Sentry, and AWS CloudWatch to assemble likely root causes before you even open your laptop.
Struct proactively investigates every alert as it fires, correlating signals and building a clear incident timeline. This approach cuts triage time by about 80% compared with traditional manual investigation. Book a demo to see proactive incident resolution in action.
Supercharge Incident Response with Struct–PagerDuty Integration
Struct listens to your configured PagerDuty channels and investigates incidents in real time. When an alert fires, Struct queries your observability stack, correlates logs and metrics, and generates a concise timeline with likely root cause within minutes.
Companies using Struct report an 80% reduction in triage time, often turning 45-minute manual investigations into 5-minute reviews that confirm AI findings. The platform integrates with existing PagerDuty workflows and provides composable runbooks tailored to your infrastructure.
A Series A fintech company with strict SLAs adopted Struct and protected compliance while enabling junior engineers to handle on-call with confidence. AI-generated dashboards give every responder a strong starting point and reduce reliance on tribal knowledge that usually limits who can take the pager.
Struct maintains SOC 2 and HIPAA compliance, which supports sensitive and regulated environments. Unlike generic AI tools that wait for manual prompts during outages, Struct pushes findings directly into your Slack channels where your team already collaborates.
After you implement these PagerDuty configurations and optional Struct integration, establish a measurement framework to validate results and guide ongoing improvements.
Measure Success and Iterate
Track key metrics to confirm that your alert fatigue strategy works. Monitor weekly incident volume and aim for at least a 50% reduction in noise alerts. Measure MTTA and MTTR, and expect significant gains that often match the 40% resolution-time improvement associated with strong severity frameworks.
Use PagerDuty’s analytics to monitor responder interruption patterns and verify that sleep-hour alerts drop over time. Teams that apply these practices see a higher share of alerts that are truly actionable, so most notifications that reach engineers require real action.
Schedule weekly reviews with your on-call team to collect qualitative feedback. Ask about alert relevance, investigation time, and overall stress levels. Continuously refine your PagerDuty configurations based on this input. If your team still spends large amounts of time on manual investigation, explore automated investigation tools such as Struct to remove the remaining toil.
FAQ
How do you suppress an alert in PagerDuty?
To suppress an alert in PagerDuty, open the incident and click “Snooze” to pause notifications temporarily, or “Acknowledge” to stop escalation while keeping the incident active. For ongoing suppression, use Event Orchestration rules to filter specific event patterns, or configure auto-pause in Event Intelligence for flapping alerts. Always document suppression decisions and review them regularly so critical issues are not hidden.
What’s the difference between PagerDuty AIOps and Struct?
PagerDuty AIOps focuses on reactive alert management by grouping, deduplicating, and suppressing alerts after they fire. Struct focuses on proactive investigation by automatically analyzing every alert and producing root cause analysis and dashboards within minutes. AIOps reduces noise, while Struct removes most of the manual investigation work that follows, which cuts triage time by about 80% compared with traditional reactive approaches.
How long does it take to set up effective alert fatigue reduction?
Basic PagerDuty configurations such as deduplication and grouping usually take 2–3 hours to implement. Comprehensive alert fatigue reduction that includes AIOps features often requires 1–2 weeks of iterative tuning. Struct integration is quick to set up, as mentioned earlier, and provides immediate investigation automation once connected to PagerDuty, Slack, and your observability tools.
Is this approach secure for compliance-sensitive environments?
Yes, both PagerDuty and Struct support enterprise-grade security standards. PagerDuty offers SOC 2 compliance and supports SAML and SSO integration. Struct is fully SOC 2 and HIPAA compliant and processes logs ephemerally without persistent storage. For organizations that require on-premise deployment, compare your compliance needs with the security features each platform provides.
What if our logging and observability data quality is poor?
Alert fatigue reduction depends heavily on reliable telemetry data. If your logging lacks correlation IDs, clear severity levels, or consistent formatting, first improve your observability foundation. Struct can enhance investigation, but it still needs basic logging, trace IDs, and monitoring alerts to work effectively. Start by instrumenting core services, then expand coverage to supporting systems.
Conclusion
These seven PagerDuty configurations significantly reduce alert fatigue, and pairing them with AI-powered automation creates a smoother on-call experience. Schedule a demo and transform your incident response from reactive firefighting to proactive problem-solving.