Trust Architecture for Agentic AI

The Problem

Sarah's Monday morning

Sarah is a field service engineer (FSE) with 7 years of experience. Her shift starts at 8 AM. She opens her inbox to 47 alerts that fired overnight. She doesn't know which are critical, which are noise, or which five are the same root cause repeating. She spends 45 minutes manually triaging before she even walks to a tool.

Meanwhile, production waits. That gap between having data and knowing what to do with it was what I set out to solve.

217

Equipment across 4 zones

1,116

Interrupts / week from one tool

85%

Alarm ignore rate (ISA-18.2 flags >10%)

$1.2M

Annual waste from inaction

Discovery

Going forensic

12 interviews across four roles, a week embedded in the fab, 468,000 alarm records analyzed, and a fragmented tool landscape audited.

Three users, one platform

Field Service Engineer

Sarah Chen

"Which tool do I fix first?"

Diagnose Agent tool-level investigation with evidence

Fleet Manager

Rene Schmidt

"Are we meeting SLA across zones?"

Monitor Agent fleet grouping and zone SLA

Apps Engineer

James Okafor

"Where is the system failing and why?"

Learning Agent override patterns, accuracy, KB health

V1 Foundation

Building the data layer first

V1 was a unified fleet monitoring platform with health-gradient tiles, a Query Builder replacing a 4-6 hour data team cycle with self-serve access, and drill-down from Tool to Error. Shipped to four enterprise customers.

6 mo → 2-3 days

Defect resolution

4-6 hrs → <5 min

Query time

30%

Efficiency improvement

25%

Pre-sales conversion

V1 solved data access. But it didn't solve the harder problem.

The Pivot

V1 shipped. It wasn't enough.

Real results but GlobalFoundries Dresden was still seeing 210 interrupts per week. Engineers had visibility. They were still drowning. The problem wasn't data access. It was cognitive overload.

The design question: How do you bring AI into a workflow where a wrong answer costs millions in damaged silicon without undermining the engineer's expertise? Our PM wanted full automation. I pushed back: 8 of 12 interviewees explicitly rejected autonomous decision-making. They wanted an advisor, not an autopilot. Built advisor prototypes. Customers validated. Leadership invested.

The System

Three agents, nine screens, three role views

I led the end-to-end design: research, interaction design, prototyping, and validation. The AI/ML engineers built the models; I defined what they should optimize for and how their outputs surface to users.

Three agents, intentionally separated so each agent's reasoning is independently traceable. The design principle: the AI's authority scales with its confidence, and human overrides strengthen the model rather than override it.

The full product: 9 screens, 3 workflows

Fleet Intelligence · Micron F10 Singapore

Agent activeAsk Agent

Fleet Intelligence

Onto Innovation

📊Fleet Overview

🔔Alerts47

🔬Diagnose

📋Service Requests

👥Assignment

🔧Maintenance

📦Parts & Inventory

📈Query Builder

📖Knowledge Base

📊Reports

➕Onboard Equipment

⚙Configuration

SC

Sarah Chen

FSE · Micron F10

My Assigned

3

of 217 fleet

Active Diagnoses

2

1 high · 1 medium

Fleet Alerts

47

+18 vs yesterday

Fleet Uptime

94.2%

-1.3pp this week

Fleet MTBI

127h

+11h vs Q4

Parts Pending

1

Xe Lamp · S-C17

Monitor Agent50 errors overnight · 13 tools · 3 priorities

P1

L-A01

Down · 4.2h

Trigger board lockup

P2

S-C17

Critical · 4.0h

Lamp thermal degrade

P3

L-B09

Warning · 2.1h

Turret position drift

Tool Summary · 30 of 217 need attention

L-A01

Litho-A

DOWN

L-A02

Litho-A

DOWN

S-C17

SE-C

CRITICAL

L-A17

Litho-A

ALARM

L-B09

Litho-B

WARNING

L-A04

Litho-A

CAUTION

L-B06

Litho-B

FAIR

S-C03

SE-C

GOOD

L-B22

Litho-B

HEALTHY

Root Cause Summary

1

Trigger board lockup19 alerts

2

Turret position error13 alerts

3

Xe lamp thermal degradation4 alerts

FSE Dashboard 12 navigation items across 4 workflow groups. Only 30 problem tools shown; 187 healthy tools are invisible by design.

Role-based views from one dropdown

The same platform serves all three personas through a role dropdown in the context bar. Same data, different information architecture:

Fleet Overview Fleet Manager View

ED

Ellen Dong · Fleet Manager

Zone Performance vs SLA

Litho-A

91.2%

Target: 95% · ▼ 3.8pp

Litho-B

95.4%

Target: 95% · ▲ 0.4pp

SE-C

92.8%

Target: 95% · ▼ 2.2pp

BTF-D

98.7%

Target: 95% · ▲ 3.7pp

Technician Utilization

Sarah Chen3 active

Kevin Wong1 active

Ya Ching Chang1 active

Escalation Tracker

Trigger Board Cluster L3 recommended

Turret FW Fleet-wide Awaiting approval

Fleet Manager sees zone SLA, technician utilization, 8D stage distribution, and escalation tracker no tool-level detail.

Fleet Overview Apps Engineer View

JO

James Okafor · Apps Eng

Agent Performance · 30 Days

Accuracy

94%

+3% vs prior

FSE Overrides

23

this period

Fleet Updates

12

from Learning Agent

KB Cases

1,250

total captured

Override Analysis

Root cause mismatch12 / 23

Severity overestimate6 / 23

Missing secondary cause3 / 23

Historical case mismatch2 / 23

Knowledge Base Coverage

Lamp92%

Trigger Board78%

Turret71%

Stage54%

Network38%

Apps Engineer view this is effectively an alignment monitoring dashboard. Override analysis categorizes WHY the AI fails. KB coverage shows WHERE it has gaps.

Service Requests: 8D lifecycle

Service Requests follow the 8D methodology (the manufacturing standard for root-cause problem solving). The agent pre-populates the problem statement from the Diagnose screen. D7 (Prevent) triggers the Learning Agent's fleet-wide update.

SR #151204Xe Lamp Thermal Degradation · S-C17

D1
Team

D2
Problem

D3
Contain

D4
Root Cause

D5
Corrective

D6
Validate

D7
Prevent

D8
Review

8D lifecycle tracker D2 auto-populated from Diagnose. D7 triggers Learning Agent fleet-wide update. D8 creates Knowledge Base entry.

Configuration: AI governance controls

What should be configurable versus fixed is a design position. I designed the Configuration screen to give operators control over agent behavior without requiring engineering changes:

Monitor Agent Settings

Alarm rationalization

Grouping sensitivityMedium (45-min correlation window)

Predicted alert horizon72 hours

Diagnose Agent Settings

Minimum confidence threshold65%

Cross-fab case matching

Learning Agent Settings

Fleet-wide updates

Override review period24h review

KB auto-capture

Configuration each agent's behavior is tunable. The 65% confidence threshold determines when State 3 (no hypothesis) activates. The override review period (24h) is a safety valve for the Learning Agent.

Design decision: the 65% threshold. Below it, no hypothesis shown (State 3). Above it, diagnosis with uncertainty markers. Configurable per subsystem because a lamp failure and a wafer-in-chamber situation have different risk profiles.

Alert Intelligence

From 400+ alarms to 3 priorities

The Monitor Agent uses ISA-18.2 temporal correlation (the alarm management standard) to compress raw signals:

The FSE at shift start needs one thing: "what happened overnight, what's urgent, where do I start?" The Monitor Agent answers that in 10 seconds with a structured briefing and 24-hour timeline.

Alerts Priority Order

Agent View

Trigger 4

Turret 9

Lamp 3

Focus 2

Priority Order · 19 tools

L-A01Down · 4.2hTrigger

L-A02Down · 3.8hTrigger

S-C17Critical · 4.0hLamp

L-B04Warning · 2.6hTrigger

L-B09Warning · 2.1hTurret

Monitor Agent Briefing

5

Tools down

23.4h

Downtime

12

Degraded

3

Predicted

Recommendation: Start with L-A01 (trigger board, 4.2h down). Highest production impact. Historical match at 94%.

8 Root Cause Situations

Trigger board lockup · 19 alerts

Turret position error · 13 alerts

Xe lamp thermal · 4 alerts

+ 5 more situations

Agent Accuracy: 94% over last 30 days

Alerts screen split panel with filter pills (root cause groups), priority-sorted tool list, and Monitor Agent briefing with production impact KPIs and situation summary.

Designing for Uncertainty

Five confidence states

From Toshiba repair logs, every incident follows: occurrence → response → repair start → repair complete → return to normal. I mapped these stages to five confidence states each requiring a fundamentally different UI.

State 1 · High Confidence

Shows hypothesis + evidence cascade

94%

Confidence · High · 6 of 7 cases matched

Xe Lamp Thermal Degradation

Lamp hours at 4,012h (threshold: 4,000h). Spectral intensity dropped 12% in 48h. Matches Case #1247 (97%).

Accept DiagnosisOverride

"Accept" isn't the default. The FSE must scroll through the evidence cascade first. Acceptance is informed, not automatic.

Design time: 2 days

State 3 · Insufficient Data My most important design decision

Shows NO hypothesis. Prevents anchoring bias.

⚠ Insufficient Pattern Match FSE Assessment Required

Agent has insufficient data to form a hypothesis. Closest match: 31% (below threshold). Manual assessment recommended.

Raw Signals Unfiltered

No first-out alarm identified

Closest match: 31% ●

FSE Assessment

Describe what you observe at the tool...

Submit Assessment

If the agent showed a 30% guess, the engineer would anchor to it. By showing nothing, the engineer approaches fresh. A wrong diagnosis means replacing the wrong part while the actual failure continues damaging wafers.

Design time: 2 weeks

State 4 · Data Blocked

Diagnosis blocked missing data channels

☁ Diagnosis Blocked: Missing Data

Autotest Active

FDC Stale 6h behind

Health Index Active

Metrology Disconnected

Request Data Sync Escalate to IT

Two of four data channels are unavailable. "Accept" is disabled. The FSE sees exactly which channels need restoration.

State 5 · Override

Agent was wrong structured correction feeds Learning Agent

✏ Override Agent Diagnosis

Your correction updates the KB and improves diagnoses across 47 similar tools.

Actual Root Cause

Xe Lamp Failure ▾

Resolution Applied

Replaced Xe lamp assembly and recalibrated spectral baseline.

Repair Time

42 min

Parts Used

Xe Lamp Assembly (1)

Why Was the Agent Wrong?

Root cause mismatch ▾

Submit Correction → Knowledge Base

Not "disagree" or "provide feedback." Structured fields: 15 root cause options, resolution, repair time, parts, and 7 categories for why the agent was wrong. Data the Learning Agent can act on.

"State 1 took 2 days. State 3 took 2 weeks. The edge cases aren't exceptions to the design they are the design."

Try it live:
1. You're viewing State 1 · High Confidence (94%) click Accept Diagnosis to see the Plan of Action
2. Click State 3 · Low notice: no hypothesis shown (prevents anchoring bias)
3. Click State 4 · Blocked see channel stoplights (Active/Stale/Disconnected)
4. Click State 5 · Correction the structured override form (15 root causes, 7 "why wrong" categories)
5. Click Run Autotest (right panel) watch the 12-point diagnostic run live

Diagnose Screen Interactive Prototype

5 confidence states · clickable

S-C17

Atlas II · Xe Lamp Failure Investigation

SE Zone C · Micron F10 · Assigned: Sarah Chen · SR #151204

FSE

Apps Eng

TPS

L3

Unscheduled Down

94%

Confidence · High

6 of 7 historical cases matched

Xe Lamp Thermal Degradation

Lamp at 3,890h of 4,000h rated life. SE_CLTC_TEMP rising steadily over 7 days, autotest_intensity declining. Pattern matches 6 historical cases, all resolved by lamp replacement. Expected repair: 35 to 45 min.

checklist

Plan of Action · Pre-populated from 6 matched cases

1

Pause tool · complete current wafer step

2

Replace Xe Lamp · P/N 4710-SE · 3 in stock, Micron F10 cage

3

Run 12-point Autotest · verify SE_CLTC_TEMP below 31°C

4

Run 3 QC wafers · confirm measurement within spec

5

Return to production · update Stoplight to green · close SR #151204

Evidence · First-Out Alarm + Cascade

Primary trigger identified · 7 downstream

View all 654 signals →

First-Out Alarm · Primary Trigger

SE_CLTC_TEMP

34.2°C

baseline 30.0°C

+4.2°C ▲ · 8.4σ

Crossed 3σ four days ago · Accelerating

SE_CLTC_TEMP · 7-day trendBaseline: 30.0°C

DOWNSTREAM

├─

autotest_intensity

82%

base 95%

−13pp ▼

consequence

├─

sw_log_warnings

23/day

base 2/day

+21 ▲

consequence

├─

Focus_drift

0.8µm

base 0.2µm

+0.6µm ▲

downstream

├─

Stage_Wedge_Z

0.02mm

base 0.01mm

+0.01 ▲

secondary

UNRELATED · Normal range

●

Network_Latency

12ms

base 12ms

0 ●

normal

Dashed line = baseline · FDC real-time + Autotest daily

Historical Precedent · 6 of 7 Matched

Ranked by embedding similarityinfo

97%

SR #150422 · S-C17 · Lamp replacement at 3,200h

Kevin Wong · 38 min · Mar 2, 2026 · Same tool, same failure mode

Signal

Error codes

Tool model

94%

SR #151089 · S-C06 · Lamp thermal at 3,920h

Ya Ching Chang · 42 min · Jan 2026 · Different tool, same model Atlas II

91%

SR #149066 · QATSL11 · IDE controller + lamp overheat

TPS escalation · 1,339h downtime · Feb 2019 · Toshiba Y5 · Required IDE sandwich replacement

89%

SR #130344 · QATSL05 · Halogen lamp focus error

FSE heard noise from Y-stage · 72.5h downtime · Mar 2018 · Toshiba Y5

86%

GF-275562 · AMI1400 · Pressure alarm after lamp thermal event

Rene Schmidt · WW19-24 · GF Dresden · Shielded cable replacement resolved

82%

QATSG05 · Halogen lamp QC fail

Toshiba Y5 · 78h downtime · Atlas II+ · Lamp replacement + optics cleaning

Diagnose Agent

High confidence. SE_CLTC_TEMP is the first-out alarm, all 7 downstream signals trace to lamp thermal degradation. Replaced twice before on S-C17 at similar hours. Avg resolution across 6 cases: 42 min. Not an ANALYSISENGEER software crash. Recommend lamp replacement.

Stoplight Chart · S-C17

Daily tracking · Owner: Rene Schmidt

View Fleet Stoplight →

Xe Lamp replacement

Open

SR #151204 · Pending FSE action · Opened today

D1D3D8

Auto Focus optics cal

SR #151089 · D4 confirmed · Ya Ching Chang

On track

Stage alignment

SR #150891 · D8 closed · Sarah Chen

Closed

Remote Actions

Lamp subsystem · Atlas II · S-C17

Connected to S-C17 via PLC · Tool state: Down · Safe to execute

Run Autotest

12-point diagnostic · ~15 min

Starting...0/12

Pause Tool

Complete current wafer step, then idle

Trigger Calibration

Optics + stage recal · Tool must be idle

Restart Lamp Controller

Soft restart lamp subsystem · No wafer impact

Tool Context

Lamp Hours3,890h / 4,000h

Last PMApr 10, 2026

MTBI (30d)142h

Fleet MTBI (4wk avg)123h

Last Lamp ReplaceMar 2, 2026 at 3,200h

Total SRs (Quarter)3

Data FreshnessFDC real-time

MTBI · 4-week rolling avg · S-C17 vs Fleet

FSE Notes

Document your observations

68%

Confidence · Medium

3 of 7 historical cases matched

Turret Position Calibration Drift

DFLY3174 turret position error during wafer exchange. 3 cases match but with two different root causes: mechanical wear vs. firmware. Run turret diagnostic before committing.

Evidence · First-Out Alarm + Cascade

Primary trigger: DFLY3174 turret position

First-Out Alarm · Primary Trigger

TURRET_POS_ERR

0.15°

baseline 0.02°

+0.13° ▲

DOWNSTREAM

├─

TURRET_MOTOR_I

2.1A

base 1.8A

+0.3A ▲

consequence

└─

WAFER_XFER_TIME

4.2s

base 3.8s

+0.4s ▲

secondary

Historical Precedent · 3 Matched

Two different root causes in history

73%

GF Slide 6 · DFLY3174 LP1 abort, Robot exchange

GF Dresden · WW19-24 2023 · Robot exchanged, rare issues persisted · FW installation pending

68%

GF Slide 13 · Turret Issues, FW improvements

GF Dresden · Engineering task force · Daniel F. owner · New firmware tested on AMI821 WW19

62%

GF Slide 6 · Edge top plate replacement

GF Dresden · Pending due to wrong delivered part · Mechanical root cause

Diagnose Agent

Medium confidence. GF Dresden resolved DFLY3174 two ways: firmware update (25 min) or robot exchange (180 min). Motor current suggests mechanical but FW fix at AMI821 resolved identical symptoms. Run turret diagnostic to differentiate.

Stoplight Chart · L-B09

No active POA for this tool

No open corrective actions. This is a new issue.

Remote Actions

Gather data to increase confidence

Run Turret Diagnostic

Motor current + position accuracy · ~8 min

Run Autotest

Full 12-point diagnostic · ~15 min

FSE Notes

Insufficient Pattern Match · FSE Assessment Required

Agent confidence below threshold. No hypothesis shown to prevent anchoring. Review raw signals and document your assessment.

Raw Signals · Unfiltered

No first-out alarm identified · Assess independently

IDE_VACUUM_PRESS

2.3 Pa

base 1.8 Pa

+0.5 Pa ▲

mild

CHUCK_TEMP_VAR

±0.8°C

base ±0.3°C

+0.5°C ▲

oscillating

STAGE_VIBRATION

0.04g

base 0.03g

+0.01g

within spec

Network_Latency

11ms

base 12ms

−1ms ●

normal

Weak Match · 1 Case at 31%

Below confidence threshold, shown for reference only

31%

SR #149066 · QATSL11 · IDE isolation pad contamination

TPS escalation · 707 min MTTR · Feb 2019 · Toshiba Y5 · Required L3 engineering + IDE sandwich replacement

Diagnose Agent

Insufficient data for diagnosis. Mild deviations, no pattern match. The one weak case (QATSL11) required L3 escalation and 1,339h downtime. If you suspect similar, escalate early. Run IDE Leak Test and Autotest first.

Remote Actions

Gather more data before diagnosing

Run Autotest

Full 12-point diagnostic · ~15 min

Pause Tool

Wafers in progress complete current step

Run IDE Leak Test

Vacuum integrity check · ~8 min

FSE Assessment

Your findings become the diagnosis

edit_note

Override Agent Diagnosis

Your correction updates the Knowledge Base and improves diagnoses across 47 similar tools.

What did you find?

Actual Root Cause(required)

Resolution(what fixed it, follows 8D D5 format)

Repair Time

Parts Used

Why was the agent wrong?(helps improve the model)

Note for Future FSEs(optional, becomes part of Knowledge Base)

Agent's Original Diagnosis

For reference. You are overriding this

Xe Lamp Thermal Degradation (94%)

Lamp at 3,890h of 4,000h rated life. Pattern matched 6 historical cases. Agent recommended lamp replacement.

Impact of This Correction

Tools with similar config47

Active cases affected3

Knowledge Base cases1,249 → 1,250

Stoplight ChartWill update to amber

Learning Agent

Your correction updates the diagnostic model. Future cases with this signal pattern will include your finding. At Toshiba Y5, the ANALYSISENGEER correction (44 identical fixes) trained the model to auto-resolve.

The Learning Loop

One correction improves 47 tools

In every KB system I studied, corrections are unstructured feedback. The system doesn't learn. I designed override as input.

FSE overrides
diagnosis

→

Learning Agent captures
structured correction

→

KB + thresholds
updated fleet-wide

→

Fewer overrides
over time

Real example: From patterns across overrides, the Learning Agent adjusted the lamp threshold from 4,000h to 3,800h across all 47 tools. One FSE's correction improved preventive maintenance for the entire fleet.

Guardrails: Three safeguards prevent bad corrections from cascading fleet-wide: concordance thresholds, configurable staging windows, and contradiction detection.

Query Builder

Query Builder V2: natural language meets structured editing

In V1, engineers manually constructed boolean queries across four data channels (Autotest, FDC, Health Index, Metrology). In V2, the engineer types a natural-language question. The agent translates it into structured, editable field chips each parameter individually adjustable. A "View SQL" toggle shows the raw query. One sentence replaces four manual conditions.

Try it live:
1. Click Run Query watch the agent reason through your question step by step
2. Click View SQL › see the raw query the agent generated
3. Expand a row (click › on L-B09) drill into readings and sparkline chart
4. Switch to Chart tab SPC trace, bar comparison, and configurable chart playground
5. Toggle V1: Boolean Builder tab see what the same query looked like before AI

Query Builder

4 data channels · 12,847 signals · 217 tools

Query Agent

● Reading query... identifying "Litho tools", "turret position drift", "above 0.05°", "14 days"

● Selecting channels: FDC (turret position), Health Index (tool model), Autotest (calibration)

● Building query... 9 tools matched across Litho-A and Litho-B

9 of 217 tools · 14d

save download

more_vert

	Tool ↕	Zone ▾	Model ▾	Current (°) ↕	14d Ago	Δ Rate ↕	Threshold	Last Cal ↕	Health Idx ↕	Status ▾
›	L-B09	Litho-B	DFLY-400	0.150	0.031	+0.009/d	0.050	Apr 2	34%	Degraded
Last 5 readings Apr 22 5:48 AM · 0.150° Apr 21 5:30 AM · 0.141° Apr 20 5:15 AM · 0.132° Apr 19 5:22 AM · 0.123° Apr 18 5:18 AM · 0.115° Related signals STAGE_VIBRATION · 0.8g (normal) WAFER_CHUCK_TEMP · 23.1°C (normal) MOTOR_CURRENT · 2.4A (normal) TURRET_POS_ERR · L-B09 · 14 days Progressive drift from 0.031° to 0.150° (+384%)
›	L-A03	Litho-A	DFLY-400	0.120	0.028	+0.007/d	0.050	Mar 28	41%	Degraded
Last 5 readings Apr 22 5:47 AM · 0.120° Apr 21 5:32 AM · 0.113° Apr 20 5:18 AM · 0.106° Apr 19 5:25 AM · 0.099° Apr 18 5:20 AM · 0.092° Related signals STAGE_VIBRATION · 0.6g (normal) WAFER_CHUCK_TEMP · 23.0°C (normal) TURRET_POS_ERR · L-A03 · 14 days Progressive drift from 0.028° to 0.120° (+329%)
›	L-A14	Litho-A	DFLY-400	0.100	0.024	+0.005/d	0.050	Apr 5	48%	Degraded
Last 5 readings Apr 22 5:46 AM · 0.100° Apr 21 5:28 AM · 0.095° Apr 20 5:12 AM · 0.090° Apr 19 5:19 AM · 0.085° Apr 18 5:16 AM · 0.080° Related signals STAGE_VIBRATION · 0.7g (normal) WAFER_CHUCK_TEMP · 22.9°C (normal) TURRET_POS_ERR · L-A14 · 14 days Progressive drift from 0.024° to 0.100° (+317%)
	L-B06	Litho-B	DFLY-400	0.080	0.022	+0.004/d	0.050	Apr 1	55%	Warning
	L-A07	Litho-A	DFLY-400	0.072	0.020	+0.004/d	0.050	Apr 8	58%	Warning
	L-B02	Litho-B	DFLY-400	0.065	0.018	+0.003/d	0.050	Apr 10	62%	Warning
	L-A09	Litho-A	DFLY-400	0.061	0.019	+0.003/d	0.050	Apr 6	64%	Warning
	L-B11	Litho-B	DFLY-400	0.058	0.016	+0.003/d	0.050	Apr 12	66%	Warning
	L-A18	Litho-A	DFLY-400	0.052	0.015	+0.003/d	0.050	Apr 14	69%	Threshold

Showing 1 to 9 of 9 results

Rows per page:

Results

That $1.2M cost of inaction this is the response.

Metric	Before		After
Defect resolution	6 months	→	2-3 days
Triage time	45 minutes	→	Under 2 minutes
GF Dresden interrupts	210 / week	→	50 / week
Efficiency	Baseline	→	30% improvement
Agent accuracy	N/A	→	90%+ top-1 precision
Alarm fatigue	85% ignore rate	→	Eliminated
Pre-sales impact		→	25% conversion · 4 customers

Tested with 8 FSEs and 2 PMs 80% positive. Key refinement: override path streamlined to be accessible from any state.

Sarah's Monday morning now starts with 3 priorities instead of 400+ alarms. She resolves two before walking to the fab floor.

Methodology: Top-1 precision against 200+ resolved SRs. We tracked precision over recall because a withheld diagnosis (State 3) is a designed outcome, not a failure.

Failure Modes

What happens when the AI is wrong

Designing for failure shaped more of this product than designing for success. Each failure mode was stress-tested during shadow deployment before any recommendation surfaced to FSEs.

Diagnose Agent: confidently wrong at 94%

What if it shows high confidence for the wrong root cause?

The evidence cascade shows first-out alarm, downstream signals, and match percentages. The confidence score is context, not a command. Override is always accessible.

Cross-agent failure: cascading errors

What if Monitor groups incorrectly, causing Diagnose to match the wrong pattern?

The three-agent separation makes this traceable. Each agent logs independently; the Apps Engineer can audit the full chain.

Two additional failure modes (Monitor suppression, Learning propagation) were stress-tested with corresponding detection metrics.

Reflections

"Designing for AI failure is harder than designing for AI success."

What I'd do differently

Suppressed alarm transparency

400+ to 47, but the FSE has no visibility into what was filtered. I'd add a "353 alarms rationalized" view. Transparency about what the AI removed is as critical as what it shows.

Scalability beyond 217 tools

At 5,000+ tools across 12 fabs, the flat tile grid breaks down. I'd move to a fab, zone, bay hierarchy with aggregated health scores.

Accessibility in a fab environment

Designed and validated for cleanroom constraints: WCAG AA contrast throughout, color-blind safe encoding (text labels and directional arrows alongside color, never color alone), 44px touch targets for gloved interaction, ARIA semantics validated with the accessibility team, and monospace signal names sized for arm's length readability.

Design principles

Trust through transparency

Five states acknowledge the agent isn't always right. Override gives FSE authority. The agent recommends never commands.

Override as input, not feedback

Structured corrections enable retraining. A comment field gives text. Structured fields give data the Learning Agent can act on.

AI features feel native

Agent cards use identical styling to every other card. No glowing borders. The AI is a tool, not a feature demo.

The edge cases are the design

State 3 prevents anchoring. State 4 prevents premature commitment. State 5 captures knowledge. The happy path is obvious the edge cases are where decisions matter.