Risk matrix
| Risk ID | Risk | Failure pattern | Required control |
|---|---|---|---|
| ACJ-001 | Context collapse | The classifier cannot distinguish threat from quote, sarcasm, research, or fiction. | Require context windows, source preservation, and review before sanction. |
| ACJ-002 | Dialect bias | Speech from minority dialects receives elevated toxicity or risk labels. | Measure disparate false-positive rates and tune thresholds by documented context. |
| ACJ-003 | Person judgment | Event labels become character judgments. | Ban morality, loyalty, trustworthiness, stability, and future-danger scores. |
| ACJ-004 | Reporting overcapture | Ambiguous conversations become external reports. | Escalate only narrow statutory or emergency triggers. |
| ACJ-005 | Source mutation | Original conversation is rewritten or summarized into institutional language. | Immutable source plus derivative variance record. |
| ACJ-006 | Review theater | Appeal cannot inspect or change the decision. | Independent human review and restoration remedy. |
