Documentation Audit Engine · Air-Gapped · Evidence-Grade

Catch audit risks before auditors do.

STM checks your clinical notes against treatment plans, catches copy-paste, flags unusual changes, and generates evidence you can hand to an auditor—without sending patient data to the cloud.

Deployment
Docker container / VPC / on-prem
Outputs
HTML evidence reports + JSON metrics
What it checks
Protocol fidelity · Copy-paste · Drift · Stagnation
Documentation Trend Alert
Deploy inside your VPC
# No external calls. No data egress.
docker run -p 8000:8000 \
  -v ./data:/app/data \
  stm-engine:latest

What STM Catches

Six checks that surface audit risks, training needs, and documentation problems—before they become liabilities.

Protocol Drift

Staff gradually modify procedures until what's delivered doesn't match the written BIP/ISP. STM compares notes against authorized plans and flags divergence.

  • Unauthorized intervention strategies
  • Prompt hierarchy deviations
  • Billing risk from unapproved procedures

Copy-Paste Detection

Identical notes across multiple sessions are an audit red flag. STM detects templated documentation and suspiciously similar text.

  • Clone detection across sessions
  • Templated note patterns
  • Documentation integrity alerts

Stagnation

Notes that never change suggest either copy-paste or lack of clinical progress. STM flags documentation that shows no meaningful variation over time.

  • Zero-change alerts
  • Maintenance phase justification
  • Distinguish stability from laziness

RBT-BCBA Alignment

Compare what RBTs document against what BCBAs supervise. Misalignment indicates training gaps or communication breakdowns.

  • Vocabulary mismatch detection
  • Procedure interpretation gaps
  • Supervision quality signals

Unusual Changes

Sudden shifts in documentation content may indicate clinical events, staff changes, or problems. STM flags sessions that deviate significantly from baseline.

  • New behavior emergence
  • Crisis indicators
  • Staff transition artifacts

AI Summary Validation

If you use AI to help write notes, STM can verify summaries against source content—catching hallucinations or omissions before they're signed.

  • Source-to-summary comparison
  • Missing detail detection
  • Hallucination flags

Why Not Just Search Keywords?

Keyword search misses context. Sentiment analysis misses function. LLMs hallucinate and require data egress. STM measures what actually changes in your documentation over time.

The Data Problem

Clinical signal lives in session notes. "Client was upset" and "emotional dysregulation following tangible denial" describe the same event—but keyword search treats them as unrelated.

  • Context-dependent phrasing
  • Staff vocabulary differences
  • Real change vs. noise

The Compliance Problem

Sending PHI to OpenAI or Anthropic creates compliance headaches. STM runs entirely on-prem with no external API calls.

  • Air-gapped deployment
  • Zero vendor dependencies
  • Configurable retention

The Audit Problem

Auditors look for stagnation, copy-paste, and protocol drift. If you can't prove documentation integrity, you lose revenue.

  • Evidence-grade artifacts
  • Reproducible analysis
  • Defensible reports

How It Works

Export notes from your EHR. Run STM. Get a report showing what to review.

Your Notes
"RBT note: Client was yelling, so I told him to use his inside voice."
STM Engine
Compare against BIP · Check for copy-paste · Flag anomalies
Audit Report
Protocol mismatch Billing risk: high 3 sessions to review

Protocol Fidelity

Compares each session note against the authorized treatment plan. Flags when documentation describes procedures that weren't approved.

Change Detection

Measures how documentation changes over time. High change rate may indicate crisis or breakthrough. Zero change indicates stagnation or copy-paste.

Staff Alignment

Compares RBT session notes against BCBA supervision notes. Flags gaps that indicate training needs or miscommunication.

Technical Architecture

For engineering teams: STM uses sentence embeddings to represent clinical text as vectors, then measures distances and trajectories over time. No LLM generation—just deterministic, reproducible analytics.

Core Components

  • Embedding model: Local sentence-transformers (MiniLM, MPNet) or API
  • Dual resolution: Session-level + token-level embeddings
  • Drift metrics: Cosine distance, trajectory velocity, acceleration
  • Alignment: Hungarian assignment for token-level comparison
  • Protocol anchors: Embed BIP/ISP text as reference vectors

Deployment

  • Container: Single Docker image, ~2GB with model
  • Inputs: CSV/JSON export from any EHR
  • Outputs: HTML reports + JSON metrics + manifest
  • Storage: Mounted volume or object store

API

POST /v1/analysis/audit
POST /v1/analysis/alignment
POST /v1/analysis/drift
GET  /v1/artifacts/{id}

Job-style endpoints. Returns artifact URLs + metrics JSON.

Example Request/Response

// POST /v1/analysis/audit
{
  "client_id": "3920-AC",
  "protocol": {
    "text": "Implement extinction for vocal stereotypy. Do not provide verbal attention.",
    "source": "BIP v2.1"
  },
  "sessions": [
    {
      "session_id": "sess_892",
      "date": "2024-12-20",
      "text": "Client was yelling. I told him to use his inside voice."
    }
  ]
}

// Response
{
  "alerts": [
    {
      "type": "protocol_mismatch",
      "severity": "high",
      "session": "sess_892",
      "finding": "Note describes verbal attention during extinction protocol",
      "protocol_says": "Do not provide verbal attention",
      "note_says": "I told him to use his inside voice",
      "billing_risk": "Session may not be billable as written"
    }
  ],
  "artifacts": {
    "report_url": "/artifacts/3920-AC/audit_report.html"
  }
}
Design Principle
Evidence presenter, not decision maker. STM surfaces what to review. Humans decide what to do about it.

Air-Gapped. On-Prem. Production-Ready.

Deploy inside your infrastructure. No data leaves your environment. No vendor API calls.

Deployment Bash
# Deploy in your VPC. No external calls.
docker run -p 8000:8000 \
  -v ./data:/app/data \
  stm-engine:latest

# Run an audit report
curl -X POST http://localhost:8000/v1/analysis/audit \
  -H "Content-Type: application/json" \
  -d @./data/sessions.json

Single container + mounted volume. Wire it to your EHR export pipeline.

Local Processing

Embedding model runs locally. No OpenAI. No Anthropic. No data egress.

offline VPC/on-prem PHI-safe

Standard Inputs

Accepts CSV/JSON exports from any EHR or practice management system.

CSV JSON any EHR

Evidence Outputs

HTML reports for clinical review. JSON metrics for dashboards. Manifest for reproducibility.

HTML JSON audit trail

Configurable Retention

Set retention policies. Delete on schedule. Meet your compliance requirements.

retention rules auto-delete HIPAA

Sample Reports

STM outputs are designed for QA teams, clinical directors, and auditors. Show the problem. Preserve the evidence.

protocol_fidelity_report.html Protocol Mismatch
BIP Says "Implement extinction for attention-seeking behavior; do not reprimand."
RBT Wrote "Client was yelling, so I told him to use his inside voice."
Red = documented procedure contradicts authorized protocol.
Protocol fidelity report — Shows where session documentation diverges from the treatment plan. View Sample →
drift_report.html 3 Sessions to Review
Copy-paste detected
Sessions 31-33 contain identical text across 3 consecutive days.
Clone score0.0
Sessions3
RiskHigh
Audit alert report — Prioritized list of sessions that need review before an audit. View Sample →
See it on your data
I'll run STM on a sample of your documentation and deliver a full audit report.
Request Pilot

Limitations

STM is good at some things and bad at others. Here's what to know.

Negation is hard

"Did not hit" and "hit" can look similar to embedding models. STM mitigates this with preprocessing and by keeping evidence visible for human review.

Vocabulary varies

Different staff use different words for the same thing. STM shows where vocabulary diverges—but humans decide if it matters clinically.

Context matters

Notes can change because the setting changed, not because the client changed. STM supports context conditioning but can't read minds.

Not a decision maker

STM surfaces what to look at. It doesn't tell you what to do. Human judgment remains the decision boundary.