Sprint Four: Plan and Activate

Return & Assessment

Brent returns after the gap period. Assess what transferred: the independent pilot Septapod ran without Brent, check-in call themes, champion network activity, and capability transfer evidence. The gap period is a deliberate design feature, not dead time.

Module info Weeks 1-2 11 hrs facilitator 1.5 hrs CEO 0.5 hrs per exec

When: Weeks 1-2 of Sprint Four. Assessment draws on evidence from the gap period. Brent returns with observations from regular check-in calls and a point of view on what transferred.

Time per person

Facilitator (Brent) 11 hrs Return assessment facilitation, independent pilot review, capability transfer evaluation, outcome metrics comparison

CEO 1.5 hrs Assessment session (60 min), gap-period debrief (30 min)

Each senior exec 0.5 hrs Champion network assessment, gap-period observations

Internal owner 3 hrs Independent pilot debrief, capability transfer scorecard, champion network assessment

What actually happens

Brent returns and assesses what the gap period produced. The independent pilot Septapod ran without Brent is evaluated against the same criteria used for the Sprint Three pilots. Outcome metrics (rework rate, cycle time, error-catch rate) are compared between the two Sprint Three pilots and the independent pilot. Check-in call themes are reviewed for patterns. The capability transfer scorecard measures whether the distributed governance model, the per-pilot method, the signal watch-list, and the cross-functional AI responsibilities functioned without Brent.

Through-line

Generates: Independent pilot evaluation. Outcome metrics comparison (two pilots side by side). Check-in call theme inventory. Champion network assessment. Capability transfer scorecard with five indicators.
Value: Makes the gap period visible as evidence, not as a gap. Reveals whether the capability transfer model worked: did Septapod use the tools Brent taught them? Where did it break down? This is the data that makes Sprint Four's scaling decisions evidence-based.
How Septapod uses it: The assessment drives what Sprint Four prioritizes. Strong transfer means Sprint Four formalizes and scales. Weak transfer means Sprint Four diagnoses what broke and adjusts the approach before scaling.
Next step uses: Step 2 (Scenario Planning) uses pilot evidence and outcome metrics to ground scaling decisions. Step 3 (Annual AI Plan) uses the capability transfer scorecard to shape governance model updates.

Sprint Three Outputs (carried forward)

Data from Sprint Three's pilots and synthesis, loaded from your browser. Read-only.

No Sprint Three data found. Complete Sprint Three first, or continue without carried-forward outputs.

Sprint Three Pilot Summary

The two pilots run during Sprint Three, one co-led with Brent and one advised. Read-only, loaded from Sprint Three data. Displayed alongside the independent pilot for comparison.

No Sprint Three pilot data found. Complete Sprint Three's Operational Pilots first.

Independent Pilot (Gap Period)

Evidence from the pilot Septapod ran independently during the gap period. Same field structure as Sprint Three's pilot template for direct comparability. This is the test of capability transfer: did Septapod use the per-pilot method (Anthropic patterns, eval-driven loop, PAIR design) without Brent?

Use case Architecture pattern used

From Anthropic's "Building Effective Agents." Same six patterns used in Sprint Three. Which pattern did the independent pilot use? If the team did not formally select one, choose "Unknown" at the bottom.

Prompt Chaining

Sequential steps where each step's output feeds the next as input. The workflow decomposes a complex task into a fixed sequence of simpler LLM calls, with optional validation gates between steps.

When to use: Tasks that naturally decompose into ordered steps with clear handoff points. Works well when you can verify intermediate outputs before proceeding.

Strengths: Predictable flow. Each step can be tuned independently. Easy to debug because failures localize to a specific step. Low risk of runaway behavior.

Limitations: Rigid. Cannot adapt if one step produces unexpected output that requires a different path. Latency accumulates across steps.

CU example: New account opening workflow where each step (identity verification, product eligibility check, disclosures generation, account provisioning) feeds the next with validation between.

Routing

Classifies incoming input and directs it to the appropriate specialized handler. A single classifier decides which downstream path to take, then hands off to a purpose-built prompt or tool for that category.

When to use: When inputs vary widely and different types need fundamentally different handling. The classifier is the critical component.

Strengths: Each handler can be optimized for its specific case. Keeps individual prompts focused. Easy to add new categories without changing existing handlers.

Limitations: Only as good as the classifier. Misrouted inputs get poor results. Requires enough training examples to build an accurate classifier.

CU example: Member inquiry routing where the classifier categorizes incoming requests (account question, loan inquiry, dispute, general info) and directs each to a handler with the right context and tools.

Parallelization

Runs multiple LLM operations simultaneously, then aggregates results. Two variants: sectioning (splitting a task into independent parallel subtasks) and voting (running the same task multiple times for consensus).

When to use: When subtasks are independent and can run concurrently, or when you need reliability through redundancy. Speeds up workflows where sequential processing is unnecessarily slow.

Strengths: Faster than sequential for independent work. Voting variant improves accuracy on judgment calls. Scales naturally.

Limitations: Only works when subtasks are genuinely independent. Aggregation logic can be complex. Higher compute cost than sequential.

CU example: Quarterly compliance review where policy checks across lending, BSA/AML, and fair lending run in parallel rather than sequentially, with results aggregated into one report.

Orchestrator-Worker

A coordinator agent breaks a complex task into subtasks, delegates each to specialized worker agents, then synthesizes the results. The orchestrator decides what work to do and in what order.

When to use: Complex tasks where the required subtasks are not predictable in advance. The orchestrator adapts the plan based on intermediate results.

Strengths: Flexible. Handles tasks where the full scope is not known at the start. Workers can be specialized for different domains.

Limitations: More complex to build and debug. The orchestrator is a single point of failure. Requires clear contracts between orchestrator and workers.

CU example: Loan application processing where an orchestrator manages income verification, credit analysis, collateral assessment, and compliance checks, adapting the workflow based on what each worker finds.

Evaluator-Optimizer

One LLM generates output, another evaluates it against criteria, and the generator refines based on feedback. The loop continues until quality thresholds are met or iteration limits are reached.

When to use: When output quality matters more than speed and clear evaluation criteria exist. The evaluator needs well-defined standards to judge against.

Strengths: Produces higher-quality output than single-pass generation. The evaluator catches errors the generator misses. Self-improving within a session.

Limitations: Slower and more expensive. Requires clear evaluation criteria. Can get stuck in loops if criteria are ambiguous.

CU example: Compliance document review where a generator drafts regulatory responses and an evaluator checks against NCUA examination standards, iterating until the response addresses every finding.

Autonomous Agents

Self-directed agents that plan their own approach, execute actions using tools, observe results, and adjust. They operate in an open-ended loop with access to external tools and data sources.

When to use: Open-ended problems where the solution path is not known in advance and the agent needs to explore. Requires strong guardrails and monitoring.

Strengths: Can handle genuinely novel situations. Adapts to unexpected findings. Closest to human-like problem solving.

Limitations: Hardest to control. Risk of compounding errors. Highest cost. Requires robust safety boundaries, logging, and human oversight. Not appropriate for early pilots without strong governance.

CU example: Research assistant for strategic planning that autonomously gathers peer CU data, NCUA call report trends, and market analysis, then synthesizes recommendations. Appropriate only after governance is mature.

Unknown / Not Formally Selected

The team did not formally select an architecture pattern from the Anthropic framework. The pilot was built without reference to the pattern catalog. This is diagnostic evidence about whether the per-pilot method transferred.

Eval outcomes Iterations summary Scale / retire decision

Scale

Evidence supports broader deployment. Rework rate below 15% baseline, cycle time improved, error-catch rate stable. Ready to expand to additional teams or workflows.

Refine

Shows promise but needs more iteration before scaling. Specific improvements identified. Continue in current scope with targeted changes.

Retire

Evidence does not support continued investment. Document what was learned and redirect resources to higher-value opportunities. Retirement is a success of the evaluation method.

No Decision

No formal scale/retire decision was made during the gap period. The pilot may still be running, stalled, or abandoned without a documented conclusion.

What surprised them

Scenario probes (independent pilot)

What would have to change for this pilot to fail? What external shift would make this use case irrelevant?

If the independent pilot did not happen

The gap period was designed to test whether Septapod could run a pilot independently. If no independent pilot was attempted, that is itself significant evidence about the capability transfer model. Sprint Four proceeds differently: instead of comparing two pilots side by side, the assessment focuses on why the pilot did not happen and what Sprint Four needs to change.

Why the independent pilot did not happen

The independent pilot did not happen during the gap period

Outcome Metrics (Superadditive)

Three metrics from Superadditive's "Metrics and Meat Shields." Compared side by side for the two Sprint Three pilots and the independent gap-period pilot. Do not measure token spend, prompts per day, tool usage hours, or percent of AI-generated content. Those reward the wrong behavior.

Rework Rate

Percentage of AI-assisted work that requires human correction or redo. Industry baseline is approximately 15%. Above this threshold, the AI is creating net additional work rather than reducing it.

Co-led pilot

Independent pilot

Cycle Time

How long the workflow takes with AI compared to before. Measured as before/after comparison. A pilot that increases cycle time has not yet found its shape.

Co-led pilot

Independent pilot

Error-Catch Rate

Percentage of AI errors that human reviewers catch before the output leaves the workflow. A rate of zero means reviewers stopped looking, not that the AI stopped making errors. Track this to detect automation complacency.

Co-led pilot

Independent pilot

Check-In Call Summary

Themes from the regular check-in calls during the gap period. Not a transcript. Patterns: what came up repeatedly, what escalated, what was never mentioned. Each row captures a theme, not a meeting.

Theme 1

Theme name Frequency Notes

Champion Network Assessment

3-5 people across departments who provide peer-to-peer support for AI adoption. Assess gap-period activity: who stayed active, who went quiet, which departments engaged. Maps to the 35% staff-function resistance finding.

No champion network data carried forward from Sprint Three. Champions may be a new Sprint Four activity.

Champion 1

Name

Department

Activity during gap Type of peer support provided

Champion network summary

Capability Transfer Scorecard

Five indicators of whether Septapod built internal capability during the gap period. Each indicator is yes/no/partial with space for evidence. The scorecard tests the core premise of the engagement: that Septapod can operate independently after Sprint Four.

1. Did the team use the per-pilot method?

Anthropic agent patterns, eval-driven loop, PAIR human-side design. Did they follow the method or improvise?

Yes

Followed the full method: selected an Anthropic architecture pattern, used eval-driven development loop, applied PAIR guidelines for human-side design.

Partial

Some elements used, but not the full sequence. Common gaps: skipped evals, chose architecture by instinct rather than framework, or omitted PAIR design review.

No

Improvised without reference to the method. Indicates the per-pilot method transfer did not take hold during Sprint Three.

2. Did the governance model function without Brent?

The distributed accountability model (four slots: pilot oversight, vendor evaluation, annual plan refresh, board reporting). Did the named people carry their accountabilities?

Yes

All four governance slots functioned: pilot oversight, vendor evaluation, annual plan refresh, and board reporting were carried by their named accountable people.

Partial

Some slots functioned, others did not. Identify which slots worked and which went vacant. Common pattern: pilot oversight worked but board reporting lapsed.

No

The governance model did not function without Brent. Accountabilities were not carried. Indicates the model was facilitator-dependent, not internally owned.

3. Was the signal watch-list used?

Did the people assigned to watch specific signals (market shifts, regulatory changes, technology capabilities) check them on cadence? Were any trigger conditions met during the gap?

Yes

Named watchers monitored their assigned signals on the established cadence. Triggered signals were flagged and discussed. The watch-list operated as designed.

Partial

Some signals were monitored, others were neglected. Common pattern: watchers checked early then stopped, or watched easy signals while ignoring harder ones.

No

The watch-list was not used during the gap period. No signals were monitored on cadence. The list existed on paper but not in practice.

4. Did shadow-AI behavior change from the Sprint One baseline?

Sprint One's shadow-AI audit (Step 3) established a baseline. Compare current state to that baseline. Improvement means the employee communication cadence and sanctioned-use policies took hold.

Sprint One shadow-AI baseline will load automatically.

Yes, meaningful change

Shadow-AI usage decreased measurably from the Sprint One baseline. Employee communication cadence and sanctioned-use policies took hold. People shifted to approved tools.

Some change

Directional improvement but not a clean shift. Some departments adopted sanctioned tools; others continued shadow usage. Uneven uptake across the organization.

No change

Shadow-AI behavior is unchanged from the Sprint One baseline. The employee communication cadence and policies did not alter behavior. Root cause needs investigation.

5. Did the champion network provide peer-to-peer support?

Champions are the distributed support layer. Did they help colleagues adopt AI tools, answer questions, and reduce resistance?

Yes

Champions actively helped colleagues adopt AI tools, answered questions, and reduced resistance. Peer-to-peer support was visible across departments.

Partial

Some champions provided support, others went quiet. Support was concentrated in certain departments or around specific tools rather than broad adoption.

No

Champions did not provide meaningful peer support during the gap. The network existed in name but not in practice.

N/A

No champion network was established during Sprint Three. This indicator does not apply. Sprint Four may need to build this from scratch.

Overall capability transfer assessment

Scenario Planning & Scaling

Build coherent futures from scenario probes accumulated during Sprint Three. Classify pilots by coupling tier. Decide what scales and what gets retired, with evidence thresholds and the 6-8 week stability constraint.

Module info Weeks 2-4 11 hrs facilitator 1.5 hrs CEO 1.5 hrs per exec 2 hrs internal owner

When: Weeks 2-4. Builds on the assessment from Step 1. Scenario planning and scaling decisions happen in facilitated sessions with the CEO and the executive team. Scenario probes accumulated during Sprint Three provide the raw material.

Time per person

Facilitator (Brent) 11 hrs Scenario facilitation, scaling framework sessions, coupling tier classification, what-scales / what-retires decisions

CEO 1.5 hrs Scenario exercise participation, scaling decision sign-off

Each senior exec 1.5 hrs Scenario construction, 20-min portfolio trap scan, coupling tier input, scaling decisions for their domains

Internal owner 2 hrs Probe inventory curation, scenario session prep with Brent, scaling decision documentation

Each board member Not involved this step Scenario planning and scaling are internal team work. Board engagement returns in Step 3 with the Annual AI Plan co-creation.

What actually happens

Brent facilitates a scenario exercise using probes accumulated from Sprint Three pilots and the independent pilot. The team constructs 2-3 coherent futures and stress-tests current AI commitments against them. Before scaling, the team runs a 20-minute portfolio trap scan to distinguish weak AI fit from deeper operating problems, shared costs, bottlenecks, and work that survives only because nobody formally ended it. Each pilot is classified by coupling tier (low/medium/high) to determine governance requirements for scaling. Scaling decisions are made with evidence thresholds and the 6-8 week stability constraint. What gets retired is documented with rationale.

Through-line

Generates: Scenario probe inventory. 2-3 constructed scenarios. Portfolio trap scan with evidence and corrective action. Per-pilot coupling tier classification. Scaling decisions with evidence and governance requirements. What-scales and what-retires lists.
Value: Scaling decisions are grounded in evidence and scenario-tested, not optimistic projections. The trap scan prevents the credit union from scaling a tool that masks an integration problem, feeds a bottleneck, or shifts costs into IT, Compliance, and frontline review. Coupling tiers determine governance intensity: low-coupling workflows scale with logging, high-coupling workflows need human-in-the-loop. The stability constraint protects staff from change fatigue.
How Septapod uses it: The what-scales list becomes the operational roadmap for the next 6-12 months. The what-retires list frees up capacity and attention. Trap responses name root-cause work and shared-cost owners that sit outside the AI portfolio. Coupling tiers determine which governance requirements apply to each scaled workflow.
Next step uses: Step 3 (Annual AI Plan) incorporates scaling decisions, scenario stress-test results, trap responses, and coupling tier classifications into the formal plan. The governance model v2.0 reflects coupling-tier governance requirements and ownership of shared costs.

Scenario Probe Inventory

Probes accumulated from Sprint Three pilots and the independent pilot. Each probe captures a condition under which current assumptions break. Probes are the raw material for constructing coherent scenarios.

No scenario probes found in Sprint Three data. Add probes manually below.

Additional probes from the gap period

Identity-Based Scenario Framework (Tighe)

Self-contained reference for running the scenario exercise with the executive team. This framework connects AI strategy decisions to organizational identity, which matters for a mission-driven credit union where values constrain what AI can do.

Framework Overview

Identity-based scenario planning starts from the premise that an organization's response to external change depends on who it believes itself to be. For Septapod, the cooperative charter and the member-ownership model are identity anchors that constrain AI strategy in ways that a conventional bank's identity does not.

Four Scenario Dimensions

Technology Trajectory

How does the AI capability landscape change? What becomes possible in 12-24 months that is not possible today? Where does the cost curve move?

Regulatory Environment

How do NCUA, state regulators, and federal AI policy (including EU AI Act Article 14, August 2, 2026) change the compliance requirements for AI in financial services?

Competitive Landscape

What do peer credit unions, fintechs, and large banks do with AI? Where does competitive pressure come from? What does member expectation look like in 2-3 years?

Member Behavior

How does member interaction with AI-powered services evolve? Where does trust erode or deepen? How does the membership profile shape acceptable AI use?

Facilitator Guidance

Run the exercise in a 90-minute facilitated session with the CEO and the executive team. Start with the probe inventory from Sprint Three. Ask each exec to pick the two probes most likely to materialize in 12-18 months. Cluster the responses. Build 2-3 coherent futures from the clusters. For each future, ask: what does Septapod's current AI plan look like in that world? Where does it hold? Where does it break?

Scenario Construction Workspace

Build 2-3 coherent futures from the accumulated probes. Each scenario describes what changes across the four dimensions and what that means for Septapod's AI commitments.

Scenario 1

Scenario name

Technology changes

Regulatory changes

Competitive landscape

Member behavior

What this means for Septapod's AI commitments

Portfolio Trap Review

Activity: 20 minutes inside the existing scaling session. Prototype representation: an eight-trap scan with one evidence note and one corrective decision.

AI can help a team escape an operating trap, or let the organization deepen it faster. Review the portfolio before deciding what scales. Check only traps with concrete evidence from the pilots, task mapping, vendor audit, or operating data.

Build trap

Output volume increased while user contact, verification, or learning signal fell.

Premature scaling

The organization is expanding before the workflow, review practice, or governance has stabilized.

Limits to growth

More AI work feeds a bottleneck such as core integration, data quality, review capacity, or decision authority.

Success trap

A visible winner absorbs attention and investment after its learning value has flattened.

Permanent refinance

Modernization continues indefinitely without a named operational benefit or finish condition.

Zombie portfolio

Pilots, licenses, or integrations remain alive without an owner, decision, use, or evidence of value.

Commons trap

A local gain creates shared costs for IT, Compliance, data teams, frontline staff, or members.

Shifting the burden

AI masks the root problem instead of resolving the workflow, policy, staffing, or integration issue underneath it.

Evidence for the traps checked Corrective decision before scaling

Why this is here

Source: Adapted from John Cutler's "Learning Faster Than the Hype" presentation.

Use: The scan distinguishes AI opportunities from integration debt, buy-then-underuse patterns, local efficiency gains that create shared costs, and tools that mask a more important operating or member problem.

Time discipline: Twenty minutes inside the existing scaling session. It sharpens the scale-or-retire decision rather than creating a separate review meeting.

Scaling Decision Framework

Per-pilot scaling assessment with coupling tier classification. Evidence thresholds are explicit: scaling requires pilot evidence, not optimism.

Coupling Tier Reference (Superadditive, "The Undo Button")

Low

Action affects one record. Other systems read it on their own schedule. Scales with logging.

Diagnostic: What else happens? Nothing immediate. Who sees results first? Only the user. Detection vs. cascade? Detection is instant, no cascade.

Medium

Action triggers downstream behavior within minutes. Agent recommends, human approves most cases.

Diagnostic: What else happens? Downstream actions trigger. Who sees results first? Multiple people or systems. Detection vs. cascade? Detection within minutes, cascade contained.

High

Real-time cascades before internal detection. Agent surfaces the decision but does not make it.

Diagnostic: What else happens? Immediate cascades. Who sees results first? External parties. Detection vs. cascade? Cascade outpaces detection.

People absorb substantial redesign with 6-8 weeks of stability between changes. Sequence expansions with enough stability between each.

Brent-Led Pilot

Pilot data loads from Sprint Three.

Outcome metrics load from Step 1.

Coupling tier (Superadditive, "The Undo Button")

Low

Output is reviewed and edited before it reaches anyone outside the team. If this AI disappeared tomorrow, the workflow reverts to manual with minor inconvenience. Governance implied: Team-level oversight, quarterly review.

Medium

Output feeds into processes that reach members or regulators. Human review exists, but volume or speed makes thorough review of every output impractical. Removing the AI requires process redesign. Governance implied: Manager-level oversight, monthly review, error monitoring dashboard.

High

AI is in the critical path of member-facing or compliance-critical workflows. Removing it causes operational disruption. Output is difficult to fully review before reaching its audience. Governance implied: Executive-level oversight, continuous monitoring, compliance review, documented fallback plan.

Scaling decision

Scale

Evidence supports broader deployment. Rework rate below baseline, cycle time improved, error-catch rate stable. Deploy to additional teams or workflows.

Refine

Shows promise but needs more iteration. Specific improvements identified. Continue in current scope with targeted changes before expanding.

Retire

Evidence does not support continued investment. Document what was learned and redirect resources. Retirement is a success of the evaluation method.

Evidence summary

Independent Pilot

Independent pilot data loads from Step 1.

Coupling tier (Superadditive, "The Undo Button")

Low

Medium

High

Scaling decision

Scale

Evidence supports broader deployment. Rework rate below baseline, cycle time improved, error-catch rate stable. Deploy to additional teams or workflows.

Refine

Shows promise but needs more iteration. Specific improvements identified. Continue in current scope with targeted changes before expanding.

Retire

Evidence does not support continued investment. Document what was learned and redirect resources. Retirement is a success of the evaluation method.

Evidence summary

What Scales

Workflows and pilots approved for broader deployment. Each entry includes the coupling tier, conditions for expansion, resources required, and governance requirements implied by the tier.

Item 1

Workflow / pilot name Coupling tier

Low

Output reviewed before reaching anyone outside the team. Removable with minor inconvenience.

Medium

Feeds member-facing or regulatory processes. Thorough review impractical at volume. Removal requires process redesign.

High

Critical path for member-facing or compliance workflows. Removal causes operational disruption.

Conditions for expansion Resources required Governance requirements (from coupling tier)

What Gets Retired

Pilots or approaches that are retired based on evidence. Retirement is a success of the method, not a failure: it means the evaluation criteria worked and prevented bad investments from scaling.

Item 1

Pilot or approach Reason

Insufficient Evidence

The pilot ran but did not generate enough data for a confident scale decision. More time or a different measurement approach would be needed, but the continued investment is not justified.

Scenario Fragility

Works under current conditions but breaks under plausible future scenarios identified in the scenario exercise. Too fragile to bet on for the next 12 months.

Cost Exceeds Value

The measurable value (rework rate reduction, cycle time improvement, error-catch improvement) does not justify the ongoing cost of operation, monitoring, and governance.

Compliance / Fair-Lending Risk

Compliance review identified risks that cannot be mitigated within the pilot's current design. Particularly relevant for any member-facing AI with fair-lending obligations.

Superseded

A better approach emerged from another pilot, vendor improvement, or internal capability change that makes this pilot redundant. The learning still counts.

Other

A reason not captured by the categories above. Document the reason in the notes field below so it can inform future retirement decisions.

Notes

Annual AI Plan & Board

Formalize the Annual AI Plan from accumulated entries. Update the governance model with gap-period evidence. Design the board co-creation session. Formalize the champion network. Build the annual refresh mechanism. The engagement's handoff is already in progress, not a final-week task.

Module info Weeks 3-7 18 hrs facilitator 1 hr CEO 1 hr per exec 2.5 hrs board 4 hrs internal owner

When: Weeks 3-7. The Annual AI Plan formalizes entries that accumulated during Sprint Three plus Sprint Four scaling decisions. Board co-creation session is the culminating facilitated event. The engagement's handoff is already in progress, not a final-week task.

Time per person

Facilitator (Brent) 18 hrs Plan formalization, governance model update, board session facilitation, engagement wrap, pre-close readiness check (1 hr to assemble evidence and complete the four-prereq check)

CEO 1 hr Plan review, governance sign-off, board session participation

Each senior exec 1 hr Annual Plan draft review for their domain, board co-creation session participation

Each board member 2.5 hrs Pre-read material review, board co-creation session participation

Internal owner 4 hrs Refresh mechanism ownership, champion network formalization, ongoing governance, pre-close readiness evidence (1 hr providing artifacts for the four-prereq check)

What actually happens

The Annual AI Plan has been assembling continuously since Sprint Three. Sprint Four formalizes what already exists rather than writing from scratch. The governance model is updated with gap-period evidence. The board co-creation session brings the board from signal-watchers to governance participants. The champion network is formalized with roles and mandates. The annual refresh mechanism ensures Septapod can run the cycle without Brent.

Through-line

Generates: Formal Annual AI Plan (board-ready). Governance model v2.0. Signal watch-list refresh. Board co-creation session design. Formalized champion network. Annual refresh mechanism. Engagement completion summary.
Value: Septapod has a complete, board-approved AI plan that is refresh-ready. The governance model is evidence-tested, not theoretical. The engagement succeeds if there is no follow-on engagement.
How Septapod uses it: The Annual AI Plan guides AI decisions for the next 12 months. The refresh mechanism triggers the annual update. The governance model operates independently. The board has ownership over AI governance, not just oversight.
Next step uses: There is no next step. The engagement ends with Septapod operating independently. The refresh mechanism is the successor to the engagement.

Annual AI Plan Assembly

Aggregated from Sprint Three accumulated entries and Sprint Four scaling decisions. Seven sections cover the full plan. Each section has a text area for the narrative Brent writes during facilitation.

Sprint Three Annual AI Plan entries will load automatically.

1. What AI Septapod Uses and Why

2. What Septapod Does Not Use and Why

3. Scaling Roadmap

4. Resource Requirements

5. Governance Model

6. Risk Posture

7. Refresh Triggers

Governance Model v2.0

Updated from Sprint Three's governance model with gap-period evidence. Each accountability slot shows the Sprint Three original alongside the updated version.

No governance model was carried forward from Sprint Three. Build a new governance model here or complete Sprint Three first.

Pilot Oversight

Gap-period performance

Load-bearing

Worked as designed. The named person carried their pilot oversight responsibilities without prompting during the gap period.

Needs Adjustment

The accountability existed but did not function as designed. The person, cadence, or scope needs modification based on gap-period evidence.

Not Tested

The gap period did not produce situations that tested pilot oversight. Effectiveness remains unvalidated. Not a failure, but not evidence of success.

Updated assignee Updated accountability Evidence from gap period

Vendor Evaluation

Gap-period performance

Load-bearing

Worked as designed. The named person carried vendor evaluation responsibilities, including applying the evaluation criteria to real decisions.

Needs Adjustment

The accountability existed but did not function as designed. Vendor decisions may have been made without the evaluation framework or by the wrong person.

Not Tested

No vendor decisions arose during the gap period. The vendor evaluation accountability was not exercised. Effectiveness remains unvalidated.

Updated assignee Updated accountability Evidence from gap period

Annual Plan Refresh

Gap-period performance

Load-bearing

Worked as designed. The named person maintained the annual plan refresh process and kept it current during the gap period.

Needs Adjustment

The accountability existed but the plan was not refreshed on schedule, or the refresh was superficial rather than evidence-based.

Not Tested

The gap period was too short to trigger a planned refresh cycle. The annual plan refresh accountability was not exercised.

Updated assignee Updated accountability Evidence from gap period

Board Reporting

Gap-period performance

Load-bearing

Worked as designed. Board received AI updates on the established cadence. Board members engaged with the material rather than just receiving it.

Needs Adjustment

Board reporting happened but was perfunctory, off-cadence, or did not generate meaningful board engagement. Format or content needs rethinking.

Not Tested

No board meeting occurred during the gap period that included AI reporting, or the reporting slot was deferred. Board engagement with AI remains unvalidated.

Updated assignee Updated accountability Evidence from gap period

Fair-lending review status Governance model narrative

Agent Identity and Oversight

A governance checklist for the point where AI stops drafting and starts acting on member accounts. As agents move into production, each one needs to be a known, accountable actor rather than an anonymous process running on a shared login. Bring this into the governance model and revisit it at each refresh.

Its own identity. Each agent has a distinct identity, separate from any employee or shared service account, so its actions trace back to it alone.
Scoped, least-privilege access. The agent can reach only the systems and data its task requires, with short-lived credentials instead of standing keys.
A tamper-evident log. Every action the agent takes is recorded in a log that cannot be altered without detection, so an examiner or auditor can reconstruct what happened.
A named human owner. One person is accountable for each agent, the way a manager is accountable for a team member.
A way to revoke it. Access can be cut in one place, quickly, if the agent misbehaves or is no longer needed.

The practical path for a credit union is to get these from its core and identity vendors, not to build them. The governance question is whether the vendor provides them and who at the credit union confirms it. Okta for AI Agents and Microsoft Entra Agent ID are building this into the identity stack; open-source work such as Abaxx Labs' agents library shows the same building blocks (a cryptographic ID, scoped credentials, a signed audit trail) in the open.

Signal Watch-List Refresh

Carried forward from Sprint Three. For each signal: was it useful during the gap? Keep, modify, or drop. New signals from the scenario exercise can be added.

Signal 1

Signal name

Gap usefulness

Decision

Who watches

Frequency Source

Board Co-Creation Session Design

The board has been receiving signals to watch since Sprint Three. They have enough exposure to participate in governance decisions, not just receive updates. This card designs the facilitated session, not a form the board clicks through.

Pre-Read Material

Annual AI Plan draft Governance model v2.0 summary Scaling decisions summary Scenario overview

Session Structure

Block 1

Time

Activity

What the board provides Facilitator notes

Governance decisions the board weighs in on Total session time

Champion Network Formalization

Formalizes champion roles and mandates based on the gap-period assessment from Step 1. Connects to the governance model: champions are the distributed support layer.

No champion assessment data from Step 1. Complete Step 1's champion network assessment first, or build the network here.

Champion 1

Name

Department

Gap-period activity (from Step 1) Formal role in ongoing governance Reporting relationship

Champion network mandate

Annual Refresh Mechanism

How Septapod runs the AI Plan refresh without Brent. The engagement succeeds if there is no follow-on engagement.

Who owns the refresh Refresh cadence What triggers a mid-year update

Refresh input checklist

Signal watch-list review Pilot performance data Vendor landscape changes Regulatory updates Member feedback

What the board sees at each refresh Connection to Septapod's strategic planning calendar

Vendor Assumption Sheet (Superadditive Reference)

Six evaluation questions from Superadditive's "Japanophilia and Greenhouses" for AI platform procurement. Reference material for vendor decisions that arise during scaling.

1. Coordination Philosophy

Does the vendor assume centralized or distributed coordination? A platform that assumes all AI work routes through a single team conflicts with a distributed governance model.

2. Unit of Work

What does the vendor consider a single "job"? If the unit is a chat turn, the platform is optimized for conversational AI. If it is a workflow, it fits process automation. The mismatch between the vendor's unit and the credit union's unit creates friction.

3. Destination Assumption

Where does the vendor assume AI output goes? Into a human review queue, directly into a system of record, into a member-facing channel? The destination assumption determines governance requirements.

4. Relational Persistence

Does the platform remember context across interactions? A platform with no memory requires rebuilding context each session. A platform with persistent memory raises data governance and privacy questions.

5. Naming Metaphor

What does the vendor call its AI? "Assistant," "agent," "copilot," "advisor." The naming metaphor reveals the vendor's mental model of the human-AI relationship and shapes user expectations.

6. Exit Cost

What happens if Septapod stops using this vendor? Are prompts, workflows, and training data portable? A platform that locks in institutional knowledge creates dependency that conflicts with the annual refresh model.

Pre-Close Readiness Check

Four prerequisites that have to be in place before the engagement closes. Each one was implied by the design; this card makes them explicit so the engagement does not end leaving Septapod set up to fail the capability transfer test it took in Step 1. If any prerequisite is not met, Sprint Four needs to address it before closing.

Why these four

Sprint Four's capability transfer scorecard tests whether Septapod can run AI work independently. Five indicators were measured: per-pilot method used, governance functioning, signal watch-list active, shadow-AI behavior changed, champion network operating. For those indicators to show as "Yes" rather than "Partial" or "No," four things have to be in place when the engagement closes. Named these explicitly because implication will not survive the gap period.

1. Named internal owner with calendar commitment

Without a single named owner who has documented time on their calendar for post-engagement work, the Annual AI Plan refresh becomes everyone's job and therefore no one's. The refresh mechanism card above identifies the owner; this prereq confirms the calendar commitment is real.

Internal owner named (matches refresh_owner above) Calendar commitment documented (specific time allocation per week/month for AI work) Authority and reporting line confirmed (who they escalate to, who they have authority over)

Evidence and notes

2. Governance cadence ran 2+ cycles before the gap

The Distributed Governance model (Sprint Three Step 1) and its v2.0 update (Sprint Four Step 3) are paper constructs until the cadence runs. Two completed cycles under Brent's eye is the minimum to know the model handles real decisions. One cycle is not enough; the second cycle reveals what the first missed.

First governance cycle completed during Sprint Three (with documented decisions) Second governance cycle completed during late Sprint Three or Sprint Four (with documented decisions) Cadence is now a recurring meeting on Septapod's calendar, not an ad-hoc convening

Evidence and notes

3. Champion network of 3-5 people with documented support activities

The Champion Network Formalization card above names the people and roles. This prereq confirms the support activities are real: each champion has done something visible during the engagement, not just received the badge.

3-5 champions named (matches Champion Network Formalization above) Each champion has at least one documented support activity from Sprint Three or Sprint Four Mandate is written down and signed off by the AI Taskforce or governance lead

Evidence and notes

4. Signal watch-list with named watchers who completed one monitoring cycle

The Signal Watch-List Refresh card above keeps the list current. This prereq confirms the watchers are practiced: each named watcher has run one monitoring cycle (checked their signal at the assigned frequency, reported what they saw, no action or action taken). A watcher who has never reported has not actually been a watcher.

Signal watch-list has named watchers for every active signal Every watcher has completed at least one monitoring cycle with a documented report Watch-list review is scheduled into the refresh cadence (not just "annual")

Evidence and notes

Overall pre-close readiness Notes on the overall assessment

Engagement Completion Summary

What the full engagement produced across all four sprints. What Septapod now owns and operates independently. Conditions under which Brent would return.

What the engagement produced What Septapod now owns and operates independently Conditions under which Brent would return

Sprint Four Summary

Click to expand