Jonathan Hawkins/Builder of AI-native products
I turn frontier model
capabilities into products people can actually use.
20+ years shipping games. Two-time Game of the Year winner on God of War and Eclipse: Edge of Light. Now solo-founding Aligned Tools at the edge of agentic AI.
01—Featured work
Four case studies.
Projects that show the through-line: take a frontier capability, ship a product around it, fast.
Aligned Tools
Your company's brain: listens to every meeting, remembers what your team decides, files the tickets they confirm.
Aligned is built to get smarter with age. It holds the patterns no one person can (recurring blockers, scattered expertise, decisions that keep resurfacing), so the longer a team uses it, the more their company remembers.
- Problem
- Engineering managers lose 30 to 60 minutes per meeting turning decisions into tickets, then spend the week reconciling the same task across Jira, Linear, Asana, Notion, and Monday. Action items slip, ownership is ambiguous, and the same decisions get re-litigated three sprints later because nobody remembers.
- Constraint
- Solo founder, B2B SaaS, in active enterprise sales. Must clear enterprise security review and slot into the customer's existing stack. Switching tools is off the table.
- Move
Aligned listens across the surfaces work actually happens on (Zoom, Meet, Teams, Slack, Gmail, Calendar), then runs every transcript through an AI pipeline that extracts decisions, commitments, and ownership into a diff-style review UI where every human edit is captured.
The shape of the system:
- Company Brain. A pgvector memory layer that reads across meetings, so Aligned can say “you decided this three sprints ago” before the debate starts over.
- Agent frameworks in production. A CrewAI sprint planner for capacity-aware prioritization, rebuilt on a user-scoped Mem0 memory layer: CrewAI's default memory is process-local with no tenant partitioning, which leaks one customer's retrieval context into the next on shared FastAPI workers. CI greps every commit so nobody flips it back. Alongside it, a streaming email-triage pipeline and computer-use agents (OpenClaw + VNC) that drive full Ubuntu desktops: terminal, files, any GUI app.
- Real-time voice agent. A LiveKit speech-to-speech agent (xAI Grok primary, OpenAI fallback) exposes ~60 backend tools, with an optional LemonSlice video-avatar integration. A fail-closed, multi-layer HMAC auth chain with nonce replay protection and a heartbeat watchdog (covering 75 of 86 voice routes) means a leaked service token alone can't impersonate a user.
- Bidirectional ticket sync. Jira, Linear, Asana, Monday, Notion, and GitHub: per-provider mapping for project, status, priority, and issue-type, and a status that only reads “synced” when zero errors fired. Round-trip sync is 5× the work of write-only, but the customer keeps their tool of choice and Aligned becomes the connective tissue underneath.
- SOC 2 CC7-style controls (not a certification). SSO-only, per-tenant isolation, AES-256-GCM versioned token encryption with key rotation, AWS Secrets Manager for per-tenant webhook secrets, hash-chain audit logs, fail-closed rate limiting, and dual TS+Python environment validators that block startup on missing or malformed config.
- Outcome
- MSA and two SOWs in negotiation with a name-brand AAA game studio after passing CIO-level security review on the first pass. In live tenants, meetings become human-confirmed Jira issues (name-to-ID resolution, undo, rollback) that trace back to the transcript that produced them, and tasks sync both ways across six providers. The WebSpatial build of Aligned won Best Technical Excellence at TechWeek 2026.
VibeView
Spatial multi-agent orchestration for Claude Code
- Problem
- Run five Claude Code agents in parallel and the terminal turns to soup: output interleaves, work in progress is invisible until something breaks, and there's no shared view of who is doing what against the plan.
- Constraint
- 28 hours, solo, at the SenseAI Hackademy. Cross-platform spatial UI from scratch.
- Move
- VibeView turns your editor into a room. Each Claude Code agent gets a floating glass window with live tmux output and parsed model/token/cost telemetry; a shared spatial kanban above them is fed by Claude Code's own ~/.claude/tasks/ files. Hold-to-talk voice routes commands to any agent, and when one transitions idle a Python bridge auto-summarizes its terminal with GPT-5-mini and speaks the summary back through a per-agent ElevenLabs voice. The hard call was WebSpatial. It ships to Apple Vision Pro, desktop browsers, and PICO 4 Ultra from one bundle, but the docs are sparse, so I had to reverse-engineer the renderer from runtime behavior.
- Outcome
- Shipped end-to-end in 28 hours, demoed live driving a 5-agent swarm against a real codebase. Won the WebSpatial track at SenseAI Hackademy. Open-sourced post-event.
SkillVault Desktop
Making Claude Code's invisible config visible
- Problem
- Claude Code's power lives in env vars, hooks, slash commands, and skills, but they're scattered across files, undocumented in-product, and impossible to compare across projects. New users hit a wall, and power users rebuild the same scaffolding for every project.
- Constraint
- Consumer-facing desktop app, must feel native on macOS, ship a marketplace from day one.
- Move
- SkillVault Desktop is a single pane that reads your Claude Code config across every project, surfaces what is actually wired up, and lets you install vetted skills/hooks/commands from a marketplace with one click. The part I sweated most was the install diff: when you install a skill, you see exactly which files change, with rollback. I built it on Tauri so it stays a 12 MB binary instead of a 200 MB Electron app, and the marketplace is just signed manifests in a public repo so anyone can publish without a backend.
- Outcome
- Public beta launched with 40+ community-contributed skills in the first week. Average user activates 6 skills in their first session.
Glassbox
A glass cockpit for a coding agent swarm, with scores it can't fake.
- Problem
- Agent swarms have two problems: you can't watch them work, and you can't trust what they tell you. Parallel output interleaves into noise, and when an agent claims its code is correct, usually nothing is checking.
- Constraint
- WeaveHacks 4: one weekend, solo, empty repo. One rule: no theater. The agents write the code themselves, every step shows on the board as it happens, and the score comes from building and running the thing. Nothing gated, nothing hardcoded.
- Move
Glassbox puts a fixed crew (planner, coordinator, four workers, validator, improver) on a live tldraw board fed by a Redis event stream, so every decompose, handoff, and verify lands on screen the moment it happens. The same crew runs two ways: as live Claude Code or Codex sessions you supervise from the command center, or headless against a graded benchmark.
The shape of the system:
- One engine, eight loop shapes. The engine never changes (decompose, dispatch, verify). A loop is that engine plus a stop condition: Land stops when the goal verifies, Climb when a metric plateaus, Sweep when a backlog drains, Race when a judge picks a winner.
- No fake progress. Workers author each edit with W&B Inference against the validator's failing cases and keep a change only if the built artifact scores higher. Agent Mail carries the handoffs, Beads tracks the dependency graph, and nothing advances on a timer.
- It improves itself. The benchmark ports a BPE tokenizer to Rust, scored by an exact token-ID diff against tiktoken. Between versions, the improver reads the failing cases back from Weave and rewrites the planner skill, so accuracy climbs without anyone touching the swarm code.
- Point it at your own repo. A task is just a goal, a workspace, and a checkable evaluator. Give Glassbox a repo and a test command and it clones into a disposable sandbox, finds the failing tests, and fixes them. Your source is never touched.
- Outcome
- Built in a weekend from an empty repo. The graded Climb took the Rust tokenizer from 0.17 to a perfect 1.00 token-ID match against tiktoken, then a separate Python library task from 0.52 to 1.00, with zero swarm code changed between the two. On the live side, a Sweep drained a four-file backlog in about eight minutes and tore itself down, and a Climb cut tokenizer latency from 269 to 141 ms.
02—Supporting work
More things I've shipped.
Side projects, weekend builds, and open-source experiments. Most ship a real product, even the small ones.
- 01
LabFork
A GitHub for agentic research. Fork labs, run agents on papers.
AgentsResearchNext.js - 02
voxherd
Voice-first multi-instance Claude Code controller for iOS.
iOSVoiceClaude Code - 03
TinkerSchool
Open-source AI homeschool platform for K-6 with M5StickC.
EducationHardwareOpen source - 04
Patina
A C++ → Rust conversion of the Godot engine.
RustGame engineGodot
03—About
A little context.
20+ years in games. Started as an intern at Sony Santa Monica and rose to Lead Level Designer over a decade, shipping God of War 1, 2, and 3 along the way. Was one of four hand-picked leads on a new unannounced AAA IP, managing 15+ designers.
In 2014 I founded White Elk Studios and recruited a small team of God of War franchise veterans to build Eclipse: Edge of Light, a VR adventure that won three Mobile VR Game of the Year awards (UploadVR, Daydream District, VR Fest 2018) and shipped to 8 platforms including PS VR, Oculus Quest, Steam, and Nintendo Switch.
In 2011 I founded GameDevDrinkUp, a monthly mixer that virally scaled from LA to 20+ cities worldwide, sponsored by Twitch.
Now solo-founding Aligned Tools. I take frontier model capabilities the week they ship and turn them into products non-engineers can use the same week.