Shipgate / Product Handbook / Internal

Every pull request
is a risk decision.
We make it fast.

This handbook is the foundation for everyone joining Shipgate. It explains what we are building, why it matters, who it serves, and how it works. Engineering, marketing, sales, and design should all start here before anything else.

Shipgate is a runtime-aware, schema-aware, security-aware merge gate. It understands the entire repository, runs your code changes in a sandboxed environment, validates database migrations against real data, catches security vulnerabilities, and detects low-quality AI-generated code. Every pull request gets a verdict before any human reviewer opens the diff.

Why This Matters Now
$32B
Developer tooling market by 2028
77%
Of developers now use AI coding assistants
1.7x
More bugs found in AI-generated code than human-written code, per CodeRabbit's own research
0
Competitors that run your code, validate your migrations, and detect AI-generated code quality issues in a single check
01

The Problem

Why the way we review code today is fundamentally broken

For most of software history, code review was a human-scale problem. A developer writes a change, a colleague reads it, and together they decide whether it is safe to ship. This works when teams are small and the pace of changes is manageable. It breaks down fast when both of those things stop being true.

AI coding tools have fundamentally changed the equation. Developers using tools like Cursor, GitHub Copilot, or Lovable can now produce ten times more code than before. That sounds like a win. The problem is that the reviewers are still human. The volume of code going out has exploded, but the human capacity to carefully evaluate it has not moved at all.

Meanwhile, the tools that are supposed to catch problems were built for a different era. They check for syntax errors (whether your code is grammatically correct) and style violations (whether you used the right formatting). They do not understand what a change actually does to a running system, how far its effects reach, or whether the AI that generated it made up a function that does not actually exist anywhere.

Code volume is exploding

AI tools mean developers can propose code changes far faster than before. More PRs per day, more files changed per PR, and more surface area for mistakes to hide. Human reviewers simply cannot keep up with the pace. Something has to give, and it is usually the thoroughness of the review.

PR = Pull Request. This is the package of code changes a developer proposes to add to the shared codebase. Think of it as a formal request to merge your edits into the main project.
🔍

Reviews focus on style, not risk

Most reviewers, human or AI, spend their attention on formatting, variable naming, and minor style preferences. The real question is whether this change could break something important or open a security vulnerability. That question rarely gets a serious answer, because answering it requires understanding the entire codebase, not just the few lines that changed.

🔒

Security tools run too late

Traditional security scanners are typically run after code is already merged into the main branch and deployed to production, meaning it is live on servers that real users are hitting. By the time they find a problem, the vulnerability is already in the wild. Fixing it then requires pulling engineers away from other work, writing a patch, going through the entire release process again, and hoping no one found the hole in the meantime. Catching the same issue at review time costs a tiny fraction of that.

Merged = approved and permanently added to the shared codebase. Deployed = shipped to live production servers that real users interact with.
🤖

AI-generated code brings new failure modes

AI coding assistants produce code that looks correct on the surface but frequently is not. They confidently invent the names of libraries and functions that do not exist anywhere. They copy patterns from their training data without understanding whether those patterns fit the specific codebase they are writing for. They introduce subtle logic errors that every syntax checker will happily pass. And no existing review tool was built to detect any of this, because they all predate the era of AI-generated code.

AI slop = code produced by AI tools that looks plausible but contains hidden errors, invented references to things that do not exist, or patterns copied without understanding the context they are being pasted into.
🌊

Small changes can have large, invisible consequences

A two-line change to an authentication function can silently break ten other parts of the system that depend on it. A reviewer looking only at the changed lines has no way to know that without a comprehensive map of the entire codebase. No human reviewer consistently holds that map in their head for every file in every project they review.

Authentication = the part of a software system responsible for verifying who a user is before letting them in. Vulnerabilities here are among the most serious in any application.
📉

Engineering leaders have no visibility into risk

CTOs and team leads have no consistent, data-driven view of how risky their incoming code changes are, which contributors are introducing the most problems, or whether quality is trending up or down across their repositories over time. Every decision they make is based on instinct and individual memory rather than reliable, systematic data.

The Core Gap No One Has Filled

There is no fast, PR-level system that answers the question engineering teams actually need answered: "If this change is wrong, what breaks, how far does the damage spread, and how risky is it to ship right now?" That is exactly what Shipgate answers, automatically, on every single pull request.

02

The Solution

What Shipgate does and why the approach is different

Shipgate installs on your GitHub repository and becomes a required check on every pull request. When a developer opens a PR, Shipgate runs automatically. It analyses the proposed change against the full context of the codebase, measures how far the change's effects reach, scans for security vulnerabilities, checks for the telltale signs of low-quality AI-generated code, and produces a clear risk verdict before any human reviewer has even opened the diff.

The critical framing here is risk engine. Shipgate is not a linter (a tool that enforces formatting rules). It is not a spellchecker for code. It exists to answer one question: is this change safe to ship? Everything it does flows from that single goal.

The result for reviewers is a structured risk summary that replaces 40 minutes of uncertain manual analysis with 5 minutes of confident, data-backed decision-making. They see exactly what the change does, how far it reaches, what security issues were found, and a clear recommendation on whether to block or allow the merge.

Capability 01

Full Codebase Understanding

Most review tools only read the lines you changed. Shipgate first reads and maps the entire repository so it understands what those changed lines affect across the whole system. It knows how every file relates to every other file before it evaluates anything.

Capability 02

Blast Radius Analysis

Before you merge, Shipgate maps exactly how far the change ripples outward: which other files depend on it, which services are affected, and whether any critical parts of the system sit in the blast zone. This is the core differentiator no competitor offers.

Capability 03

Security Scanning

Runs security checks on every PR against the OWASP Top 10, which is the global standard checklist of the ten most critical and commonly exploited security risks in software. Catches SQL injection, exposed passwords, vulnerable dependencies, and more before they reach users.

Capability 04

AI Slop Detection

Identifies the distinctive failure patterns of AI-generated code: invented library names that do not exist, meaningless placeholder variables, excessive copy-pasted boilerplate, and code that is structurally inconsistent with how the rest of the codebase was written. No other product does this.

Capability 05

Merge Enforcement

Integrates as a required check on GitHub. When a PR crosses a risk threshold, the merge button is physically blocked until the issues are resolved. Security becomes a non-negotiable gate, not a guideline that gets skipped when deadlines tighten.

Capability 06

Regression Detection

Catches behavioral regressions that no syntax checker can find: a condition that was accidentally inverted, a transaction boundary that was quietly removed, an exported function whose signature changed in a way that silently breaks every caller. These are the bugs that only show up in production.

Capability 07

Execution Validation

Spins up a real, sandboxed copy of the application using Daytona, runs the existing test suite against the PR branch, and compares the results against the base branch. If the PR introduces test failures that were not there before, the merge is blocked before a single human reviewer has to read a line of code.

Capability 08

Migration Safety

Spins up an ephemeral database branch using Neon, applies the PR's database migration, inserts synthetic test data, replays real query patterns from the codebase, and checks whether any data was lost, truncated, or corrupted. Catches data-loss migrations before they ever touch a production database row.

03

About the Product

How Shipgate works from first pull request to final verdict

Shipgate is installed as a GitHub App. A GitHub App is a piece of software that you connect to your repository in a single click, the same way you might install a browser extension. Once installed, it automatically receives a notification every time a developer opens, updates, or closes a pull request on that repository. No developer needs to change their workflow. No new commands to run. Shipgate operates silently in the background and posts its findings directly onto the PR.

Here is the complete step-by-step of what happens from the moment a PR is opened to the moment a reviewer sees the results.

01

Developer opens a Pull Request

A developer proposes a code change by opening a pull request on GitHub. This action immediately triggers a webhook notification to Shipgate. A webhook is an automatic notification sent by GitHub the instant a specific event occurs, without anyone having to press a button or run a command. Think of it as a doorbell that rings automatically when a package arrives.

02

Shipgate indexes the repository

Shipgate reads the full codebase and builds an internal map of how everything connects. It identifies which files import (use code from) which other files, parses the code structure using a technique called tree-sitter (which reads code the same way a compiler does, understanding the logical structure of the code as a tree of relationships rather than just as raw text), and assembles a dependency graph showing how every part of the system is connected to every other part.

This map is kept continuously updated, not rebuilt from scratch every time. When new code is pushed, only the changed portions of the map are refreshed, which is what keeps analysis fast even on large codebases.

Plain EnglishImagine drawing a detailed map of every road in a city and marking which roads depend on which bridges. If a bridge closes, you instantly know every road that is affected. That is exactly what Shipgate builds for your codebase, and it consults that map before evaluating any proposed change.
03

Blast Radius is calculated

Shipgate traces the full downstream impact of the proposed changes. It identifies every file, function, and service that is directly or indirectly affected by what the PR modifies. It classifies the change by type: feature addition, refactor, security patch, configuration change, or dependency update. It specifically flags if any sensitive areas of the system are in the blast zone, including authentication (the login and access control system), payment processing, database schema (the structure of how data is stored), or public-facing API endpoints (the connection points that other systems or users call from outside).

All of this feeds into a blast radius risk level of Low, Medium, High, or Critical. This level is used as a multiplier when computing the Security Score.

Plain EnglishIf you change the lock on a front door, Shipgate tells you how many rooms that door leads into, whether any of them are vaults, and whether the new lock is weaker than the old one. A change in a rarely-visited back room is a very different risk than the same change at the main entrance. Blast Radius captures that distinction automatically.
04

Security engine runs in parallel

Three categories of security checks run simultaneously on the changed code. First, static analysis using a tool called Semgrep scans the code for patterns that match the OWASP Top 10, the globally accepted list of the ten most critical and most commonly exploited security vulnerabilities in web software. This covers SQL injection (tricking a database with malicious commands), cross-site scripting (injecting code that runs in another user's browser), broken authentication (flaws in the login and session management system), and seven other equally serious categories.

Second, secrets detection scans every line for accidentally committed credentials: API keys (passwords used by software to access external services), database connection strings (the address and password for connecting to a database), and private keys (cryptographic credentials that, if exposed, give an attacker the ability to impersonate the server).

Third, dependency scanning checks every new or updated package in the project against OSV, an open-source vulnerability database maintained by Google and continuously updated with newly discovered security problems in popular libraries.

Plain EnglishSQL injection is when an attacker types something like "DROP TABLE users;" into a search box and your database obeys. It sounds simple because it is, and it is one of the most common ways real systems get breached. Cross-site scripting is when an attacker gets their own code to execute inside another logged-in user's browser. Shipgate catches both of these, and eight other equally serious risk categories, before they ever reach production.
05

AI slop detection runs

Shipgate compares the code in the PR against the established conventions, style, and patterns of the rest of the repository. Several specific checks run in parallel: hallucinated import detection cross-references every library and function name in the changed code against public package registries to verify they actually exist. Placeholder detection flags variables with meaningless names like "data", "temp", "result", or "handler" that suggest the code was generated rather than written with intent. Style deviation analysis flags sections of code that are structurally inconsistent with how the rest of the project was written.

Plain EnglishWhen you ask an AI coding tool to write a function, it sometimes confidently invents the name of a library that does not exist anywhere. The code it writes looks completely plausible, compiles without errors, passes every syntax check, and then crashes at runtime because it references a dependency that was never real. Shipgate catches this before it merges because it checks every single import against the actual registries of available packages.
06

Regression detection analyses code structure and behaviour

Shipgate analyses the structural changes in the PR for patterns that historically cause silent breakage. It checks whether any function signatures changed in ways that would break callers (a caller is any piece of code that uses a function, even in a completely different file). It checks whether any previously exported symbols (shared pieces of code that other parts of the system depend on) have been removed. It checks whether any conditional logic was inverted, which is the class of bug where a condition that previously evaluated to true now evaluates to false, causing previously-passing cases to fail silently. It checks whether transaction boundaries were removed, which is when changes to a database are no longer wrapped in a unit that either succeeds completely or fails completely, leaving the database in a partially-written state.

It also checks for contract safety violations: whether a public API endpoint now returns a different response shape than callers expect, whether required fields were removed from a response, whether authentication middleware was removed from a route that was previously protected, and whether HTTP status codes changed in ways that would break clients relying on them.

Plain EnglishA conditional inversion is exactly what it sounds like: you had "if user is authenticated, allow access" and someone changed it to "if user is NOT authenticated, allow access." Every syntax checker passes it. Every linter ignores it. It only becomes visible in production when users who should be blocked suddenly have access to things they should not. Regression detection catches this class of bug by analysing whether the logic changed in dangerous ways, not just whether the code is grammatically correct.
07

Execution validation runs the code in a real sandbox

For PRs above a configured risk threshold, Shipgate spins up a fully isolated, ephemeral copy of the application using Daytona. Ephemeral means the environment exists only for the duration of the analysis and is permanently destroyed afterward. Isolated means it has no access to the real network, real databases, or any production systems. It is a self-contained copy of the application that exists only to run the test suite.

Shipgate checks out the base branch (the version of the code before the PR's changes), installs all dependencies, builds the project, and runs the full test suite. It then repeats the exact same process on the PR branch. The two results are compared. Any test that was passing on the base and is now failing on the PR branch is flagged as a regression introduced by this specific change. Tests that were already failing before the PR are excluded, because Shipgate is measuring what the PR changed, not the pre-existing state of the codebase.

For low-risk PRs, Shipgate skips full execution and runs only the tests that touch files within the blast radius, which is significantly faster. The depth of execution scales with the risk level so that thorough validation is concentrated where it matters most.

Plain EnglishMost code review tools read code. Shipgate runs it. There is a large category of bugs that look completely fine on paper but break immediately the moment the code executes. The only way to find those bugs before they reach production is to actually run the code in a controlled environment. The sandbox ensures none of this touches any real system or data.
08

Database migration safety validation runs on an ephemeral Neon branch

If the PR contains database migrations, a separate validation process runs in parallel with the execution engine. A database migration is a script that changes the structure of the database: adding or removing tables, changing column types, removing constraints, or altering how data is indexed. Migrations are among the highest-risk changes in any codebase because they modify the permanent structure of the data store, and some changes cannot be reversed once applied.

Shipgate creates a temporary, isolated database branch using Neon. A Neon branch is a copy-on-write snapshot of the database that can be created in seconds, is completely independent from production, and is destroyed automatically when the analysis is complete. Shipgate applies all existing migrations to bring the branch to the current state, then applies the PR's new migration on top. If the migration fails to apply, that is an immediate block on the merge.

If the migration applies successfully, Shipgate takes a schema snapshot before and after, and produces a precise diff showing exactly what changed: which tables were dropped, which columns were removed, which column types were narrowed, which indexes were removed, and which constraints were relaxed. After the structural check, Shipgate inserts synthetic data rows that match the existing schema, applies the migration, and checks whether those rows survived. If rows were deleted, truncated, or corrupted, that is flagged as a data loss risk. Finally, Shipgate replays a representative sample of the actual SQL queries used in the codebase against the post-migration schema to verify they still execute correctly.

Plain EnglishA database migration that drops a column is permanent. If 40 places in the codebase reference that column in their queries, every single one of them will start throwing errors the moment the migration runs in production. Finding this takes seconds in a Neon branch. Finding it after deployment means a production outage, an emergency rollback, and potentially data that cannot be recovered. Neon makes it possible to test the full migration against a real database with real query patterns in an isolated copy, on every single PR that touches the schema.
09

A unified risk score is produced across all dimensions

Shipgate combines inputs from every engine into a set of dimension-specific scores and a single final risk score from 0 to 100. The security score reflects raw findings weighted by exposure level (a vulnerability on a public-facing payment endpoint is weighted far more heavily than the same vulnerability in an internal admin script). The regression score reflects the severity and breadth of any behavioral changes detected. The execution score reflects whether the PR introduced new test failures. The migration score reflects whether any data-loss or query-breaking changes were found in the schema diff or SQL replay.

The blast radius multiplier scales every dimension score based on how widely the affected code is used across the system. A Critical finding in a file that nothing else depends on is a different situation from the same finding in a file that 60 other services call. The final risk level is reported as Low, Medium, High, or Critical, and is what determines whether the merge button is blocked.

Plain EnglishTwo PRs could have the exact same vulnerability. One has it in a function called by 40 other parts of the system, on a public endpoint that millions of users hit. The other has it in an isolated script that runs once a month for an internal report. Those are not equally dangerous. The unified risk score reflects this. Risk is not just about what is broken. It is about what breaks if someone finds and exploits it.
10

Results are posted to the PR with inline fixes

Shipgate posts a structured summary comment directly on the pull request. The reviewer sees the final risk level, a breakdown of the security, regression, execution, and migration scores, every finding with the precise file name and line number, a plain-English explanation of why the finding matters, and a suggested code fix formatted as a GitHub suggestion block (meaning the contributor can apply the fix with a single click without leaving the browser). The GitHub check status updates simultaneously, which is what physically controls whether the merge button is enabled or blocked.

This is what reviewers see on every PR:

shipgate / pr-analysis / pr #247 / add payment webhook handler
## Shipgate Analysis Report
────────────────────────────────────────────────────────────

PR: Add payment webhook handler
Author: @contributor
Files changed: 7
Lines changed: +312 / -48

── Impact Assessment ────────────────────────────────────────

Impact Risk: CRITICAL
Blast radius: 23 downstream files affected by this change
Scopes affected: Billing, Auth, Notifications
Public API: Yes. 3 public-facing endpoints modified.
Sensitive paths: Payment processing, Authentication layer

── Scores ───────────────────────────────────────────────────

Security Score: 78 / 100 (High Risk)
Regression Score: 62 / 100 (High Risk)
Execution Score: 41 / 100 (Medium — 2 new test failures)
Migration Score: 89 / 100 (Critical — column drop detected)
Final Risk Score: 84 / 100 (CRITICAL)

── Findings (6 issues found) ────────────────────────────────

[CRITICAL / SECURITY] webhook_handler.py, line 94
Type: SQL Injection vulnerability (OWASP A03:2021)
What it is: User input is pasted directly into a database query with no filtering
The risk: An attacker can read, modify, or delete your entire database through a form field
Fix: Replace with parameterised query (one-click suggestion attached below)

[HIGH / SECURITY] config.py, line 12
Type: Exposed Secret (OWASP A02:2021)
What it is: Stripe secret API key written directly in the source code file
The risk: Anyone with read access to this repository can charge to your Stripe account
Fix: Move to environment variable (one-click suggestion attached below)

[CRITICAL / REGRESSION] auth/middleware.py, line 31
Type: Auth middleware removed from /api/payments route
What it is: The payment endpoint was previously behind an authentication check. That check was removed in this PR.
The risk: Any unauthenticated user can now call the payment endpoint directly

[HIGH / REGRESSION] billing/service.py, line 148
Type: Transaction boundary removed from charge() function
What it is: The charge() function previously wrapped its database writes in a transaction. If any write failed, all writes were rolled back. That transaction was removed.
The risk: A failed charge can now leave the database in a partially-written state

[CRITICAL / MIGRATION] migrations/0042_drop_user_payment_method.sql
Type: Column drop — users.payment_method_token (irreversible)
What it is: This migration permanently drops the payment_method_token column from the users table
Data test: 12 synthetic rows lost after migration. Data cannot be recovered once this runs.
Query test: 3 queries referencing users.payment_method_token failed after migration

[SLOP] utils.py, line 203
Type: Hallucinated Import
What it is: Code imports "stripe_helpers_v2" which does not exist in any public package registry
The risk: This library does not exist. The application will crash immediately at runtime.

── Execution Results ────────────────────────────────────────

Base branch: 47 tests passed / 0 failed
PR branch: 45 tests passed / 2 failed
FAIL tests/billing/test_charge.py::test_idempotent_charge
FAIL tests/auth/test_payment_route.py::test_requires_auth

── Verdict ──────────────────────────────────────────────────

Decision: MERGE BLOCKED
Reasons: Critical SQL injection in a public billing endpoint.
Auth middleware removed from payment route.
Irreversible column drop with confirmed data loss.
Resolve the 3 critical findings above to unlock the merge button.
04

What Shipgate Does

Every major capability, explained without jargon

Shipgate has six core capabilities that work together as a unified risk system. Each one addresses a specific gap that no other tool in the market adequately fills. They are not independent features bolted together. They compound: blast radius makes security scoring more precise, slop detection improves review accuracy, and contributor intelligence makes the whole system smarter over time.

Repository Intelligence

Understands your entire codebase before reviewing anything

Shipgate does not just read the lines you changed. Before reviewing any PR, it builds a complete internal model of the repository: every file, every function, every dependency relationship, every framework in use, and every sensitive area of the codebase including authentication, payment flows, admin routes, and database access patterns.

It uses a parsing technique called tree-sitter, which reads code the same way a compiler does. A compiler is the software that translates code from human-readable form into instructions a computer can execute. Tree-sitter reads the logical structure of the code as a tree of relationships rather than treating it as raw text. This means Shipgate understands what code does, not just what it says. The model is kept current through incremental updates every time new code is pushed, so it always reflects the live state of the project.

Think of it as the difference between reading a full map of a city and just looking at street signs. The street signs tell you where you are. The map tells you where everything connects, which roads share the same bridge, and which detours matter when something breaks. Shipgate reads the map.
Blast Radius Engine

Measures exactly how far a change reaches

Every code change creates a ripple effect through the system. A change to a shared utility function might be used in 40 other places across the codebase. A change to the database schema (the structure that defines how data is stored and organised) might affect every query in every service that reads from or writes to that database. Blast Radius is Shipgate's measurement of how wide that ripple spreads.

Shipgate counts the downstream files, services, and public interfaces that are affected by what the PR modifies. It produces a risk level of Low, Medium, High, or Critical. This level is then used as a multiplier in the unified risk scoring, because the same vulnerability in a widely-used, critical component is orders of magnitude more dangerous than the same vulnerability sitting in an isolated corner of the codebase.

A crack in a load-bearing beam is more dangerous than a crack in a decorative tile, even when the cracks look the same size. Blast Radius tells you which kind you are looking at, automatically, before every merge.
Security Engine

Catches vulnerabilities before they reach production

Three parallel security checks run on every PR. Static analysis via Semgrep scans the changed code against the OWASP Top 10: the definitive list of the ten most critical and most commonly exploited security risks in software. These include SQL injection, broken authentication, cross-site scripting, path traversal, insecure deserialization, and five other equally serious categories that account for the vast majority of real-world security breaches.

Secrets detection scans every line for accidentally committed credentials: API keys, database passwords, private cryptographic keys. A leaked Stripe API key means an attacker can charge your customers. A leaked AWS key means they can spin up servers on your bill. This happens constantly, and most teams only discover it after the damage is done.

Dependency scanning checks every new or updated library against OSV, a continuously maintained database of known vulnerabilities in open-source code. Using a library with a known vulnerability is one of the most common ways production systems get compromised.

OWASP Top 10 is the security equivalent of a mandatory building safety checklist. If your code fails any item on that list, a known and documented attack technique can be used against it. Shipgate checks every PR against the full list, automatically, before it can merge.
AI Slop Detection

Identifies the specific ways AI coding tools fail

AI coding assistants produce a set of distinctive failure patterns that traditional review tools were never designed to detect, because those tools all predate the era of AI-generated code. Hallucinated import detection cross-references every library and function name in the PR against public package registries to verify they actually exist. Code that references a non-existent library will compile and pass every syntax check, then crash the moment it runs.

Placeholder variable detection flags code using meaningless names like "data", "result", "temp", or "handler" that suggest the code was generated to look functional rather than written to solve a specific problem. Style deviation analysis compares the structure and patterns of the new code against the established conventions of the rest of the repository, flagging sections that appear inconsistent with how the project was written over time.

No other product on the market does this. CodeRabbit uses AI to review code but has no detection for how AI coding tools specifically fail. cubic.dev offers AI analysis but no slop detection. This is the most defensible technical differentiation Shipgate has.
Regression Detection

Catches the silent breakages that syntax checkers miss entirely

There is a whole class of bugs that look completely correct on paper: the code compiles, the linter passes, the syntax is valid, and nothing complains. They only break things when the code actually runs. Regression detection analyses the logical structure of what changed rather than just whether it is grammatically correct.

It checks for inverted conditions (a check that previously allowed access now blocks it, or vice versa), removed transaction boundaries (database writes that were previously atomic are now partial), changed function signatures (the inputs or outputs a function expects are different from what all its callers provide), removed exports (shared pieces of code that other parts of the system depend on are gone), and contract violations (a public API endpoint now returns a different shape than callers expect, or authentication was removed from a protected route).

A developer removes the "require authentication" check from a payment API route. The code compiles. The tests pass (unless there was a specific test for that check). The change looks like a one-line cleanup. In production, it means any anonymous request can now call the payment endpoint directly. Regression detection catches this at the PR stage by analysing whether the logic changed in dangerous ways, not just whether it is syntactically valid.
Execution Validation

Actually runs the code and compares results to the base branch

For PRs above a configured risk threshold, Shipgate spins up a fully isolated, ephemeral application environment using Daytona. Ephemeral means the environment is permanently destroyed after the analysis completes. Isolated means it has no access to production systems, real databases, or any live network services. It is a self-contained sandbox that exists only to run the test suite.

Shipgate checks out the base branch (the codebase before the PR's changes), installs all dependencies, builds the project, and runs the full test suite. It captures which tests pass and which fail. It then repeats the same process on the PR branch and compares the two results. Any test that was passing before the PR and is failing after it is a regression introduced by this specific change. Tests that were already failing are excluded because Shipgate is measuring what the PR changed, not the pre-existing state of the codebase.

For low-risk PRs, execution is scoped to only the tests that cover files within the blast radius, which is significantly faster. The depth of validation scales with the risk level so that compute is concentrated where it matters most.

Most code review tools read code. Shipgate runs it. There is a large category of bugs that look completely fine on paper but break immediately when the code executes. The only reliable way to find them before production is to actually run the code. The ephemeral sandbox ensures none of this ever touches a real system or any real data.
Migration Safety

Validates database changes against real data and real queries before any row is touched

Database migrations are among the highest-risk changes in any codebase because they permanently modify the structure of the data store. Dropping a column, narrowing a type, or removing a constraint can silently break queries throughout the codebase and may cause data that cannot be recovered. Shipgate validates every migration in an isolated copy of the database before it ever gets near production.

Shipgate creates an ephemeral database branch using Neon. A Neon branch is a copy-on-write snapshot of the database that is created in seconds, is completely independent from production, and is automatically destroyed when the analysis completes. Shipgate applies all existing migrations to bring the branch to the current state, then applies the PR's new migration. If it fails to apply, the merge is blocked immediately.

If the migration applies, Shipgate produces a schema diff showing exactly what changed: dropped tables, removed columns, narrowed types, removed indexes, relaxed constraints. It then inserts synthetic data rows matching the existing schema, applies the migration, and checks whether those rows survived. If rows were deleted, truncated, or corrupted by the migration, that is flagged as a data loss risk. Finally, Shipgate extracts a representative sample of the actual SQL queries used in the codebase and replays them against the post-migration schema to verify they still execute correctly.

A migration that drops a column is permanent. If 40 queries across the codebase reference that column, every one of them will throw an error the moment the migration runs in production. Finding this in a Neon branch takes seconds. Finding it after deployment means a production outage, an emergency rollback, and potentially data that cannot be recovered. Shipgate runs this validation on every PR that touches the schema, automatically.
Contributor Intelligence

Visibility into who contributes what quality over time

Shipgate builds a running quality profile for every contributor to a repository. It tracks slop rate (the percentage of that contributor's PRs that contain AI-generated code quality issues), security issue introduction rate (how frequently their changes introduce vulnerabilities), rework rate (how often a PR requires multiple rounds of revision before it can merge), and review-to-merge time. This data gives engineering leaders something they have never had before: reliable, systematic evidence for decisions about who to trust with sensitive parts of the codebase and where to invest review attention.

For enterprise customers, this data can be formatted into exportable security posture reports structured specifically for vendor procurement reviews, SOC 2 audits (a formal certification process that large companies require before buying software), ISO 27001 assessments, and PCI-DSS compliance checks (the security standard required for any software that handles credit card payments).

If one contributor's PRs consistently introduce security vulnerabilities, that is a coaching opportunity. If another contributor's PRs consistently pass clean, that person should be fast-tracked. Before Shipgate, these patterns existed but nobody had the data to act on them. Now they do.

The combination is the product. Any individual capability above exists in some form somewhere in the market. The thing no competitor offers is all of them operating together as a unified risk engine: static analysis, behavioral regression detection, live execution validation in a real sandbox, database migration safety against real query patterns, AI slop detection, and blast-radius-weighted scoring, all running in parallel on every pull request before a human reviewer opens the diff. Shipgate is the only platform that catches problems at all four layers simultaneously: in the code, in the running application, in the database schema, and in the data itself.

05

Market Size

The scale and shape of the opportunity we are building into

Shipgate operates at the intersection of two large, fast-growing markets: developer tooling and application security testing. The shift to AI-assisted development is creating a new sub-category within both markets that did not meaningfully exist three years ago. The companies that define that sub-category now will own it for a long time. That is the window Shipgate is building into.

$32B
Developer tooling market size projected by 2028
Growing at approximately 18% per year
$21B
Application security testing market by 2027
Growing at approximately 24% per year, accelerated by AI adoption
100M+
Developers worldwide, 77% of whom now use AI coding tools
Every team generating AI code is a potential Shipgate customer

The most important market dynamic is timing. AI coding tools crossed mainstream developer adoption in 2023. The trust gap, the moment where teams realise they are routinely shipping AI-generated code they cannot properly evaluate, is becoming acutely painful in 2025 and 2026. The tools built to address that gap are being defined right now. Shipgate is among the first to build specifically for this problem.

The paying customers are not individual developers. They are the engineering teams, platform teams, and organisations responsible for the quality and security of the codebases those developers contribute to. A single enterprise customer deploying Shipgate across 50 repositories is worth dramatically more than 50 individual subscribers. The business model targets the team and organisation tier from day one.

Capability SonarQube / CodeClimate CodeRabbit cubic.dev Shipgate
Static security analysis (code scanning)YesYes, via integrationsPartialYes, OWASP-aligned
Dependency vulnerability scanningYesYesPartialYes, via OSV database
Secrets and credential detectionSomeYesPartialYes
Blast Radius / system-wide impact analysisNoNoNoYes, core feature
AI slop and hallucination detectionNoNoNoYes, core feature
Behavioral regression detectionNoNoNoYes, inverted conditions, removed auth, contract violations
Execution validation in sandboxed environmentNoNoNoYes, via Daytona ephemeral environments
Database migration safety and SQL replayNoNoNoYes, via Neon ephemeral branches
Native merge blocking enforcementVia CI pipeline onlyNoNoYes, native
Multi-platform (GitHub, GitLab, Bitbucket)YesYesGitHub onlyYes
Vibe coding platform integrations (Lovable, Bolt, Cursor)NoNoNoYes, purpose-built
Enterprise security reports for compliance and procurementLimitedNoNoYes, exportable PDF

On the competition: CodeRabbit raised $88M at a $550M valuation in September 2025 and is growing at 20% month-over-month. This confirms the market exists and that budgets are allocated for this category. It does not mean the category is decided. Independent benchmarks score CodeRabbit 1 out of 5 on review completeness. In January 2025, CodeRabbit's own AI system flagged a malicious instruction in a PR comment and then executed it, exposing over one million repositories to potential attack. cubic.dev is GitHub-only, has three employees, and no independently verified accuracy data. Neither competitor runs your code, validates your database migrations, or detects behavioral regressions. Shipgate is the only platform that operates at all four layers: static analysis, live execution, schema safety, and AI-generated code detection.

06

Why Now

The timing argument for building Shipgate today and not two years from now

There have always been code review tools. There have always been security scanners. The question worth asking clearly is why a new entrant focused specifically on AI-generated code risk has a meaningful window right now. The answer is a convergence of three things happening simultaneously: the pain is established and real, the budget is allocated in engineering and security teams, and the standards for what a solution looks like are still being written by the market. When all three are true at the same time, that is the window.

2022

AI coding tools enter the mainstream for the first time

GitHub Copilot launches publicly. ChatGPT arrives in November. For the first time in history, AI-generated code becomes a realistic part of everyday developer workflow at meaningful scale. The tools are exciting. Engineers begin using them for the productivity gains. The specific failure modes that emerge at scale are not yet understood, and nobody is thinking about what happens when AI-generated code starts reaching production in volume.

2023 to 2024

Adoption accelerates and reaches developers at every skill level

Cursor, Lovable, Bolt, and Replit launch and gain significant traction. These vibe coding platforms (platforms that let users describe what they want in plain language and generate entire working applications from that description) lower the barrier to contributing code so far that non-engineers begin pushing code directly to repositories. AI-assisted development goes from a productivity trick used by experienced engineers to the default workflow for developers at every skill level. The proportion of AI-generated code in production codebases grows rapidly, and most engineering teams have no reliable way to measure it or review it differently.

2025

The trust gap becomes painful and well-funded

Engineering teams begin experiencing the failure modes at scale. AI-generated code introduces subtle bugs that pass code review. Security incidents linked to AI-generated vulnerabilities start appearing in the industry. In January 2025, CodeRabbit, the largest AI code review platform, is found to have a critical vulnerability where its own AI flagged a malicious instruction embedded in a PR review comment and then executed it, exposing over one million connected repositories. CodeRabbit raises $88M at a $550M valuation in September 2025. The market is actively looking for a more serious solution and capital is available to buy it.

2025 to 2026

The category is being defined right now, in real time

This is the window. The pain is established. Engineering budgets for developer tooling and security tooling are approved. The buying criteria for what "AI-aware code review" actually means are still being written by customers making their first purchasing decisions. The companies that build the right solution during this window will set the standard that everyone else has to compete against for the next five years. First-mover advantage in a category that is still forming is one of the most durable positions in enterprise software.

2026 onward

Vibe coding platforms become a primary source of production code

Lovable, Bolt, Cursor, and Replit will account for a large and growing share of new code contributions to both private and open source repositories within 18 months. Any review platform that lacks specific detection capabilities for what these tools produce will be operating blind on its fastest-growing input stream. Shipgate's integrations with vibe coding platforms, built now while the market is forming, will become table stakes that competitors will scramble to build from a trailing position.

07

Who It's For

The people who buy Shipgate and the problems they are trying to solve

Shipgate is bought by engineering teams and organisations, not individual developers. The person who installs it is rarely the person writing the code. They are the person responsible for what happens when that code reaches production. This distinction is fundamental to how we design the product, how we write about it, and which features we build first.

Every feature decision at Shipgate should be evaluated through the lens of the person paying the invoice. That person owns a codebase they need to protect. They manage a team they need to keep safe. They have a security posture to maintain in front of customers and auditors. They are not looking for a smarter autocomplete tool. They are looking for a system that catches what humans miss, at scale, without requiring their team to slow down.

Primary Buyer

The Engineering Team Lead or CTO

Responsible for everything the team ships. Developers are using AI tools and moving fast. Manual reviews are getting harder to do thoroughly because the volume is high and AI-generated code is structurally harder to evaluate than code a human wrote with specific intent. They need a system they can trust to catch what the team misses, and they need clear visibility into where risk is accumulating across their repositories before something goes wrong.

What keeps them up at night
A PR with a SQL injection vulnerability slips through review and gets exploited in production. The postmortem asks why nobody caught it at review time, and there is no good answer.
Half the team is using Cursor and Lovable to generate code. Nobody knows how much of the new code is AI-generated or whether it requires a different kind of review attention to be safe.
There are 40 open PRs and no reliable way to prioritise which three genuinely need careful human attention and which 37 are low-risk enough to approve quickly.
A senior engineer with deep knowledge of the codebase is leaving. When they go, all the institutional understanding of how the system is wired together leaves with them.
Primary Buyer

The Open Source Maintainer

Maintains a public repository with dozens or hundreds of external contributors they do not know personally. Receiving more PRs than ever before, many generated by AI tools with widely varying quality and intent. No budget to hire dedicated reviewers. Needs automation that is smart enough to separate the small number of PRs requiring genuine attention from the large number that are safe to approve quickly, and to block the ones that are clearly not safe to merge at all.

What keeps them up at night
The majority of incoming PRs are AI-generated boilerplate that adds no meaningful value and consumes hours of review time every week that the maintainer does not have to spare.
A well-meaning contributor submitted a PR with a vulnerable dependency and it merged without anyone catching the CVE. A security researcher filed a report six months later.
No way to distinguish which contributors are consistently reliable from those who are generating noise. Every PR gets the same level of manual review by default, which is not sustainable.
Security vulnerabilities are being discovered in the project by external researchers that a pre-merge automated scan would have caught before they ever shipped.
Secondary Buyer

The Security or DevSecOps Engineer

Responsible for the security posture of engineering output across the entire organisation. Currently running security scans after deployment, which means finding vulnerabilities in code that is already live on production servers being accessed by real users. Looking for a way to shift security left, which is the industry term for moving security checks earlier in the development process, to the PR review stage rather than after code has shipped. Also responsible for producing evidence of security controls for compliance audits.

What they need from Shipgate
Automatic, consistent security checks on every PR without requiring developers to run anything manually or change their workflow.
Trend data showing security posture improvement over time, formatted for reporting upward to executives and to the board.
Exportable security reports they can submit during SOC 2 audits, ISO 27001 assessments, and enterprise procurement security reviews without spending days assembling the data manually.
Secondary Buyer

The Startup CTO Moving Fast

Running a small team using every AI tool available to ship features at maximum speed. Understands the team is accumulating technical debt and security risk but cannot afford to slow down for thorough manual review on every change. Needs a safety net that reliably catches the genuinely dangerous issues without adding friction to the parts of the development process that are working well and moving fast.

What they need from Shipgate
Analysis that completes in under 10 seconds on changed files. Any slower and engineers find ways to route around the check to meet their deadlines.
Low noise output. Surface only the issues that genuinely matter. A false positive rate above 10 percent erodes trust in the tool and leads to the findings being ignored.
Actionable, specific fixes attached to every finding. Not "this is a security problem" but "here is the exact code change to fix it, click to apply."

Who Shipgate is not for: Individual developers looking for a personal productivity tool or AI autocomplete. Teams whose primary pain is formatting inconsistency or style enforcement rather than security and risk. There are many tools for those use cases. Shipgate exists to protect codebases at the organisational level, for the people who own and are responsible for what ships, not the people who submit the changes.

08

Product Principles

The values that shape every decision, feature, and tradeoff

These are not aspirational values on a poster. They are active constraints. When a feature decision is unclear or two priorities conflict, these principles are what resolve the disagreement. Every person on the Shipgate team should be able to cite them and apply them to their own work.

Principle 01

Fast or useless

If PR analysis takes longer than 10 seconds on changed files, developers route around it. Speed is not a feature. It is the precondition for everything else we do. We run analysis only on changed files. We parallelise every check that can be parallelised. We cache the repository index aggressively. We never sacrifice speed for thoroughness because we are committed to achieving both together.

Principle 02

High signal, low noise

A tool that raises too many false alarms gets turned off quickly. Every finding Shipgate surfaces must be a genuine, confirmed issue with a clear explanation of why it matters and a suggested path to fixing it. A false positive rate above 10 percent is a product failure, not an acceptable tradeoff. We would rather surface 5 real issues than 50 uncertain ones.

Principle 03

Actionable always

Every finding Shipgate raises includes what the problem is, why it matters in plain language a non-security expert can act on, and exactly how to fix it with a code suggestion attached. Surfacing a problem without a path to resolution is not a helpful feature. It is a source of frustration that erodes trust and leads reviewers to dismiss findings. We do not ship findings without fixes.

Principle 04

Enforce, do not suggest

A security check that can be ignored when a sprint deadline arrives is not a security check. It is advice. Shipgate integrates as a required status check and blocks the merge button. This is intentional and non-negotiable. Organisations pay Shipgate for enforcement. The value disappears the moment it becomes optional.

Principle 05

Built from scratch for the AI era

Every other product in this space was designed before AI-generated code was a meaningful proportion of what ships. We are not retrofitting slop detection onto a legacy architecture. The reality that a significant portion of submitted code was generated by an AI tool is the starting assumption of every design decision we make. This is not a feature we added. It is the lens through which the entire product was conceived.

Principle 06

The maintainer is the customer

We build for the people who own and protect codebases, not the people who contribute to them. Features that reduce maintainer burden ship first. Features that only benefit contributors ship when they also make maintainers' lives easier. When any decision is unclear, ask the question: does this make the person responsible for the codebase more effective at their job?

A note for every new team member: This handbook will evolve as the product evolves and the market evolves. What will not change is the foundational insight that created Shipgate: AI coding tools are making it far easier to push code that looks correct but is not, and nobody has built a serious, purpose-built system for catching that at the pull request layer before it reaches production. Everything Shipgate does flows from solving that problem completely, for the people whose job it is to ensure what ships is safe. Welcome to the team.