Skip to main content

How We Catch Performance Regressions Before They Hit Production

· 4 min read

The Problem

In performance-critical systems, ideally we want to validate every source code change and make sure no performance regressions are introduced. Functional tests handle correctness, but performance is trickier — how do you know a change won't slow things down?

We worked with the OrioleDB maintainers to build a CI pipeline that runs TPC-C benchmarks on every PR and flags regressions before merge.

OrioleDB is a PostgreSQL extension that combines the advantages of both on-disk and in-memory storage engines, using PostgreSQL's pluggable storage to improve performance and reduce costs.

How It Works

Stroppy Action

The best place to detect regressions is where changes are introduced: the Pull Request.

Since OrioleDB lives on GitHub, we built stroppy-action to run Stroppy directly from GitHub Actions:

  - name: Run TPC-C benchmark
uses: stroppy-io/stroppy-action@main
with:
preset: tpcc
driver-url: ...
k6-args: ...
stroppy-args: ...

The Pipeline

We created .github/workflows/perf.yml that runs on every PR:

jobs:
setup:
name: Generate run matrix
runs-on: ubuntu-latest
steps:
- id: gen
env:
RUNS: ${{ inputs.bench_runs || '1' }}
WAREHOUSES: ${{ inputs.warehouses || '200' }}
run: |
echo "run-matrix=$(python3 -c "
import json, os
runs = int(os.environ['RUNS'])
whs = [s.strip() for s in os.environ['WAREHOUSES'].split(',')]
print(json.dumps({'run': list(range(1, runs+1)), 'warehouses': whs}))
")" >> $GITHUB_OUTPUT
...
bench-base:
name: "Bench base ${{ matrix.warehouses }}W #${{ matrix.run }}"
needs: setup
runs-on: perf-runner
steps:
- name: Run TPC-C benchmark
uses: stroppy-io/stroppy-action@main
with:
preset: tpcc
driver-url: postgres://${{ env.PGUSER }}@localhost:5432/postgres?sslmode=disable
artifact-name: perf-results-base-${{ matrix.warehouses }}W-${{ matrix.run }}
k6-args: '--out opentelemetry'
stroppy-args: ${{ env.DB_CACHED == 'true' && '--steps workload' || '' }}
...
bench-head:
name: "Bench head ${{ matrix.warehouses }}W #${{ matrix.run }}"
needs: setup
runs-on: perf-runner
steps:
- name: Run TPC-C benchmark
uses: stroppy-io/stroppy-action@main
with:
preset: tpcc
driver-url: postgres://${{ env.PGUSER }}@localhost:5432/postgres?sslmode=disable
artifact-name: perf-results-head-${{ matrix.warehouses }}W-${{ matrix.run }}
k6-args: '--out opentelemetry'
stroppy-args: ${{ env.DB_CACHED == 'true' && '--steps workload' || '' }}
...
compare:
name: Compare results
...

The pipeline benchmarks base and head, compares results, and posts a comment with the outcome.

Example workflow with performance tests running

Full source: orioledb/perf-test.yml.

Example PR comment with performance testing results

Runners

Testing is useless if it's not stable. We use two identical dedicated self-hosted runners to guarantee the same conditions for every run.

Self-hosted perf-runners in GitHub Actions

Monitoring

For each runner, we collect everything:

  • Full node_exporter and postgres_exporter metric sets
  • K6 and Stroppy metrics via OTEL
  • PostgreSQL logs

This lets us see how metrics change during tests and dig into regressions when they happen.

Stroppy benchmark dashboard in Grafana Node exporter dashboard: CPU, memory, disk, and network during the benchmark

Keeping It Fast

Small datasets don't give you meaningful performance data. We settled on:

  • 200 warehouses (~15GB)
  • 25 minutes test duration
  • 99 virtual users

Loading 200 warehouses takes ~30 minutes, so we cache the data on disk — cuts load time to ~5 minutes. With parallel execution on two machines, the full pipeline runs in ~30 minutes.

Try It Yourself