Skip to main content
Version: 5.1.2

TPC-H

TPC-H is Stroppy's analytical workload. It bulk-loads the eight TPC-H tables and runs the 22 decision-support queries once.

It simulates a decision-support system that scans orders, line items, customers, suppliers, parts, and geography data to answer business questions. The workload is read-heavy after loading and stresses query planning, joins, aggregation, sorting, filtering, and date/range predicates.

Use TPC-H to evaluate:

  • Bulk load throughput for large relational datasets.
  • Scan bandwidth and predicate filtering efficiency.
  • Join planning and join execution across fact and dimension tables.
  • Aggregation, grouping, sorting, and top-N performance.
  • Index usefulness for analytical SQL.
  • Memory pressure, temporary files, and spill behavior on large queries.
  • Planner stability across a varied 22-query suite.

Script

TPC-H has one script: tpch/tx. It supports PostgreSQL, MySQL, Picodata, and YDB through dialect-specific SQL files.

stroppy run tpch/tx -d pg -D url=postgres://user:pass@host:5432/bench -e scale_factor=0.01
stroppy run tpch/tx -d mysql -D url=root:pass@tcp(host:3306)/bench -e scale_factor=0.01
stroppy run tpch/tx -d pg -e scale_factor=1

Use SCALE_FACTOR=0.01 for smoke tests. Use SCALE_FACTOR=1 when you want PostgreSQL answer validation against the embedded SF=1 reference answers.

Load Type

TPC-H is a bulk-load plus query-suite workload, not a sustained per-iteration transaction loop. The setup phase:

  1. Creates schema.
  2. Loads all tables through InsertSpec.
  3. Creates indexes and finalizes derived totals.
  4. Executes Q1 through Q22 once.
  5. Validates answers only for PostgreSQL at SCALE_FACTOR=1.

The exported default() function is intentionally empty; the work happens in setup steps.

This makes TPC-H a single-pass analytical benchmark rather than a concurrency benchmark. Increase SCALE_FACTOR to increase data volume, and use database-side monitoring to inspect CPU, I/O, memory, temp-space, and parallel worker behavior during the query phase.

Data Model

For scale factor SF, Stroppy loads:

TableRows
region5
nation25
partfloor(200,000 * SF), minimum 1
supplierfloor(10,000 * SF), minimum 1
partsupp4 * part
customerfloor(150,000 * SF), minimum 1
ordersfloor(1,500,000 * SF), minimum 1
lineitemUniform 1..7 rows per order, about 4 * orders

orders.o_orderkey follows the TPC-H sparse-key scheme. orders.o_totalprice is recomputed after lineitems are loaded because it depends on lineitem totals.

Steps

StepDescription
drop_schemaRemoves existing TPC-H tables so the next run starts from a clean dataset.
create_schemaExecutes the dialect DDL and creates region, nation, part, supplier, partsupp, customer, orders, and lineitem. PostgreSQL creates these as UNLOGGED first for faster bulk load.
load_dataBulk-loads all eight tables through InsertSpec. The generated orders.o_totalprice is a placeholder at this point because the final value depends on lineitems that are loaded later.
set_loggedPostgreSQL-only durability and planner-prep step: changes loaded tables from UNLOGGED to LOGGED and runs ANALYZE. Other dialects leave this step empty or absent.
create_indexesCreates query-support indexes from the selected SQL file, including join keys and date/order keys used by the 22 query templates.
finalize_totalsRecomputes orders.o_totalprice after lineitem exists using the TPC-H formula: sum of l_extendedprice * (1 + l_tax) * (1 - l_discount) per order. This must happen after load; Picodata keeps this as a no-op because its SQL layer does not support the required correlated update shape.
queriesExecutes Q1 through Q22 once with pinned TPC-H parameters, logs per-query duration, and continues through the suite even if an individual query reports an error.
validate_answersRuns correctness comparison only for PostgreSQL at SCALE_FACTOR=1, where embedded reference answers are available. Other scale factors and drivers log a skip instead of comparing incompatible result sets.
workloadMarks the k6 workload phase. TPC-H has an intentionally empty default() function, so no additional query loop runs after setup steps finish.

Examples:

stroppy run tpch/tx --steps drop_schema,create_schema,load_data
stroppy run tpch/tx --no-steps drop_schema,create_schema,load_data

SQL Files

When SQL_FILE is not set, tpch/tx chooses a dialect file from the active driver type:

Driver typeSQL file
postgrestpch/pg.sql
mysqltpch/mysql.sql
picodatatpch/pico.sql
ydbtpch/ydb.sql

Pass an explicit SQL file to force a variant:

stroppy run tpch/tx tpch/pg -d pg
stroppy run tpch/tx tpch/ydb -d ydb

Parameters

Workload parameters are passed as environment variables with -e KEY=VALUE. Keys are case-insensitive in the CLI.

ParameterDefaultDescription
POOL_SIZE50Sets both minConns and maxConns for the primary driver pool.
SCALE_FACTOR1TPC-H scale factor. Must be a positive number. 0.01 is supported for smoke tests.
LOAD_WORKERS0Parallel InsertSpec workers during load_data. 0 lets the framework choose.
SQL_FILEautoSQL file path. Defaults to a dialect file based on driverType.
TX_ISOLATIONautoIsolation override kept for driver symmetry. Defaults are read_committed for PostgreSQL/MySQL, none for Picodata, and serializable for YDB.
TPCH_QUERY_WARN_MS60000During answer validation, logs a warning when a query takes longer than this many milliseconds.

Common runner controls also apply: --steps, --no-steps, and -d/-D driver options.

Metrics

TPC-H does not define custom k6 Trend or Counter metrics per query. Query timings are printed to stdout as each query finishes:

[tpch] q1: ok in 1234ms

For PostgreSQL at SCALE_FACTOR=1, validate_answers prints a validation summary against answers_sf1.json, including OK, DIFF, SKIP, and ERROR rows per query.

Stroppy still emits common driver metrics during load and query execution:

MetricTypeDescription
insert_durationTrendInsertSpec/bulk-load duration samples. Useful for load throughput analysis.
insert_error_rateRateInsert failures during data loading.
run_query_countCounterSQL statements executed by the driver.
run_query_durationTrendSQL execution latency samples. Query samples are tagged by the active step.
run_query_error_rateRateQuery execution failure rate.

Use the per-query stdout timings for Q1..Q22 latency comparisons. Use the common driver metrics for aggregate load/query behavior and error rates.

Thresholds

TPC-H does not define k6 thresholds by default. The run is not auto-failed by a slow query threshold.

TPCH_QUERY_WARN_MS is a warning limit used during answer validation. If a validation query takes longer than the configured number of milliseconds, Stroppy logs a slow-query warning; it does not by itself fail the run.

Answer validation is only enabled for PostgreSQL at SCALE_FACTOR=1. Mismatches and query errors are reported in the validation summary so you can compare correctness across runs and dialect changes.

Query Parameters

TPC-H query parameters are pinned in tpch/tx; they are not exposed as environment overrides.

QueryPinned parameters
Q1delta=90
Q2size=15, type=BRASS, region=EUROPE
Q3segment=BUILDING, date=1995-03-15
Q4date=1993-07-01
Q5region=ASIA, date=1994-01-01
Q6date=1994-01-01, discount=0.06, quantity=24
Q7nation1=FRANCE, nation2=GERMANY
Q8region=AMERICA, nation=BRAZIL, type=ECONOMY ANODIZED STEEL
Q9color=green
Q10date=1993-10-01
Q11nation=GERMANY, fraction=0.0001 / SCALE_FACTOR
Q12shipmode1=MAIL, shipmode2=SHIP, date=1994-01-01
Q13word1=special, word2=requests
Q14date=1995-09-01
Q15date=1996-01-01
Q16brand=Brand#45, type_prefix=MEDIUM POLISHED, sizes 49,14,23,45,19,3,36,9
Q17brand=Brand#23, container=MED BOX
Q18quantity=300
Q19brand1=Brand#12, brand2=Brand#23, brand3=Brand#34, quantities 1,10,20
Q20color=forest, nation=CANADA, date=1994-01-01
Q21nation=SAUDI ARABIA
Q22country codes 13,31,23,29,30,18,17