Version: 5.1.2

TPC-H

TPC-H is Stroppy's analytical workload. It bulk-loads the eight TPC-H tables and runs the 22 decision-support queries once.

It simulates a decision-support system that scans orders, line items, customers, suppliers, parts, and geography data to answer business questions. The workload is read-heavy after loading and stresses query planning, joins, aggregation, sorting, filtering, and date/range predicates.

Use TPC-H to evaluate:

Bulk load throughput for large relational datasets.
Scan bandwidth and predicate filtering efficiency.
Join planning and join execution across fact and dimension tables.
Aggregation, grouping, sorting, and top-N performance.
Index usefulness for analytical SQL.
Memory pressure, temporary files, and spill behavior on large queries.
Planner stability across a varied 22-query suite.

Script

TPC-H has one script: tpch/tx. It supports PostgreSQL, MySQL, Picodata, and YDB through dialect-specific SQL files.

stroppy run tpch/tx -d pg -D url=postgres://user:pass@host:5432/bench -e scale_factor=0.01
stroppy run tpch/tx -d mysql -D url=root:pass@tcp(host:3306)/bench -e scale_factor=0.01
stroppy run tpch/tx -d pg -e scale_factor=1

Use SCALE_FACTOR=0.01 for smoke tests. Use SCALE_FACTOR=1 when you want PostgreSQL answer validation against the embedded SF=1 reference answers.

Load Type

TPC-H is a bulk-load plus query-suite workload, not a sustained per-iteration transaction loop. The setup phase:

Creates schema.
Loads all tables through InsertSpec.
Creates indexes and finalizes derived totals.
Executes Q1 through Q22 once.
Validates answers only for PostgreSQL at SCALE_FACTOR=1.

The exported default() function is intentionally empty; the work happens in setup steps.

This makes TPC-H a single-pass analytical benchmark rather than a concurrency benchmark. Increase SCALE_FACTOR to increase data volume, and use database-side monitoring to inspect CPU, I/O, memory, temp-space, and parallel worker behavior during the query phase.

Data Model

For scale factor SF, Stroppy loads:

Table	Rows
`region`	`5`
`nation`	`25`
`part`	`floor(200,000 * SF)`, minimum 1
`supplier`	`floor(10,000 * SF)`, minimum 1
`partsupp`	`4 * part`
`customer`	`floor(150,000 * SF)`, minimum 1
`orders`	`floor(1,500,000 * SF)`, minimum 1
`lineitem`	Uniform 1..7 rows per order, about `4 * orders`

orders.o_orderkey follows the TPC-H sparse-key scheme. orders.o_totalprice is recomputed after lineitems are loaded because it depends on lineitem totals.

Steps

Step	Description
`drop_schema`	Removes existing TPC-H tables so the next run starts from a clean dataset.
`create_schema`	Executes the dialect DDL and creates `region`, `nation`, `part`, `supplier`, `partsupp`, `customer`, `orders`, and `lineitem`. PostgreSQL creates these as `UNLOGGED` first for faster bulk load.
`load_data`	Bulk-loads all eight tables through InsertSpec. The generated `orders.o_totalprice` is a placeholder at this point because the final value depends on lineitems that are loaded later.
`set_logged`	PostgreSQL-only durability and planner-prep step: changes loaded tables from `UNLOGGED` to `LOGGED` and runs `ANALYZE`. Other dialects leave this step empty or absent.
`create_indexes`	Creates query-support indexes from the selected SQL file, including join keys and date/order keys used by the 22 query templates.
`finalize_totals`	Recomputes `orders.o_totalprice` after `lineitem` exists using the TPC-H formula: sum of `l_extendedprice * (1 + l_tax) * (1 - l_discount)` per order. This must happen after load; Picodata keeps this as a no-op because its SQL layer does not support the required correlated update shape.
`queries`	Executes Q1 through Q22 once with pinned TPC-H parameters, logs per-query duration, and continues through the suite even if an individual query reports an error.
`validate_answers`	Runs correctness comparison only for PostgreSQL at `SCALE_FACTOR=1`, where embedded reference answers are available. Other scale factors and drivers log a skip instead of comparing incompatible result sets.
`workload`	Marks the k6 workload phase. TPC-H has an intentionally empty `default()` function, so no additional query loop runs after setup steps finish.

Examples:

stroppy run tpch/tx --steps drop_schema,create_schema,load_data
stroppy run tpch/tx --no-steps drop_schema,create_schema,load_data

SQL Files

When SQL_FILE is not set, tpch/tx chooses a dialect file from the active driver type:

Driver type	SQL file
`postgres`	`tpch/pg.sql`
`mysql`	`tpch/mysql.sql`
`picodata`	`tpch/pico.sql`
`ydb`	`tpch/ydb.sql`

Pass an explicit SQL file to force a variant:

stroppy run tpch/tx tpch/pg -d pg
stroppy run tpch/tx tpch/ydb -d ydb

Parameters

Workload parameters are passed as environment variables with -e KEY=VALUE. Keys are case-insensitive in the CLI.

Parameter	Default	Description
`POOL_SIZE`	`50`	Sets both `minConns` and `maxConns` for the primary driver pool.
`SCALE_FACTOR`	`1`	TPC-H scale factor. Must be a positive number. `0.01` is supported for smoke tests.
`LOAD_WORKERS`	`0`	Parallel InsertSpec workers during `load_data`. `0` lets the framework choose.
`SQL_FILE`	auto	SQL file path. Defaults to a dialect file based on `driverType`.
`TX_ISOLATION`	auto	Isolation override kept for driver symmetry. Defaults are `read_committed` for PostgreSQL/MySQL, `none` for Picodata, and `serializable` for YDB.
`TPCH_QUERY_WARN_MS`	`60000`	During answer validation, logs a warning when a query takes longer than this many milliseconds.

Common runner controls also apply: --steps, --no-steps, and -d/-D driver options.

Metrics

TPC-H does not define custom k6 Trend or Counter metrics per query. Query timings are printed to stdout as each query finishes:

[tpch] q1: ok in 1234ms

For PostgreSQL at SCALE_FACTOR=1, validate_answers prints a validation summary against answers_sf1.json, including OK, DIFF, SKIP, and ERROR rows per query.

Stroppy still emits common driver metrics during load and query execution:

Metric	Type	Description
`insert_duration`	Trend	InsertSpec/bulk-load duration samples. Useful for load throughput analysis.
`insert_error_rate`	Rate	Insert failures during data loading.
`run_query_count`	Counter	SQL statements executed by the driver.
`run_query_duration`	Trend	SQL execution latency samples. Query samples are tagged by the active step.
`run_query_error_rate`	Rate	Query execution failure rate.

Use the per-query stdout timings for Q1..Q22 latency comparisons. Use the common driver metrics for aggregate load/query behavior and error rates.

Thresholds

TPC-H does not define k6 thresholds by default. The run is not auto-failed by a slow query threshold.

TPCH_QUERY_WARN_MS is a warning limit used during answer validation. If a validation query takes longer than the configured number of milliseconds, Stroppy logs a slow-query warning; it does not by itself fail the run.

Answer validation is only enabled for PostgreSQL at SCALE_FACTOR=1. Mismatches and query errors are reported in the validation summary so you can compare correctness across runs and dialect changes.

Query Parameters

TPC-H query parameters are pinned in tpch/tx; they are not exposed as environment overrides.

Query	Pinned parameters
Q1	`delta=90`
Q2	`size=15`, `type=BRASS`, `region=EUROPE`
Q3	`segment=BUILDING`, `date=1995-03-15`
Q4	`date=1993-07-01`
Q5	`region=ASIA`, `date=1994-01-01`
Q6	`date=1994-01-01`, `discount=0.06`, `quantity=24`
Q7	`nation1=FRANCE`, `nation2=GERMANY`
Q8	`region=AMERICA`, `nation=BRAZIL`, `type=ECONOMY ANODIZED STEEL`
Q9	`color=green`
Q10	`date=1993-10-01`
Q11	`nation=GERMANY`, `fraction=0.0001 / SCALE_FACTOR`
Q12	`shipmode1=MAIL`, `shipmode2=SHIP`, `date=1994-01-01`
Q13	`word1=special`, `word2=requests`
Q14	`date=1995-09-01`
Q15	`date=1996-01-01`
Q16	`brand=Brand#45`, `type_prefix=MEDIUM POLISHED`, sizes `49,14,23,45,19,3,36,9`
Q17	`brand=Brand#23`, `container=MED BOX`
Q18	`quantity=300`
Q19	`brand1=Brand#12`, `brand2=Brand#23`, `brand3=Brand#34`, quantities `1,10,20`
Q20	`color=forest`, `nation=CANADA`, `date=1994-01-01`
Q21	`nation=SAUDI ARABIA`
Q22	country codes `13,31,23,29,30,18,17`

Script​

Load Type​

Data Model​

Steps​

SQL Files​

Parameters​

Metrics​

Thresholds​

Query Parameters​