Benchmark Methodology

The benchmarks published on this site follow a specific methodology designed to produce numbers you can actually reason about. This note documents the approach so you can evaluate the results in context and apply similar practices to your own performance testing.

Test Environment

All benchmarks run on dedicated hardware. No shared cloud instances. No other workloads during test execution. The specific hardware configuration is documented alongside each result set, but the principle is consistent: eliminate environmental variance so that timing differences reflect processor and stylesheet behavior rather than resource contention.

JVM-based processors run on a pinned JDK version. We currently standardize on OpenJDK 17.0.9. Changing JDK versions can alter performance by 10-20%, so pinning is essential for reproducible results.

Warmup Protocol

JVM-based XSLT processors exhibit significantly different performance during initial executions compared to steady state. Class loading, JIT compilation, and memory allocation patterns all stabilize after a warmup period.

Our standard warmup protocol: run 50 transformation iterations and discard the results. Begin measurement on iteration 51. This is conservative. For most workloads, the JVM stabilizes within 20-30 iterations. The extra margin ensures that even complex stylesheet compilation and optimization is complete before measurement begins.

For browser-based benchmarks, warmup is shorter (10 iterations) because browser JavaScript engines stabilize faster for XSLT workloads.

Measurement and Reporting

Each measured run consists of 200 iterations. We record the transformation time for each iteration (wall clock time for the transform call only, excluding I/O).

We report three statistics:

Median: The middle value. Less sensitive to outliers than the mean.
P95: The 95th percentile. 95% of iterations completed within this time.
P99: The 99th percentile. Relevant for latency-sensitive applications where tail behavior matters.

We do not report mean values because garbage collection pauses and occasional JIT recompilation produce outliers that inflate the mean beyond what is representative of typical execution.

Stylesheet Categories

Benchmarks use three stylesheet complexity categories to cover the range of real-world workloads:

Simple. An identity transform with minor element filtering. 5 template rules. Tests baseline processing overhead.

Moderate. Document restructuring with conditional output, namespace handling, sorting, and multiple output modes. 20 template rules. Representative of typical production workloads.

Complex. Multi-stage transformation with recursive processing, complex XPath predicates, grouping, and output formatting. 60+ template rules. Representative of demanding workloads like publishing or UBL invoice rendering.

Input Document Categories

Three size categories:

Small: 2-5 KB. Flat structure, minimal nesting.
Medium: 20-80 KB. Moderate nesting, mixed content.
Large: 500 KB-2 MB. Deep nesting, extensive namespace usage.

Input documents are synthetic but structurally representative of real document patterns encountered in XML processing workflows.

Limitations

These benchmarks measure transformation engine performance in isolation. Real-world performance includes I/O, validation, serialization, and pipeline orchestration overhead that is not captured here. Use these numbers for engine comparison and optimization targeting, not for capacity planning.

For detailed results, see the performance results page. For interpretation guidance, see the benchmarks overview.