Gregor XSLT

A compiler-oriented XSLT tool designed to make transformation pipelines faster, more predictable, and easier to debug at scale.

Gregor originated from a straightforward problem that turns up in any organization processing XML at volume: interpreted XSLT execution is too slow for batch workloads, too opaque for debugging, and too fragile for pipelines that need deterministic behavior across environments. In this document I explain what Gregor addresses, how it fits into transformation architectures, and where its design decisions reflect lessons from years of working with XSLT processors that were not built for the workloads they eventually inherited. For related context, the XSLT transformation workflows reference covers the broader landscape, and the performance results present comparative timing data.

The Problem Gregor Addresses

Most XSLT processors are interpreters. They read a stylesheet, parse it into an internal representation, and walk the input document tree evaluating template rules at runtime. This approach is flexible and standards-compliant. It is also slow when you need to run the same stylesheet against thousands of documents per hour.

The performance cost comes from repeated work. Each execution re-evaluates template match priorities, re-resolves XPath expressions, and re-allocates intermediate data structures. For a single document, the overhead is negligible. For batch processing, it accumulates.

Gregor takes a compiler approach. Instead of interpreting the stylesheet at runtime, it analyzes the template structure ahead of time and produces an optimized execution plan. Template matching decisions are resolved during compilation. XPath expressions are simplified where possible. The result is a transformation process that does less redundant work on each execution.

This is not a new idea. The XSLTC compiler explored similar territory by compiling stylesheets to Java bytecode. Gregor builds on that lineage with additional focus on debugging transparency and integration flexibility.

Architecture and Workflow

Gregor’s processing model has three stages:

Analysis. The input stylesheet is parsed and analyzed. Template match patterns are catalogued. Import precedence is resolved. Default template rules are identified. XPath expressions are examined for common optimization opportunities such as constant folding, dead branch elimination, and predicate simplification.

Compilation. The analyzed stylesheet is converted into an optimized execution plan. This is not source-to-source translation. The compilation output is an internal representation that Gregor’s runtime engine consumes directly. The compilation step is designed to be a one-time cost that amortizes over many transformation executions.

Execution. The compiled stylesheet is applied to input documents. Template matching uses the pre-resolved priority tables from the compilation step. XPath evaluation uses the simplified expressions. The execution engine is deliberately simpler than a full XSLT interpreter because the complexity has been pushed into the compilation step.

This separation matters for batch processing. You compile once, then execute as many times as needed with minimal per-document overhead. For interactive or single-document scenarios, the compilation cost may not be worthwhile. The benchmarks page discusses where the crossover point typically falls.

Architecture Note The compilation step is intentionally conservative. Gregor will not apply an optimization unless it can prove the result is semantically identical to interpreted execution. Correctness takes precedence over speed in every case.

Performance and Implementation Notes

Compilation produces the largest gains on stylesheets with complex template matching logic. When a stylesheet has dozens of template rules with overlapping match patterns, the interpreted approach must evaluate priority rules at each node during tree traversal. The compiled approach resolves these priorities once and produces direct dispatch logic.

For simple stylesheets with a handful of templates, the gain is modest. The per-document overhead of interpretation is already low, and compilation cannot reduce it by much. The practical threshold is roughly this: if your stylesheet has more than 15 template rules and processes documents larger than 20 KB, compilation will show measurable improvement. Below that, the overhead of compilation may not justify the gains.

Memory usage during compilation is higher than during interpretation because the analyzer must hold the complete stylesheet representation in memory along with the optimization structures. For extremely large stylesheets (thousands of template rules), this can be significant. Execution memory usage is comparable to interpreted processing.

Error handling deserves mention. When a compiled transform encounters an error, the diagnostic information maps back to the original stylesheet source locations. This was a deliberate design requirement. Debugging compiled transforms should not be harder than debugging interpreted ones.

Performance Data In benchmark runs against medium-complexity stylesheets processing 50-100 KB documents, Gregor's compiled execution shows 3x to 7x throughput improvement over interpreted execution with the same stylesheet. The range depends on template matching complexity and XPath expression depth. See the performance results for detailed data.

Integration Patterns

Gregor is designed to integrate into existing XML processing pipelines rather than replace them. Common integration patterns include:

Batch preprocessing. Compile the stylesheet once at pipeline startup. Feed documents through the compiled transform in a processing loop. This is the highest-throughput pattern and the primary use case.

Hybrid pipelines. Use Gregor for the performance-critical transformation stages and a standard XSLT interpreter for stages where flexibility matters more than speed. Schema validation, enrichment, and post-processing steps often do not need compiled performance.

Development and production split. Develop and debug stylesheets with an interpreted processor for immediate feedback. Deploy with Gregor’s compiled execution for production throughput. The semantic equivalence guarantee means stylesheets that work correctly in an interpreter will produce identical output through Gregor.

Relationship to XSLTC

Gregor and XSLTC share a philosophical lineage. Both recognize that XSLT interpretation imposes unnecessary overhead for repetitive processing. The key differences are in scope and transparency.

XSLTC compiles to JVM bytecode, which means the compiled artifact is opaque. Debugging requires mapping bytecode execution back to stylesheet source, which is non-trivial. Gregor compiles to an internal representation that maintains explicit connections to the source stylesheet, making diagnostics more accessible.

XSLTC targets XSLT 1.0. Gregor’s design accommodates a wider range of XSLT features, though coverage of XSLT 2.0 and 3.0 constructs varies by release.

The XSLTC Story provides the historical and technical context for the compiler approach that Gregor extends.

Frequently Asked Questions

Does Gregor replace Saxon or Xalan?

No. Gregor is a compiled transformation engine that can complement interpreted processors. It is best suited for high-throughput batch scenarios where the same stylesheet is applied to many documents.

Which XSLT version does Gregor support?

Gregor's core compilation engine targets XSLT 1.0 patterns with selective support for 2.0 features. Full XSLT 3.0 support is outside the current scope. Check the documentation for specific feature coverage.

How does compilation time scale with stylesheet size?

Compilation time grows approximately linearly with the number of template rules. A stylesheet with 50 templates compiles in under a second on typical hardware. Stylesheets with several hundred templates may take a few seconds.

Can I use Gregor for single-document transformations?

You can, but the compilation overhead may not be justified. Gregor's advantage appears when the same compiled stylesheet processes many documents. For single-document use, an interpreted processor is more appropriate.

Gregor XSLT

The Problem Gregor Addresses

Architecture and Workflow

Performance and Implementation Notes

Integration Patterns

Relationship to XSLTC

Frequently Asked Questions

Related Reading