Skip to content

API Reference

xmldiffreport is a small, typed library. The high-level entry point is diff:

from xmldiffreport import diff

result = diff(["old.xml", "new.xml"], recipe="sitemap")   # a file, files, or dir(s)
print(result.render())          # Markdown — or result.render("html")
result.units                    # list[NodeDiff] — what differs
bool(result)                    # True if anything differs (handy for exit codes)

Lower-level pieces are also re-exported: load_recipe, parse_xml, gather_files, diff_sources, validate_recipe. To pick a format by name, use the renderer factory get_renderer / list_formats.

High-level

xmldiffreport.diff(paths, recipe='generic')

Diff a file, multiple files, and/or directories.

paths is a path or an iterable of paths (files and/or directories; directories are scanned recursively for *.xml). recipe is a built-in recipe name, a path to a .toml, or an already-loaded recipe dict. Returns a :class:DiffReport — call .render() on it, or iterate .units.

Engine

xmldiffreport.core

N-way, recursive, recipe-driven structural diff engine.

Model
  • Every XML element is a node with attributes, optional text, and children.
  • A recipe declares, per tag: the key (natural identity), whether the tag is inline (its children become pseudo-attributes instead of opening a new level), and which attributes to ignore.
  • N sources are compared at once, matching nodes by identity (order-independent). Only differences end up in the result.
Performance

Each file is parsed once into an in-memory tree (ElementTree); the diff cost is roughly linear in the number of nodes. For typical Control-M exports (a few MB) it is instant, and it is fine up to the order of tens of MB. It is not designed for gigabyte-scale files — we deliberately favour simple, maintainable code over incremental/streaming parsing.

NodeDiff dataclass

gather_files(paths)

Resolve files and/or directories into (label, path) sources.

A file is taken as-is; a directory is scanned recursively for *.xml. The label is the file path — the engine knows nothing about "environments".

load_recipe(name_or_path)

Load a built-in TOML recipe (by name) or one from a path.

parse_xml(path, strip_ns=True)

identity(recipe, tag, el)

Identity key of an element among its siblings, per the recipe (with fallbacks).

value_attrs(recipe, tag, el, ignore)

Comparable attributes of a leaf/inline node (excludes identity and volatile).

diff_group(recipe, tag, ident, nodes, ignore)

N-way diff of a group of nodes (one per source) sharing the same identity.

diff_sources(recipe, sources)

Diff N sources.

sources is a list of (label, root_element). Returns the list of units (NodeDiff) that differ across two or more sources. The label is just a display name (typically the file path) — no other meaning is attached.

validate_recipe(data)

Validate a parsed recipe dict; return a list of problems (empty = valid).

Report

xmldiffreport.report.base

Renderer strategy + registry (factory) for diff reports.

Adding a new output format is a single class:

from .base import DiffReport, Renderer, register

@register
class JsonRenderer(Renderer):
    format = "json"
    file_extension = "json"
    def render(self, report: DiffReport) -> str:
        ...

DiffReport dataclass

The result of a diff and everything a renderer needs to format it.

units are the NodeDiff objects that differ; sources are the labels (file paths) that were compared. source_display maps each label to the compact name shown in the report.

__bool__()

True if any unit differs (handy for exit codes).

render(fmt='md')

Render this report in the given format (default Markdown).

Renderer

Bases: ABC

Strategy: turn a :class:DiffReport into a document string.

register(cls)

Class decorator that registers a renderer by its format name.

get_renderer(fmt)

Instantiate the renderer registered for fmt (the factory).

list_formats()

All registered format names, sorted.