Referência API¶
O xmldiffreport é uma biblioteca pequena e tipada. O ponto de entrada de alto
nível é o diff:
from xmldiffreport import diff
result = diff(["old.xml", "new.xml"], recipe="sitemap") # um ficheiro, vários, ou dir(s)
print(result.render()) # Markdown — ou result.render("html")
result.units # list[NodeDiff] — o que difere
bool(result) # True se algo difere (útil para exit codes)
Peças de baixo nível também são reexportadas: load_recipe, parse_xml,
gather_files, diff_sources, validate_recipe. Para escolher o formato pelo
nome, usa a factory get_renderer / list_formats.
Alto nível¶
xmldiffreport.diff(paths, recipe='generic')
¶
Diff a file, multiple files, and/or directories.
paths is a path or an iterable of paths (files and/or directories;
directories are scanned recursively for *.xml). recipe is a built-in
recipe name, a path to a .toml, or an already-loaded recipe dict.
Returns a :class:DiffReport — call .render() on it, or iterate
.units.
Motor¶
xmldiffreport.core
¶
N-way, recursive, recipe-driven structural diff engine.
Model¶
- Every XML element is a node with attributes, optional text, and children.
- A recipe declares, per tag: the
key(natural identity), whether the tag isinline(its children become pseudo-attributes instead of opening a new level), and which attributes to ignore. - N sources are compared at once, matching nodes by identity (order-independent). Only differences end up in the result.
Performance¶
Each file is parsed once into an in-memory tree (ElementTree); the diff cost is roughly linear in the number of nodes. For typical Control-M exports (a few MB) it is instant, and it is fine up to the order of tens of MB. It is not designed for gigabyte-scale files — we deliberately favour simple, maintainable code over incremental/streaming parsing.
NodeDiff
dataclass
¶
gather_files(paths)
¶
Resolve files and/or directories into (label, path) sources.
A file is taken as-is; a directory is scanned recursively for *.xml.
The label is the file path — the engine knows nothing about "environments".
load_recipe(name_or_path)
¶
Load a built-in TOML recipe (by name) or one from a path.
parse_xml(path, strip_ns=True)
¶
identity(recipe, tag, el)
¶
Identity key of an element among its siblings, per the recipe (with fallbacks).
value_attrs(recipe, tag, el, ignore)
¶
Comparable attributes of a leaf/inline node (excludes identity and volatile).
diff_group(recipe, tag, ident, nodes, ignore)
¶
N-way diff of a group of nodes (one per source) sharing the same identity.
diff_sources(recipe, sources)
¶
Diff N sources.
sources is a list of (label, root_element). Returns the list of
units (NodeDiff) that differ across two or more sources. The label is
just a display name (typically the file path) — no other meaning is attached.
validate_recipe(data)
¶
Validate a parsed recipe dict; return a list of problems (empty = valid).
Relatório¶
xmldiffreport.report.base
¶
Renderer strategy + registry (factory) for diff reports.
Adding a new output format is a single class:
from .base import DiffReport, Renderer, register
@register
class JsonRenderer(Renderer):
format = "json"
file_extension = "json"
def render(self, report: DiffReport) -> str:
...
DiffReport
dataclass
¶
The result of a diff and everything a renderer needs to format it.
units are the NodeDiff objects that differ; sources are the
labels (file paths) that were compared. source_display maps each label
to the compact name shown in the report.
Renderer
¶
Bases: ABC
Strategy: turn a :class:DiffReport into a document string.
register(cls)
¶
Class decorator that registers a renderer by its format name.
get_renderer(fmt)
¶
Instantiate the renderer registered for fmt (the factory).
list_formats()
¶
All registered format names, sorted.