json_delta.py
This is the main “library” for json_delta’s functionality. If it gets
any bigger, I’ll probably split it out into at least four separate
modules. As it is, the functionality divides into four main headings:
diff, patch, udiff and upatch, plus entry points for the above,
utility functions common to all four, and basic shell functionality as
documented below.
The library is kept as one module for ease of distribution: if you
want to use it from another program, this file is all you need. The
full package includes three CLI scripts that offer more versatile
access to the functionality this module makes available. The
json_diff and json_patch scripts have syntax and options that mimic
standard diff(1) and patch(1) where it is appropriate to do so. Also,
there is json_cat, a script I wrote for testing purposes which may
prove useful.
Requires Python 2.7 or newer (including Python 3).
Main entry points
-
json_delta.diff(left_struc, right_struc, minimal=True, verbose=True, key=None)
Compose a sequence of diff stanzas sufficient to convert the
structure left_struc into the structure right_struc. (The
goal is to add ‘necessary and’ to ‘sufficient’ above!).
- Flags:
verbose: if this is set True (the default), a line of
compression statistics will be printed to stderr.
minimal: if True, the function will try harder to find
the diff that encodes as the shortest possible JSON string, at
the expense of a possible performance penalty (as alternatives
are computed and compared).
The parameter key is present because this function is mutually
recursive with needle_diff() and keyset_diff().
If set to a list, it will be prefixed to every keypath in the
output.
-
json_delta.patch(struc, diff, in_place=True)
Apply the sequence of diff stanzas diff to the structure
struc.
By default, this function modifies struc in place; set
in_place to False to return a patched copy of struc
instead.
-
json_delta.udiff(left, right, patch=None, indent=0, entry=True)
Render the difference between the structures left and
right as a string in a fashion inspired by diff -u.
Generating a udiff is strictly slower than generating a normal
diff, since the udiff is computed on the basis of a normal diff
between left and right. If such a diff has already been
computed (e.g. by calling diff()), pass it as the
patch parameter.
-
json_delta.upatch(struc, udiff, reverse=False, in_place=True)
Apply a patch of the form output by udiff() to the structure
struc.
By default, this function modifies struc in place; set
in_place to False to return a patched copy of struc
instead.
Utility functions
-
json_delta.compact_json_dumps(obj)
Compute the most compact possible JSON representation of the
serializable object obj.
>>> test = {
... 'foo': 'bar',
... 'baz':
... ['quux', 'spam',
... 'eggs']
... }
>>> compact_json_dumps(test)
'{"foo":"bar","baz":["quux","spam","eggs"]}'
>>>
-
json_delta.all_paths(struc)
Generate key-paths to every node in struc.
Both terminal and non-terminal nodes are visited, like so:
>>> paths = [x for x in all_paths({'foo': None, 'bar': ['baz', 'quux']})]
>>> [] in paths # ([] is the path to struc itself.)
True
>>> ['foo'] in paths
True
>>> ['bar'] in paths
True
>>> ['bar', 0] in paths
True
>>> ['bar', 1] in paths
True
>>> len(paths)
5
-
json_delta.follow_path(struc, path)
Return the value found at the key-path path within struc.
-
json_delta.in_one_level(diff, key)
Return the subset of diff whose key-paths begin with
key, expressed relative to the structure at [key]
(i.e. with the first element of each key-path stripped off).
>>> diff = [ [['bar'], None],
... [['baz', 3], 'cheese'],
... [['baz', 4, 'quux'], 'foo'] ]
>>> in_one_level(diff, 'baz')
[[[3], 'cheese'], [[4, 'quux'], 'foo']]
-
json_delta.check_diff_structure(diff)
Return diff (or True) if it is structured as a sequence
of diff stanzas. Otherwise return False.
[] is a valid diff, so if it is passed to this function, the
return value is True, so that the return value is always true
in a Boolean context if diff is valid.
>>> check_diff_structure([])
True
>>> check_diff_structure([None])
False
>>> check_diff_structure([[["foo", 6, 12815316313, "bar"], None]])
[[['foo', 6, 12815316313, 'bar'], None]]
>>> check_diff_structure([[["foo", 6, 12815316313, "bar"], None],
... [["foo", False], True]])
False
Functions for computing normal diffs.
-
json_delta.commonality(left_struc, right_struc)
Return a float between 0.0 and 1.0 representing the amount
that the structures left_struc and right_struc have in common.
If left_struc and right_struc are of the same type, this is
computed as the fraction (elements in common) / (total elements).
Otherwise, commonality is 0.0.
-
json_delta.compute_keysets(left_seq, right_seq)
Compare the keys of left_seq vs. right_seq.
Determines which keys left_seq and right_seq have in
common, and which are unique to each of the structures. Arguments
should be instances of the same basic type, which must be a
non-terminal: i.e. list or dict. If they are lists, the keys
compared will be integer indices.
- Returns:
- Return value is a 3-tuple of sets ({overlap}, {left_only},
{right_only}). As their names suggest, overlap is a set
of keys left_seq have in common, left_only represents
keys only found in left_seq, and right_only holds keys
only found in right_seq.
- Raises:
- AssertionError if left_seq is not an instance of
type(right_seq), or if they are not of a non-terminal
type.
>>> compute_keysets({'foo': None}, {'bar': None})
(set([]), set(['foo']), set(['bar']))
>>> compute_keysets({'foo': None, 'baz': None}, {'bar': None, 'baz': None})
(set(['baz']), set(['foo']), set(['bar']))
>>> compute_keysets(['foo', 'baz'], ['bar', 'baz'])
(set([0, 1]), set([]), set([]))
>>> compute_keysets(['foo'], ['bar', 'baz'])
(set([0]), set([]), set([1]))
>>> compute_keysets([], ['bar', 'baz'])
(set([]), set([]), set([0, 1]))
-
json_delta.keyset_diff(left_struc, right_struc, key, minimal=True)
Return a diff between left_struc and right_struc.
It is assumed that left_struc and right_struc are both
non-terminal types (serializable as arrays or objects). Sequences
are treated just like mappings by this function, so the diffs will
be correct but not necessarily minimal. For a minimal diff
between two sequences, use needle_diff().
This function probably shouldn’t be called directly. Instead, use
udiff(), which will call keyset_diff() if appropriate
anyway.
-
json_delta.needle_penalty(left, right)
Penalty function for Needleman-Wunsch alignment.
-
json_delta.needle_diff(left_struc, right_struc, key, minimal=True)
Returns a diff between left_struc and right_struc.
If left_struc and right_struc are both serializable as
arrays, this function will use Needleman-Wunsch sequence alignment
to find a minimal diff between them. Otherwise, returns the same
result as keyset_diff().
This function probably shouldn’t be called directly. Instead, use
udiff(), which will call keyset_diff() if appropriate
anyway.
-
json_delta.this_level_diff(left_struc, right_struc, key=None, common=None)
Return a sequence of diff stanzas between the structures
left_struc and right_struc, assuming that they are each at the
key-path key within the overall structure.
-
json_delta.split_deletions(diff)
Return a tuple of length 3, of which the first element is a
list of stanzas from diff that modify objects (dicts when
deserialized), the second is a list of stanzas that add or change
elements in sequences, and the second is a list of stanzas which
delete elements from sequences.
-
json_delta.sort_stanzas(diff)
Sort the stanzas in diff: node changes can occur in any
order, but deletions from sequences have to happen last node
first: [‘foo’, ‘bar’, ‘baz’] -> [‘foo’, ‘bar’] -> [‘foo’] ->
[] and additions to sequences have to happen
leftmost-node-first: [] -> [‘foo’] -> [‘foo’, ‘bar’] ->
[‘foo’, ‘bar’, ‘baz’].
Note that this will also sort changes to objects (dicts) so that
they occur first of all, then modifications/additions on
sequences, followed by deletions from sequences.
Functions for applying normal patches
-
json_delta.patch_stanza(struc, diff)
Applies the diff stanza diff to the structure struc as
a patch.
Note that this function modifies struc in-place into the
target of diff.
Functions for computing udiffs.
The data structure representing a udiff that these functions all
manipulate is a pair of lists of iterators (left_lines,
right_lines). These lists are expected (principally by
generate_udiff_lines(), which processes them), to be of the
same length. A pair of iterators (left_lines[i], right_lines[i])
may yield exactly the same sequence of output lines, each with ' '
as the first character (representing parts of the structure the input
and output have in common). Alternatively, they may each yield zero
or more lines (referring to parts of the structure that are unique to
the inputs they represent). In this case, all lines yielded by
left_lines[i] should begin with '-', and all lines yielded by
right_lines[i] should begin with '+'.
-
json_delta.patch_bands(indent, material, sigil=u' ')
Generate appropriately indented patch bands, with sigil as
the first character.
-
json_delta.single_patch_band(indent, line, sigil=u' ')
Convenience function returning an iterable that generates a
single patch band.
-
json_delta.commafy(gen, comma=True)
Yields every result of gen, with a comma concatenated onto
the final result iff comma is True.
-
json_delta.add_matter(seq, matter, indent)
Add material to seq, treating it appropriately for its
type.
matter may be an iterator, in which case it is appended to
seq. If it is a sequence, it is assumed to be a sequence of
iterators, the sequence is concatenated onto seq. If
matter is a string, it is turned into a patch band using
single_patch_band(), which is appended. Finally, if
matter is None, an empty iterable is appended to seq.
This function is a udiff-forming primitive, called by more
specific functions defined within udiff_dict() and
udiff_list().
-
json_delta.udiff_dict(left, right, diff, indent=0)
Construct a pair of iterator sequences representing diff as
a delta between dictionaries left and right.
This function probably shouldn’t be called directly. Instead, use
udiff() with the same arguments. udiff()
and udiff_dict() are mutually recursive, anyway.
-
json_delta.reconstruct_alignment(left, right, diff)
Reconstruct the sequence alignment between the lists left
and right implied by diff.
-
json_delta.udiff_list(left, right, diff, indent=0)
Construct a pair of iterator sequences representing diff as
a delta between the lists left and right.
This function probably shouldn’t be called directly. Instead, use
udiff() with the same arguments. udiff()
and udiff_list() are mutually recursive, anyway.
-
json_delta.generate_udiff_lines(left, right)
Combine the diff lines from left and right, and
generate the lines of the resulting udiff.
Functions for applying udiffs as patches.
-
json_delta.extended_json_decoder()
Return a decoder for the superset of JSON understood by the
upatch function.
The exact nature of this superset is documented in the manpage for
json_patch(1). Briefly, the string ... is accepted where
standard JSON expects a {<property name>}: {<object>}
construction, and the string ..., optionally followed by a
number in parentheses, is accepted where standard JSON expects an
array element.
The superset is implemented by parsing ... in JSON objects as
a key/value pair Ellipsis: True in the resulting dictionary,
and ...({num}){?} as a subsequence of {num} Ellipsis
objects, or one ellipsis object if ({num}) is not present.
Examples:
>>> dec = extended_json_decoder()
>>> (dec.decode('{"foo": "bar",...,"baz": "quux"}') ==
... {"foo": "bar", "baz": "quux", Ellipsis: True})
True
>>> dec.decode('[...]')
[Ellipsis]
>>> dec.decode('["foo",...(3),"bar"]')
[u'foo', Ellipsis, Ellipsis, Ellipsis, u'bar']
-
json_delta.fill_in_ellipses(struc, left_part, right_part)
Replace Ellipsis objects parsed out of a udiff with elements
from struc.
-
json_delta.split_udiff_parts(udiff)
Split udiff into parts representing subsets of the left and
right structures that were used to create it.
The return value is a 2-tuple (left_json, right_json). If
udiff is a string representing a valid udiff, then each member of
the tuple will be a structure formatted in the superset of JSON that
can be interpreted using extended_json_decoder().
The function itself works by stripping off the first character of
every line in the udiff, and composing left_json out of those
lines which begin with ' ' or '-', and right_json out
of those lines which begin with ' ' or '+'.
>>> udiff = """--- <stdin>[0]
... +++ <stdin>[1]
... {
... "foo": "bar"
... ...
... "baz":
... - "quux",
... + "quordle",
...
... - "bar": null
...
... + "quux": false
... }"""
>>> left_json = """{
... "foo": "bar"
... ...
... "baz":
... "quux",
...
... "bar": null
...
... }"""
>>> right_json = """{
... "foo": "bar"
... ...
... "baz":
... "quordle",
...
...
... "quux": false
... }"""
>>> split_udiff_parts(udiff) == (left_json, right_json)
True
Basic script functionality
json_delta.py can be run as a script. The usage syntax is
json_delta.py (diff | patch) [-u], and the script then runs as a
filter: standard input is expected to read as a JSON structure
[<arg1>, <arg2>]. The two arguments are fed to one of the
functions diff(), patch(), udiff() or
upatch(), as appropriate from the command-line arguments, and
the result is written to standard output.
The package available from pypi also installs the scripts
json_diff <json_diff.1> and json_patch <json_patch.1>,
which make available a more versatile and useful command-line
interface to json_delta. Their manpages are included in this
documentation.