json_delta.py

This is the main “library” for json_delta’s functionality. If it gets any bigger, I’ll probably split it out into at least four separate modules. As it is, the functionality divides into four main headings: diff, patch, udiff and upatch, plus entry points for the above, utility functions common to all four, and basic shell functionality as documented below.

The library is kept as one module for ease of distribution: if you want to use it from another program, this file is all you need. The full package includes three CLI scripts that offer more versatile access to the functionality this module makes available. The json_diff and json_patch scripts have syntax and options that mimic standard diff(1) and patch(1) where it is appropriate to do so. Also, there is json_cat, a script I wrote for testing purposes which may prove useful.

Requires Python 2.7 or newer (including Python 3).

Main entry points

json_delta.diff(left_struc, right_struc, minimal=True, verbose=True, key=None)

Compose a sequence of diff stanzas sufficient to convert the structure left_struc into the structure right_struc. (The goal is to add ‘necessary and’ to ‘sufficient’ above!).

Flags:

verbose: if this is set True (the default), a line of compression statistics will be printed to stderr.

minimal: if True, the function will try harder to find the diff that encodes as the shortest possible JSON string, at the expense of a possible performance penalty (as alternatives are computed and compared).

The parameter key is present because this function is mutually recursive with needle_diff() and keyset_diff(). If set to a list, it will be prefixed to every keypath in the output.

json_delta.patch(struc, diff, in_place=True)

Apply the sequence of diff stanzas diff to the structure struc.

By default, this function modifies struc in place; set in_place to False to return a patched copy of struc instead.

json_delta.udiff(left, right, patch=None, indent=0, entry=True)

Render the difference between the structures left and right as a string in a fashion inspired by diff -u.

Generating a udiff is strictly slower than generating a normal diff, since the udiff is computed on the basis of a normal diff between left and right. If such a diff has already been computed (e.g. by calling diff()), pass it as the patch parameter.

json_delta.upatch(struc, udiff, reverse=False, in_place=True)

Apply a patch of the form output by udiff() to the structure struc.

By default, this function modifies struc in place; set in_place to False to return a patched copy of struc instead.

Utility functions

json_delta.compact_json_dumps(obj)

Compute the most compact possible JSON representation of the serializable object obj.

>>> test = {
...             'foo': 'bar',
...             'baz': 
...                ['quux', 'spam', 
...       'eggs']
... }
>>> compact_json_dumps(test)
'{"foo":"bar","baz":["quux","spam","eggs"]}'
>>>
json_delta.all_paths(struc)

Generate key-paths to every node in struc.

Both terminal and non-terminal nodes are visited, like so:

>>> paths = [x for x in all_paths({'foo': None, 'bar': ['baz', 'quux']})]
>>> [] in paths # ([] is the path to struc itself.)
True
>>> ['foo'] in paths
True
>>> ['bar'] in paths
True
>>> ['bar', 0] in paths
True
>>> ['bar', 1] in paths
True
>>> len(paths)
5
json_delta.follow_path(struc, path)

Return the value found at the key-path path within struc.

json_delta.in_one_level(diff, key)

Return the subset of diff whose key-paths begin with key, expressed relative to the structure at [key] (i.e. with the first element of each key-path stripped off).

>>> diff = [ [['bar'], None],
...          [['baz', 3], 'cheese'],
...          [['baz', 4, 'quux'], 'foo'] ]
>>> in_one_level(diff, 'baz')
[[[3], 'cheese'], [[4, 'quux'], 'foo']]
json_delta.check_diff_structure(diff)

Return diff (or True) if it is structured as a sequence of diff stanzas. Otherwise return False.

[] is a valid diff, so if it is passed to this function, the return value is True, so that the return value is always true in a Boolean context if diff is valid.

>>> check_diff_structure([])
True
>>> check_diff_structure([None])
False
>>> check_diff_structure([[["foo", 6, 12815316313, "bar"], None]])
[[['foo', 6, 12815316313, 'bar'], None]]
>>> check_diff_structure([[["foo", 6, 12815316313, "bar"], None],
...                       [["foo", False], True]])
False

Functions for computing normal diffs.

json_delta.commonality(left_struc, right_struc)

Return a float between 0.0 and 1.0 representing the amount that the structures left_struc and right_struc have in common.

If left_struc and right_struc are of the same type, this is computed as the fraction (elements in common) / (total elements). Otherwise, commonality is 0.0.

json_delta.compute_keysets(left_seq, right_seq)

Compare the keys of left_seq vs. right_seq.

Determines which keys left_seq and right_seq have in common, and which are unique to each of the structures. Arguments should be instances of the same basic type, which must be a non-terminal: i.e. list or dict. If they are lists, the keys compared will be integer indices.

Returns:
Return value is a 3-tuple of sets ({overlap}, {left_only}, {right_only}). As their names suggest, overlap is a set of keys left_seq have in common, left_only represents keys only found in left_seq, and right_only holds keys only found in right_seq.
Raises:
AssertionError if left_seq is not an instance of type(right_seq), or if they are not of a non-terminal type.
>>> compute_keysets({'foo': None}, {'bar': None})
(set([]), set(['foo']), set(['bar']))
>>> compute_keysets({'foo': None, 'baz': None}, {'bar': None, 'baz': None})
(set(['baz']), set(['foo']), set(['bar']))
>>> compute_keysets(['foo', 'baz'], ['bar', 'baz'])
(set([0, 1]), set([]), set([]))
>>> compute_keysets(['foo'], ['bar', 'baz'])
(set([0]), set([]), set([1]))
>>> compute_keysets([], ['bar', 'baz'])
(set([]), set([]), set([0, 1]))
json_delta.keyset_diff(left_struc, right_struc, key, minimal=True)

Return a diff between left_struc and right_struc.

It is assumed that left_struc and right_struc are both non-terminal types (serializable as arrays or objects). Sequences are treated just like mappings by this function, so the diffs will be correct but not necessarily minimal. For a minimal diff between two sequences, use needle_diff().

This function probably shouldn’t be called directly. Instead, use udiff(), which will call keyset_diff() if appropriate anyway.

json_delta.needle_penalty(left, right)

Penalty function for Needleman-Wunsch alignment.

json_delta.needle_diff(left_struc, right_struc, key, minimal=True)

Returns a diff between left_struc and right_struc.

If left_struc and right_struc are both serializable as arrays, this function will use Needleman-Wunsch sequence alignment to find a minimal diff between them. Otherwise, returns the same result as keyset_diff().

This function probably shouldn’t be called directly. Instead, use udiff(), which will call keyset_diff() if appropriate anyway.

json_delta.this_level_diff(left_struc, right_struc, key=None, common=None)

Return a sequence of diff stanzas between the structures left_struc and right_struc, assuming that they are each at the key-path key within the overall structure.

json_delta.split_deletions(diff)

Return a tuple of length 3, of which the first element is a list of stanzas from diff that modify objects (dicts when deserialized), the second is a list of stanzas that add or change elements in sequences, and the second is a list of stanzas which delete elements from sequences.

json_delta.sort_stanzas(diff)

Sort the stanzas in diff: node changes can occur in any order, but deletions from sequences have to happen last node first: [‘foo’, ‘bar’, ‘baz’] -> [‘foo’, ‘bar’] -> [‘foo’] -> [] and additions to sequences have to happen leftmost-node-first: [] -> [‘foo’] -> [‘foo’, ‘bar’] -> [‘foo’, ‘bar’, ‘baz’].

Note that this will also sort changes to objects (dicts) so that they occur first of all, then modifications/additions on sequences, followed by deletions from sequences.

Functions for applying normal patches

json_delta.patch_stanza(struc, diff)

Applies the diff stanza diff to the structure struc as a patch.

Note that this function modifies struc in-place into the target of diff.

Functions for computing udiffs.

The data structure representing a udiff that these functions all manipulate is a pair of lists of iterators (left_lines, right_lines). These lists are expected (principally by generate_udiff_lines(), which processes them), to be of the same length. A pair of iterators (left_lines[i], right_lines[i]) may yield exactly the same sequence of output lines, each with ' ' as the first character (representing parts of the structure the input and output have in common). Alternatively, they may each yield zero or more lines (referring to parts of the structure that are unique to the inputs they represent). In this case, all lines yielded by left_lines[i] should begin with '-', and all lines yielded by right_lines[i] should begin with '+'.

json_delta.patch_bands(indent, material, sigil=u' ')

Generate appropriately indented patch bands, with sigil as the first character.

json_delta.single_patch_band(indent, line, sigil=u' ')

Convenience function returning an iterable that generates a single patch band.

json_delta.commafy(gen, comma=True)

Yields every result of gen, with a comma concatenated onto the final result iff comma is True.

json_delta.add_matter(seq, matter, indent)

Add material to seq, treating it appropriately for its type.

matter may be an iterator, in which case it is appended to seq. If it is a sequence, it is assumed to be a sequence of iterators, the sequence is concatenated onto seq. If matter is a string, it is turned into a patch band using single_patch_band(), which is appended. Finally, if matter is None, an empty iterable is appended to seq.

This function is a udiff-forming primitive, called by more specific functions defined within udiff_dict() and udiff_list().

json_delta.udiff_dict(left, right, diff, indent=0)

Construct a pair of iterator sequences representing diff as a delta between dictionaries left and right.

This function probably shouldn’t be called directly. Instead, use udiff() with the same arguments. udiff() and udiff_dict() are mutually recursive, anyway.

json_delta.reconstruct_alignment(left, right, diff)

Reconstruct the sequence alignment between the lists left and right implied by diff.

json_delta.udiff_list(left, right, diff, indent=0)

Construct a pair of iterator sequences representing diff as a delta between the lists left and right.

This function probably shouldn’t be called directly. Instead, use udiff() with the same arguments. udiff() and udiff_list() are mutually recursive, anyway.

json_delta.generate_udiff_lines(left, right)

Combine the diff lines from left and right, and generate the lines of the resulting udiff.

Functions for applying udiffs as patches.

json_delta.extended_json_decoder()

Return a decoder for the superset of JSON understood by the upatch function.

The exact nature of this superset is documented in the manpage for json_patch(1). Briefly, the string ... is accepted where standard JSON expects a {<property name>}: {<object>} construction, and the string ..., optionally followed by a number in parentheses, is accepted where standard JSON expects an array element.

The superset is implemented by parsing ... in JSON objects as a key/value pair Ellipsis: True in the resulting dictionary, and ...({num}){?} as a subsequence of {num} Ellipsis objects, or one ellipsis object if ({num}) is not present.

Examples:

>>> dec = extended_json_decoder()
>>> (dec.decode('{"foo": "bar",...,"baz": "quux"}') ==
...  {"foo": "bar", "baz": "quux", Ellipsis: True})
True
>>> dec.decode('[...]')
[Ellipsis]
>>> dec.decode('["foo",...(3),"bar"]')
[u'foo', Ellipsis, Ellipsis, Ellipsis, u'bar']
json_delta.fill_in_ellipses(struc, left_part, right_part)

Replace Ellipsis objects parsed out of a udiff with elements from struc.

json_delta.split_udiff_parts(udiff)

Split udiff into parts representing subsets of the left and right structures that were used to create it.

The return value is a 2-tuple (left_json, right_json). If udiff is a string representing a valid udiff, then each member of the tuple will be a structure formatted in the superset of JSON that can be interpreted using extended_json_decoder().

The function itself works by stripping off the first character of every line in the udiff, and composing left_json out of those lines which begin with ' ' or '-', and right_json out of those lines which begin with ' ' or '+'.

>>> udiff = """--- <stdin>[0]
... +++ <stdin>[1]
...  {
...   "foo": "bar"
...   ...
...   "baz": 
... -  "quux",
... +  "quordle",
...  
... - "bar": null
...  
... + "quux": false
...  }"""
>>> left_json = """{
...  "foo": "bar"
...  ...
...  "baz": 
...   "quux",
...
...  "bar": null
...
... }"""
>>> right_json = """{
...  "foo": "bar"
...  ...
...  "baz": 
...   "quordle",
...
...
...  "quux": false
... }"""
>>> split_udiff_parts(udiff) == (left_json, right_json)
True

Basic script functionality

json_delta.py can be run as a script. The usage syntax is json_delta.py (diff | patch) [-u], and the script then runs as a filter: standard input is expected to read as a JSON structure [<arg1>, <arg2>]. The two arguments are fed to one of the functions diff(), patch(), udiff() or upatch(), as appropriate from the command-line arguments, and the result is written to standard output.

The package available from pypi also installs the scripts json_diff <json_diff.1> and json_patch <json_patch.1>, which make available a more versatile and useful command-line interface to json_delta. Their manpages are included in this documentation.