librsync  2.3.0
format.md
1 # File formats {#page_formats}
2 
3 ## Generalities
4 
5 There are two file formats used by `librsync` and `rdiff`: the
6 *signature* file, which summarizes a data file, and the *delta* file,
7 which describes the edits from one data file to another.
8 
9 librsync does not know or care about any formats in the data files.
10 
11 All integers are big-endian.
12 
13 ## Magic numbers
14 
15 All librsync files start with a `uint32` magic number identifying them.
16 These are declared in `librsync.h`:
17 
18 ```
19 /** A delta file. At present, there's only one delta format. **/
20 RS_DELTA_MAGIC = 0x72730236, /* r s \2 6 */
21 
22 /**
23  * A signature file with MD4 signatures. Backward compatible with
24  * librsync < 1.0, but strongly deprecated because it creates a security
25  * vulnerability on files containing partly untrusted data. See
26  * <https://github.com/librsync/librsync/issues/5>.
27  **/
28 RS_MD4_SIG_MAGIC = 0x72730136, /* r s \1 6 */
29 
30 /**
31  * A signature file using the BLAKE2 hash. Supported from librsync 1.0.
32  **/
33 RS_BLAKE2_SIG_MAGIC = 0x72730137 /* r s \1 7 */
34 ```
35 
36 ## Signatures
37 
38 Signatures consist of a header followed by a number of block
39 signatures.
40 
41 Each block signature gives signature hashes for one block of
42 `block_len` bytes from the input data file. The final data block
43 may be shorter. The number of blocks in the signature is therefore
44 
45  ceil(input_len/block_len)
46 
47 The signature header is (see `rs_sig_s_header`):
48 
49  u32 magic; // either RS_MD4_SIG_MAGIC or RS_BLAKE2_SIG_MAGIC
50  u32 block_len; // bytes per block
51  u32 strong_sum_len; // bytes per strong sum in each block
52 
53 The block signature contains a rolling or weak checksum used to find
54 moved data, and a strong hash used to check the match is correct.
55 The weak checksum is computed as in `rollsum.c`. The strong hash is
56 either MD4 or BLAKE2 depending on the magic number.
57 
58 To make the signatures smaller at a cost of a greater chance of collisions,
59 the `strong_sum_len` in the header can cause the strong sum to be truncated
60 to the left after computation.
61 
62 Each signature block format is (see `rs_sig_do_block`):
63 
64  u32 weak_sum;
65  u8[strong_sum_len] strong_sum;
66 
67 ## Delta files
68 
69 Deltas consist of the delta magic constant `RS_DELTA_MAGIC` followed by a
70 series of commands. Commands tell the patch logic how to construct the result
71 file (new version) from the basis file (old version).
72 
73 There are three kinds of commands: the literal command, the copy command, and
74 the end command. A command consists of a single byte followed by zero or more
75 arguments. The number and size of the arguments are defined in `prototab.c`.
76 
77 A literal command describes data not present in the basis file. It has one
78 argument: `length`. The format is:
79 
80  u8 command; // in the range 0x41 through 0x44 inclusive
81  u8[arg1_len] length;
82  u8[length] data; // new data to append
83 
84 A copy command describes a range of data in the basis file. It has two
85 arguments: `start` and `length`. The format is:
86 
87  u8 command; // in the range 0x45 through 0x54 inclusive
88  u8[arg1_len] start; // offset in the basis to begin copying data
89  u8[arg2_len] length; // number of bytes to copy from the basis
90 
91 The end command indicates the end of the delta file. It consists of a single
92 null byte and has no arguments.
A signature file using the BLAKE2 hash.
Definition: librsync.h:89
A delta file.
Definition: librsync.h:71
A signature file with MD4 signatures.
Definition: librsync.h:82
static rs_result rs_sig_s_header(rs_job_t *)
State of trying to send the signature header.
Definition: mksum.c:50