$treeview $search $mathjax $extrastylesheet
librsync
2.3.0
$projectbrief
|
$projectbrief
|
$searchbox |
00001 # File formats {#page_formats} 00002 00003 ## Generalities 00004 00005 There are two file formats used by `librsync` and `rdiff`: the 00006 *signature* file, which summarizes a data file, and the *delta* file, 00007 which describes the edits from one data file to another. 00008 00009 librsync does not know or care about any formats in the data files. 00010 00011 All integers are big-endian. 00012 00013 ## Magic numbers 00014 00015 All librsync files start with a `uint32` magic number identifying them. 00016 These are declared in `librsync.h`: 00017 00018 ``` 00019 /** A delta file. At present, there's only one delta format. **/ 00020 RS_DELTA_MAGIC = 0x72730236, /* r s \2 6 */ 00021 00022 /** 00023 * A signature file with MD4 signatures. Backward compatible with 00024 * librsync < 1.0, but strongly deprecated because it creates a security 00025 * vulnerability on files containing partly untrusted data. See 00026 * <https://github.com/librsync/librsync/issues/5>. 00027 **/ 00028 RS_MD4_SIG_MAGIC = 0x72730136, /* r s \1 6 */ 00029 00030 /** 00031 * A signature file using the BLAKE2 hash. Supported from librsync 1.0. 00032 **/ 00033 RS_BLAKE2_SIG_MAGIC = 0x72730137 /* r s \1 7 */ 00034 ``` 00035 00036 ## Signatures 00037 00038 Signatures consist of a header followed by a number of block 00039 signatures. 00040 00041 Each block signature gives signature hashes for one block of 00042 `block_len` bytes from the input data file. The final data block 00043 may be shorter. The number of blocks in the signature is therefore 00044 00045 ceil(input_len/block_len) 00046 00047 The signature header is (see `rs_sig_s_header`): 00048 00049 u32 magic; // either RS_MD4_SIG_MAGIC or RS_BLAKE2_SIG_MAGIC 00050 u32 block_len; // bytes per block 00051 u32 strong_sum_len; // bytes per strong sum in each block 00052 00053 The block signature contains a rolling or weak checksum used to find 00054 moved data, and a strong hash used to check the match is correct. 00055 The weak checksum is computed as in `rollsum.c`. The strong hash is 00056 either MD4 or BLAKE2 depending on the magic number. 00057 00058 To make the signatures smaller at a cost of a greater chance of collisions, 00059 the `strong_sum_len` in the header can cause the strong sum to be truncated 00060 to the left after computation. 00061 00062 Each signature block format is (see `rs_sig_do_block`): 00063 00064 u32 weak_sum; 00065 u8[strong_sum_len] strong_sum; 00066 00067 ## Delta files 00068 00069 Deltas consist of the delta magic constant `RS_DELTA_MAGIC` followed by a 00070 series of commands. Commands tell the patch logic how to construct the result 00071 file (new version) from the basis file (old version). 00072 00073 There are three kinds of commands: the literal command, the copy command, and 00074 the end command. A command consists of a single byte followed by zero or more 00075 arguments. The number and size of the arguments are defined in `prototab.c`. 00076 00077 A literal command describes data not present in the basis file. It has one 00078 argument: `length`. The format is: 00079 00080 u8 command; // in the range 0x41 through 0x44 inclusive 00081 u8[arg1_len] length; 00082 u8[length] data; // new data to append 00083 00084 A copy command describes a range of data in the basis file. It has two 00085 arguments: `start` and `length`. The format is: 00086 00087 u8 command; // in the range 0x45 through 0x54 inclusive 00088 u8[arg1_len] start; // offset in the basis to begin copying data 00089 u8[arg2_len] length; // number of bytes to copy from the basis 00090 00091 The end command indicates the end of the delta file. It consists of a single 00092 null byte and has no arguments.