1 # Streaming jobs {#api_streaming}
3 A key design requirement
for librsync is that it should handle data as
4 and when the hosting application requires it. librsync can be used
5 inside applications that
do non-blocking IO or filtering of network
6 streams, because it never does IO directly, or needs to block waiting
9 Arbitrary-length input and output buffers are passed to the
11 library proceeds as far as it can, and returns
an ::rs_result value
12 indicating whether it needs more data or space.
14 All the state needed by the library to resume processing when more
16 After creation of a job, repeated calls to
rs_job_iter() in between
17 filling and emptying the buffers keeps data flowing through the
18 stream. The ::
rs_result values returned may indicate
20 - ::
RS_DONE: processing is complete
21 - ::
RS_BLOCKED: processing has blocked pending more data
22 - one of various possible errors in processing (see ::
rs_result.)
24 These can be converted to a human-readable
string by
rs_strerror().
26 \note Smaller buffers have high relative handling costs. Application
27 performance will be improved by using buffers of at least 32kb or so
30 \sa \ref api_whole - Simpler but more limited interface than the streaming
33 \sa \ref api_pull - Intermediate-complexity callback interface.
35 \sa \ref api_callbacks - for reading from the basis file
36 when doing a "patch" operation.
41 All streaming librsync jobs are initiated
using a `_begin`
42 function to create
a ::rs_job_t object, passing in any necessary
43 initialization parameters. The various jobs available are:
47 -
rs_delta_begin(): Calculate the delta between a signature and a new
52 The patch job accepts the patch as input, and uses a callback to look up
53 blocks within the basis file.
55 You must configure read, write and basis callbacks after creating the
56 job but before it is run.
58 You can set job->sig_file_bytes to signature file size or
59 job->estimated_signature_count before running the job
60 if the signature file size (or the number of chunks) is known in advance.
61 If both are set, estimated_signature_count is used.
62 This will preallocate the needed memory for signature sums instead of
63 calling realloc for each block.
68 The work of the operation is done when the application calls
69 rs_job_iter(). This includes reading from input files via the callback,
70 running the rsync algorithms, and writing output.
72 The IO callbacks are only called from inside
rs_job_iter(). If any of
73 them return an error,
rs_job_iter() will generally return the same error.
75 When librsync needs to do input or output, it calls one of the callback
76 functions.
rs_job_iter() returns when the operation has completed or
77 failed, or when one of the IO callbacks has blocked.
79 rs_job_iter() will usually be called in a loop, perhaps alternating
80 librsync processing with other application functions.
85 A job is deleted and its memory freed up using
rs_job_free().
87 This is typically called when the job has completed or failed. It can be
88 called earlier if the application decides it wants to cancel
91 rs_job_free() does not delete the output of the job, such as the sumset
92 loaded into memory. It does delete the job
's statistics.
96 ## State Machine Internals
98 Internally, the operations are implemented as state machines that move
99 through various states as input and output buffers become available.
101 All computers and programs are state machines. So why is the
102 representation as a state machine a little more explicit (and perhaps
103 verbose) in librsync than other places? Because we need to be able to
104 let the real computer go off and do something else like waiting for
105 network traffic, while still remembering where it was in the librsync
108 librsync will never block waiting for IO, unless the callbacks do
111 The current state is represented by the private field
112 ::rs_job_t::statefn, which points to a function with a name like
113 `rs_OPERATION_s_STATE`. Every time librsync tries to make progress,
114 it will call this function.
116 The state function returns one of the ::rs_result values. The
117 most important values are
119 * ::RS_DONE: Completed successfully.
121 * ::RS_BLOCKED: Cannot make further progress at this point.
123 * ::RS_RUNNING: The state function has neither completed nor blocked but
124 wants to be called again. **XXX**: Perhaps this should be removed?
126 States need to correspond to suspension points. The only place the
127 job can resume after blocking is at the entry to a state function.
129 Therefore states must be "all or nothing" in that they can either
130 complete, or restart without losing information.
132 Basically every state needs to work from one input buffer to one
135 States should never generally return ::RS_DONE directly. Instead, they
136 should call rs__job_done(), which sets the state function to
137 rs__s_done(). This makes sure that any pending output is flushed out
138 before ::RS_DONE is returned to the application.
LIBRSYNC_EXPORT rs_job_t * rs_sig_begin(size_t block_len, size_t strong_len, rs_magic_number sig_magic)
Start generating a signature.
rs_result rs_job_iter(rs_job_t *job, rs_buffers_t *buffers)
Run a rs_job state machine until it blocks (RS_BLOCKED), returns an error, or completes (RS_DONE)...
LIBRSYNC_EXPORT rs_job_t * rs_loadsig_begin(rs_signature_t **)
Read a signature from a file into an rs_signature structure in memory.
struct rs_job rs_job_t
Job of work to be done.
LIBRSYNC_EXPORT char const * rs_strerror(rs_result r)
Return an English description of a rs_result value.
struct rs_buffers_s rs_buffers_t
rs_job_t * rs_delta_begin(rs_signature_t *sig)
Prepare to compute a streaming delta.
rs_result
Return codes from nonblocking rsync operations.
Blocked waiting for more data.
LIBRSYNC_EXPORT rs_job_t * rs_patch_begin(rs_copy_cb *copy_cb, void *copy_arg)
Apply a delta to a basis file to recreate the new file.
rs_result rs_job_free(rs_job_t *job)
Deallocate job state.