2 * Copyright (c) 2009, 2010 Zmanda, Inc. All Rights Reserved.
4 * This program is free software; you can redistribute it and/or modify it
5 * under the terms of the GNU General Public License version 2 as published
6 * by the Free Software Foundation.
8 * This program is distributed in the hope that it will be useful, but
9 * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
10 * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
13 * You should have received a copy of the GNU General Public License along
14 * with this program; if not, write to the Free Software Foundation, Inc.,
15 * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17 * Contact information: Zmanda Inc., 465 S. Mathilda Ave., Suite 300
18 * Sunnyvale, CA 94085, USA, or: http://www.zmanda.com
25 Amanda::MainLoop - Perl interface to the Glib MainLoop
31 my $to = Amanda::MainLoop::timeout_source(2000);
32 $to->set_callback(sub {
34 $to->remove(); # dont' re-queue this timeout
35 Amanda::MainLoop::quit(); # return from Amanda::MainLoop::run
38 Amanda::MainLoop::run();
40 Note that all functions in this module are individually available for
43 use Amanda::MainLoop qw(run quit);
47 The main event loop of an application is a tight loop which waits for
48 events, and calls functions to respond to those events. This design
49 allows an IO-bound application to multitask within a single thread, by
50 responding to IO events as they occur instead of blocking on
51 particular IO operations.
53 The Amanda security API, transfer API, and other components rely on
54 the event loop to allow them to respond to their own events in a
57 The overall structure of an application, then, is to initialize its
58 state, register callbacks for some events, and begin looping. In each
59 iteration, the loop waits for interesting events to occur (data
60 available for reading or writing, timeouts, etc.), and then calls
61 functions to handle those interesting things. Thus, the application
62 spends most of its time waiting. When some application-defined state
63 is reached, the loop is terminated and the application cleans up and
66 The Glib main loop takes place within a call to
67 C<Amanda::MainLoop::run()>. This function executes until a call to
68 C<Amanda::MainLoop::quit()> occurs, at which point C<run()> returns.
69 You can check whether the loop is running with
70 C<Amanda::MainLoop::is_running()>.
72 =head1 HIGH-LEVEL INTERFACE
74 The functions in this section are intended to make asynchronous
75 programming as simple as possible. They are implemented on top of the
76 interfaces described in the LOW-LEVEL INTERFACE section.
80 In most cases, a callback does not need to be invoked immediately. In
81 fact, because Perl does not do tail-call optimization, a long chain of
82 callbacks may cause the perl stack to grow unnecessarily.
84 The solution is to queue the callback for execution on the next
85 iteration of the main loop, and C<call_later($cb, @args)> does exactly
90 if (can_do_it_now()) {
92 Amanda::MainLoop::call_later($cb, $result)
98 When starting the main loop, an application usually has a sub that
99 should run after the loop has started. C<call_later> works in this
104 Amanda::MainLoop::quit();
106 Amanda::MainLoop::call_later($main);
108 Amanda::MainLoop::run();
112 As an optimization, C<make_cb> wraps a sub with a call to call_later
113 while also naming the sub (using C<Sub::Name>, if available):
115 my $fetched_cb = make_cb(fetched_cb => sub {
119 In general, C<make_cb> should be used whenever a callback is passed to
120 some other library. For example, the Changer API (see
121 L<Amanda::Changer>) might be invoked like this:
123 my $reset_finished_cb = make_cb(reset_finished_cb => sub {
125 die "while resetting: $err" if $err;
129 Be careful I<not> to use C<make_cb> in cases where some action must
130 take place before the next iteration of the main loop. In practice,
131 this means C<make_cb> should be avoided with file-descriptor
132 callbacks, which will trigger repeatedly until the descriptors' needs
135 C<make_cb> is exported automatically.
139 Sometimes you need the MainLoop equivalent of C<sleep()>. That comes
140 in the form of C<call_later($delay, $cb, @args)>, which takes a delay
141 (in milliseconds), a sub, and an arbitrary number of arguments. The
142 sub is called with the arguments after the delay has elapsed.
149 Amanda::MainLoop::call_after(1000, $counter, $i-1);
155 The function returns the underlying event source (see below), enabling
156 the caller to cancel the pending call:
158 my $tosrc = Amanda::MainLoop::call_after(15000, $timeout_cb):
159 # ...data arrives before timeout...
162 =head3 call_on_child_termination
164 To monitor a child process for termination, give its pid to
165 C<call_on_child_termination($pid, $cb, @args)>. When the child exits
166 for any reason, this will collect its exit status (via C<waitpid>) and
169 $cb->($exitstatus, @args);
171 Like C<call_after>, this function returns the event source to allow
172 early cancellation if desired.
178 size => $size, # optional, default 0
179 async_read_cb => $async_read_cb,
180 args => [ .. ]); # optional
182 This function will read C<$size> bytes when they are available from
183 file descriptor C<$fd>, and invoke the callback with the results:
185 $async_read_cb->($err, $buf, @args);
187 If C<$size> is zero, then the callback will get whatever data is
188 available as soon as it is available, up to an arbitrary buffer size.
189 If C<$size> is nonzero, then a short read may still occur if C<$size>
190 bytes do not become available simultaneously. On EOF, C<$buf> will be
191 the empty string. It is the caller's responsibility to set C<$fd> to
192 non-blocking mode. Note that not all operating sytems generate errors
193 that might be reported here. For example, on Solaris an invalid file
194 descriptor will be silently ignored.
196 The return value is an event source, and calling its C<remove> method
197 will cancel the read. It is an error to have more than one
198 C<async_read> operation on a single file descriptor at any time, and
199 will lead to unpredictable results.
201 This function adds a new FdSource every time it is invoked, so it is
202 not well-suited to processing large amounts of data. For that
203 purpose, consider using the low-level interface or, better, the
204 transfer architecture (see L<Amanda::Xfer>).
211 async_write_cb => $async_write_cb,
212 args => [ .. ]); # optional
214 This function will write C<$data> to file descriptor C<$fd> and invoke
215 the callback with the number of bytes written:
217 $cb->($err, $bytes_written, @args);
219 If C<$bytes_written> is less than then length of <$data>, then an
220 error occurred, and is given in C<$err>. As for C<async_read>, the
221 caller should set C<$fd> to non-blocking mode. Multiple parallel
222 invocations of this function for the same file descriptor are allowed
223 and will be serialized in the order the calls were made:
225 async_write($fd, "HELLO!\n",
226 async_write_cb => make_cb(wrote_hello => sub {
227 print "wrote 'HELLO!'\n";
229 async_write($fd, "GOODBYE!\n",
230 async_write_cb => make_cb(wrote_goodbye => sub {
231 print "wrote 'GOODBYE!'\n";
234 In this case, the two strings are guaranteed to be written in the same
235 order, and the callbacks will be called in the correct order.
237 Like async_read, this function may add a new FdSource every time it is
238 invoked, so it is not well-suited to processing large amounts of data.
242 Java has the notion of a "synchronized" method, which can only execute in one
243 thread at any time. This is a particular application of a lock, in which the
244 lock is acquired when the method begins, and released when it finishes.
246 With C<Amanda::MainLoop>, this functionality is generally not needed because
247 there is no unexpected preemeption. However, if you break up a long-running
248 operation (that doesn't allow concurrency) into several callbacks, you'll need
249 to ensure that at most one of those operations is going on at a time. The
250 C<synchronized> function manages that for you.
252 The function takes a C<$lock> argument, which should be initialized to an empty
253 arrayref (C<[]>). It is used like this:
255 use Amanda::MainLoop 'synchronized';
259 my ($arg1, $arg2, $dump_cb) = @_;
261 synchronized($self->{'lock'}, $dump_cb, sub {
262 my ($dump_cb) = @_; # IMPORTANT! See below
263 $self->do_dump_data($arg1, $arg2, $dump_cb);
267 Here, C<do_dump_data> may take a long time to complete (perhaps it starts
268 a long-running data transfer) but only one such operation is allowed at any
269 time and other C<Amanda::MainLoop> callbacks may occur (e.g. a timeout).
270 When the critical operation is complete, it calls C<$dump_cb> which will
271 release the lock before transferring control to the caller.
273 Note that the C<$dump_cb> in the inner C<sub> shadows that in
274 C<dump_data> -- this is intentional, the a call to the the inner
275 C<$dump_cb> is how C<synchronized> knows that the operation has completed.
277 Several methods may be synchronized with one another by simply sharing the same
280 =head1 ASYNCHRONOUS STYLE
282 When writing asynchronous code, it's easy to write code that is *very*
283 difficult to read or debug. The suggestions in this section will help
284 write code that is more readable, and also ensure that all asynchronous
285 code in Amanda uses similar, common idioms.
287 =head2 USING CALLBACKS
289 Most often, callbacks are short, and can be specified as anonymous
290 subs. They should be specified with make_cb, like this:
292 some_async_function(make_cb(foo_cb => sub {
297 If a callback is more than about two lines, specify it in a named
298 variable, rather than directly in the function call:
300 my $foo_cb = make_cb(foo_cb => sub {
306 some_async_function($foo_cb);
308 When using callbacks from an object-oriented package, it is often
309 useful to treat a method as a callback. This requires an anonymous
310 sub "wrapper", which can be written on one line:
312 some_async_function(sub { $self->foo_cb(@_) });
316 The single most important factor in readability is linearity. If a function
317 that performs operations A, B, and C in that order, then the code for A, B, and
318 C should appear in that order in the source file. This seems obvious, but it's
319 all too easy to write
322 my $do_c = sub { .. };
323 my $do_b = sub { .. $do_c->() .. };
324 my $do_a = sub { .. $do_b->() .. };
328 Which isn't very readable. Be readable.
330 =head2 SINGLE ENTRY AND EXIT
332 Amanda's use of callbacks emulates continuation-passing style. As such, when a
333 function finishes -- whether successfully or with an error -- it should call a
334 single callback. This ensures that the function has a simple control
335 interface: perform the operation and call the callback.
337 =head2 MULTIPLE STEPS
339 Some operations require a long squence of asynchronous operations. For
340 example, often the results of one operation are required to initiate
341 another. The I<step> syntax is useful to make this much more readable, and
342 also eliminate some nasty reference-counting bugs. The idea is that each "step"
343 in the process gets its own sub, and then each step calls the next step. The
344 first step defined will be called automatically.
347 my ($hostname, $port, $data, $sendfile_cb) = @_;
348 my ($addr, $socket); # shared lexical variables
349 my $steps = define_steps
350 cb_ref => \$sendfile_cb;
351 step lookup_addr => sub {
352 return async_gethostbyname(hostname => $hostname,
353 ghbn_cb => $steps->{'got_addr'});
355 step ghbn_cb => sub {
356 my ($err, $hostinfo) = @_;
358 $addr = $hostinfo->{'ipaddr'};
359 return $steps->{'connect'}->();
361 step connect => sub {
362 return async_connect(
365 connect_cb => $steps->{'connect_cb'},
368 step connect_cb => sub {
369 my ($err, $conn_sock) = @_;
371 $socket = $conn_sock;
372 return $steps->{'write_block'}->();
377 The C<define_steps> function sets the stage. It is given a reference to the
378 callback for this function (recall there is only one exit point!), and
379 "patches" that reference to free C<$steps>, which otherwise forms a reference
382 WARNING: if the function or method needs to do any kind of setup before its
383 first step, that setup should be done either in a C<setup> step or I<before>
384 the C<define_steps> invocation. Do not write any statements other than step
385 declarations after the C<define_steps> call.
387 Note that there are more steps in this example than are strictly necessary: the
388 body of C<connect> could be appended to C<ghbn_cb>. The extra steps make the
389 overall operation more readable by adding "punctuation" to separate the task of
390 handling a callback (C<ghbn_cb>) from starting the next operation (C<connect>).
392 Also note that the enclosing scope contains some lexical (C<my>)
393 variables which are shared by several of the callbacks.
395 All of the steps are wrapped by C<make_cb>, so each step will be executed on a
396 separate iteration of the MainLoop. This generally has the effect of making
397 asynchronous functions share CPU time more fairly. Sometimes, especially when
398 using the low-level interface, a callback must be called immediately. To
399 achieve this for all callbacks, add C<< immediate => 1 >> to the C<define_steps>
402 my $steps = define_steps
403 cb_ref => \$finished_cb,
406 To do the same for a single step, add the same keyword to the C<step> invocation:
409 connect => sub { .. };
411 In some case, you want to execute some code when the step finish, it can
412 be done by defining a finalize code in define_steps:
414 my $steps = define_steps
415 cb_ref => \$finished_cb,
416 finalize => sub { .. };
418 =head2 JOINING ASYNCHRONOUS "THREADS"
420 With slow operations, it is often useful to perform multiple operations
421 simultaneously. As an example, the following code might run two system
422 commands simultaneously and capture their output:
424 sub run_two_commands {
425 my ($finished_cb) = @_;
426 my $running_commands = 0;
427 my ($result1, $result2);
428 my $steps = define_steps
429 cb_ref => \$finished_cb;
432 run_command($command1,
433 run_cb => $steps->{'command1_done'});
435 run_command($command2,
436 run_cb => $steps->{'command2_done'});
438 step command1_done => sub {
440 $steps->{'maybe_done'}->();
442 step command2_done => sub {
444 $steps->{'maybe_done'}->();
446 step maybe_done => sub {
447 return if --$running_commands; # not done yet
448 $finished_cb->($result1, $result2);
452 It is tempting to optimize out the C<$running_commands> with something like:
454 step maybe_done { ## BAD!
455 return unless defined $result1 and defined $result2;
456 $finished_cb->($result1, $result2);
459 However this can lead to trouble. Remember that define_steps automatically
460 applies C<make_cb> to each step, so a C<maybe_done> is not invoked immediately
461 by C<command1_done> and C<command2_done> - instead, C<maybe_done> is scheduled
462 for invocation in the next loop of the mainloop (via C<call_later>). If both
463 commands finish before C<maybe_done> is invoked, C<call_later> will be called
464 I<twice>, with both C<$result1> and C<$result2> defined both times. The result
465 is that C<$finished_cb> is called twice, and mayhem ensues.
467 This is a complex case, but worth understanding if you want to be able to debug
468 difficult MainLoop bugs.
470 =head2 WRITING ASYNCHRONOUS INTERFACES
472 When designing a library or interface that will accept and invoke
473 callbacks, follow these guidelines so that users of the interface will
474 not need to remember special rules.
476 Each callback signature within a package should always have the same
477 name, ending with C<_cb>. For example, a hypothetical
478 C<Amanda::Estimate> module might provide its estimates through a
479 callback with four parameters. This callback should be referred to as
480 C<estimate_cb> throughout the package, and its parameters should be
481 clearly defined in the package's documentation. It should take
482 positional parameters only. If error conditions must also be
483 communicated via the callback, then the first parameter should be an
484 C<$error> parameter, which is undefined when no error has occurred.
485 The Changer API's C<res_cb> is typical of such a callback signature.
487 A caller can only know that an operation is complete by the invocation
488 of the callback, so it is important that a callback be invoked
489 I<exactly once> in all circumstances. Even in an error condition, the
490 caller needs to know that the operation has failed. Also beware of
491 bugs that might cause a callback to be invoked twice.
493 Functions or methods taking callbacks as arguments should either take
494 only a callback (like C<call_later>), or take hash-key parameters,
495 where the callback's key is the signature name. For example, the
496 C<Amanda::Estimate> package might define a function like
497 C<perform_estimate>, invoked something like this:
499 my $estimate_cb = make_cb(estimate_cb => sub {
500 my ($err, $size, $level) = @_;
504 Amanda::Estimate::perform_estimate(
507 estimate_cb => $estimate_cb,
510 When invoking a user-supplied callback within the library, there is no
511 need to wrap it in a C<call_later> invocation, as the user already
512 supplied that wrapper via C<make_cb>, or is not interested in using
515 Callbacks are a form of continuation
516 (L<http://en.wikipedia.org/wiki/Continuations>), and as such should
517 only be called at the I<end> of a function. Do not do anything after
518 invoking a callback, as you cannot know what processing has gone on in
523 $self->{'estimate_cb'}->(undef, $size, $level);
524 $self->{'estimate_in_progress'} = 0; # BUG!!
527 In this case, the C<estimate_cb> invocation may have called
528 C<perform_estimate> again, setting C<estimate_in_progress> back to 1.
529 A technique to avoid this pitfall is to always C<return> a callback's
530 result, even though that result is not important. This makes the bug
535 return $self->{'estimate_cb'}->(undef, $size, $level);
536 $self->{'estimate_in_progress'} = 0; # BUG (this just looks silly)
539 =head1 LOW-LEVEL INTERFACE
541 MainLoop events are generated by event sources. A source may produce
542 multiple events over its lifetime. The higher-level methods in the
543 previous section provide a more Perlish abstraction of event sources,
544 but for efficiency it is sometimes necessary to use event sources
547 The method C<< $src->set_callback(\&cb) >> sets the function that will
548 be called for a given source, and "attaches" the source to the main
549 loop so that it will begin generating events. The arguments to the
550 callback depend on the event source, but the first argument is always
551 the source itself. Unless specified, no other arguments are provided.
553 Event sources persist until they are removed with
554 C<< $src->remove() >>, even if the source itself is no longer accessible from Perl.
555 Although Glib supports it, there is no provision for "automatically"
556 removing an event source. Also, calling C<< $src->remove() >> more than
557 once is a potentially-fatal error. As an example:
561 Amanda::MainLoop::timeout_source(200)->set_callback(sub {
566 Amanda::MainLoop::quit();
571 Amanda::MainLoop::run();
573 There is no means in place to specify extra arguments to be provided
574 to a source callback when it is set. If the callback needs access to
575 other data, it should use a Perl closure in the form of lexically
576 scoped variables and an anonymous sub. In fact, this is exactly what
577 the higher-level functions (described above) do.
581 my $src = Amanda::MainLoop::timeout_source(10000);
583 A timeout source will create events at the specified interval,
584 specified in milliseconds (thousandths of a second). The events will
585 continue until the source is destroyed.
589 my $src = Amanda::MainLoop::idle_source(2);
591 An idle source will create events continuously except when a
592 higher-priority source is emitting events. Priorities are generally
593 small positive integers, with larger integers denoting lower
594 priorities. The events will continue until the source is destroyed.
598 my $src = Amanda::MainLoop::child_watch_source($pid);
600 A child watch source will issue an event when the process with the
601 given PID dies. To avoid race conditions, it will issue an event even
602 if the process dies before the source is created. The callback is
603 called with three arguments: the event source, the PID, and the
606 Note that this source is totally incompatible with any thing that
607 would cause perl to change the SIGCHLD handler. If SIGCHLD is
608 changed, under some circumstances the module will recognize this
609 circumstance, add a warning to the debug log, and continue operating.
610 However, it is impossible to catch all possible situations.
612 =head2 File Descriptor
614 my $src = Amanda::MainLoop::fd_source($fd, $G_IO_IN);
616 This source will issue an event whenever one of the given conditions
617 is true for the given file (a file handle or integer file descriptor).
618 The conditions are from Glib's GIOCondition, and are C<$G_IO_IN>,
619 C<G_IO_OUT>, C<$G_IO_PRI>, C<$G_IO_ERR>, C<$G_IO_HUP>, and
620 C<$G_IO_NVAL>. These constants are available with the import tag
623 Generally, when reading from a file descriptor, use
624 C<$G_IO_IN|$G_IO_HUP|$G_IO_ERR> to ensure that an EOF triggers an
625 event as well. Writing to a file descriptor can simply use
626 C<$G_IO_OUT|$G_IO_ERR>.
628 The callback attached to an FdSource should read from or write to the
629 underlying file descriptor before returning, or it will be called
630 again in the next iteration of the main loop, which can lead to
631 unexpected results. Do I<not> use C<make_cb> here!
633 =head2 Combining Event Sources
635 Event sources are often set up in groups, e.g., a long-term operation
636 and a timeout. When this is the case, be careful that all sources are
637 removed when the operation is complete. The easiest way to accomplish
638 this is to include all sources in a lexical scope and remove them at
639 the appropriate times:
642 my $op_src = long_operation_src();
643 my $timeout_src = Amanda::MainLoop::timeout_source($timeout);
647 $timeout_src->remove();
650 $op_src->set_callback(sub {
651 print "Operation complete\n";
655 $timeout_src->set_callback(sub {
656 print "Operation timed out\n";
661 =head2 Relationship to Glib
663 Glib's main event loop is described in the Glib manual:
664 L<http://library.gnome.org/devel/glib/stable/glib-The-Main-Event-Loop.html>.
665 Note that Amanda depends only on the functionality available in
666 Glib-2.2.0, so many functions described in that document are not
667 available in Amanda. This module provides a much-simplified interface
668 to the glib library, and is not intended as a generic wrapper for it:
669 Amanda's perl-accessible main loop only runs a single C<GMainContext>,
670 and always runs in the main thread; and (aside from idle sources),
671 event priorities are not accessible from Perl.