2 * Copyright (c) 2009, 2010 Zmanda, Inc. All Rights Reserved.
4 * This program is free software; you can redistribute it and/or modify it
5 * under the terms of the GNU General Public License version 2 as published
6 * by the Free Software Foundation.
8 * This program is distributed in the hope that it will be useful, but
9 * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
10 * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
13 * You should have received a copy of the GNU General Public License along
14 * with this program; if not, write to the Free Software Foundation, Inc.,
15 * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17 * Contact information: Zmanda Inc., 465 S. Mathilda Ave., Suite 300
18 * Sunnyvale, CA 94085, USA, or: http://www.zmanda.com
25 Amanda::MainLoop - Perl interface to the Glib MainLoop
31 my $to = Amanda::MainLoop::timeout_source(2000);
32 $to->set_callback(sub {
34 $to->remove(); # dont' re-queue this timeout
35 Amanda::MainLoop::quit(); # return from Amanda::MainLoop::run
38 Amanda::MainLoop::run();
40 Note that all functions in this module are individually available for
43 use Amanda::MainLoop qw(run quit);
47 The main event loop of an application is a tight loop which waits for
48 events, and calls functions to respond to those events. This design
49 allows an IO-bound application to multitask within a single thread, by
50 responding to IO events as they occur instead of blocking on
51 particular IO operations.
53 The Amanda security API, transfer API, and other components rely on
54 the event loop to allow them to respond to their own events in a
57 The overall structure of an application, then, is to initialize its
58 state, register callbacks for some events, and begin looping. In each
59 iteration, the loop waits for interesting events to occur (data
60 available for reading or writing, timeouts, etc.), and then calls
61 functions to handle those interesting things. Thus, the application
62 spends most of its time waiting. When some application-defined state
63 is reached, the loop is terminated and the application cleans up and
66 The Glib main loop takes place within a call to
67 C<Amanda::MainLoop::run()>. This function executes until a call to
68 C<Amanda::MainLoop::quit()> occurs, at which point C<run()> returns.
69 You can check whether the loop is running with
70 C<Amanda::MainLoop::is_running()>.
72 =head1 HIGH-LEVEL INTERFACE
74 The functions in this section are intended to make asynchronous
75 programming as simple as possible. They are implemented on top of the
76 interfaces described in the LOW-LEVEL INTERFACE section.
80 In most cases, a callback does not need to be invoked immediately. In
81 fact, because Perl does not do tail-call optimization, a long chain of
82 callbacks may cause the perl stack to grow unnecessarily.
84 The solution is to queue the callback for execution on the next
85 iteration of the main loop, and C<call_later($cb, @args)> does exactly
90 if (can_do_it_now()) {
92 Amanda::MainLoop::call_later($cb, $result)
98 When starting the main loop, an application usually has a sub that
99 should run after the loop has started. C<call_later> works in this
104 Amanda::MainLoop::quit();
106 Amanda::MainLoop::call_later($main);
108 Amanda::MainLoop::run();
112 As an optimization, C<make_cb> wraps a sub with a call to call_later
113 while also naming the sub (using C<Sub::Name>, if available):
115 my $fetched_cb = make_cb(fetched_cb => sub {
119 In general, C<make_cb> should be used whenever a callback is passed to
120 some other library. For example, the Changer API (see
121 L<Amanda::Changer>) might be invoked like this:
123 my $reset_finished_cb = make_cb(reset_finished_cb => sub {
125 die "while resetting: $err" if $err;
129 Be careful I<not> to use C<make_cb> in cases where some action must
130 take place before the next iteration of the main loop. In practice,
131 this means C<make_cb> should be avoided with file-descriptor
132 callbacks, which will trigger repeatedly until the descriptors' needs
135 C<make_cb> is exported automatically.
139 Sometimes you need the MainLoop equivalent of C<sleep()>. That comes
140 in the form of C<call_later($delay, $cb, @args)>, which takes a delay
141 (in milliseconds), a sub, and an arbitrary number of arguments. The
142 sub is called with the arguments after the delay has elapsed.
149 Amanda::MainLoop::call_after(1000, $counter, $i-1);
155 The function returns the underlying event source (see below), enabling
156 the caller to cancel the pending call:
158 my $tosrc = Amanda::MainLoop::call_after(15000, $timeout_cb):
159 # ...data arrives before timeout...
162 =head3 call_on_child_termination
164 To monitor a child process for termination, give its pid to
165 C<call_on_child_termination($pid, $cb, @args)>. When the child exits
166 for any reason, this will collect its exit status (via C<waitpid>) and
169 $cb->($exitstatus, @args);
171 Like C<call_after>, this function returns the event source to allow
172 early cancellation if desired.
178 size => $size, # optional, default 0
179 async_read_cb => $async_read_cb,
180 args => [ .. ]); # optional
182 This function will read C<$size> bytes when they are available from
183 file descriptor C<$fd>, and invoke the callback with the results:
185 $async_read_cb->($err, $buf, @args);
187 If C<$size> is zero, then the callback will get whatever data is
188 available as soon as it is available, up to an arbitrary buffer size.
189 If C<$size> is nonzero, then a short read may still occur if C<$size>
190 bytes do not become available simultaneously. On EOF, C<$buf> will be
191 the empty string. It is the caller's responsibility to set C<$fd> to
192 non-blocking mode. Note that not all operating sytems generate errors
193 that might be reported here. For example, on Solaris an invalid file
194 descriptor will be silently ignored.
196 The return value is an event source, and calling its C<remove> method
197 will cancel the read. It is an error to have more than one
198 C<async_read> operation on a single file descriptor at any time, and
199 will lead to unpredictable results.
201 This function adds a new FdSource every time it is invoked, so it is
202 not well-suited to processing large amounts of data. For that
203 purpose, consider using the low-level interface or, better, the
204 transfer architecture (see L<Amanda::Xfer>).
211 async_write_cb => $async_write_cb,
212 args => [ .. ]); # optional
214 This function will write C<$data> to file descriptor C<$fd> and invoke
215 the callback with the number of bytes written:
217 $cb->($err, $bytes_written, @args);
219 If C<$bytes_written> is less than then length of <$data>, then an
220 error occurred, and is given in C<$err>. As for C<async_read>, the
221 caller should set C<$fd> to non-blocking mode. Multiple parallel
222 invocations of this function for the same file descriptor are allowed
223 and will be serialized in the order the calls were made:
225 async_write($fd, "HELLO!\n",
226 async_write_cb => make_cb(wrote_hello => sub {
227 print "wrote 'HELLO!'\n";
229 async_write($fd, "GOODBYE!\n",
230 async_write_cb => make_cb(wrote_goodbye => sub {
231 print "wrote 'GOODBYE!'\n";
234 In this case, the two strings are guaranteed to be written in the same
235 order, and the callbacks will be called in the correct order.
237 Like async_read, this function may add a new FdSource every time it is
238 invoked, so it is not well-suited to processing large amounts of data.
242 Java has the notion of a "synchronized" method, which can only execute in one
243 thread at any time. This is a particular application of a lock, in which the
244 lock is acquired when the method begins, and released when it finishes.
246 With C<Amanda::MainLoop>, this functionality is generally not needed because
247 there is no unexpected preemeption. However, if you break up a long-running
248 operation (that doesn't allow concurrency) into several callbacks, you'll need
249 to ensure that at most one of those operations is going on at a time. The
250 C<synchronized> function manages that for you.
252 The function takes a C<$lock> argument, which should be initialized to an empty
253 arrayref (C<[]>). It is used like this:
255 use Amanda::MainLoop 'synchronized';
259 my ($arg1, $arg2, $dump_cb) = @_;
261 synchronized($self->{'lock'}, $dump_cb, sub {
262 my ($dump_cb) = @_; # IMPORTANT! See below
263 $self->do_dump_data($arg1, $arg2, $dump_cb);
267 Here, C<do_dump_data> may take a long time to complete (perhaps it starts
268 a long-running data transfer) but only one such operation is allowed at any
269 time and other C<Amanda::MainLoop> callbacks may occur (e.g. a timeout).
270 When the critical operation is complete, it calls C<$dump_cb> which will
271 release the lock before transferring control to the caller.
273 Note that the C<$dump_cb> in the inner C<sub> shadows that in
274 C<dump_data> -- this is intentional, the a call to the the inner
275 C<$dump_cb> is how C<synchronized> knows that the operation has completed.
277 Several methods may be synchronized with one another by simply sharing the same
280 =head1 ASYNCHRONOUS STYLE
282 When writing asynchronous code, it's easy to write code that is *very*
283 difficult to read or debug. The suggestions in this section will help
284 write code that is more readable, and also ensure that all asynchronous
285 code in Amanda uses similar, common idioms.
287 =head2 USING CALLBACKS
289 Most often, callbacks are short, and can be specified as anonymous
290 subs. They should be specified with make_cb, like this:
292 some_async_function(make_cb(foo_cb => sub {
297 If a callback is more than about two lines, specify it in a named
298 variable, rather than directly in the function call:
300 my $foo_cb = make_cb(foo_cb => sub {
306 some_async_function($foo_cb);
308 When using callbacks from an object-oriented package, it is often
309 useful to treat a method as a callback. This requires an anonymous
310 sub "wrapper", which can be written on one line:
312 some_async_function(sub { $self->foo_cb(@_) });
316 The single most important factor in readability is linearity. If a function
317 that performs operations A, B, and C in that order, then the code for A, B, and
318 C should appear in that order in the source file. This seems obvious, but it's
319 all too easy to write
322 my $do_c = sub { .. };
323 my $do_b = sub { .. $do_c->() .. };
324 my $do_a = sub { .. $do_b->() .. };
328 Which isn't very readable. Be readable.
330 =head2 SINGLE ENTRY AND EXIT
332 Amanda's use of callbacks emulates continuation-passing style. As such, when a
333 function finishes -- whether successfully or with an error -- it should call a
334 single callback. This ensures that the function has a simple control
335 interface: perform the operation and call the callback.
337 =head2 MULTIPLE STEPS
339 Some operations require a long squence of asynchronous operations. For
340 example, often the results of one operation are required to initiate
341 another. The I<step> syntax is useful to make this much more readable, and
342 also eliminate some nasty reference-counting bugs. The idea is that each "step"
343 in the process gets its own sub, and then each step calls the next step. The
344 first step defined will be called automatically.
347 my ($hostname, $port, $data, $sendfile_cb) = @_;
348 my ($addr, $socket); # shared lexical variables
349 my $steps = define_steps
350 cb_ref => \$sendfile_cb;
351 step lookup_addr => sub {
352 return async_gethostbyname(hostname => $hostname,
353 ghbn_cb => $steps->{'got_addr'});
355 step ghbn_cb => sub {
356 my ($err, $hostinfo) = @_;
358 $addr = $hostinfo->{'ipaddr'};
359 return $steps->{'connect'}->();
361 step connect => sub {
362 return async_connect(
365 connect_cb => $steps->{'connect_cb'},
368 step connect_cb => sub {
369 my ($err, $conn_sock) = @_;
371 $socket = $conn_sock;
372 return $steps->{'write_block'}->();
377 The C<define_steps> function sets the stage. It is given a reference to the
378 callback for this function (recall there is only one exit point!), and
379 "patches" that reference to free C<$steps>, which otherwise forms a reference
382 WARNING: if the function or method needs to do any kind of setup before its
383 first step, that setup should be done either in a C<setup> step or I<before>
384 the C<define_steps> invocation. Do not write any statements other than step
385 declarations after the C<define_steps> call.
387 Note that there are more steps in this example than are strictly necessary: the
388 body of C<connect> could be appended to C<ghbn_cb>. The extra steps make the
389 overall operation more readable by adding "punctuation" to separate the task of
390 handling a callback (C<ghbn_cb>) from starting the next operation (C<connect>).
392 Also note that the enclosing scope contains some lexical (C<my>)
393 variables which are shared by several of the callbacks.
395 All of the steps are wrapped by C<make_cb>, so each step will be executed on a
396 separate iteration of the MainLoop. This generally has the effect of making
397 asynchronous functions share CPU time more fairly. Sometimes, especially when
398 using the low-level interface, a callback must be called immediately. To
399 achieve this for all callbacks, add C<< immediate => 1 >> to the C<define_steps>
402 my $steps = define_steps
403 cb_ref => \$finished_cb,
406 To do the same for a single step, add the same keyword to the C<step> invocation:
409 connect => sub { .. };
411 =head2 JOINING ASYNCHRONOUS "THREADS"
413 With slow operations, it is often useful to perform multiple operations
414 simultaneously. As an example, the following code might run two system
415 commands simultaneously and capture their output:
417 sub run_two_commands {
418 my ($finished_cb) = @_;
419 my $running_commands = 0;
420 my ($result1, $result2);
421 my $steps = define_steps
422 cb_ref => \$finished_cb;
425 run_command($command1,
426 run_cb => $steps->{'command1_done'});
428 run_command($command2,
429 run_cb => $steps->{'command2_done'});
431 step command1_done => sub {
433 $steps->{'maybe_done'}->();
435 step command2_done => sub {
437 $steps->{'maybe_done'}->();
439 step maybe_done => sub {
440 return if --$running_commands; # not done yet
441 $finished_cb->($result1, $result2);
445 It is tempting to optimize out the C<$running_commands> with something like:
447 step maybe_done { ## BAD!
448 return unless defined $result1 and defined $result2;
449 $finished_cb->($result1, $result2);
452 However this can lead to trouble. Remember that define_steps automatically
453 applies C<make_cb> to each step, so a C<maybe_done> is not invoked immediately
454 by C<command1_done> and C<command2_done> - instead, C<maybe_done> is scheduled
455 for invocation in the next loop of the mainloop (via C<call_later>). If both
456 commands finish before C<maybe_done> is invoked, C<call_later> will be called
457 I<twice>, with both C<$result1> and C<$result2> defined both times. The result
458 is that C<$finished_cb> is called twice, and mayhem ensues.
460 This is a complex case, but worth understanding if you want to be able to debug
461 difficult MainLoop bugs.
463 =head2 WRITING ASYNCHRONOUS INTERFACES
465 When designing a library or interface that will accept and invoke
466 callbacks, follow these guidelines so that users of the interface will
467 not need to remember special rules.
469 Each callback signature within a package should always have the same
470 name, ending with C<_cb>. For example, a hypothetical
471 C<Amanda::Estimate> module might provide its estimates through a
472 callback with four parameters. This callback should be referred to as
473 C<estimate_cb> throughout the package, and its parameters should be
474 clearly defined in the package's documentation. It should take
475 positional parameters only. If error conditions must also be
476 communicated via the callback, then the first parameter should be an
477 C<$error> parameter, which is undefined when no error has occurred.
478 The Changer API's C<res_cb> is typical of such a callback signature.
480 A caller can only know that an operation is complete by the invocation
481 of the callback, so it is important that a callback be invoked
482 I<exactly once> in all circumstances. Even in an error condition, the
483 caller needs to know that the operation has failed. Also beware of
484 bugs that might cause a callback to be invoked twice.
486 Functions or methods taking callbacks as arguments should either take
487 only a callback (like C<call_later>), or take hash-key parameters,
488 where the callback's key is the signature name. For example, the
489 C<Amanda::Estimate> package might define a function like
490 C<perform_estimate>, invoked something like this:
492 my $estimate_cb = make_cb(estimate_cb => sub {
493 my ($err, $size, $level) = @_;
497 Amanda::Estimate::perform_estimate(
500 estimate_cb => $estimate_cb,
503 When invoking a user-supplied callback within the library, there is no
504 need to wrap it in a C<call_later> invocation, as the user already
505 supplied that wrapper via C<make_cb>, or is not interested in using
508 Callbacks are a form of continuation
509 (L<http://en.wikipedia.org/wiki/Continuations>), and as such should
510 only be called at the I<end> of a function. Do not do anything after
511 invoking a callback, as you cannot know what processing has gone on in
516 $self->{'estimate_cb'}->(undef, $size, $level);
517 $self->{'estimate_in_progress'} = 0; # BUG!!
520 In this case, the C<estimate_cb> invocation may have called
521 C<perform_estimate> again, setting C<estimate_in_progress> back to 1.
522 A technique to avoid this pitfall is to always C<return> a callback's
523 result, even though that result is not important. This makes the bug
528 return $self->{'estimate_cb'}->(undef, $size, $level);
529 $self->{'estimate_in_progress'} = 0; # BUG (this just looks silly)
532 =head1 LOW-LEVEL INTERFACE
534 MainLoop events are generated by event sources. A source may produce
535 multiple events over its lifetime. The higher-level methods in the
536 previous section provide a more Perlish abstraction of event sources,
537 but for efficiency it is sometimes necessary to use event sources
540 The method C<< $src->set_callback(\&cb) >> sets the function that will
541 be called for a given source, and "attaches" the source to the main
542 loop so that it will begin generating events. The arguments to the
543 callback depend on the event source, but the first argument is always
544 the source itself. Unless specified, no other arguments are provided.
546 Event sources persist until they are removed with
547 C<< $src->remove() >>, even if the source itself is no longer accessible from Perl.
548 Although Glib supports it, there is no provision for "automatically"
549 removing an event source. Also, calling C<< $src->remove() >> more than
550 once is a potentially-fatal error. As an example:
554 Amanda::MainLoop::timeout_source(200)->set_callback(sub {
559 Amanda::MainLoop::quit();
564 Amanda::MainLoop::run();
566 There is no means in place to specify extra arguments to be provided
567 to a source callback when it is set. If the callback needs access to
568 other data, it should use a Perl closure in the form of lexically
569 scoped variables and an anonymous sub. In fact, this is exactly what
570 the higher-level functions (described above) do.
574 my $src = Amanda::MainLoop::timeout_source(10000);
576 A timeout source will create events at the specified interval,
577 specified in milliseconds (thousandths of a second). The events will
578 continue until the source is destroyed.
582 my $src = Amanda::MainLoop::idle_source(2);
584 An idle source will create events continuously except when a
585 higher-priority source is emitting events. Priorities are generally
586 small positive integers, with larger integers denoting lower
587 priorities. The events will continue until the source is destroyed.
591 my $src = Amanda::MainLoop::child_watch_source($pid);
593 A child watch source will issue an event when the process with the
594 given PID dies. To avoid race conditions, it will issue an event even
595 if the process dies before the source is created. The callback is
596 called with three arguments: the event source, the PID, and the
599 Note that this source is totally incompatible with any thing that
600 would cause perl to change the SIGCHLD handler. If SIGCHLD is
601 changed, under some circumstances the module will recognize this
602 circumstance, add a warning to the debug log, and continue operating.
603 However, it is impossible to catch all possible situations.
605 =head2 File Descriptor
607 my $src = Amanda::MainLoop::fd_source($fd, $G_IO_IN);
609 This source will issue an event whenever one of the given conditions
610 is true for the given file (a file handle or integer file descriptor).
611 The conditions are from Glib's GIOCondition, and are C<$G_IO_IN>,
612 C<G_IO_OUT>, C<$G_IO_PRI>, C<$G_IO_ERR>, C<$G_IO_HUP>, and
613 C<$G_IO_NVAL>. These constants are available with the import tag
616 Generally, when reading from a file descriptor, use
617 C<$G_IO_IN|$G_IO_HUP|$G_IO_ERR> to ensure that an EOF triggers an
618 event as well. Writing to a file descriptor can simply use
619 C<$G_IO_OUT|$G_IO_ERR>.
621 The callback attached to an FdSource should read from or write to the
622 underlying file descriptor before returning, or it will be called
623 again in the next iteration of the main loop, which can lead to
624 unexpected results. Do I<not> use C<make_cb> here!
626 =head2 Combining Event Sources
628 Event sources are often set up in groups, e.g., a long-term operation
629 and a timeout. When this is the case, be careful that all sources are
630 removed when the operation is complete. The easiest way to accomplish
631 this is to include all sources in a lexical scope and remove them at
632 the appropriate times:
635 my $op_src = long_operation_src();
636 my $timeout_src = Amanda::MainLoop::timeout_source($timeout);
640 $timeout_src->remove();
643 $op_src->set_callback(sub {
644 print "Operation complete\n";
648 $timeout_src->set_callback(sub {
649 print "Operation timed out\n";
654 =head2 Relationship to Glib
656 Glib's main event loop is described in the Glib manual:
657 L<http://library.gnome.org/devel/glib/stable/glib-The-Main-Event-Loop.html>.
658 Note that Amanda depends only on the functionality available in
659 Glib-2.2.0, so many functions described in that document are not
660 available in Amanda. This module provides a much-simplified interface
661 to the glib library, and is not intended as a generic wrapper for it:
662 Amanda's perl-accessible main loop only runs a single C<GMainContext>,
663 and always runs in the main thread; and (aside from idle sources),
664 event priorities are not accessible from Perl.