/* * Copyright (c) 2009-2012 Zmanda, Inc. All Rights Reserved. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version 2 * of the License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * * Contact information: Zmanda Inc., 465 S. Mathilda Ave., Suite 300 * Sunnyvale, CA 94085, USA, or: http://www.zmanda.com */ %perlcode %{ =head1 NAME Amanda::MainLoop - Perl interface to the Glib MainLoop =head1 SYNOPSIS use Amanda::MainLoop; my $to = Amanda::MainLoop::timeout_source(2000); $to->set_callback(sub { print "Time's Up!\n"; $to->remove(); # dont' re-queue this timeout Amanda::MainLoop::quit(); # return from Amanda::MainLoop::run }); Amanda::MainLoop::run(); Note that all functions in this module are individually available for export, e.g., use Amanda::MainLoop qw(run quit); =head1 OVERVIEW The main event loop of an application is a tight loop which waits for events, and calls functions to respond to those events. This design allows an IO-bound application to multitask within a single thread, by responding to IO events as they occur instead of blocking on particular IO operations. The Amanda security API, transfer API, and other components rely on the event loop to allow them to respond to their own events in a timely fashion. The overall structure of an application, then, is to initialize its state, register callbacks for some events, and begin looping. In each iteration, the loop waits for interesting events to occur (data available for reading or writing, timeouts, etc.), and then calls functions to handle those interesting things. Thus, the application spends most of its time waiting. When some application-defined state is reached, the loop is terminated and the application cleans up and exits. The Glib main loop takes place within a call to C. This function executes until a call to C occurs, at which point C returns. You can check whether the loop is running with C. =head1 HIGH-LEVEL INTERFACE The functions in this section are intended to make asynchronous programming as simple as possible. They are implemented on top of the interfaces described in the LOW-LEVEL INTERFACE section. =head3 call_later In most cases, a callback does not need to be invoked immediately. In fact, because Perl does not do tail-call optimization, a long chain of callbacks may cause the perl stack to grow unnecessarily. The solution is to queue the callback for execution on the next iteration of the main loop, and C does exactly this. sub might_delay { my ($cb) = @_; if (can_do_it_now()) { my $result = do_it(); Amanda::MainLoop::call_later($cb, $result) } else { # .. } } When starting the main loop, an application usually has a sub that should run after the loop has started. C works in this situation, too. my $main = sub { # .. Amanda::MainLoop::quit(); }; Amanda::MainLoop::call_later($main); # .. Amanda::MainLoop::run(); =head3 make_cb As an optimization, C wraps a sub with a call to call_later while also naming the sub (using C, if available): my $fetched_cb = make_cb(fetched_cb => sub { # .. callback body } In general, C should be used whenever a callback is passed to some other library. For example, the Changer API (see L) might be invoked like this: my $reset_finished_cb = make_cb(reset_finished_cb => sub { my ($err) = @_; die "while resetting: $err" if $err; # .. }); Be careful I to use C in cases where some action must take place before the next iteration of the main loop. In practice, this means C should be avoided with file-descriptor callbacks, which will trigger repeatedly until the descriptors' needs are addressed. C is exported automatically. =head3 call_after Sometimes you need the MainLoop equivalent of C. That comes in the form of C, which takes a delay (in milliseconds), a sub, and an arbitrary number of arguments. The sub is called with the arguments after the delay has elapsed. sub countdown { my $counter; $counter = sub { print "$i..\n"; if ($i) { Amanda::MainLoop::call_after(1000, $counter, $i-1); } } $counter->(10); } The function returns the underlying event source (see below), enabling the caller to cancel the pending call: my $tosrc = Amanda::MainLoop::call_after(15000, $timeout_cb): # ...data arrives before timeout... $tosrc->remove(); =head3 call_on_child_termination To monitor a child process for termination, give its pid to C. When the child exits for any reason, this will collect its exit status (via C) and call C<$cb> as $cb->($exitstatus, @args); Like C, this function returns the event source to allow early cancellation if desired. =head3 async_read async_read( fd => $fd, size => $size, # optional, default 0 async_read_cb => $async_read_cb, args => [ .. ]); # optional This function will read C<$size> bytes when they are available from file descriptor C<$fd>, and invoke the callback with the results: $async_read_cb->($err, $buf, @args); If C<$size> is zero, then the callback will get whatever data is available as soon as it is available, up to an arbitrary buffer size. If C<$size> is nonzero, then a short read may still occur if C<$size> bytes do not become available simultaneously. On EOF, C<$buf> will be the empty string. It is the caller's responsibility to set C<$fd> to non-blocking mode. Note that not all operating sytems generate errors that might be reported here. For example, on Solaris an invalid file descriptor will be silently ignored. The return value is an event source, and calling its C method will cancel the read. It is an error to have more than one C operation on a single file descriptor at any time, and will lead to unpredictable results. This function adds a new FdSource every time it is invoked, so it is not well-suited to processing large amounts of data. For that purpose, consider using the low-level interface or, better, the transfer architecture (see L). =head3 async_write async_write( fd => $fd, data => $data, async_write_cb => $async_write_cb, args => [ .. ]); # optional This function will write C<$data> to file descriptor C<$fd> and invoke the callback with the number of bytes written: $cb->($err, $bytes_written, @args); If C<$bytes_written> is less than then length of <$data>, then an error occurred, and is given in C<$err>. As for C, the caller should set C<$fd> to non-blocking mode. Multiple parallel invocations of this function for the same file descriptor are allowed and will be serialized in the order the calls were made: async_write($fd, "HELLO!\n", async_write_cb => make_cb(wrote_hello => sub { print "wrote 'HELLO!'\n"; })); async_write($fd, "GOODBYE!\n", async_write_cb => make_cb(wrote_goodbye => sub { print "wrote 'GOODBYE!'\n"; })); In this case, the two strings are guaranteed to be written in the same order, and the callbacks will be called in the correct order. Like async_read, this function may add a new FdSource every time it is invoked, so it is not well-suited to processing large amounts of data. =head3 synchronized Java has the notion of a "synchronized" method, which can only execute in one thread at any time. This is a particular application of a lock, in which the lock is acquired when the method begins, and released when it finishes. With C, this functionality is generally not needed because there is no unexpected preemeption. However, if you break up a long-running operation (that doesn't allow concurrency) into several callbacks, you'll need to ensure that at most one of those operations is going on at a time. The C function manages that for you. The function takes a C<$lock> argument, which should be initialized to an empty arrayref (C<[]>). It is used like this: use Amanda::MainLoop 'synchronized'; # .. sub dump_data { my $self = shift; my ($arg1, $arg2, $dump_cb) = @_; synchronized($self->{'lock'}, $dump_cb, sub { my ($dump_cb) = @_; # IMPORTANT! See below $self->do_dump_data($arg1, $arg2, $dump_cb); }; } Here, C may take a long time to complete (perhaps it starts a long-running data transfer) but only one such operation is allowed at any time and other C callbacks may occur (e.g. a timeout). When the critical operation is complete, it calls C<$dump_cb> which will release the lock before transferring control to the caller. Note that the C<$dump_cb> in the inner C shadows that in C -- this is intentional, the a call to the the inner C<$dump_cb> is how C knows that the operation has completed. Several methods may be synchronized with one another by simply sharing the same lock. =head1 ASYNCHRONOUS STYLE When writing asynchronous code, it's easy to write code that is *very* difficult to read or debug. The suggestions in this section will help write code that is more readable, and also ensure that all asynchronous code in Amanda uses similar, common idioms. =head2 USING CALLBACKS Most often, callbacks are short, and can be specified as anonymous subs. They should be specified with make_cb, like this: some_async_function(make_cb(foo_cb => sub { my ($x, $y) = @_; # ... })); If a callback is more than about two lines, specify it in a named variable, rather than directly in the function call: my $foo_cb = make_cb(foo_cb => sub { my ($src) = @_; # . # . long function # . }); some_async_function($foo_cb); When using callbacks from an object-oriented package, it is often useful to treat a method as a callback. This requires an anonymous sub "wrapper", which can be written on one line: some_async_function(sub { $self->foo_cb(@_) }); =head2 LINEARITY The single most important factor in readability is linearity. If a function that performs operations A, B, and C in that order, then the code for A, B, and C should appear in that order in the source file. This seems obvious, but it's all too easy to write sub three_ops { my $do_c = sub { .. }; my $do_b = sub { .. $do_c->() .. }; my $do_a = sub { .. $do_b->() .. }; $do_a->(); } Which isn't very readable. Be readable. =head2 SINGLE ENTRY AND EXIT Amanda's use of callbacks emulates continuation-passing style. As such, when a function finishes -- whether successfully or with an error -- it should call a single callback. This ensures that the function has a simple control interface: perform the operation and call the callback. =head2 MULTIPLE STEPS Some operations require a long squence of asynchronous operations. For example, often the results of one operation are required to initiate another. The I syntax is useful to make this much more readable, and also eliminate some nasty reference-counting bugs. The idea is that each "step" in the process gets its own sub, and then each step calls the next step. The first step defined will be called automatically. sub send_file { my ($hostname, $port, $data, $sendfile_cb) = @_; my ($addr, $socket); # shared lexical variables my $steps = define_steps cb_ref => \$sendfile_cb; step lookup_addr => sub { return async_gethostbyname(hostname => $hostname, ghbn_cb => $steps->{'got_addr'}); }; step ghbn_cb => sub { my ($err, $hostinfo) = @_; die $err if $err; $addr = $hostinfo->{'ipaddr'}; return $steps->{'connect'}->(); }; step connect => sub { return async_connect( ipaddr => $addr, port => $port, connect_cb => $steps->{'connect_cb'}, ); }; step connect_cb => sub { my ($err, $conn_sock) = @_; die $err if $err; $socket = $conn_sock; return $steps->{'write_block'}->(); }; # ... } The C function sets the stage. It is given a reference to the callback for this function (recall there is only one exit point!), and "patches" that reference to free C<$steps>, which otherwise forms a reference loop, on exit. WARNING: if the function or method needs to do any kind of setup before its first step, that setup should be done either in a C step or I the C invocation. Do not write any statements other than step declarations after the C call. Note that there are more steps in this example than are strictly necessary: the body of C could be appended to C. The extra steps make the overall operation more readable by adding "punctuation" to separate the task of handling a callback (C) from starting the next operation (C). Also note that the enclosing scope contains some lexical (C) variables which are shared by several of the callbacks. All of the steps are wrapped by C, so each step will be executed on a separate iteration of the MainLoop. This generally has the effect of making asynchronous functions share CPU time more fairly. Sometimes, especially when using the low-level interface, a callback must be called immediately. To achieve this for all callbacks, add C<< immediate => 1 >> to the C invocation: my $steps = define_steps cb_ref => \$finished_cb, immediate => 1; To do the same for a single step, add the same keyword to the C invocation: step immediate => 1, connect => sub { .. }; In some case, you want to execute some code when the step finish, it can be done by defining a finalize code in define_steps: my $steps = define_steps cb_ref => \$finished_cb, finalize => sub { .. }; =head2 JOINING ASYNCHRONOUS "THREADS" With slow operations, it is often useful to perform multiple operations simultaneously. As an example, the following code might run two system commands simultaneously and capture their output: sub run_two_commands { my ($finished_cb) = @_; my $running_commands = 0; my ($result1, $result2); my $steps = define_steps cb_ref => \$finished_cb; step start => sub { $running_commands++; run_command($command1, run_cb => $steps->{'command1_done'}); $running_commands++; run_command($command2, run_cb => $steps->{'command2_done'}); }; step command1_done => sub { $result1 = $_[0]; $steps->{'maybe_done'}->(); }; step command2_done => sub { $result2 = $_[0]; $steps->{'maybe_done'}->(); }; step maybe_done => sub { return if --$running_commands; # not done yet $finished_cb->($result1, $result2); }; } It is tempting to optimize out the C<$running_commands> with something like: step maybe_done { ## BAD! return unless defined $result1 and defined $result2; $finished_cb->($result1, $result2); } However this can lead to trouble. Remember that define_steps automatically applies C to each step, so a C is not invoked immediately by C and C - instead, C is scheduled for invocation in the next loop of the mainloop (via C). If both commands finish before C is invoked, C will be called I, with both C<$result1> and C<$result2> defined both times. The result is that C<$finished_cb> is called twice, and mayhem ensues. This is a complex case, but worth understanding if you want to be able to debug difficult MainLoop bugs. =head2 WRITING ASYNCHRONOUS INTERFACES When designing a library or interface that will accept and invoke callbacks, follow these guidelines so that users of the interface will not need to remember special rules. Each callback signature within a package should always have the same name, ending with C<_cb>. For example, a hypothetical C module might provide its estimates through a callback with four parameters. This callback should be referred to as C throughout the package, and its parameters should be clearly defined in the package's documentation. It should take positional parameters only. If error conditions must also be communicated via the callback, then the first parameter should be an C<$error> parameter, which is undefined when no error has occurred. The Changer API's C is typical of such a callback signature. A caller can only know that an operation is complete by the invocation of the callback, so it is important that a callback be invoked I in all circumstances. Even in an error condition, the caller needs to know that the operation has failed. Also beware of bugs that might cause a callback to be invoked twice. Functions or methods taking callbacks as arguments should either take only a callback (like C), or take hash-key parameters, where the callback's key is the signature name. For example, the C package might define a function like C, invoked something like this: my $estimate_cb = make_cb(estimate_cb => sub { my ($err, $size, $level) = @_; die $err if $err; # ... }); Amanda::Estimate::perform_estimate( host => $host, disk => $disk, estimate_cb => $estimate_cb, ); When invoking a user-supplied callback within the library, there is no need to wrap it in a C invocation, as the user already supplied that wrapper via C, or is not interested in using such a wrapper. Callbacks are a form of continuation (L), and as such should only be called at the I of a function. Do not do anything after invoking a callback, as you cannot know what processing has gone on in the callback. sub estimate_done { # ... $self->{'estimate_cb'}->(undef, $size, $level); $self->{'estimate_in_progress'} = 0; # BUG!! } In this case, the C invocation may have called C again, setting C back to 1. A technique to avoid this pitfall is to always C a callback's result, even though that result is not important. This makes the bug much more apparent: sub estimate_done { # ... return $self->{'estimate_cb'}->(undef, $size, $level); $self->{'estimate_in_progress'} = 0; # BUG (this just looks silly) } =head1 LOW-LEVEL INTERFACE MainLoop events are generated by event sources. A source may produce multiple events over its lifetime. The higher-level methods in the previous section provide a more Perlish abstraction of event sources, but for efficiency it is sometimes necessary to use event sources directly. The method C<< $src->set_callback(\&cb) >> sets the function that will be called for a given source, and "attaches" the source to the main loop so that it will begin generating events. The arguments to the callback depend on the event source, but the first argument is always the source itself. Unless specified, no other arguments are provided. Event sources persist until they are removed with C<< $src->remove() >>, even if the source itself is no longer accessible from Perl. Although Glib supports it, there is no provision for "automatically" removing an event source. Also, calling C<< $src->remove() >> more than once is a potentially-fatal error. As an example: sub start_timer { my ($loops) = @_; Amanda::MainLoop::timeout_source(200)->set_callback(sub { my ($src) = @_; print "timer\n"; if (--$loops <= 0) { $src->remove(); Amanda::MainLoop::quit(); } }); } start_timer(10); Amanda::MainLoop::run(); There is no means in place to specify extra arguments to be provided to a source callback when it is set. If the callback needs access to other data, it should use a Perl closure in the form of lexically scoped variables and an anonymous sub. In fact, this is exactly what the higher-level functions (described above) do. =head2 Timeout my $src = Amanda::MainLoop::timeout_source(10000); A timeout source will create events at the specified interval, specified in milliseconds (thousandths of a second). The events will continue until the source is destroyed. =head2 Idle my $src = Amanda::MainLoop::idle_source(2); An idle source will create events continuously except when a higher-priority source is emitting events. Priorities are generally small positive integers, with larger integers denoting lower priorities. The events will continue until the source is destroyed. =head2 Child Watch my $src = Amanda::MainLoop::child_watch_source($pid); A child watch source will issue an event when the process with the given PID dies. To avoid race conditions, it will issue an event even if the process dies before the source is created. The callback is called with three arguments: the event source, the PID, and the child's exit status. Note that this source is totally incompatible with any thing that would cause perl to change the SIGCHLD handler. If SIGCHLD is changed, under some circumstances the module will recognize this circumstance, add a warning to the debug log, and continue operating. However, it is impossible to catch all possible situations. =head2 File Descriptor my $src = Amanda::MainLoop::fd_source($fd, $G_IO_IN); This source will issue an event whenever one of the given conditions is true for the given file (a file handle or integer file descriptor). The conditions are from Glib's GIOCondition, and are C<$G_IO_IN>, C, C<$G_IO_PRI>, C<$G_IO_ERR>, C<$G_IO_HUP>, and C<$G_IO_NVAL>. These constants are available with the import tag C<:GIOCondition>. Generally, when reading from a file descriptor, use C<$G_IO_IN|$G_IO_HUP|$G_IO_ERR> to ensure that an EOF triggers an event as well. Writing to a file descriptor can simply use C<$G_IO_OUT|$G_IO_ERR>. The callback attached to an FdSource should read from or write to the underlying file descriptor before returning, or it will be called again in the next iteration of the main loop, which can lead to unexpected results. Do I use C here! =head2 Combining Event Sources Event sources are often set up in groups, e.g., a long-term operation and a timeout. When this is the case, be careful that all sources are removed when the operation is complete. The easiest way to accomplish this is to include all sources in a lexical scope and remove them at the appropriate times: { my $op_src = long_operation_src(); my $timeout_src = Amanda::MainLoop::timeout_source($timeout); sub finish { $op_src->remove(); $timeout_src->remove(); } $op_src->set_callback(sub { print "Operation complete\n"; finish(); }); $timeout_src->set_callback(sub { print "Operation timed out\n"; finish(); }); } =head2 Relationship to Glib Glib's main event loop is described in the Glib manual: L. Note that Amanda depends only on the functionality available in Glib-2.2.0, so many functions described in that document are not available in Amanda. This module provides a much-simplified interface to the glib library, and is not intended as a generic wrapper for it: Amanda's perl-accessible main loop only runs a single C, and always runs in the main thread; and (aside from idle sources), event priorities are not accessible from Perl. =cut %}