This week’s Perl and Raku Conference 2022 in Houston was packed with great presentations, and I humbly added to them with a five-​ish minute lightning talk on two of Perl’s more misunderstood functions: map and grep.

Sorry about the um”s and ah”s…

I’ve written much about list processing in Perl, and this talk was based on the following blog posts:

Overall I loved attending the conference, and it really invigorated my participation in the Perl community. Stay tuned as I resume regular posting!

Update for Raku

On Twitter I nudged prominent Raku hacker (and recovered Perl hacker) Elizabeth Mattijsen to write about the Raku programming language’s map and grep functionality. Check out her five-​part series on DEV.to.

plate of eggs and hash browns

This month I started a new job at Alert Logic, a cybersecurity provider with Perl (among many other things) at its beating heart. I’ve been learning a lot, and part of the process has been understanding the APIs in the code base. To that end, I’ve been writing small test scripts to tease apart data structures, using Perl array-​processing, list-​processing, and hash- (i.e., associative array)-processing functions.

I’ve covered map, grep, and friends a couple times before. Most recently, I described using List::Util’s any function to check if a condition is true for any item in a list. In the simplest case, you can use it to check to see if a given value is in the list at all:

use feature 'say';
use List::Util 'any';
my @colors =
  qw(red orange yellow green blue indigo violet);
say 'matched' if any { /^red$/ } @colors;

However, if you’re going to be doing this a lot with arbitrary strings, Perl FAQ section 4 advises turning the array into the keys of a hash and then checking for membership there. For example, here’s a simple script to check if the colors input (either from the keyboard or from files passed as arguments) are in the rainbow:

#!/usr/bin/env perl

use v5.22; # introduced <<>> for safe opening of arguments
use warnings;
 
my %in_colors = map {$_ => 1}
  qw(red orange yellow green blue indigo violet);

while (<<>>) {
  chomp;
  say "$_ is in the rainbow" if $in_colors{$_};
}

List::Util has a bunch of functions for processing lists of pairs that I’ve found useful when pawing through hashes. pairgrep, for example, acts just like grep but instead assigns $a and $b to each key and value passed in and returns the resulting pairs that match. I’ve used it as a quick way to search for hash entries matching certain value conditions:

use List::Util 'pairgrep';
my %numbers = (zero => 0, one => 1, two => 2, three => 3);
my %odds = pairgrep {$b % 2} %numbers;

Sure, you could do this by invoking a mix of plain grep, keys, and a hash slice, but it’s noisier and more repetitive:

use v5.20; # for key/value hash slice 
my %odds = %numbers{grep {$numbers{$_} % 2} keys %numbers};

pairgreps compiled C‑based XS code can also be faster, as evidenced by this Benchmark script that works through a hash made of the Unix words file (479,828 entries on my machine):

#!/usr/bin/env perl

use v5.20;
use warnings;
use List::Util 'pairgrep';
use Benchmark 'cmpthese';

my (%words, $count);
open my $fh, '<', '/usr/share/dict/words'
  or die "can't open words: $!";
while (<$fh>) {
  chomp;
  $words{$_} = $count++;
}
close $fh;

cmpthese(100, {
  grep => sub {
    my %odds = %words{grep {$words{$_} % 2} keys %words};
  },
  pairgrep => sub {
    my %odds = pairgrep {$b % 2} %words;
  },
} );

Benchmark output:

           Rate     grep pairgrep
grep     1.47/s       --     -20%
pairgrep 1.84/s      25%       --

In general, I urge you to work through the Perl documentations tutorials on references, lists of lists, the data structures cookbook, and the FAQs on array and hash manipulation. Then dip into the various list-​processing modules (especially the included List::Util and CPAN’s List::SomeUtils) for ready-​made functions for common operations. You’ll find a wealth of techniques for creating, managing, and processing the data structures that your programs need.

chocolate bar and sugar cubes on a hand
What about My::Favorite::Module?

I mentioned at the Ephemeral Miniconf last month that as soon as I write about one Perl module (or five), someone inevitably brings up another (or seven) I’ve missed. And of course, it happened again last week: no sooner had I written in passing that I was using Exception::Class than the denizens of the Libera Chat IRC #perl channel insisted I should use Throwable instead for defining my exceptions. (I’ve already blogged about various ways of catching exceptions.)

Why Throwable? Aside from Exception::Class’s author recommending it over his own work due to a nicer, more modern interface,” Throwable is a Moo role, so it’s composable into classes along with other roles instead of mucking about with multiple inheritance. This means that if your exceptions need to do something reusable in your application like logging, you can also consume a role that does that and not have so much duplicate code. (No, I’m not going to pick a favorite logging module; I’ll probably get that wrong too.)

However, since Throwable is a role instead of a class, I would have to define several additional packages in my tiny modulino script from last week, one for each exception class I want. The beauty of Exception::Class is its simple declarative nature: just use it and pass a list of desired class names along with options for attributes and whatnot. What’s needed for simple use cases like mine is a declarative syntax for defining several exception classes without the noise of multiple packages.

Enter Throwable::SugarFactory, a module that enables you to do just that by adding an exception function for declaring exception classes. (There’s also the similarly-​named Throwable::Factory; see the above discussion about never being able to cover everybody’s favorites.) The exception function takes three arguments: the name of the desired exception class as a string, a description, and an optional list of instructions Moo uses to build the class. It might look something like this:

package Local::My::Exceptions;
use Throwable::SugarFactory;

exception GenericError  => 'something bad happened';
exception DetailedError => 'something specific happened' =>
  ( has => [ message => ( is => 'ro' ) ] );

1;

Throwable::SugarFactory takes care of creating constructor functions in Perl-​style snake_case as well as functions for detecting what kind of exception is being caught, so you can use your new exception library like this:

#!/usr/bin/env perl

use experimental qw(isa);
use Feature::Compat::Try;
use JSON::MaybeXS;
use Local::My::Exceptions;

try {
    die generic_error();
}
catch ($e) {
    warn 'whoops!';
}

try {
    die detailed_error( message => 'you got me' );
}
catch ($e) {
    die encode_json( $e->to_hash )
      if $e isa DetailedError and defined $e->message;
    $e->throw if $e->does('Throwable');
    die $e;
}

The above also demonstrates a couple of other Throwable::SugarFactory features. First, you get a to_hash method that returns a hash reference of all exception data, suitable for serializing to JSON. Second, you get all of Throwable’s methods, including throw for re-​throwing exceptions. 

So where does this leave last week’s FOAAS.com modulino client demonstration of object mocking tests? With a little bit of rewriting to define and then use our sweeter exception library, it looks like this. You can review for a description of the rest of its workings.

#!/usr/bin/env perl

package Local::CallFOAAS::Exceptions;
use Throwable::SugarFactory;

BEGIN {
    exception NoMethodError =>
      'no matching WebService::FOAAS method' =>
      ( has => [ method => ( is => 'ro' ) ] );
    exception ServiceError =>
      'error from WebService::FOAAS' =>
      ( has => [ message => ( is => 'ro' ) ] );
}

package Local::CallFOAAS;  # this is a modulino
use Test2::V0;             # enables strict, warnings, utf8

# declare all the new stuff we're using
use feature qw(say state);
use experimental qw(isa postderef signatures);
use Feature::Compat::Try;
use Syntax::Construct qw(non-destructive-substitution);

use WebService::FOAAS ();
use Package::Stash;
BEGIN { Local::CallFOAAS::Exceptions->import() }

my $foaas = Package::Stash->new('WebService::FOAAS');

my $run_as =
    !!$ENV{CPANTEST}       ? 'test'
  : !defined scalar caller ? 'run'
  :                          undef;
__PACKAGE__->$run_as(@ARGV) if defined $run_as;

sub run ( $class, @args ) {
    try { say $class->call_method(@args) }
    catch ($e) {
        die 'No method ', $e->method, "\n"
          if $e isa NoMethodError;
        die 'Service error: ', $e->message, "\n"
          if $e isa ServiceError;
        die "$e\n";
    }
    return;
}

# Utilities

sub methods ($) {
    state @methods = sort map s/^foaas_(.+)/$1/r,
      grep /^foaas_/, $foaas->list_all_symbols('CODE');
    return @methods;
}

sub call_method ( $class, $method = '', @args ) {
    state %methods = map { $_ => 1 } $class->methods();
    die no_method_error( method => $method )
      unless $methods{$method};
    return do {
        try { $foaas->get_symbol("&$method")->(@args) }
        catch ($e) { die service_error( message => $e ) }
    };
}

# Testing

sub test ( $class, @ ) {
    state $stash = Package::Stash->new($class);
    state @tests = sort grep /^_test_/,
      $stash->list_all_symbols('CODE');

    for my $test (@tests) {
        subtest $test => sub {
            try { $class->$test() }
            catch ($e) { diag $e }
        };
    }
    done_testing();
    return;
}

sub _test_can ($class) {
    state @subs = qw(run call_method methods test);
    can_ok $class, \@subs, "can do: @subs";
    return;
}

sub _test_methods ($class) {
    my $mock = mock 'WebService::FOAAS' => ( track => 1 );

    for my $method ( $class->methods() ) {
        $mock->override( $method => 1 );

        ok lives { $class->call_method($method) },
          "$method lives";
        ok scalar $mock->sub_tracking->{$method}->@*,
          "$method called";
    }
    return;
}

sub _test_service_failure ($class) {
    my $mock = mock 'WebService::FOAAS';

    for my $method ( $class->methods() ) {
        $mock->override( $method => sub { die 'mocked' } );

        my $exception =
          dies { $class->call_method($method) };
        isa_ok $exception, [ServiceError],
          "$method throws ServiceError on failure";
        like $exception->message, qr/^mocked/,
          "correct error in $method exception";
    }
    return;
}

1;

[Updated, thanks to Dan Book, Karen Etheridge, and Bob Kleemann] The only goofy bit above is the need to put the exception calls in a BEGIN block and then explicitly call BEGIN { Local::CallFOAAS::Exceptions->import() }. Since the two packages are in the same file, I can’t do a use statement since the implied require would look for a corresponding file or entry in %INC. (You can get around this by messing with %INC directly or through a module like me::inlined that does that messing for you, but for a single-​purpose modulino like this it’s fine.)


happy man funny sticking tongue out

Over the past two years, I’ve gotten back into playing Dungeons & Dragons, the famous tabletop fantasy role-​playing game. As a software developer and musician, one of my favorite character classes to play is the bard, a magical and inspiring performer or wordsmith. The list of basic bardic spells includes Vicious Mockery, enchanting verbal barbs that have the power to psychically damage and disadvantage an opponent even if they don’t understand the words. (Can you see why this is so appealing to a coder?)

Mocking has a role to play in software testing as well, in the form of mock objects that simulate parts of a system that are too brittle, too slow, too complicated, or otherwise too finicky to use in reality. They enable discrete unit testing without relying on dependencies external to the code being tested. Mocks are great for databases, web services, or other network resources where the goal is to test what you wrote, not what’s out in the cloud” somewhere.

Speaking of web services and mocking, one of my favorites is the long-​running FOAAS (link has language not safe for work), a surprisingly expansive RESTful insult service. There’s a corresponding Perl client API, of course, but what I was missing was a handy Perl script to call that API from the terminal command line. So I wrote the following over Thanksgiving break, trying to keep it simple while also showing the basics of mocking such an API. It also demonstrates some newer Perl syntax and testing techniques as well as brian d foys modulino concept from Mastering Perl (second edition, 2014) that marries script and module into a self-​contained executable library.

#!/usr/bin/env perl

package Local::CallFOAAS;  # this is a modulino
use Test2::V0;             # enables strict, warnings, utf8

# declare all the new stuff we're using
use feature qw(say state);
use experimental qw(isa postderef signatures);
use Feature::Compat::Try;
use Syntax::Construct qw(non-destructive-substitution);

use WebService::FOAAS ();
use Package::Stash;
use Exception::Class (
    NoMethodException => {
        alias  => 'throw_no_method',
        fields => 'method',
    },
    ServiceException => { alias => 'throw_service' },
);

my $foaas = Package::Stash->new('WebService::FOAAS');

my $run_as =
    !!$ENV{CPANTEST}       ? 'test'
  : !defined scalar caller ? 'run'
  :                          undef;
__PACKAGE__->$run_as(@ARGV) if defined $run_as;

sub run ( $class, @args ) {
    try { say $class->call_method(@args) }
    catch ($e) {
        die 'No method ', $e->method, "\n"
          if $e isa NoMethodException;
        die 'Service error: ', $e->error, "\n"
          if $e isa ServiceException;
        die "$e\n";
    }
    return;
}

# Utilities

sub methods ($) {
    state @methods = sort map s/^foaas_(.+)/$1/r,
      grep /^foaas_/, $foaas->list_all_symbols('CODE');
    return @methods;
}

sub call_method ( $class, $method = '', @args ) {
    state %methods = map { $_ => 1 } $class->methods();
    throw_no_method( method => $method )
      unless $methods{$method};
    return do {
        try { $foaas->get_symbol("&$method")->(@args) }
        catch ($e) { throw_service( error => $e ) }
    };
}

# Testing

sub test ( $class, @ ) {
    state $stash = Package::Stash->new($class);
    state @tests = sort grep /^_test_/,
      $stash->list_all_symbols('CODE');

    for my $test (@tests) {
        subtest $test => sub {
            try { $class->$test() }
            catch ($e) { diag $e }
        };
    }
    done_testing();
    return;
}

sub _test_can ($class) {
    state @subs = qw(run call_method methods test);
    can_ok( $class, \@subs, "can do: @subs" );
    return;
}

sub _test_methods ($class) {
    my $mock = mock 'WebService::FOAAS' => ( track => 1 );

    for my $method ( $class->methods() ) {
        $mock->override( $method => 1 );

        ok lives { $class->call_method($method) },
          "$method lives";
        ok scalar $mock->sub_tracking->{$method}->@*,
          "$method called";
    }
    return;
}

sub _test_service_failure ($class) {
    my $mock = mock 'WebService::FOAAS';

    for my $method ( $class->methods() ) {
        $mock->override( $method => sub { die 'mocked' } );

        my $exception =
          dies { $class->call_method($method) };
        isa_ok $exception, ['ServiceException'],
          "$method throws ServiceException on failure";
        like $exception->error, qr/^mocked/,
          "correct error in $method exception";
    }
    return;
}

1;

Let’s walk through the code above.

Preliminaries

First, there’s a generic shebang line to indicate that Unix and Linux systems should use the perl executable found in the user’s PATH via the env command. I declare a package name (in the Local:: namespace) so as not to pollute the default main package of other scripts that might want to require this as a module. Then I use the Test2::V0 bundle from Test2::Suite since the embedded testing code uses many of its functions. This also has the side effect of enabling the strict, warnings, and utf8 pragmas, so there’s no need to explicitly use them here.

(Why Test2 instead of Test::More and its derivatives and add-​ons? Both are maintained by the same author, who recommends the former. I’m seeing more and more modules using it, so I thought this would be a great opportunity to learn.)

I then declare all the new-​ish Perl features I’d like to use that need to be explicitly enabled so as not to sacrifice backward compatibility with older versions of Perl 5. As of this writing, some of these features (the isa class instance operator, named argument subroutine signatures, and try/​catch exception handling syntax) are considered experimental, with the latter enabled in older versions of Perl via the Feature::Compat::Try module. The friendlier postfix dereferencing syntax was mainlined in Perl version 5.24, but versions 5.20 and 5.22 still need it experimental. Finally, I use Syntax::Construct to announce the /r flag for non-​destructive regular expression text substitutions introduced in version 5.14.

Next, I bring in the aforementioned FOAAS Perl API without importing any of its functions, Package::Stash to make metaprogramming easier, and a couple of exception classes so that the command line function and other consumers might better tell what caused a failure. In preparation for the methods below dynamically discovering what functions are provided by WebService::FOAAS, I gather up its symbol table (or stash) into the $foaas variable.

The next block determines how, if at all, I’m going to run the code as a script. If the CPANTEST environment variable is set, I’ll call the test class method sub, but if there’s no subroutine calling me I’ll execute the run class method. Either will receive the command line arguments from @ARGV. If neither of these conditions is true, do nothing; the rest of the code is method declarations.

Modulino methods, metaprogramming, and exceptions

The first of these is the run method. It’s a thin wrapper around the call_method class method detailed below, either outputting its result or dieing with an appropriate error depending on the class of exception thrown. Although I chose not to write tests for this output, future tests might call this method and catch these rethrown exceptions to match against them. The messages end with a \n newline character so die knows not to append the current script line number.

Next is a utility method called methods that uses Package::Stash’s list_all_symbols to retrieve the names of all named CODE blocks (i.e., subs) from WebService::FOAAS’s symbol table. Reading from right to left, these are then filtered with grep to only find those beginning in foaas_ and then transformed with map to remove that prefix. The list is then sorted and stored in a state variable and returned so it need not be initialized again.

(As an aside, although perlcritic sternly warns against it I’ve chosen the expression forms of grep and map here over their block forms for simplicity’s sake. It’s OK to bend the rules if you have a good reason.)

sub call_method is where the real action takes place. Its parameters are the class that called it, the name of a FOAAS $method (defaulted to the empty string), and an array of optional arguments in @args. I build a hash or associative array from the earlier methods method which I then use to see if the passed method name is one I know about. If not, I throw a NoMethodException using the throw_no_method alias function created when I used Exception::Class at the beginning. Using a function instead of NoMethodException->throw() means that it’s checked at compile time rather than runtime, catching typos.

I get the subroutine (denoted by a & sigil) named by $method from the $foaas stash and pass it any further received arguments from @args. If that WebService::FOAAS subroutine throws an exception it’ll be caught and re-​thrown as a ServiceException; otherwise call_method returns the result. It’s up to the caller to determine what, if anything, to do with that result or any thrown exceptions.

Testing the modulino with mocks

This is where I start using those Test2::Suite tools I mentioned at the beginning. The test class method starts by building a filtered list of all subs beginning with _test_ in the current class, much like methods did above with WebService::FOAAS. I then loop through that list of subs, running each as a subtest containing a class method with any exceptions reported as diagnostics.

The rest of the modulino is subtest methods, starting with a simple _test_can sanity check for the public methods in the class. Following that is _test_methods, which starts by mocking the WebService::FOAAS package and telling Test2::Mock I want to track any added, overridden, or set subs. I then loop through all the method names returned by the methods class method, overrideing each one to return a simple true value. I then test passing those names to call_method and use the hash reference returned by sub_tracking to check that the overridden sub was called. This seems a lot simpler than the Test::Builder-based mocking libraries I’ve tried like Test::MockModule and Test::MockObject.

_test_service_failure acts in much the same way, checking that call_method correctly throws ServiceExceptions if the wrapped WebService::FOAAS function dies. The main difference is that the mocked WebService::FOAAS subs are now overridden with a code reference (sub { die 'mocked' }), which call_method uses to populate the rethrown ServiceExceptions error field.

Wrapping up

With luck, this article has given you some ideas, whether it’s in making scripts (perhaps legacy code) testable to improve them, or writing better unit tests that mock dependencies, or delving a little into metaprogramming so you can dynamically support and test new features of said dependencies. I hope you haven’t come away too offended, at least. Let me know in the comments what you think.

woman looking at the map

Six months ago I gave an overview of Perl’s list processing fundamentals, briefly describing what lists are and then introducing the built-​in map and grep functions for transforming and filtering them. Later on, I compiled a list (how appropriate) of list processing modules available via CPAN, noting there’s some confusing duplication of effort. But you’re a busy developer, and you just want to know the Right Thing To Do™ when faced with a list processing challenge.

First, some credit is due: these are all restatements of several Perl::Critic policies which in turn codify standards described in Damian Conway’s Perl Best Practices (2005). I’ve repeatedly recommended the latter as a starting point for higher-​quality Perl development. Over the years these practices continue to be re-​evaluated (including by the author himself) and various authors release new policy modules, but perlcritic remains a great tool for ensuring you (and your team or other contributors) maintain a consistent high standard in your code.

With that said, on to the recommendations!

Don’t use grep to check if any list elements match

It might sound weird to lead off by recommending not to use grep, but sometimes it’s not the right tool for the job. If you’ve got a list and want to determine if a condition matches any item in it, you might try:

if (grep { some_condition($_) } @my_list) {
    ... # don't do this!
}

Yes, this works because (in scalar context) grep returns the number of matches found, but it’s wasteful, checking every element of @my_list (which could be lengthy) before finally providing a result. Use the standard List::Util module’s any function, which immediately returns (“short-​circuits”) on the first match:

use List::Util 1.33 qw(any);

if (any { some_condition($_) } @my_list) {
... # do something
}

Perl has included the requisite version of this module since version 5.20 in 2014; for earlier releases, you’ll need to update from CPAN. List::Util has many other great list-​reduction, key/​value pair, and other related functions you can import into your code, so check it out before you attempt to re-​invent any wheels.

As a side note for web developers, the Perl Dancer framework also includes an any keyword for declaring multiple HTTP routes, so if you’re mixing List::Util in there don’t import it. Instead, call it explicitly like this or you’ll get an error about a redefined function:

use List::Util 1.33;

if (List::Util::any { some_condition($_) } @my_list) {
... # do something
}

This recommendation is codified in the BuiltinFunctions::ProhibitBooleanGrep Perl::Critic policy, comes directly from Perl Best Practices, and is recommended by the Software Engineering Institute Computer Emergency Response Team (SEI CERT)’s Perl Coding Standard.

Don’t change $_ in map or grep

I mentioned this back in March, but it bears repeating: map and grep are intended as pure functions, not mutators with side effects. This means that the original list should remain unchanged. Yes, each element aliases in turn to the $_ special variable, but that’s for speed and can have surprising results if changed even if it’s technically allowed. If you need to modify an array in-​place use something like:

for (@my_array) {
$_ = ...; # make your changes here
}

If you want something that looks like map but won’t change the original list (and don’t mind a few CPAN dependencies), consider List::SomeUtilsapply function:

use List::SomeUtils qw(apply);

my @doubled_array = apply {$_ *= 2} @old_array;

Lastly, side effects also include things like manipulating other variables or doing input and output. Don’t use map or grep in a void context (i.e., without a resulting array or list); do something with the results or use a for or foreach loop:

map { print foo($_) } @my_array; # don't do this
print map { foo($_) } @my_array; # do this instead

map { push @new_array, foo($_) } @my_array; # don't do this
@new_array = map { foo($_) } @my_array; # do this instead

This recommendation is codified by the BuiltinFunctions::ProhibitVoidGrep, BuiltinFunctions::ProhibitVoidMap, and ControlStructures::ProhibitMutatingListFunctions Perl::Critic policies. The latter comes from Perl Best Practices and is an SEI CERT Perl Coding Standard rule.

Use blocks with map and grep, not expressions

You can call map or grep like this (parentheses are optional around built-​in functions):

my @new_array  = map foo($_), @old_array; # don't do this
my @new_array2 = grep !/^#/, @old_array; # don't do this

Or like this:

my @new_array  = map { foo($_) } @old_array;
my @new_array2 = grep {!/^#/} @old_array;

Do it the second way. It’s easier to read, especially if you’re passing in a literal list or multiple arrays, and the expression forms can conceal bugs. This recommendation is codified by the BuiltinFunctions::RequireBlockGrep and BuiltinFunctions::RequireBlockMap Perl::Critic policies and comes from Perl Best Practices.

Refactor multi-​statement maps, greps, and other list functions

map, grep, and friends should follow the Unix philosophy of Do One Thing and Do It Well.” Your readability and maintainability drop with every statement you place inside one of their blocks. Consider junior developers and future maintainers (this includes you!) and refactor anything with more than one statement into a separate subroutine or at least a for loop. This goes for list processing functions (like the aforementioned any) imported from other modules, too.

This recommendation is codified by the Perl Best Practices-inspired BuiltinFunctions::ProhibitComplexMappings and BuiltinFunctions::RequireSimpleSortBlock Perl::Critic policies, although those only cover map and sort functions, respectively.


Do you have any other suggestions for list processing best practices? Feel free to leave them in the comments or better yet, consider creating new Perl::Critic policies for them or contacting the Perl::Critic team to develop them for your organization.