Six months ago I gave an overview of Perl’s list processing fundamentals, briefly describing what lists are and then introducing the built-​in map and grep functions for transforming and filtering them. Later on, I compiled a list (how appropriate) of list processing modules available via CPAN, noting there’s some confusing duplication of effort. But you’re a busy developer, and you just want to know the Right Thing To Do™ when faced with a list processing challenge.

First, some credit is due: these are all restatements of several Perl::Critic policies which in turn codify standards described in Damian Conway’s Perl Best Practices (2005). I’ve repeatedly recommended the latter as a starting point for higher-​quality Perl development. Over the years these practices continue to be re-​evaluated (including by the author himself) and various authors release new policy modules, but perlcritic remains a great tool for ensuring you (and your team or other contributors) maintain a consistent high standard in your code.

With that said, on to the recommendations!

Don’t use grep to check if any list elements match

It might sound weird to lead off by recommending not to use grep, but sometimes it’s not the right tool for the job. If you’ve got a list and want to determine if a condition matches any item in it, you might try:

if (grep { some_condition($_) } @my_list) {
    ... # don't do this!
}

Yes, this works because (in scalar context) grep returns the number of matches found, but it’s wasteful, checking every element of @my_list (which could be lengthy) before finally providing a result. Use the standard List::Util module’s any function, which immediately returns (“short-​circuits”) on the first match:

use List::Util 1.33 qw(any);

if (any { some_condition($_) } @my_list) {
... # do something
}

Perl has included the requisite version of this module since version 5.20 in 2014; for earlier releases, you’ll need to update from CPAN. List::Util has many other great list-​reduction, key/​value pair, and other related functions you can import into your code, so check it out before you attempt to re-​invent any wheels.

As a side note for web developers, the Perl Dancer framework also includes an any keyword for declaring multiple HTTP routes, so if you’re mixing List::Util in there don’t import it. Instead, call it explicitly like this or you’ll get an error about a redefined function:

use List::Util 1.33;

if (List::Util::any { some_condition($_) } @my_list) {
... # do something
}

This recommendation is codified in the BuiltinFunctions::ProhibitBooleanGrep Perl::Critic policy, comes directly from Perl Best Practices, and is recommended by the Software Engineering Institute Computer Emergency Response Team (SEI CERT)’s Perl Coding Standard.

Don’t change $_ in map or grep

I mentioned this back in March, but it bears repeating: map and grep are intended as pure functions, not mutators with side effects. This means that the original list should remain unchanged. Yes, each element aliases in turn to the $_ special variable, but that’s for speed and can have surprising results if changed even if it’s technically allowed. If you need to modify an array in-​place use something like:

for (@my_array) {
$_ = ...; # make your changes here
}

If you want something that looks like map but won’t change the original list (and don’t mind a few CPAN dependencies), consider List::SomeUtilsapply function:

use List::SomeUtils qw(apply);

my @doubled_array = apply {$_ *= 2} @old_array;

Lastly, side effects also include things like manipulating other variables or doing input and output. Don’t use map or grep in a void context (i.e., without a resulting array or list); do something with the results or use a for or foreach loop:

map { print foo($_) } @my_array; # don't do this
print map { foo($_) } @my_array; # do this instead

map { push @new_array, foo($_) } @my_array; # don't do this
@new_array = map { foo($_) } @my_array; # do this instead

This recommendation is codified by the BuiltinFunctions::ProhibitVoidGrep, BuiltinFunctions::ProhibitVoidMap, and ControlStructures::ProhibitMutatingListFunctions Perl::Critic policies. The latter comes from Perl Best Practices and is an SEI CERT Perl Coding Standard rule.

Use blocks with map and grep, not expressions

You can call map or grep like this (parentheses are optional around built-​in functions):

my @new_array  = map foo($_), @old_array; # don't do this
my @new_array2 = grep !/^#/, @old_array; # don't do this

Or like this:

my @new_array  = map { foo($_) } @old_array;
my @new_array2 = grep {!/^#/} @old_array;

Do it the second way. It’s easier to read, especially if you’re passing in a literal list or multiple arrays, and the expression forms can conceal bugs. This recommendation is codified by the BuiltinFunctions::RequireBlockGrep and BuiltinFunctions::RequireBlockMap Perl::Critic policies and comes from Perl Best Practices.

Refactor multi-​statement maps, greps, and other list functions

map, grep, and friends should follow the Unix philosophy of Do One Thing and Do It Well.” Your readability and maintainability drop with every statement you place inside one of their blocks. Consider junior developers and future maintainers (this includes you!) and refactor anything with more than one statement into a separate subroutine or at least a for loop. This goes for list processing functions (like the aforementioned any) imported from other modules, too.

This recommendation is codified by the Perl Best Practices-inspired BuiltinFunctions::ProhibitComplexMappings and BuiltinFunctions::RequireSimpleSortBlock Perl::Critic policies, although those only cover map and sort functions, respectively.


Do you have any other suggestions for list processing best practices? Feel free to leave them in the comments or better yet, consider creating new Perl::Critic policies for them or contacting the Perl::Critic team to develop them for your organization.

4 thoughts on “Better Perl: Four list processing best practices with map, grep, and more

    • They are not equivalent and using first instead of any can lead to bugs. first will return the first element for which the condition is true; if that element is itself a false value (0, the null string, or undef), the function will evaluate to false even though the element was found.

Comments are closed.