Six months ago I gave an overview of Perl’s list pro­cess­ing fun­da­men­tals, briefly describ­ing what lists are and then intro­duc­ing the built-​in map and grep func­tions for trans­form­ing and fil­ter­ing them. Later on, I com­piled a list (how appro­pri­ate) of list pro­cess­ing mod­ules avail­able via CPAN, not­ing there’s some con­fus­ing dupli­ca­tion of effort. But you’re a busy devel­op­er, and you just want to know the Right Thing To Do™ when faced with a list pro­cess­ing challenge.

First, some cred­it is due: these are all restate­ments of sev­er­al Perl::Critic poli­cies which in turn cod­i­fy stan­dards described in Damian Conway’s Perl Best Practices (2005). I’ve repeat­ed­ly rec­om­mend­ed the lat­ter as a start­ing point for higher-​quality Perl devel­op­ment. Over the years these prac­tices con­tin­ue to be re-​evaluated (includ­ing by the author him­self) and var­i­ous authors release new pol­i­cy mod­ules, but perlcritic remains a great tool for ensur­ing you (and your team or oth­er con­trib­u­tors) main­tain a con­sis­tent high stan­dard in your code.

With that said, on to the recommendations!

Don’t use grep to check if any list elements match

It might sound weird to lead off by rec­om­mend­ing not to use grep, but some­times it’s not the right tool for the job. If you’ve got a list and want to deter­mine if a con­di­tion match­es any item in it, you might try:

if (grep { some_condition($_) } @my_list) {
    ... # don't do this!

Yes, this works because (in scalar con­text) grep returns the num­ber of match­es found, but it’s waste­ful, check­ing every ele­ment of @my_list (which could be lengthy) before final­ly pro­vid­ing a result. Use the stan­dard List::Util module’s any func­tion, which imme­di­ate­ly returns (“short-​circuits”) on the first match:

use List::Util 1.33 qw(any);

if (any { some_condition($_) } @my_list) {
... # do something

Perl has includ­ed the req­ui­site ver­sion of this mod­ule since ver­sion 5.20 in 2014; for ear­li­er releas­es, you’ll need to update from CPAN. List::Util has many oth­er great list-​reduction, key/​value pair, and oth­er relat­ed func­tions you can import into your code, so check it out before you attempt to re-​invent any wheels.

As a side note for web devel­op­ers, the Perl Dancer frame­work also includes an any key­word for declar­ing mul­ti­ple HTTP routes, so if you’re mix­ing List::Util in there don’t import it. Instead, call it explic­it­ly like this or you’ll get an error about a rede­fined function:

use List::Util 1.33;

if (List::Util::any { some_condition($_) } @my_list) {
... # do something

This rec­om­men­da­tion is cod­i­fied in the BuiltinFunctions::ProhibitBooleanGrep Perl::Critic pol­i­cy, comes direct­ly from Perl Best Practices, and is rec­om­mend­ed by the Software Engineering Institute Computer Emergency Response Team (SEI CERT)’s Perl Coding Standard.

Don’t change $_ in map or grep

I men­tioned this back in March, but it bears repeat­ing: map and grep are intend­ed as pure func­tions, not muta­tors with side effects. This means that the orig­i­nal list should remain unchanged. Yes, each ele­ment alias­es in turn to the $_ spe­cial vari­able, but that’s for speed and can have sur­pris­ing results if changed even if it’s tech­ni­cal­ly allowed. If you need to mod­i­fy an array in-​place use some­thing like:

for (@my_array) {
$_ = ...; # make your changes here

If you want some­thing that looks like map but won’t change the orig­i­nal list (and don’t mind a few CPAN depen­den­cies), con­sid­er List::SomeUtilsapply function:

use List::SomeUtils qw(apply);

my @doubled_array = apply {$_ *= 2} @old_array;

Lastly, side effects also include things like manip­u­lat­ing oth­er vari­ables or doing input and out­put. Don’t use map or grep in a void con­text (i.e., with­out a result­ing array or list); do some­thing with the results or use a for or foreach loop:

map { print foo($_) } @my_array; # don't do this
print map { foo($_) } @my_array; # do this instead

map { push @new_array, foo($_) } @my_array; # don't do this
@new_array = map { foo($_) } @my_array; # do this instead

This rec­om­men­da­tion is cod­i­fied by the BuiltinFunctions::ProhibitVoidGrep, BuiltinFunctions::ProhibitVoidMap, and ControlStructures::ProhibitMutatingListFunctions Perl::Critic poli­cies. The lat­ter comes from Perl Best Practices and is an SEI CERT Perl Coding Standard rule.

Use blocks with map and grep, not expressions

You can call map or grep like this (paren­the­ses are option­al around built-​in functions):

my @new_array  = map foo($_), @old_array; # don't do this
my @new_array2 = grep !/^#/, @old_array; # don't do this

Or like this:

my @new_array  = map { foo($_) } @old_array;
my @new_array2 = grep {!/^#/} @old_array;

Do it the sec­ond way. It’s eas­i­er to read, espe­cial­ly if you’re pass­ing in a lit­er­al list or mul­ti­ple arrays, and the expres­sion forms can con­ceal bugs. This rec­om­men­da­tion is cod­i­fied by the BuiltinFunctions::RequireBlockGrep and BuiltinFunctions::RequireBlockMap Perl::Critic poli­cies and comes from Perl Best Practices.

Refactor multi-​statement maps, greps, and other list functions

map, grep, and friends should fol­low the Unix phi­los­o­phy of Do One Thing and Do It Well.” Your read­abil­i­ty and main­tain­abil­i­ty drop with every state­ment you place inside one of their blocks. Consider junior devel­op­ers and future main­tain­ers (this includes you!) and refac­tor any­thing with more than one state­ment into a sep­a­rate sub­rou­tine or at least a for loop. This goes for list pro­cess­ing func­tions (like the afore­men­tioned any) import­ed from oth­er mod­ules, too.

This rec­om­men­da­tion is cod­i­fied by the Perl Best Practices-inspired BuiltinFunctions::ProhibitComplexMappings and BuiltinFunctions::RequireSimpleSortBlock Perl::Critic poli­cies, although those only cov­er map and sort func­tions, respectively.

Do you have any oth­er sug­ges­tions for list pro­cess­ing best prac­tices? Feel free to leave them in the com­ments or bet­ter yet, con­sid­er cre­at­ing new Perl::Critic poli­cies for them or con­tact­ing the Perl::Critic team to devel­op them for your organization.

4 thoughts on “Better Perl: Four list processing best practices with map, grep, and more

  1. Use List::Util::first instead of any to avoid name clash­es and not wor­ry about ver­sions of List::Util.

    • They are not equiv­a­lent and using first instead of any can lead to bugs. first will return the first ele­ment for which the con­di­tion is true; if that ele­ment is itself a false val­ue (0, the null string, or undef), the func­tion will eval­u­ate to false even though the ele­ment was found.

Comments are closed.