woman looking at the map

Six months ago I gave an overview of Perl’s list pro­cess­ing fun­da­men­tals, briefly describ­ing what lists are and then intro­duc­ing the built-​in map and grep func­tions for trans­form­ing and fil­ter­ing them. Later on, I com­piled a list (how appro­pri­ate) of list pro­cess­ing mod­ules avail­able via CPAN, not­ing there’s some con­fus­ing dupli­ca­tion of effort. But you’re a busy devel­op­er, and you just want to know the Right Thing To Do™ when faced with a list pro­cess­ing challenge.

First, some cred­it is due: these are all restate­ments of sev­er­al Perl::Critic poli­cies which in turn cod­i­fy stan­dards described in Damian Conway’s Perl Best Practices (2005). I’ve repeat­ed­ly rec­om­mend­ed the lat­ter as a start­ing point for higher-​quality Perl devel­op­ment. Over the years these prac­tices con­tin­ue to be re-​evaluated (includ­ing by the author him­self) and var­i­ous authors release new pol­i­cy mod­ules, but perlcritic remains a great tool for ensur­ing you (and your team or oth­er con­trib­u­tors) main­tain a con­sis­tent high stan­dard in your code.

With that said, on to the recommendations!

Don’t use grep to check if any list elements match

It might sound weird to lead off by rec­om­mend­ing not to use grep, but some­times it’s not the right tool for the job. If you’ve got a list and want to deter­mine if a con­di­tion match­es any item in it, you might try:

if (grep { some_condition($_) } @my_list) {
    ... # don't do this!
}

Yes, this works because (in scalar con­text) grep returns the num­ber of match­es found, but it’s waste­ful, check­ing every ele­ment of @my_list (which could be lengthy) before final­ly pro­vid­ing a result. Use the stan­dard List::Util module’s any func­tion, which imme­di­ate­ly returns (“short-​circuits”) on the first match:

use List::Util 1.33 qw(any);

if (any { some_condition($_) } @my_list) {
... # do something
}

Perl has includ­ed the req­ui­site ver­sion of this mod­ule since ver­sion 5.20 in 2014; for ear­li­er releas­es, you’ll need to update from CPAN. List::Util has many oth­er great list-​reduction, key/​value pair, and oth­er relat­ed func­tions you can import into your code, so check it out before you attempt to re-​invent any wheels.

As a side note for web devel­op­ers, the Perl Dancer frame­work also includes an any key­word for declar­ing mul­ti­ple HTTP routes, so if you’re mix­ing List::Util in there don’t import it. Instead, call it explic­it­ly like this or you’ll get an error about a rede­fined function:

use List::Util 1.33;

if (List::Util::any { some_condition($_) } @my_list) {
... # do something
}

This rec­om­men­da­tion is cod­i­fied in the BuiltinFunctions::ProhibitBooleanGrep Perl::Critic pol­i­cy, comes direct­ly from Perl Best Practices, and is rec­om­mend­ed by the Software Engineering Institute Computer Emergency Response Team (SEI CERT)’s Perl Coding Standard.

Don’t change $_ in map or grep

I men­tioned this back in March, but it bears repeat­ing: map and grep are intend­ed as pure func­tions, not muta­tors with side effects. This means that the orig­i­nal list should remain unchanged. Yes, each ele­ment alias­es in turn to the $_ spe­cial vari­able, but that’s for speed and can have sur­pris­ing results if changed even if it’s tech­ni­cal­ly allowed. If you need to mod­i­fy an array in-​place use some­thing like:

for (@my_array) {
$_ = ...; # make your changes here
}

If you want some­thing that looks like map but won’t change the orig­i­nal list (and don’t mind a few CPAN depen­den­cies), con­sid­er List::SomeUtilsapply function:

use List::SomeUtils qw(apply);

my @doubled_array = apply {$_ *= 2} @old_array;

Lastly, side effects also include things like manip­u­lat­ing oth­er vari­ables or doing input and out­put. Don’t use map or grep in a void con­text (i.e., with­out a result­ing array or list); do some­thing with the results or use a for or foreach loop:

map { print foo($_) } @my_array; # don't do this
print map { foo($_) } @my_array; # do this instead

map { push @new_array, foo($_) } @my_array; # don't do this
@new_array = map { foo($_) } @my_array; # do this instead

This rec­om­men­da­tion is cod­i­fied by the BuiltinFunctions::ProhibitVoidGrep, BuiltinFunctions::ProhibitVoidMap, and ControlStructures::ProhibitMutatingListFunctions Perl::Critic poli­cies. The lat­ter comes from Perl Best Practices and is an SEI CERT Perl Coding Standard rule.

Use blocks with map and grep, not expressions

You can call map or grep like this (paren­the­ses are option­al around built-​in functions):

my @new_array  = map foo($_), @old_array; # don't do this
my @new_array2 = grep !/^#/, @old_array; # don't do this

Or like this:

my @new_array  = map { foo($_) } @old_array;
my @new_array2 = grep {!/^#/} @old_array;

Do it the sec­ond way. It’s eas­i­er to read, espe­cial­ly if you’re pass­ing in a lit­er­al list or mul­ti­ple arrays, and the expres­sion forms can con­ceal bugs. This rec­om­men­da­tion is cod­i­fied by the BuiltinFunctions::RequireBlockGrep and BuiltinFunctions::RequireBlockMap Perl::Critic poli­cies and comes from Perl Best Practices.

Refactor multi-​statement maps, greps, and other list functions

map, grep, and friends should fol­low the Unix phi­los­o­phy of Do One Thing and Do It Well.” Your read­abil­i­ty and main­tain­abil­i­ty drop with every state­ment you place inside one of their blocks. Consider junior devel­op­ers and future main­tain­ers (this includes you!) and refac­tor any­thing with more than one state­ment into a sep­a­rate sub­rou­tine or at least a for loop. This goes for list pro­cess­ing func­tions (like the afore­men­tioned any) import­ed from oth­er mod­ules, too.

This rec­om­men­da­tion is cod­i­fied by the Perl Best Practices-inspired BuiltinFunctions::ProhibitComplexMappings and BuiltinFunctions::RequireSimpleSortBlock Perl::Critic poli­cies, although those only cov­er map and sort func­tions, respectively.


Do you have any oth­er sug­ges­tions for list pro­cess­ing best prac­tices? Feel free to leave them in the com­ments or bet­ter yet, con­sid­er cre­at­ing new Perl::Critic poli­cies for them or con­tact­ing the Perl::Critic team to devel­op them for your organization.

The perlcritic tool is often your first defense against awk­ward, hard to read, error-​prone, or uncon­ven­tion­al con­structs in your code,” per its descrip­tion. It’s part of a class of pro­grams his­tor­i­cal­ly known as lin­ters, so-​called because like a clothes dry­er machine’s lint trap, they detect small errors with big effects.” (Another such lin­ter is perltidy, which I’ve ref­er­enced in the past.)

You can use perlcritic at the com­mand line, inte­grat­ed with your edi­tor, as a git pre-​commit hook, or (my pref­er­ence) as part of your author tests. It’s dri­ven by poli­cies, indi­vid­ual mod­ules that check your code against a par­tic­u­lar rec­om­men­da­tion, many of them from Damian Conway’s Perl Best Practices (2005). Those poli­cies, in turn, are enabled by PPI, a library that trans­forms Perl code into doc­u­ments that can be pro­gram­mat­i­cal­ly exam­ined and manip­u­lat­ed much like the Document Object Model (DOM) is used to pro­gram­mat­i­cal­ly access web pages.

perlcritic enables the fol­low­ing poli­cies by default unless you cus­tomize its con­fig­u­ra­tion or install more. These are just the gen­tle” (sever­i­ty lev­el 5) poli­cies, so con­sid­er them the bare min­i­mum in detect­ing bad prac­tices. The full set of includ­ed poli­cies goes much deep­er, ratch­et­ing up the sever­i­ty to stern,” harsh,” cru­el,” and bru­tal.” They’re fur­ther orga­nized accord­ing to themes so that you might selec­tive­ly review your code against issues like secu­ri­ty, main­te­nance, com­plex­i­ty, and bug prevention.

My favorite above is prob­a­bly ProhibitEvilModules. Aside from the col­or­ful name, a devel­op­ment team can use it to steer peo­ple towards an organization’s favored solu­tions rather than dep­re­cat­ed, bug­gy, unsup­port­ed, or inse­cure” ones. By default, it pro­hibits Class::ISA, Pod::Plainer, Shell, and Switch, but you should curate and con­fig­ure a list with­in your team.

Speaking of work­ing with­in a team, although perlcritic is meant to be a vital tool to ensure good prac­tices, it’s no sub­sti­tute for man­u­al peer code review. Those reviews can lead to the cre­ation or adop­tion of new auto­mat­ed poli­cies to save time and set­tle argu­ments, but such work should be done col­lab­o­ra­tive­ly after achiev­ing some kind of con­sen­sus. This is true whether you’re a team of employ­ees work­ing on pro­pri­etary soft­ware or a group of vol­un­teers devel­op­ing open source.

Of course, rea­son­able peo­ple can and do dis­agree over any of the includ­ed poli­cies, but as a rea­son­able per­son, you should have good rea­sons to dis­agree before you either con­fig­ure perlcritic appro­pri­ate­ly or selec­tive­ly and know­ing­ly bend the rules where required. Other CPAN authors have even pro­vid­ed their own addi­tions to perlcritic, so it’s worth search­ing CPAN under Perl::Critic::Policy::” for more exam­ples. In par­tic­u­lar, these community-​inspired poli­cies group a num­ber of rec­om­men­da­tions from Perl devel­op­ers on Internet Relay Chat (IRC).

Personally, although I adhere to my employer’s stan­dard­ized con­fig­u­ra­tion when test­ing and review­ing code, I like to run perlcritic on the bru­tal” set­ting before com­mit­ting my own. What do you pre­fer? Let me know in the com­ments below.

circus theme party

Last week’s arti­cle got a great response on Hacker News, and this par­tic­u­lar com­ment caught my eye:

I think this is the real point about Perl code read­abil­i­ty: it gives you enough flex­i­bil­i­ty to do things how­ev­er you like, and as a result many pro­gram­mers are faced with a mir­ror that reflects their own bad prac­tices back at them.

orev, Hacker News

This is why Damian Conway’s Perl Best Practices (2005) is one of my favorite books and perlcritic, the code ana­lyz­er is one of my favorite tools. (Though the for­mer could do with an update and the lat­ter includes poli­cies that con­tra­dict Conway.) Point perlcritic at your code, maybe add some oth­er poli­cies that agree with your house style, and grad­u­al­ly ratch­et up the sever­i­ty lev­el from gen­tle” to bru­tal.” All kinds of bad juju will come to light, from waste­ful­ly using grep to hav­ing too many sub­rou­tine argu­ments to catch­ing pri­vate vari­able use from oth­er pack­ages. perlcritic offers a use­ful base­line of con­duct and you can always cus­tomize its con­fig­u­ra­tion to your own tastes.

The oth­er con­for­mance tool in a Perl devel­op­er’s belt is perltidy, and it too has a Conway-​compatible con­fig­u­ra­tion as well as its default Perl Style Guide set­tings. I’ve found that more than any­thing else, perltidy helps set­tle argu­ments both between devel­op­ers and between their code in help­ing to avoid exces­sive merge conflicts.

But apart from extra tools, Perl the lan­guage itself can be bent and even bro­ken to suit just about any­one’s agen­da. Those used to more bondage-​and-​discipline lan­guages (hi, Java!) might feel revul­sion at the lengths to which this has some­times been tak­en, but per the quote above this is less an indict­ment of the lan­guage and more of its less method­i­cal pro­gram­mers.

Some of this behav­ior can be reha­bil­i­tat­ed with perlcritic and perltidy, but what about oth­er sins attrib­uted to Perl? Here are a few peren­ni­al favorites”:

Objects and Object-​Oriented Programming

Perl has a min­i­mal­ist object sys­tem based on earlier-​available lan­guage con­cepts like data struc­tures (often hash­es, which it has in com­mon with JavaScript), pack­ages, and sub­rou­tines. Since Perl 5’s release in 1994 much ver­bose OO code has been writ­ten using these tools.

The good news is that since 2007 we’ve had a sophis­ti­cat­ed metaobject-​protocol-​based lay­er on top of them called Moose, since 2010 a light­weight but forward-​compatible sys­tem called Moo, and a cou­ple of even tinier options as described in the Perl OO Tutorial. Waiting in the wings is Corinna, an effort to bring next-​generation object capa­bil­i­ties into the Perl core itself, and Object::Pad, a test­bed for some of the ideas in Corinna that you can use today in cur­rent code. (Really, please try it—the author needs feed­back!)

All this is to say that 99% of the time you nev­er need trou­ble your­self with bless, con­struc­tors, or writ­ing acces­sors for class or object attrib­ut­es. Smarter peo­ple than me have done the work for you, and you might even find a con­cept or three that you wish oth­er lan­guages had.

Contexts

There are two major ones: list and scalar. Another way to think of it is plur­al” vs. sin­gu­lar” in English, which is hope­ful­ly a thing you’re famil­iar with as you’re read­ing this blog.

Some func­tions in Perl act dif­fer­ent­ly depend­ing on whether the expect­ed return val­ue is a list or a scalar, and a func­tion will pro­vide a list or scalar con­text to its argu­ments. Mostly these act just as you would expect or would like them to, and you can find out how a func­tion behaves by read­ing its doc­u­men­ta­tion. Your own func­tions can behave like this too, but there’s usu­al­ly no need as both scalars and lists are auto­mat­i­cal­ly inter­pret­ed into lists.” Again, Perl’s DWIMmery at work.

Subroutine and Method Arguments

I’ve already writ­ten about this. Twice. And pre­sent­ed about it. Twice. The short ver­sion: Perl has sig­na­tures, but they’ve been con­sid­ered exper­i­men­tal for a while. In the mean­time, there are alter­na­tives on CPAN. You can even have type con­straints if you want.


I’ll leave you with this: Over the past month, Neil Bowers of the Perl Steering Council has been col­lect­ing quirks like these from Perl devel­op­ers. The PSC is review­ing this col­lec­tion for poten­tial doc­u­men­ta­tion fix­es, bug fix­es, fur­ther dis­cus­sion, etc. I would­n’t expect to see any fun­da­men­tal changes to the lan­guage out of this effort, but it’s a good sign that poten­tial­ly con­fus­ing fea­tures are being addressed. 

notebook

As a Perl devel­op­er, you’re prob­a­bly aware of the lan­guage’s strengths as a text-​processing lan­guage and how many com­put­ing tasks can be bro­ken down into those types of tasks. You might not real­ize, though, that Perl is also a world-​class list pro­cess­ing lan­guage and that many prob­lems can be expressed in terms of lists and their transformations.

Chief among Perl’s tools for list pro­cess­ing are the func­tions map and grep. I can’t count how many times in my twenty-​five years as a devel­op­er I’ve run into code that could’ve been sim­pli­fied if only the author was famil­iar with these two func­tions. Once you under­stand map and grep, you’ll start see­ing lists every­where and the oppor­tu­ni­ty to make your code more suc­cinct and expres­sive at the same time.

What are lists?

Before we get into func­tions that manip­u­late lists, we need to under­stand what they are. A list is an ordered group of ele­ments, and those ele­ments can be any kind of data you can rep­re­sent in the lan­guage: num­bers, strings, objects, reg­u­lar expres­sions, ref­er­ences, etc., as long as they’re stored as scalars. You might think of a list as the thing that an array stores, and in fact Perl is fine with using an array where a list can go.

my @foo = (1, 2, 3);

Here we’re assign­ing the list of num­bers from 1 to 3 to the array @foo. The dif­fer­ence between the array and the list is that the list is a fixed col­lec­tion, while arrays and their ele­ments can be mod­i­fied by var­i­ous oper­a­tions. perlfaq4 has a great dis­cus­sion on the dif­fer­ences between the two.

Lists are everywhere, man!

Ever want­ed to sort some data? You were using a list.

join a bunch of things togeth­er into a string? List again.

split a string into pieces? You got a list back (in list con­text; in scalar con­text, you got the size of the list.)

Heck, even the hum­ble print func­tion and its cousin say take a list (and an option­al file­han­dle) as argu­ments; it’s why you can treat Perl as an upscale AWK and feed it scalars to out­put with a field sep­a­ra­tor.

You’re using lists all the time and may not even know it.

map: The list transformer

The map func­tion is devi­ous in its sim­plic­i­ty: It takes two inputs, an expres­sion or block of code, and a list to run it on. For every item in the list, it will alias $_ to it, and then return none, one, or many items in a list based on what hap­pens in the expres­sion or code block. You can call it like this:

my @foo = map bar($_), @list;

Or like this:

my @foo = map { bar($_) } @list;

We’re going to ignore the first way, though because Conway (Perl Best Practices, 2005) tells us that when you spec­i­fy the first argu­ment as an expres­sion, it’s hard­er to tell it apart from the remain­ing argu­ments, espe­cial­ly if that expres­sion uses a built-​in func­tion where the paren­the­ses are option­al. So always use a code block!

You should always turn to map (and not, say, a for or foreach loop) when gen­er­at­ing a new list from an old list. For example:

my @lowercased = map { lc } @mixed_case;

When paired with a lookup table, map is also the most effi­cient way to tell if a mem­ber of a list equals a string, espe­cial­ly if that list is static:

use Const::Fast;

const my %IS_EXIT_WORD => map { ($_ => 1) }
  qw(q quit bye exit stop done last finish aurevoir);

...

die if $IS_EXIT_WORD{$command};

Here we’re using maps abil­i­ty to return mul­ti­ple items per source ele­ment to gen­er­ate a con­stant hash, and then test­ing mem­ber­ship in that hash.

grep: The list filter

You may rec­og­nize the word grep” from the Unix com­mand of the same name. It’s a tool for find­ing lines of text inside of oth­er text using a reg­u­lar expres­sion describ­ing the desired result.

Perl, of course, is real­ly good at reg­u­lar expres­sions, but its grep func­tion goes beyond and enables you to match using any expres­sion or code block. Think of it as a part­ner to map; where map uses a code block to trans­form a list, grep uses one to fil­ter it down. In fact, oth­er lan­guages typ­i­cal­ly call this func­tion filter.

You can, of course, use reg­u­lar expres­sions with grep, espe­cial­ly because a reg­exp match in Perl defaults to match­ing on the $_ vari­able and grep hap­pens to pro­vide that to its code block argu­ment. So:

my @months_with_a = grep { /[Aa]/ } qw(
  January February March
  April   May      June
  July    August   September
  October November December
);

But grep real­ly comes into its own when used for its gen­er­al fil­ter­ing capa­bil­i­ties; for instance, mak­ing sure that you don’t acci­den­tal­ly try to com­pare an unde­fined value:

say $_ > 5
  ? "$_ is bigger"
  : "$_ is equal or smaller"
  for grep { defined } @numbers;

Or when exe­cut­ing a com­pli­cat­ed func­tion that returns true or false depend­ing on its arguments:

my @results = grep { really_large_database_query($_) }
              @foo;

You might even con­sid­er chain­ing map and grep togeth­er. Here’s an exam­ple for get­ting the JPEG images out of a file list and then low­er­cas­ing the results:

my @jpeg_files = map  { lc }
                grep { /\.jpe?g$/i } @files;

Side effects may include…” (updated)

When intro­duc­ing map above I not­ed that it aliased $_ for every ele­ment in the list. I used that term delib­er­ate­ly because mod­i­fi­ca­tions to $_ will mod­i­fy the orig­i­nal ele­ment itself, and that is usu­al­ly an error. Programmers call that a side effect,” and they can lead to unex­pect­ed behav­ior or at least difficult-​to-​maintain code. Consider:

my @needs_docs = grep { s/\.pm$/.pod/ && !-e }
                 @pm_files;

The intent may have been to find files end­ing in .pm that don’t have a cor­re­spond­ing .pod file, but the actu­al behav­ior is replac­ing the .pm suf­fix with .pod, then check­ing whether that file­name exists. If it does­n’t, it’s passed through to @needs_docs; regard­less, @pm_files has had its con­tents modified.

If you real­ly do need to mod­i­fy a copy of each ele­ment, assign a vari­able with­in your code block like this:

my @needs_docs = grep {
                   my $file = $_;
                   $file =~ s/\.pm$/.pod/;
                   !-e $file
                 } @pm_files;

But at that point you should prob­a­bly refac­tor your multi-​line block as a sep­a­rate function:

my @needs_docs = grep { file_without_docs($_) }
                 @pm_files;

sub file_without_docs {
    my $file = shift;
    $file =~ s/\.pm$/.pod/;
    return !-e $file;
}

In this case of using the sub­sti­tu­tion oper­a­tor s///, you could also do this when using Perl 5.14 or above to get non-​destructive sub­sti­tu­tion:

use v5.14;

my @needs_docs = grep { !-e s/\.pm$/.pod/r }
                 @pm_files;

And if you do need side effects, just use a for or foreach loop; future code main­tain­ers (i.e., you in six months) will thank you.

Taking you higher

map and grep are exam­ples of higher-​order func­tions, since they take a func­tion (in the form of a code block) as an argu­ment. So con­grat­u­la­tions, you just sig­nif­i­cant­ly lev­eled up your knowl­edge of Perl and com­put­er sci­ence. If you’re inter­est­ed in more such pro­gram­ming tech­niques, I rec­om­mend Mark Jason Dominus’ Higher Order Perl (2005), avail­able for free online.