woman looking at the map

Six months ago I gave an overview of Perl’s list pro­cess­ing fun­da­men­tals, briefly describ­ing what lists are and then intro­duc­ing the built-​in map and grep func­tions for trans­form­ing and fil­ter­ing them. Later on, I com­piled a list (how appro­pri­ate) of list pro­cess­ing mod­ules avail­able via CPAN, not­ing there’s some con­fus­ing dupli­ca­tion of effort. But you’re a busy devel­op­er, and you just want to know the Right Thing To Do™ when faced with a list pro­cess­ing challenge.

First, some cred­it is due: these are all restate­ments of sev­er­al Perl::Critic poli­cies which in turn cod­i­fy stan­dards described in Damian Conway’s Perl Best Practices (2005). I’ve repeat­ed­ly rec­om­mend­ed the lat­ter as a start­ing point for higher-​quality Perl devel­op­ment. Over the years these prac­tices con­tin­ue to be re-​evaluated (includ­ing by the author him­self) and var­i­ous authors release new pol­i­cy mod­ules, but perlcritic remains a great tool for ensur­ing you (and your team or oth­er con­trib­u­tors) main­tain a con­sis­tent high stan­dard in your code.

With that said, on to the recommendations!

Don’t use grep to check if any list elements match

It might sound weird to lead off by rec­om­mend­ing not to use grep, but some­times it’s not the right tool for the job. If you’ve got a list and want to deter­mine if a con­di­tion match­es any item in it, you might try:

if (grep { some_condition($_) } @my_list) {
    ... # don't do this!
}

Yes, this works because (in scalar con­text) grep returns the num­ber of match­es found, but it’s waste­ful, check­ing every ele­ment of @my_list (which could be lengthy) before final­ly pro­vid­ing a result. Use the stan­dard List::Util module’s any func­tion, which imme­di­ate­ly returns (“short-​circuits”) on the first match:

use List::Util 1.33 qw(any);

if (any { some_condition($_) } @my_list) {
... # do something
}

Perl has includ­ed the req­ui­site ver­sion of this mod­ule since ver­sion 5.20 in 2014; for ear­li­er releas­es, you’ll need to update from CPAN. List::Util has many oth­er great list-​reduction, key/​value pair, and oth­er relat­ed func­tions you can import into your code, so check it out before you attempt to re-​invent any wheels.

As a side note for web devel­op­ers, the Perl Dancer frame­work also includes an any key­word for declar­ing mul­ti­ple HTTP routes, so if you’re mix­ing List::Util in there don’t import it. Instead, call it explic­it­ly like this or you’ll get an error about a rede­fined function:

use List::Util 1.33;

if (List::Util::any { some_condition($_) } @my_list) {
... # do something
}

This rec­om­men­da­tion is cod­i­fied in the BuiltinFunctions::ProhibitBooleanGrep Perl::Critic pol­i­cy, comes direct­ly from Perl Best Practices, and is rec­om­mend­ed by the Software Engineering Institute Computer Emergency Response Team (SEI CERT)’s Perl Coding Standard.

Don’t change $_ in map or grep

I men­tioned this back in March, but it bears repeat­ing: map and grep are intend­ed as pure func­tions, not muta­tors with side effects. This means that the orig­i­nal list should remain unchanged. Yes, each ele­ment alias­es in turn to the $_ spe­cial vari­able, but that’s for speed and can have sur­pris­ing results if changed even if it’s tech­ni­cal­ly allowed. If you need to mod­i­fy an array in-​place use some­thing like:

for (@my_array) {
$_ = ...; # make your changes here
}

If you want some­thing that looks like map but won’t change the orig­i­nal list (and don’t mind a few CPAN depen­den­cies), con­sid­er List::SomeUtilsapply function:

use List::SomeUtils qw(apply);

my @doubled_array = apply {$_ *= 2} @old_array;

Lastly, side effects also include things like manip­u­lat­ing oth­er vari­ables or doing input and out­put. Don’t use map or grep in a void con­text (i.e., with­out a result­ing array or list); do some­thing with the results or use a for or foreach loop:

map { print foo($_) } @my_array; # don't do this
print map { foo($_) } @my_array; # do this instead

map { push @new_array, foo($_) } @my_array; # don't do this
@new_array = map { foo($_) } @my_array; # do this instead

This rec­om­men­da­tion is cod­i­fied by the BuiltinFunctions::ProhibitVoidGrep, BuiltinFunctions::ProhibitVoidMap, and ControlStructures::ProhibitMutatingListFunctions Perl::Critic poli­cies. The lat­ter comes from Perl Best Practices and is an SEI CERT Perl Coding Standard rule.

Use blocks with map and grep, not expressions

You can call map or grep like this (paren­the­ses are option­al around built-​in functions):

my @new_array  = map foo($_), @old_array; # don't do this
my @new_array2 = grep !/^#/, @old_array; # don't do this

Or like this:

my @new_array  = map { foo($_) } @old_array;
my @new_array2 = grep {!/^#/} @old_array;

Do it the sec­ond way. It’s eas­i­er to read, espe­cial­ly if you’re pass­ing in a lit­er­al list or mul­ti­ple arrays, and the expres­sion forms can con­ceal bugs. This rec­om­men­da­tion is cod­i­fied by the BuiltinFunctions::RequireBlockGrep and BuiltinFunctions::RequireBlockMap Perl::Critic poli­cies and comes from Perl Best Practices.

Refactor multi-​statement maps, greps, and other list functions

map, grep, and friends should fol­low the Unix phi­los­o­phy of Do One Thing and Do It Well.” Your read­abil­i­ty and main­tain­abil­i­ty drop with every state­ment you place inside one of their blocks. Consider junior devel­op­ers and future main­tain­ers (this includes you!) and refac­tor any­thing with more than one state­ment into a sep­a­rate sub­rou­tine or at least a for loop. This goes for list pro­cess­ing func­tions (like the afore­men­tioned any) import­ed from oth­er mod­ules, too.

This rec­om­men­da­tion is cod­i­fied by the Perl Best Practices-inspired BuiltinFunctions::ProhibitComplexMappings and BuiltinFunctions::RequireSimpleSortBlock Perl::Critic poli­cies, although those only cov­er map and sort func­tions, respectively.


Do you have any oth­er sug­ges­tions for list pro­cess­ing best prac­tices? Feel free to leave them in the com­ments or bet­ter yet, con­sid­er cre­at­ing new Perl::Critic poli­cies for them or con­tact­ing the Perl::Critic team to devel­op them for your organization.

circus theme party

Last week’s arti­cle got a great response on Hacker News, and this par­tic­u­lar com­ment caught my eye:

I think this is the real point about Perl code read­abil­i­ty: it gives you enough flex­i­bil­i­ty to do things how­ev­er you like, and as a result many pro­gram­mers are faced with a mir­ror that reflects their own bad prac­tices back at them.

orev, Hacker News

This is why Damian Conway’s Perl Best Practices (2005) is one of my favorite books and perlcritic, the code ana­lyz­er is one of my favorite tools. (Though the for­mer could do with an update and the lat­ter includes poli­cies that con­tra­dict Conway.) Point perlcritic at your code, maybe add some oth­er poli­cies that agree with your house style, and grad­u­al­ly ratch­et up the sever­i­ty lev­el from gen­tle” to bru­tal.” All kinds of bad juju will come to light, from waste­ful­ly using grep to hav­ing too many sub­rou­tine argu­ments to catch­ing pri­vate vari­able use from oth­er pack­ages. perlcritic offers a use­ful base­line of con­duct and you can always cus­tomize its con­fig­u­ra­tion to your own tastes.

The oth­er con­for­mance tool in a Perl devel­op­er’s belt is perltidy, and it too has a Conway-​compatible con­fig­u­ra­tion as well as its default Perl Style Guide set­tings. I’ve found that more than any­thing else, perltidy helps set­tle argu­ments both between devel­op­ers and between their code in help­ing to avoid exces­sive merge conflicts.

But apart from extra tools, Perl the lan­guage itself can be bent and even bro­ken to suit just about any­one’s agen­da. Those used to more bondage-​and-​discipline lan­guages (hi, Java!) might feel revul­sion at the lengths to which this has some­times been tak­en, but per the quote above this is less an indict­ment of the lan­guage and more of its less method­i­cal pro­gram­mers.

Some of this behav­ior can be reha­bil­i­tat­ed with perlcritic and perltidy, but what about oth­er sins attrib­uted to Perl? Here are a few peren­ni­al favorites”:

Objects and Object-​Oriented Programming

Perl has a min­i­mal­ist object sys­tem based on earlier-​available lan­guage con­cepts like data struc­tures (often hash­es, which it has in com­mon with JavaScript), pack­ages, and sub­rou­tines. Since Perl 5’s release in 1994 much ver­bose OO code has been writ­ten using these tools.

The good news is that since 2007 we’ve had a sophis­ti­cat­ed metaobject-​protocol-​based lay­er on top of them called Moose, since 2010 a light­weight but forward-​compatible sys­tem called Moo, and a cou­ple of even tinier options as described in the Perl OO Tutorial. Waiting in the wings is Corinna, an effort to bring next-​generation object capa­bil­i­ties into the Perl core itself, and Object::Pad, a test­bed for some of the ideas in Corinna that you can use today in cur­rent code. (Really, please try it — the author needs feed­back!)

All this is to say that 99% of the time you nev­er need trou­ble your­self with bless, con­struc­tors, or writ­ing acces­sors for class or object attrib­ut­es. Smarter peo­ple than me have done the work for you, and you might even find a con­cept or three that you wish oth­er lan­guages had.

Contexts

There are two major ones: list and scalar. Another way to think of it is plur­al” vs. sin­gu­lar” in English, which is hope­ful­ly a thing you’re famil­iar with as you’re read­ing this blog.

Some func­tions in Perl act dif­fer­ent­ly depend­ing on whether the expect­ed return val­ue is a list or a scalar, and a func­tion will pro­vide a list or scalar con­text to its argu­ments. Mostly these act just as you would expect or would like them to, and you can find out how a func­tion behaves by read­ing its doc­u­men­ta­tion. Your own func­tions can behave like this too, but there’s usu­al­ly no need as both scalars and lists are auto­mat­i­cal­ly inter­pret­ed into lists.” Again, Perl’s DWIMmery at work.

Subroutine and Method Arguments

I’ve already writ­ten about this. Twice. And pre­sent­ed about it. Twice. The short ver­sion: Perl has sig­na­tures, but they’ve been con­sid­ered exper­i­men­tal for a while. In the mean­time, there are alter­na­tives on CPAN. You can even have type con­straints if you want.


I’ll leave you with this: Over the past month, Neil Bowers of the Perl Steering Council has been col­lect­ing quirks like these from Perl devel­op­ers. The PSC is review­ing this col­lec­tion for poten­tial doc­u­men­ta­tion fix­es, bug fix­es, fur­ther dis­cus­sion, etc. I would­n’t expect to see any fun­da­men­tal changes to the lan­guage out of this effort, but it’s a good sign that poten­tial­ly con­fus­ing fea­tures are being addressed. 

person doing card trick

Perl is said (some­times frus­trat­ing­ly) to be a do-​what-​I-​mean pro­gram­ming lan­guage. Many of its state­ments and con­struc­tions are designed to be for­giv­ing or have analo­gies to nat­ur­al lan­guages. Still oth­ers are said to be mag­ic,” behav­ing dif­fer­ent­ly depend­ing on how they’re used. Adept use of Perl asks you to not only under­stand this mag­ic, but to embrace it and the expres­sive­ness it enables. Here, then, are five ways you can bring some mag­ic to your code.

$_

Perl has many spe­cial vari­ables, and first among them (lit­er­al­ly, it’s the first doc­u­ment­ed) is $_. Also spelled $ARG if you use the English mod­ule, the doc­u­men­ta­tion describes it as the default input and pattern-​matching space.” Many, many func­tions and state­ments will assume it as the default or implic­it argu­ment; you can find the full list in the doc­u­men­ta­tion. Here’s an exam­ple that uses it implic­it­ly to out­put the num­bers from 1 to 5:

say for 1 .. 5;

Output:

1
2
3
4
5

Where some lan­guages require an iter­a­tor vari­able in a for or foreach loop, in the absence of one Perl assigns it to $_.

Statement modifiers

We then use our sec­ond trick; where some oth­er lan­guages require a block to enclose every loop or con­di­tion­al (whether denot­ed by braces { } or inden­ta­tion), Perl allows you to put said loop­ing or con­di­tion­al state­ment after a sin­gle oth­er state­ment, in this case the say which prints its argument(s) fol­lowed by a newline.

However, above we have no argu­ments passed to say and so once again the default $_ is used, now con­tain­ing a num­ber from 1 to 5 which is then print­ed out. It’s a very pow­er­ful and expres­sive idiom, enabling both the writer and read­er of code to con­cen­trate on the impor­tant thing that’s hap­pen­ing. It’s also entire­ly option­al. You can just as eas­i­ly type:

for my $foo (1..5) {
    say $foo;
}

But where’s the mag­ic in that?

Magic variables and use English

We men­tioned the $_ vari­able above, and that it could also be spelled $ARG if you add use English to your code. It can be hard to read code with large amounts of punc­tu­a­tion, though, and even hard­er to remem­ber what each vari­able does. Thankfully the English mod­ule pro­vides alias­es, and the per­l­var man page lists them in order. It’s much eas­i­er to read and write things like $LIST_SEPARATOR, $PROCESS_ID, or $MATCH rather than $", $$, and $&, and goes a long way towards reduc­ing Perl’s rep­u­ta­tion as a write-​only language.

List and scalar contexts

Like nat­ur­al lan­guages, Perl has a con­cept of con­text” in which words mean dif­fer­ent things depend­ing on their sur­round­ings. In Perl’s case, expres­sions may behave dif­fer­ent­ly depend­ing on whether they expect to pro­duce a list of val­ues or a sin­gle val­ue, called a scalar. Here’s a triv­ial example:

my @foo = (1, 2, 3); # list context, @foo contains the list
my $bar = (1, 2, 3); # scalar context, $bar contains 3

In the first line, we assign the list of num­bers (1, 2, 3) to the array @foo. But in the sec­ond line, we’re assign­ing to the scalar vari­able $bar, which now con­tains the last item in the list.

Here’s anoth­er exam­ple, using the reverse function:

my @foo = ('one', 'two', 'three');
my @bar = reverse @foo; # @bar contains ('three', 'two', 'one')
my $baz = reverse @foo; # $baz contains 'eerhtowteno'

In list con­text, reverse takes its argu­ments and returns them in the oppo­site order. But in scalar con­text, it con­cate­nates all of the argu­ments togeth­er and returns a string with the char­ac­ters in oppo­site order.

In gen­er­al, there is no gen­er­al rule for deduc­ing a func­tion’s behav­ior in scalar con­text from its behav­ior in list con­text.” (Dominus 1998) You’ll just have to look up the func­tion to deter­mine what it does, though in gen­er­al, it does what you want, but if you want to force scalar con­text use the scalar operator:

my @foo = ('aa', 'aab', 'bbc');
my @bar = scalar grep /aa/, @foo; # returns a list (2), counting the number of matches

Hash slices

One of Perl’s three built-​in data types is the hash, also known as an asso­cia­tive array. It’s an unordered col­lec­tion of scalars indexed by string, rather than the num­bers used by nor­mal arrays. It’s a use­ful con­struct, and you can devel­op com­pli­cat­ed data struc­tures using just scalars, arrays, and hash­es. What’s not wide­ly known is that you can access sev­er­al ele­ments of of a hash using a hash slice, using syn­tax that’s sim­i­lar to array slices. Here’s an example:

my ($who, $home) = @ENV{'USER', 'HOME'};

It works the oth­er way, too: you can assign to a slice.

@colors{'red', 'green', 'blue'} = (0xff0000, 0x00ff00, 0x0000ff);

I use this a lot when assign­ing argu­ments received from func­tions or meth­ods (see my pre­vi­ous arti­cle on sub­rou­tine sig­na­tures):

use v5.24; # for postfix dereferencing
use Types::Standard qw(Str Int);
use Type::Params 'compile_named';

foo('hello', 42);

sub foo {
    state $check = compile_named(
        param1 => Str,
        param2 => Int, {optional => 1},
    );
    my ($param1, $param2) =
        $check->(@_)->@{'param1', 'param2'};

    say $param1, $param2;
}

In the exam­ple above, $check->(@_) returns the type-​checked argu­ments to the foo() func­tion cour­tesy of Type::Paramscompile_named() func­tion. It’s returned as a hash ref­er­ence, and since hash­es are unordered, we spec­i­fy the order in which we want the val­ues by deref­er­enc­ing and then slic­ing the result­ing hash. The post­fix deref­er­enc­ing syn­tax was added in Perl 5.20 and made a default fea­ture in 5.24, and reduces the num­ber of nest­ed brack­ets and braces we have to deal with.

Conclusion

I hope this arti­cle has giv­en you a taste of some of the mag­ic avail­able in the Perl lan­guage. It’s these sort of fea­tures that make pro­gram­ming in it a bit more joy­ful. As always, check the doc­u­men­ta­tion for com­plete infor­ma­tion on these and oth­er top­ics, or look for answers and ask ques­tions on PerlMonks or Stack Overflow.