woman looking at the map

Six months ago I gave an overview of Perl’s list processing fundamentals, briefly describing what lists are and then introducing the built-​in map and grep functions for transforming and filtering them. Later on, I compiled a list (how appropriate) of list processing modules available via CPAN, noting there’s some confusing duplication of effort. But you’re a busy developer, and you just want to know the Right Thing To Do™ when faced with a list processing challenge.

First, some credit is due: these are all restatements of several Perl::Critic policies which in turn codify standards described in Damian Conway’s Perl Best Practices (2005). I’ve repeatedly recommended the latter as a starting point for higher-​quality Perl development. Over the years these practices continue to be re-​evaluated (including by the author himself) and various authors release new policy modules, but perlcritic remains a great tool for ensuring you (and your team or other contributors) maintain a consistent high standard in your code.

With that said, on to the recommendations!

Don’t use grep to check if any list elements match

It might sound weird to lead off by recommending not to use grep, but sometimes it’s not the right tool for the job. If you’ve got a list and want to determine if a condition matches any item in it, you might try:

if (grep { some_condition($_) } @my_list) {
    ... # don't do this!
}

Yes, this works because (in scalar context) grep returns the number of matches found, but it’s wasteful, checking every element of @my_list (which could be lengthy) before finally providing a result. Use the standard List::Util module’s any function, which immediately returns (“short-​circuits”) on the first match:

use List::Util 1.33 qw(any);

if (any { some_condition($_) } @my_list) {
... # do something
}

Perl has included the requisite version of this module since version 5.20 in 2014; for earlier releases, you’ll need to update from CPAN. List::Util has many other great list-​reduction, key/​value pair, and other related functions you can import into your code, so check it out before you attempt to re-​invent any wheels.

As a side note for web developers, the Perl Dancer framework also includes an any keyword for declaring multiple HTTP routes, so if you’re mixing List::Util in there don’t import it. Instead, call it explicitly like this or you’ll get an error about a redefined function:

use List::Util 1.33;

if (List::Util::any { some_condition($_) } @my_list) {
... # do something
}

This recommendation is codified in the BuiltinFunctions::ProhibitBooleanGrep Perl::Critic policy, comes directly from Perl Best Practices, and is recommended by the Software Engineering Institute Computer Emergency Response Team (SEI CERT)’s Perl Coding Standard.

Don’t change $_ in map or grep

I mentioned this back in March, but it bears repeating: map and grep are intended as pure functions, not mutators with side effects. This means that the original list should remain unchanged. Yes, each element aliases in turn to the $_ special variable, but that’s for speed and can have surprising results if changed even if it’s technically allowed. If you need to modify an array in-​place use something like:

for (@my_array) {
$_ = ...; # make your changes here
}

If you want something that looks like map but won’t change the original list (and don’t mind a few CPAN dependencies), consider List::SomeUtilsapply function:

use List::SomeUtils qw(apply);

my @doubled_array = apply {$_ *= 2} @old_array;

Lastly, side effects also include things like manipulating other variables or doing input and output. Don’t use map or grep in a void context (i.e., without a resulting array or list); do something with the results or use a for or foreach loop:

map { print foo($_) } @my_array; # don't do this
print map { foo($_) } @my_array; # do this instead

map { push @new_array, foo($_) } @my_array; # don't do this
@new_array = map { foo($_) } @my_array; # do this instead

This recommendation is codified by the BuiltinFunctions::ProhibitVoidGrep, BuiltinFunctions::ProhibitVoidMap, and ControlStructures::ProhibitMutatingListFunctions Perl::Critic policies. The latter comes from Perl Best Practices and is an SEI CERT Perl Coding Standard rule.

Use blocks with map and grep, not expressions

You can call map or grep like this (parentheses are optional around built-​in functions):

my @new_array  = map foo($_), @old_array; # don't do this
my @new_array2 = grep !/^#/, @old_array; # don't do this

Or like this:

my @new_array  = map { foo($_) } @old_array;
my @new_array2 = grep {!/^#/} @old_array;

Do it the second way. It’s easier to read, especially if you’re passing in a literal list or multiple arrays, and the expression forms can conceal bugs. This recommendation is codified by the BuiltinFunctions::RequireBlockGrep and BuiltinFunctions::RequireBlockMap Perl::Critic policies and comes from Perl Best Practices.

Refactor multi-​statement maps, greps, and other list functions

map, grep, and friends should follow the Unix philosophy of Do One Thing and Do It Well.” Your readability and maintainability drop with every statement you place inside one of their blocks. Consider junior developers and future maintainers (this includes you!) and refactor anything with more than one statement into a separate subroutine or at least a for loop. This goes for list processing functions (like the aforementioned any) imported from other modules, too.

This recommendation is codified by the Perl Best Practices-inspired BuiltinFunctions::ProhibitComplexMappings and BuiltinFunctions::RequireSimpleSortBlock Perl::Critic policies, although those only cover map and sort functions, respectively.


Do you have any other suggestions for list processing best practices? Feel free to leave them in the comments or better yet, consider creating new Perl::Critic policies for them or contacting the Perl::Critic team to develop them for your organization.

The perlcritic tool is often your first defense against awkward, hard to read, error-​prone, or unconventional constructs in your code,” per its description. It’s part of a class of programs historically known as linters, so-​called because like a clothes dryer machine’s lint trap, they detect small errors with big effects.” (Another such linter is perltidy, which I’ve referenced in the past.)

You can use perlcritic at the command line, integrated with your editor, as a git pre-​commit hook, or (my preference) as part of your author tests. It’s driven by policies, individual modules that check your code against a particular recommendation, many of them from Damian Conway’s Perl Best Practices (2005). Those policies, in turn, are enabled by PPI, a library that transforms Perl code into documents that can be programmatically examined and manipulated much like the Document Object Model (DOM) is used to programmatically access web pages.

perlcritic enables the following policies by default unless you customize its configuration or install more. These are just the gentle” (severity level 5) policies, so consider them the bare minimum in detecting bad practices. The full set of included policies goes much deeper, ratcheting up the severity to stern,” harsh,” cruel,” and brutal.” They’re further organized according to themes so that you might selectively review your code against issues like security, maintenance, complexity, and bug prevention.

My favorite above is probably ProhibitEvilModules. Aside from the colorful name, a development team can use it to steer people towards an organization’s favored solutions rather than deprecated, buggy, unsupported, or insecure” ones. By default, it prohibits Class::ISA, Pod::Plainer, Shell, and Switch, but you should curate and configure a list within your team.

Speaking of working within a team, although perlcritic is meant to be a vital tool to ensure good practices, it’s no substitute for manual peer code review. Those reviews can lead to the creation or adoption of new automated policies to save time and settle arguments, but such work should be done collaboratively after achieving some kind of consensus. This is true whether you’re a team of employees working on proprietary software or a group of volunteers developing open source.

Of course, reasonable people can and do disagree over any of the included policies, but as a reasonable person, you should have good reasons to disagree before you either configure perlcritic appropriately or selectively and knowingly bend the rules where required. Other CPAN authors have even provided their own additions to perlcritic, so it’s worth searching CPAN under Perl::Critic::Policy::” for more examples. In particular, these community-​inspired policies group a number of recommendations from Perl developers on Internet Relay Chat (IRC).

Personally, although I adhere to my employer’s standardized configuration when testing and reviewing code, I like to run perlcritic on the brutal” setting before committing my own. What do you prefer? Let me know in the comments below.

circus theme party

Last week’s article got a great response on Hacker News, and this particular comment caught my eye:

I think this is the real point about Perl code readability: it gives you enough flexibility to do things however you like, and as a result many programmers are faced with a mirror that reflects their own bad practices back at them.

orev, Hacker News

This is why Damian Conway’s Perl Best Practices (2005) is one of my favorite books and perlcritic, the code analyzer is one of my favorite tools. (Though the former could do with an update and the latter includes policies that contradict Conway.) Point perlcritic at your code, maybe add some other policies that agree with your house style, and gradually ratchet up the severity level from gentle” to brutal.” All kinds of bad juju will come to light, from wastefully using grep to having too many subroutine arguments to catching private variable use from other packages. perlcritic offers a useful baseline of conduct and you can always customize its configuration to your own tastes.

The other conformance tool in a Perl developer’s belt is perltidy, and it too has a Conway-​compatible configuration as well as its default Perl Style Guide settings. I’ve found that more than anything else, perltidy helps settle arguments both between developers and between their code in helping to avoid excessive merge conflicts.

But apart from extra tools, Perl the language itself can be bent and even broken to suit just about anyone’s agenda. Those used to more bondage-​and-​discipline languages (hi, Java!) might feel revulsion at the lengths to which this has sometimes been taken, but per the quote above this is less an indictment of the language and more of its less methodical programmers.

Some of this behavior can be rehabilitated with perlcritic and perltidy, but what about other sins attributed to Perl? Here are a few perennial favorites”:

Objects and Object-​Oriented Programming

Perl has a minimalist object system based on earlier-​available language concepts like data structures (often hashes, which it has in common with JavaScript), packages, and subroutines. Since Perl 5’s release in 1994 much verbose OO code has been written using these tools.

The good news is that since 2007 we’ve had a sophisticated metaobject-​protocol-​based layer on top of them called Moose, since 2010 a lightweight but forward-​compatible system called Moo, and a couple of even tinier options as described in the Perl OO Tutorial. Waiting in the wings is Corinna, an effort to bring next-​generation object capabilities into the Perl core itself, and Object::Pad, a testbed for some of the ideas in Corinna that you can use today in current code. (Really, please try it—the author needs feedback!)

All this is to say that 99% of the time you never need trouble yourself with bless, constructors, or writing accessors for class or object attributes. Smarter people than me have done the work for you, and you might even find a concept or three that you wish other languages had.

Contexts

There are two major ones: list and scalar. Another way to think of it is plural” vs. singular” in English, which is hopefully a thing you’re familiar with as you’re reading this blog.

Some functions in Perl act differently depending on whether the expected return value is a list or a scalar, and a function will provide a list or scalar context to its arguments. Mostly these act just as you would expect or would like them to, and you can find out how a function behaves by reading its documentation. Your own functions can behave like this too, but there’s usually no need as both scalars and lists are automatically interpreted into lists.” Again, Perl’s DWIMmery at work.

Subroutine and Method Arguments

I’ve already written about this. Twice. And presented about it. Twice. The short version: Perl has signatures, but they’ve been considered experimental for a while. In the meantime, there are alternatives on CPAN. You can even have type constraints if you want.


I’ll leave you with this: Over the past month, Neil Bowers of the Perl Steering Council has been collecting quirks like these from Perl developers. The PSC is reviewing this collection for potential documentation fixes, bug fixes, further discussion, etc. I wouldn’t expect to see any fundamental changes to the language out of this effort, but it’s a good sign that potentially confusing features are being addressed. 

notebook

As a Perl developer, you’re probably aware of the language’s strengths as a text-​processing language and how many computing tasks can be broken down into those types of tasks. You might not realize, though, that Perl is also a world-​class list processing language and that many problems can be expressed in terms of lists and their transformations.

Chief among Perl’s tools for list processing are the functions map and grep. I can’t count how many times in my twenty-​five years as a developer I’ve run into code that could’ve been simplified if only the author was familiar with these two functions. Once you understand map and grep, you’ll start seeing lists everywhere and the opportunity to make your code more succinct and expressive at the same time.

What are lists?

Before we get into functions that manipulate lists, we need to understand what they are. A list is an ordered group of elements, and those elements can be any kind of data you can represent in the language: numbers, strings, objects, regular expressions, references, etc., as long as they’re stored as scalars. You might think of a list as the thing that an array stores, and in fact Perl is fine with using an array where a list can go.

my @foo = (1, 2, 3);

Here we’re assigning the list of numbers from 1 to 3 to the array @foo. The difference between the array and the list is that the list is a fixed collection, while arrays and their elements can be modified by various operations. perlfaq4 has a great discussion on the differences between the two.

Lists are everywhere, man!

Ever wanted to sort some data? You were using a list.

join a bunch of things together into a string? List again.

split a string into pieces? You got a list back (in list context; in scalar context, you got the size of the list.)

Heck, even the humble print function and its cousin say take a list (and an optional filehandle) as arguments; it’s why you can treat Perl as an upscale AWK and feed it scalars to output with a field separator.

You’re using lists all the time and may not even know it.

map: The list transformer

The map function is devious in its simplicity: It takes two inputs, an expression or block of code, and a list to run it on. For every item in the list, it will alias $_ to it, and then return none, one, or many items in a list based on what happens in the expression or code block. You can call it like this:

my @foo = map bar($_), @list;

Or like this:

my @foo = map { bar($_) } @list;

We’re going to ignore the first way, though because Conway (Perl Best Practices, 2005) tells us that when you specify the first argument as an expression, it’s harder to tell it apart from the remaining arguments, especially if that expression uses a built-​in function where the parentheses are optional. So always use a code block!

You should always turn to map (and not, say, a for or foreach loop) when generating a new list from an old list. For example:

my @lowercased = map { lc } @mixed_case;

When paired with a lookup table, map is also the most efficient way to tell if a member of a list equals a string, especially if that list is static:

use Const::Fast;

const my %IS_EXIT_WORD => map { ($_ => 1) }
  qw(q quit bye exit stop done last finish aurevoir);

...

die if $IS_EXIT_WORD{$command};

Here we’re using maps ability to return multiple items per source element to generate a constant hash, and then testing membership in that hash.

grep: The list filter

You may recognize the word grep” from the Unix command of the same name. It’s a tool for finding lines of text inside of other text using a regular expression describing the desired result.

Perl, of course, is really good at regular expressions, but its grep function goes beyond and enables you to match using any expression or code block. Think of it as a partner to map; where map uses a code block to transform a list, grep uses one to filter it down. In fact, other languages typically call this function filter.

You can, of course, use regular expressions with grep, especially because a regexp match in Perl defaults to matching on the $_ variable and grep happens to provide that to its code block argument. So:

my @months_with_a = grep { /[Aa]/ } qw(
  January February March
  April   May      June
  July    August   September
  October November December
);

But grep really comes into its own when used for its general filtering capabilities; for instance, making sure that you don’t accidentally try to compare an undefined value:

say $_ > 5
  ? "$_ is bigger"
  : "$_ is equal or smaller"
  for grep { defined } @numbers;

Or when executing a complicated function that returns true or false depending on its arguments:

my @results = grep { really_large_database_query($_) }
              @foo;

You might even consider chaining map and grep together. Here’s an example for getting the JPEG images out of a file list and then lowercasing the results:

my @jpeg_files = map  { lc }
                grep { /\.jpe?g$/i } @files;

Side effects may include…” (updated)

When introducing map above I noted that it aliased $_ for every element in the list. I used that term deliberately because modifications to $_ will modify the original element itself, and that is usually an error. Programmers call that a side effect,” and they can lead to unexpected behavior or at least difficult-​to-​maintain code. Consider:

my @needs_docs = grep { s/\.pm$/.pod/ && !-e }
                 @pm_files;

The intent may have been to find files ending in .pm that don’t have a corresponding .pod file, but the actual behavior is replacing the .pm suffix with .pod, then checking whether that filename exists. If it doesn’t, it’s passed through to @needs_docs; regardless, @pm_files has had its contents modified.

If you really do need to modify a copy of each element, assign a variable within your code block like this:

my @needs_docs = grep {
                   my $file = $_;
                   $file =~ s/\.pm$/.pod/;
                   !-e $file
                 } @pm_files;

But at that point you should probably refactor your multi-​line block as a separate function:

my @needs_docs = grep { file_without_docs($_) }
                 @pm_files;

sub file_without_docs {
    my $file = shift;
    $file =~ s/\.pm$/.pod/;
    return !-e $file;
}

In this case of using the substitution operator s///, you could also do this when using Perl 5.14 or above to get non-​destructive substitution:

use v5.14;

my @needs_docs = grep { !-e s/\.pm$/.pod/r }
                 @pm_files;

And if you do need side effects, just use a for or foreach loop; future code maintainers (i.e., you in six months) will thank you.

Taking you higher

map and grep are examples of higher-​order functions, since they take a function (in the form of a code block) as an argument. So congratulations, you just significantly leveled up your knowledge of Perl and computer science. If you’re interested in more such programming techniques, I recommend Mark Jason Dominus’ Higher Order Perl (2005), available for free online.