As a Perl developer, you’re probably aware of the language’s strengths as a text-processing language and how many computing tasks can be broken down into those types of tasks. You might not realize, though, that Perl is also a world-class list processing language and that many problems can be expressed in terms of lists and their transformations.
Chief among Perl’s tools for list processing are the functions map
and grep
. I can’t count how many times in my twenty-five years as a developer I’ve run into code that could’ve been simplified if only the author was familiar with these two functions. Once you understand map
and grep
, you’ll start seeing lists everywhere and the opportunity to make your code more succinct and expressive at the same time.
What are lists?
Before we get into functions that manipulate lists, we need to understand what they are. A list is an ordered group of elements, and those elements can be any kind of data you can represent in the language: numbers, strings, objects, regular expressions, references, etc., as long as they’re stored as scalars. You might think of a list as the thing that an array stores, and in fact Perl is fine with using an array where a list can go.
my @foo = (1, 2, 3);
Here we’re assigning the list of numbers from 1
to 3
to the array @foo
. The difference between the array and the list is that the list is a fixed collection, while arrays and their elements can be modified by various operations. perlfaq4
has a great discussion on the differences between the two.
Lists are everywhere, man!
Ever wanted to sort
some data? You were using a list.
join
a bunch of things together into a string? List again.
split
a string into pieces? You got a list back (in list context; in scalar context, you got the size of the list.)
Heck, even the humble print
function and its cousin say
take a list (and an optional filehandle) as arguments; it’s why you can treat Perl as an upscale AWK and feed it scalars to output with a field separator.
You’re using lists all the time and may not even know it.
map
: The list transformer
The map
function is devious in its simplicity: It takes two inputs, an expression or block of code, and a list to run it on. For every item in the list, it will alias $_
to it, and then return none, one, or many items in a list based on what happens in the expression or code block. You can call it like this:
my @foo = map bar($_), @list;
Or like this:
my @foo = map { bar($_) } @list;
We’re going to ignore the first way, though because Conway (Perl Best Practices, 2005) tells us that when you specify the first argument as an expression, it’s harder to tell it apart from the remaining arguments, especially if that expression uses a built-in function where the parentheses are optional. So always use a code block!
You should always turn to map
(and not, say, a for
or foreach
loop) when generating a new list from an old list. For example:
my @lowercased = map { lc } @mixed_case;
When paired with a lookup table, map
is also the most efficient way to tell if a member of a list equals a string, especially if that list is static:
use Const::Fast;
const my %IS_EXIT_WORD => map { ($_ => 1) }
qw(q quit bye exit stop done last finish aurevoir);
...
die if $IS_EXIT_WORD{$command};
Here we’re using map
’s ability to return multiple items per source element to generate a constant hash, and then testing membership in that hash.
grep
: The list filter
You may recognize the word “grep” from the Unix command of the same name. It’s a tool for finding lines of text inside of other text using a regular expression describing the desired result.
Perl, of course, is really good at regular expressions, but its grep
function goes beyond and enables you to match using any expression or code block. Think of it as a partner to map
; where map
uses a code block to transform a list, grep
uses one to filter it down. In fact, other languages typically call this function filter
.
You can, of course, use regular expressions with grep
, especially because a regexp match in Perl defaults to matching on the $_
variable and grep
happens to provide that to its code block argument. So:
my @months_with_a = grep { /[Aa]/ } qw(
January February March
April May June
July August September
October November December
);
But grep
really comes into its own when used for its general filtering capabilities; for instance, making sure that you don’t accidentally try to compare an undefined value:
say $_ > 5
? "$_ is bigger"
: "$_ is equal or smaller"
for grep { defined } @numbers;
Or when executing a complicated function that returns true or false depending on its arguments:
my @results = grep { really_large_database_query($_) }
@foo;
You might even consider chaining map
and grep
together. Here’s an example for getting the JPEG images out of a file list and then lowercasing the results:
my @jpeg_files = map { lc }
grep { /\.jpe?g$/i } @files;
“Side effects may include…” (updated)
When introducing map
above I noted that it aliased $_
for every element in the list. I used that term deliberately because modifications to $_
will modify the original element itself, and that is usually an error. Programmers call that a “side effect,” and they can lead to unexpected behavior or at least difficult-to-maintain code. Consider:
my @needs_docs = grep { s/\.pm$/.pod/ && !-e }
@pm_files;
The intent may have been to find files ending in .pm
that don’t have a corresponding .pod
file, but the actual behavior is replacing the .pm
suffix with .pod
, then checking whether that filename exists. If it doesn’t, it’s passed through to @needs_docs
; regardless, @pm_files
has had its contents modified.
If you really do need to modify a copy of each element, assign a variable within your code block like this:
my @needs_docs = grep {
my $file = $_;
$file =~ s/\.pm$/.pod/;
!-e $file
} @pm_files;
But at that point you should probably refactor your multi-line block as a separate function:
my @needs_docs = grep { file_without_docs($_) }
@pm_files;
sub file_without_docs {
my $file = shift;
$file =~ s/\.pm$/.pod/;
return !-e $file;
}
In this case of using the substitution operator s///
, you could also do this when using Perl 5.14 or above to get non-destructive substitution:
use v5.14;
my @needs_docs = grep { !-e s/\.pm$/.pod/r }
@pm_files;
And if you do need side effects, just use a for
or foreach
loop; future code maintainers (i.e., you in six months) will thank you.
Taking you higher
map
and grep
are examples of higher-order functions, since they take a function (in the form of a code block) as an argument. So congratulations, you just significantly leveled up your knowledge of Perl and computer science. If you’re interested in more such programming techniques, I recommend Mark Jason Dominus’ Higher Order Perl (2005), available for free online.