Better Perl: Using map and grep

As a Perl devel­op­er, you’re prob­a­bly aware of the lan­guage’s strengths as a text-​processing lan­guage and how many com­put­ing tasks can be bro­ken down into those types of tasks. You might not real­ize, though, that Perl is also a world-​class list pro­cess­ing lan­guage and that many prob­lems can be expressed in terms of lists and their transformations.

Chief among Perl’s tools for list pro­cess­ing are the func­tions map and grep. I can’t count how many times in my twenty-​five years as a devel­op­er I’ve run into code that could’ve been sim­pli­fied if only the author was famil­iar with these two func­tions. Once you under­stand map and grep, you’ll start see­ing lists every­where and the oppor­tu­ni­ty to make your code more suc­cinct and expres­sive at the same time.

What are lists?

Before we get into func­tions that manip­u­late lists, we need to under­stand what they are. A list is an ordered group of ele­ments, and those ele­ments can be any kind of data you can rep­re­sent in the lan­guage: num­bers, strings, objects, reg­u­lar expres­sions, ref­er­ences, etc., as long as they’re stored as scalars. You might think of a list as the thing that an array stores, and in fact Perl is fine with using an array where a list can go.

my @foo = (1, 2, 3);

Here we’re assign­ing the list of num­bers from 1 to 3 to the array @foo. The dif­fer­ence between the array and the list is that the list is a fixed col­lec­tion, while arrays and their ele­ments can be mod­i­fied by var­i­ous oper­a­tions. perlfaq4 has a great dis­cus­sion on the dif­fer­ences between the two.

Lists are everywhere, man!

Ever want­ed to sort some data? You were using a list.

join a bunch of things togeth­er into a string? List again.

split a string into pieces? You got a list back (in list con­text; in scalar con­text, you got the size of the list.)

Heck, even the hum­ble print func­tion and its cousin say take a list (and an option­al file­han­dle) as argu­ments; it’s why you can treat Perl as an upscale AWK and feed it scalars to out­put with a field sep­a­ra­tor.

You’re using lists all the time and may not even know it.

map: The list transformer

The map func­tion is devi­ous in its sim­plic­i­ty: It takes two inputs, an expres­sion or block of code, and a list to run it on. For every item in the list, it will alias $_ to it, and then return none, one, or many items in a list based on what hap­pens in the expres­sion or code block. You can call it like this:

my @foo = map bar($_), @list;

Or like this:

my @foo = map { bar($_) } @list;

We’re going to ignore the first way, though because Conway (Perl Best Practices, 2005) tells us that when you spec­i­fy the first argu­ment as an expres­sion, it’s hard­er to tell it apart from the remain­ing argu­ments, espe­cial­ly if that expres­sion uses a built-​in func­tion where the paren­the­ses are option­al. So always use a code block!

You should always turn to map (and not, say, a for or foreach loop) when gen­er­at­ing a new list from an old list. For example:

my @lowercased = map { lc } @mixed_case;

When paired with a lookup table, map is also the most effi­cient way to tell if a mem­ber of a list equals a string, espe­cial­ly if that list is static:

use Const::Fast;

const my %IS_EXIT_WORD => map { ($_ => 1) }
  qw(q quit bye exit stop done last finish aurevoir);

...

die if $IS_EXIT_WORD{$command};

Here we’re using maps abil­i­ty to return mul­ti­ple items per source ele­ment to gen­er­ate a con­stant hash, and then test­ing mem­ber­ship in that hash.

grep: The list filter

You may rec­og­nize the word grep” from the Unix com­mand of the same name. It’s a tool for find­ing lines of text inside of oth­er text using a reg­u­lar expres­sion describ­ing the desired result.

Perl, of course, is real­ly good at reg­u­lar expres­sions, but its grep func­tion goes beyond and enables you to match using any expres­sion or code block. Think of it as a part­ner to map; where map uses a code block to trans­form a list, grep uses one to fil­ter it down. In fact, oth­er lan­guages typ­i­cal­ly call this func­tion filter.

You can, of course, use reg­u­lar expres­sions with grep, espe­cial­ly because a reg­exp match in Perl defaults to match­ing on the $_ vari­able and grep hap­pens to pro­vide that to its code block argu­ment. So:

my @months_with_a = grep { /[Aa]/ } qw(
  January February March
  April   May      June
  July    August   September
  October November December
);

But grep real­ly comes into its own when used for its gen­er­al fil­ter­ing capa­bil­i­ties; for instance, mak­ing sure that you don’t acci­den­tal­ly try to com­pare an unde­fined value:

say $_ > 5
  ? "$_ is bigger"
  : "$_ is equal or smaller"
  for grep { defined } @numbers;

Or when exe­cut­ing a com­pli­cat­ed func­tion that returns true or false depend­ing on its arguments:

my @results = grep { really_large_database_query($_) }
              @foo;

You might even con­sid­er chain­ing map and grep togeth­er. Here’s an exam­ple for get­ting the JPEG images out of a file list and then low­er­cas­ing the results:

my @jpeg_files = map  { lc }
                grep { /\.jpe?g$/i } @files;

Side effects may include…” (updated)

When intro­duc­ing map above I not­ed that it aliased $_ for every ele­ment in the list. I used that term delib­er­ate­ly because mod­i­fi­ca­tions to $_ will mod­i­fy the orig­i­nal ele­ment itself, and that is usu­al­ly an error. Programmers call that a side effect,” and they can lead to unex­pect­ed behav­ior or at least difficult-​to-​maintain code. Consider:

my @needs_docs = grep { s/\.pm$/.pod/ && !-e }
                 @pm_files;

The intent may have been to find files end­ing in .pm that don’t have a cor­re­spond­ing .pod file, but the actu­al behav­ior is replac­ing the .pm suf­fix with .pod, then check­ing whether that file­name exists. If it does­n’t, it’s passed through to @needs_docs; regard­less, @pm_files has had its con­tents modified.

If you real­ly do need to mod­i­fy a copy of each ele­ment, assign a vari­able with­in your code block like this:

my @needs_docs = grep {
                   my $file = $_;
                   $file =~ s/\.pm$/.pod/;
                   !-e $file
                 } @pm_files;

But at that point you should prob­a­bly refac­tor your multi-​line block as a sep­a­rate function:

my @needs_docs = grep { file_without_docs($_) }
                 @pm_files;

sub file_without_docs {
    my $file = shift;
    $file =~ s/\.pm$/.pod/;
    return !-e $file;
}

In this case of using the sub­sti­tu­tion oper­a­tor s///, you could also do this when using Perl 5.14 or above to get non-​destructive sub­sti­tu­tion:

use v5.14;

my @needs_docs = grep { !-e s/\.pm$/.pod/r }
                 @pm_files;

And if you do need side effects, just use a for or foreach loop; future code main­tain­ers (i.e., you in six months) will thank you.

Taking you higher

map and grep are exam­ples of higher-​order func­tions, since they take a func­tion (in the form of a code block) as an argu­ment. So con­grat­u­la­tions, you just sig­nif­i­cant­ly lev­eled up your knowl­edge of Perl and com­put­er sci­ence. If you’re inter­est­ed in more such pro­gram­ming tech­niques, I rec­om­mend Mark Jason Dominus’ Higher Order Perl (2005), avail­able for free online.


Discover more from The Phoenix Trap

Subscribe to get the latest posts sent to your email.

Mark Gardner Avatar

Hi, I’m Mark.

Hi, I’m Mark Gard­ner, and this is my personal blog. I show software developers how to level up by building production-ready things that work. Clear code, real projects, lessons learned.

Comments

2 responses to “Better Perl: Using map and grep”

  1. James Smith Avatar

    Using mod­ern perl and the r” mod­i­fied for s/​/​/​you can rewrite the last exam­ple as:

    my @needs_docs = grep { !-e s{[.]pm}{.pod}r } @pm_files;

    The /​r mod­i­fi­er is one of the best things to come into Perl in recent years…

    1. Mark Gardner Avatar

      See, this is what Neil Bowers meant when he recent­ly wrote, People aren’t sure which fea­tures came in which ver­sion of perl, or whether they have a guard.” I had com­plete­ly for­got­ten this was a thing and which ver­sion I’d need to spec­i­fy. I’ll work on an update.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Find out more about Webmentions.)