Lister Platz

As pre­vi­ous­ly writ­ten, I like list pro­cess­ing. Many com­put­ing prob­lems can be bro­ken down into trans­form­ing and fil­ter­ing lists, and Perl has got the fun­da­men­tals cov­ered with func­tions like map, grep, and sort. There is so much more you might want to do, though, and CPAN has a pletho­ra of list and array pro­cess­ing modules.

However, due to the vicis­si­tudes of Perl mod­ule main­te­nance, we have a sit­u­a­tion where it’s not clear at a glance where to turn when you’ve got a list that needs pro­cess­ing. So here’s anoth­er list: the list mod­ules of CPAN. Click through to dis­cov­er what func­tions they provide.

  • We’ve got List::Util which has been released as part of Perl since ver­sion 5.7.3.
  • We’ve got List::MoreUtils which has some func­tions which are named the same as Util but behave differently.
  • We’ve got List::SomeUtils which dupli­cates MoreUtils but with few­er dependencies.
  • We’ve got List::UtilsBy which MoreUtils has also cribbed some func­tions from.
  • We’ve got List::AllUtils which attempts to con­sol­i­date Util, SomeUtils, and ListBy but has some excep­tions to called mod­ules because of the afore­men­tioned dupli­ca­tion between Util and SomeUtils.
  • We’ve got List::Util::MaybeXS which helps with pure Perl fall­backs in case your ver­sion of Util is too old to have a cer­tain function.
  • We’ve got List::MoreUtils::XS which pro­vides (some?) faster ver­sions of MoreUtils’ func­tions (but you still have to use MoreUtils).
  • And last­ly, we have Util::Any which lets you import func­tions from Util, MoreUtils, and just for good mea­sure Scalar::Util, Hash::Util, String::Util, String::CamelCase, List::Pairwise, and Data::Dumper. But it has­n’t been updat­ed since 2016, so it does­n’t nec­es­sar­i­ly export the func­tions added to those mod­ules since then.

Am I miss­ing any­thing? Probably! But these are the ones most asso­ci­at­ed with being upstream on the CPAN River, so they (or the mod­ules they con­sol­i­date) have more projects depend­ing on them.

As a Perl devel­op­er, you’re prob­a­bly aware of the lan­guage’s strengths as a text-​processing lan­guage and how many com­put­ing tasks can be bro­ken down into those types of tasks. You might not real­ize, though, that Perl is also a world-​class list pro­cess­ing lan­guage and that many prob­lems can be expressed in terms of lists and their transformations.

Chief among Perl’s tools for list pro­cess­ing are the func­tions map and grep. I can’t count how many times in my twenty-​five years as a devel­op­er I’ve run into code that could’ve been sim­pli­fied if only the author was famil­iar with these two func­tions. Once you under­stand map and grep, you’ll start see­ing lists every­where and the oppor­tu­ni­ty to make your code more suc­cinct and expres­sive at the same time.

What are lists?

Before we get into func­tions that manip­u­late lists, we need to under­stand what they are. A list is an ordered group of ele­ments, and those ele­ments can be any kind of data you can rep­re­sent in the lan­guage: num­bers, strings, objects, reg­u­lar expres­sions, ref­er­ences, etc., as long as they’re stored as scalars. You might think of a list as the thing that an array stores, and in fact Perl is fine with using an array where a list can go.

my @foo = (1, 2, 3);

Here we’re assign­ing the list of num­bers from 1 to 3 to the array @foo. The dif­fer­ence between the array and the list is that the list is a fixed col­lec­tion, while arrays and their ele­ments can be mod­i­fied by var­i­ous oper­a­tions. perlfaq4 has a great dis­cus­sion on the dif­fer­ences between the two.

Lists are everywhere, man!

Ever want­ed to sort some data? You were using a list.

join a bunch of things togeth­er into a string? List again.

split a string into pieces? You got a list back (in list con­text; in scalar con­text, you got the size of the list.)

Heck, even the hum­ble print func­tion and its cousin say take a list (and an option­al file­han­dle) as argu­ments; it’s why you can treat Perl as an upscale AWK and feed it scalars to out­put with a field sep­a­ra­tor.

You’re using lists all the time and may not even know it.

map: The list transformer

The map func­tion is devi­ous in its sim­plic­i­ty: It takes two inputs, an expres­sion or block of code, and a list to run it on. For every item in the list, it will alias $_ to it, and then return none, one, or many items in a list based on what hap­pens in the expres­sion or code block. You can call it like this:

my @foo = map bar($_), @list;

Or like this:

my @foo = map { bar($_) } @list;

We’re going to ignore the first way, though because Conway (Perl Best Practices, 2005) tells us that when you spec­i­fy the first argu­ment as an expres­sion, it’s hard­er to tell it apart from the remain­ing argu­ments, espe­cial­ly if that expres­sion uses a built-​in func­tion where the paren­the­ses are option­al. So always use a code block!

You should always turn to map (and not, say, a for or foreach loop) when gen­er­at­ing a new list from an old list. For example:

my @lowercased = map { lc } @mixed_case;

When paired with a lookup table, map is also the most effi­cient way to tell if a mem­ber of a list equals a string, espe­cial­ly if that list is static:

use Const::Fast;

const my %IS_EXIT_WORD => map { ($_ => 1) }
  qw(q quit bye exit stop done last finish aurevoir);

...

die if $IS_EXIT_WORD{$command};

Here we’re using maps abil­i­ty to return mul­ti­ple items per source ele­ment to gen­er­ate a con­stant hash, and then test­ing mem­ber­ship in that hash.

grep: The list filter

You may rec­og­nize the word grep” from the Unix com­mand of the same name. It’s a tool for find­ing lines of text inside of oth­er text using a reg­u­lar expres­sion describ­ing the desired result.

Perl, of course, is real­ly good at reg­u­lar expres­sions, but its grep func­tion goes beyond and enables you to match using any expres­sion or code block. Think of it as a part­ner to map; where map uses a code block to trans­form a list, grep uses one to fil­ter it down. In fact, oth­er lan­guages typ­i­cal­ly call this func­tion filter.

You can, of course, use reg­u­lar expres­sions with grep, espe­cial­ly because a reg­exp match in Perl defaults to match­ing on the $_ vari­able and grep hap­pens to pro­vide that to its code block argu­ment. So:

my @months_with_a = grep { /[Aa]/ } qw(
  January February March
  April   May      June
  July    August   September
  October November December
);

But grep real­ly comes into its own when used for its gen­er­al fil­ter­ing capa­bil­i­ties; for instance, mak­ing sure that you don’t acci­den­tal­ly try to com­pare an unde­fined value:

say $_ > 5
  ? "$_ is bigger"
  : "$_ is equal or smaller"
  for grep { defined } @numbers;

Or when exe­cut­ing a com­pli­cat­ed func­tion that returns true or false depend­ing on its arguments:

my @results = grep { really_large_database_query($_) }
              @foo;

You might even con­sid­er chain­ing map and grep togeth­er. Here’s an exam­ple for get­ting the JPEG images out of a file list and then low­er­cas­ing the results:

my @jpeg_files = map  { lc }
                grep { /\.jpe?g$/i } @files;

Side effects may include…” (updated)

When intro­duc­ing map above I not­ed that it aliased $_ for every ele­ment in the list. I used that term delib­er­ate­ly because mod­i­fi­ca­tions to $_ will mod­i­fy the orig­i­nal ele­ment itself, and that is usu­al­ly an error. Programmers call that a side effect,” and they can lead to unex­pect­ed behav­ior or at least difficult-​to-​maintain code. Consider:

my @needs_docs = grep { s/\.pm$/.pod/ && !-e }
                 @pm_files;

The intent may have been to find files end­ing in .pm that don’t have a cor­re­spond­ing .pod file, but the actu­al behav­ior is replac­ing the .pm suf­fix with .pod, then check­ing whether that file­name exists. If it does­n’t, it’s passed through to @needs_docs; regard­less, @pm_files has had its con­tents modified.

If you real­ly do need to mod­i­fy a copy of each ele­ment, assign a vari­able with­in your code block like this:

my @needs_docs = grep {
                   my $file = $_;
                   $file =~ s/\.pm$/.pod/;
                   !-e $file
                 } @pm_files;

But at that point you should prob­a­bly refac­tor your multi-​line block as a sep­a­rate function:

my @needs_docs = grep { file_without_docs($_) }
                 @pm_files;

sub file_without_docs {
    my $file = shift;
    $file =~ s/\.pm$/.pod/;
    return !-e $file;
}

In this case of using the sub­sti­tu­tion oper­a­tor s///, you could also do this when using Perl 5.14 or above to get non-​destructive sub­sti­tu­tion:

use v5.14;

my @needs_docs = grep { !-e s/\.pm$/.pod/r }
                 @pm_files;

And if you do need side effects, just use a for or foreach loop; future code main­tain­ers (i.e., you in six months) will thank you.

Taking you higher

map and grep are exam­ples of higher-​order func­tions, since they take a func­tion (in the form of a code block) as an argu­ment. So con­grat­u­la­tions, you just sig­nif­i­cant­ly lev­eled up your knowl­edge of Perl and com­put­er sci­ence. If you’re inter­est­ed in more such pro­gram­ming tech­niques, I rec­om­mend Mark Jason Dominus’ Higher Order Perl (2005), avail­able for free online.