plate of eggs and hash browns

This month I start­ed a new job at Alert Logic, a cyber­se­cu­ri­ty provider with Perl (among many oth­er things) at its beat­ing heart. I’ve been learn­ing a lot, and part of the process has been under­stand­ing the APIs in the code base. To that end, I’ve been writ­ing small test scripts to tease apart data struc­tures, using Perl array-​processing, list-​processing, and hash- (i.e., asso­cia­tive array)-processing func­tions.

I’ve cov­ered map, grep, and friends a cou­ple times before. Most recent­ly, I described using List::Util’s any func­tion to check if a con­di­tion is true for any item in a list. In the sim­plest case, you can use it to check to see if a giv­en val­ue is in the list at all:

use feature 'say';
use List::Util 'any';
my @colors =
  qw(red orange yellow green blue indigo violet);
say 'matched' if any { /^red$/ } @colors;

However, if you’re going to be doing this a lot with arbi­trary strings, Perl FAQ sec­tion 4 advis­es turn­ing the array into the keys of a hash and then check­ing for mem­ber­ship there. For exam­ple, here’s a sim­ple script to check if the col­ors input (either from the key­board or from files passed as argu­ments) are in the rainbow:

#!/usr/bin/env perl

use v5.22; # introduced <<>> for safe opening of arguments
use warnings;
 
my %in_colors = map {$_ => 1}
  qw(red orange yellow green blue indigo violet);

while (<<>>) {
  chomp;
  say "$_ is in the rainbow" if $in_colors{$_};
}

List::Util has a bunch of func­tions for pro­cess­ing lists of pairs that I’ve found use­ful when paw­ing through hash­es. pairgrep, for exam­ple, acts just like grep but instead assigns $a and $b to each key and val­ue passed in and returns the result­ing pairs that match. I’ve used it as a quick way to search for hash entries match­ing cer­tain val­ue conditions:

use List::Util 'pairgrep';
my %numbers = (zero => 0, one => 1, two => 2, three => 3);
my %odds = pairgrep {$b % 2} %numbers;

Sure, you could do this by invok­ing a mix of plain grep, keys, and a hash slice, but it’s nois­i­er and more repetitive:

use v5.20; # for key/value hash slice 
my %odds = %numbers{grep {$numbers{$_} % 2} keys %numbers};

pairgreps com­piled C‑based XS code can also be faster, as evi­denced by this Benchmark script that works through a hash made of the Unix words file (479,828 entries on my machine):

#!/usr/bin/env perl

use v5.20;
use warnings;
use List::Util 'pairgrep';
use Benchmark 'cmpthese';

my (%words, $count);
open my $fh, '<', '/usr/share/dict/words'
  or die "can't open words: $!";
while (<$fh>) {
  chomp;
  $words{$_} = $count++;
}
close $fh;

cmpthese(100, {
  grep => sub {
    my %odds = %words{grep {$words{$_} % 2} keys %words};
  },
  pairgrep => sub {
    my %odds = pairgrep {$b % 2} %words;
  },
} );

Benchmark out­put:

           Rate     grep pairgrep
grep     1.47/s       --     -20%
pairgrep 1.84/s      25%       --

In gen­er­al, I urge you to work through the Perl doc­u­men­ta­tions tuto­ri­als on ref­er­ences, lists of lists, the data struc­tures cook­book, and the FAQs on array and hash manip­u­la­tion. Then dip into the var­i­ous list-​processing mod­ules (espe­cial­ly the includ­ed List::Util and CPAN’s List::SomeUtils) for ready-​made func­tions for com­mon oper­a­tions. You’ll find a wealth of tech­niques for cre­at­ing, man­ag­ing, and pro­cess­ing the data struc­tures that your pro­grams need.