Friday, December 17, 2021, marked the thirty-​fourth birth­day of the Perl pro­gram­ming lan­guage, and coin­ci­den­tal­ly this year saw the release of ver­sion 5.34. There are plen­ty of Perl devel­op­ers out there who haven’t kept up with recent (and not-​so-​recent) improve­ments to the lan­guage and its ecosys­tem, so I thought I might list a batch. (You may have seen some of these before in May’s post Perl can do that now!”)

The feature pragma

Perl v5.10 was released in December 2007, and with it came feature, a way of enabling new syn­tax with­out break­ing back­ward com­pat­i­bil­i­ty. You can enable indi­vid­ual fea­tures by name (e.g., use feature qw(say fc); for the say and fc key­words), or by using a fea­ture bun­dle based on the Perl ver­sion that intro­duced them. For exam­ple, the following:

use feature ':5.34';

…gives you the equiv­a­lent of:

use feature qw(bareword_filehandles bitwise current_sub evalbytes fc indirect multidimensional postderef_qq say state switch unicode_eval unicode_strings);

Boy, that’s a mouth­ful. Feature bun­dles are good. The cor­re­spond­ing bun­dle also gets implic­it­ly loaded if you spec­i­fy a min­i­mum required Perl ver­sion, e.g., with use v5.32;. If you use v5.12; or high­er, strict mode is enabled for free. So just say:

use v5.34;

And last­ly, one-​liners can use the -E switch instead of -e to enable all fea­tures for that ver­sion of Perl, so you can say the fol­low­ing on the com­mand line:

perl -E 'say "Hello world!"'

Instead of:

perl -e 'print "Hello world!\n"'

Which is great when you’re try­ing to save some typing.

The experimental pragma

Sometimes new Perl fea­tures need to be dri­ven a cou­ple of releas­es around the block before their behav­ior set­tles. Those exper­i­ments are doc­u­ment­ed in the per­l­ex­per­i­ment page, and usu­al­ly, you need both a use feature (see above) and no warnings state­ment to safe­ly enable them. Or you can sim­ply pass a list to use experimental of the fea­tures you want, e.g.:

use experimental qw(isa postderef signatures);

Ever-​expanding warnings categories

March 2000 saw the release of Perl 5.6, and with it, the expan­sion of the -w command-​line switch to a sys­tem of fine-​grained con­trols for warn­ing against dubi­ous con­structs” that can be turned on and off depend­ing on the lex­i­cal scope. What start­ed as 26 main and 20 sub­cat­e­gories has expand­ed into 31 main and 43 sub­cat­e­gories, includ­ing warn­ings for the afore­men­tioned exper­i­men­tal features.

As the rel­e­vant Perl::Critic pol­i­cy says, Using warn­ings, and pay­ing atten­tion to what they say, is prob­a­bly the sin­gle most effec­tive way to improve the qual­i­ty of your code.” If you must vio­late warn­ings (per­haps because you’re reha­bil­i­tat­ing some lega­cy code), you can iso­late such vio­la­tions to a small scope and indi­vid­ual cat­e­gories. Check out the stric­tures mod­ule on CPAN if you’d like to go fur­ther and make a safe sub­set of these cat­e­gories fatal dur­ing development.

Document other recently-​introduced syntax with Syntax::Construct

Not every new bit of Perl syn­tax is enabled with a feature guard. For the rest, there’s E. Choroba’s Syntax::Construct mod­ule on CPAN. Rather than hav­ing to remem­ber which ver­sion of Perl intro­duced what, Syntax::Construct lets you declare only what you use and pro­vides a help­ful error mes­sage if some­one tries to run your code on an old­er unsup­port­ed ver­sion. Between it and the feature prag­ma, you can pre­vent many head-​scratching moments and give your users a chance to either upgrade or workaround.

Make built-​in functions throw exceptions with autodie

Many of Perl’s built-​in func­tions only return false on fail­ure, requir­ing the devel­op­er to check every time whether a file can be opened or a system com­mand exe­cut­ed. The lex­i­cal autodie prag­ma replaces them with ver­sions that raise an excep­tion with an object that can be inter­ro­gat­ed for fur­ther details. No mat­ter how many func­tions or meth­ods deep a prob­lem occurs, you can choose to catch it and respond appro­pri­ate­ly. This leads us to…

try/​catch exception handling and Feature::Compat::Try

This year’s Perl v5.34 release intro­duced exper­i­men­tal try/​catch syn­tax for excep­tion han­dling that should look more famil­iar to users of oth­er lan­guages while han­dling the issues sur­round­ing using block eval and test­ing of the spe­cial [email protected] vari­able. If you need to remain com­pat­i­ble with old­er ver­sions of Perl (back to v5.14), just use the Feature::Compat::Try mod­ule from CPAN to auto­mat­i­cal­ly select either v5.34’s native try/​catch or a sub­set of the func­tion­al­i­ty pro­vid­ed by Syntax::Keyword::Try.

Pluggable keywords

The above­men­tioned Syntax::Keyword::Try was made pos­si­ble by the intro­duc­tion of a plug­gable key­word mech­a­nism in 2010’s Perl v5.12. So was the Future::AsyncAwait asyn­chro­nous pro­gram­ming library and the Object::Pad test­bed for new object-​oriented Perl syn­tax. If you’re handy with C and Perl’s XS glue lan­guage, check out Paul LeoNerd” Evans’ XS::Parse::Keyword mod­ule to get a leg up on devel­op­ing your own syn­tax module.

Define packages with versions and blocks

Perl v5.12 also helped reduce clut­ter by enabling a package name­space dec­la­ra­tion to also include a ver­sion num­ber, instead of requir­ing a sep­a­rate our $VERSION = ...; v5.14 fur­ther refined packages to be spec­i­fied in code blocks, so a name­space dec­la­ra­tion can be the same as a lex­i­cal scope. Putting the two togeth­er gives you:

package Local::NewHotness v1.2.3 {
    ...
}

Instead of:

{
    package Local::OldAndBusted;
    use version 0.77; our $VERSION = version->declare("v1.2.3");
    ...
}

I know which I’d rather do. (Though you may want to also use Syntax::Construct qw(package-version package-block); to help along with old­er instal­la­tions as described above.)

The // defined-​or operator

This is an easy win from Perl v5.10:

defined $foo ? $foo : $bar  # replace this
$foo // $bar                # with this

And:

$foo = $bar unless defined $foo  # replace this
$foo //= $bar                    # with this

Perfect for assign­ing defaults to variables.

state variables only initialize once

Speaking of vari­ables, ever want one to keep its old val­ue the next time a scope is entered, like in a sub? Declare it with state instead of my. Before Perl v5.10, you need­ed to use a clo­sure instead.

Save some typing with say

Perl v5.10’s bumper crop of enhance­ments also includ­ed the say func­tion, which han­dles the com­mon use case of printing a string or list of strings with a new­line. It’s less noise in your code and saves you four char­ac­ters. What’s not to love?

Note unimplemented code with ...

The ... ellip­sis state­ment (col­lo­qui­al­ly yada-​yada”) gives you an easy place­hold­er for yet-​to-​be-​implemented code. It pars­es OK but will throw an excep­tion if exe­cut­ed. Hopefully, your test cov­er­age (or at least sta­t­ic analy­sis) will catch it before your users do.

Loop and enumerate arrays with each, keys, and values

The each, keys, and values func­tions have always been able to oper­ate on hash­es. Perl v5.12 and above make them work on arrays, too. The lat­ter two are main­ly for con­sis­ten­cy, but you can use each to iter­ate over an array’s indices and val­ues at the same time:

while (my ($index, $value) = each @array) {
    ...
}

This can be prob­lem­at­ic in non-​trivial loops, but I’ve found it help­ful in quick scripts and one-liners.

delete local hash (and array) entries

Ever need­ed to delete an entry from a hash (e.g, an envi­ron­ment vari­able from %ENV or a sig­nal han­dler from %SIG) just inside a block? Perl v5.12 lets you do that with delete local.

Paired hash slices

Jumping for­ward to 2014’s Perl v5.20, the new %foo{'bar', 'baz'} syn­tax enables you to slice a sub­set of a hash with its keys and val­ues intact. Very help­ful for cherry-​picking or aggre­gat­ing many hash­es into one. For example:

my %args = (
    verbose => 1,
    name    => 'Mark',
    extra   => 'pizza',
);
# don't frob the pizza
$my_object->frob( %args{ qw(verbose name) };

Paired array slices

Not to be left out, you can also slice arrays in the same way, in this case return­ing indices and values:

my @letters = 'a' .. 'z';
my @subset_kv = %letters[16, 5, 18, 12];
# @subset_kv is now (16, 'p', 5, 'e', 18, 'r', 12, 'l')

More readable dereferencing

Perl v5.20 intro­duced and v5.24 de-​experimentalized a more read­able post­fix deref­er­enc­ing syn­tax for nav­i­gat­ing nest­ed data struc­tures. Instead of using {braces} or smoosh­ing sig­ils to the left of iden­ti­fiers, you can use a post­fixed sigil-and-star:

push @$array_ref,    1, 2, 3;  # noisy
push @{$array_ref},  1, 2, 3;  # a little easier
push $array_ref->@*, 1, 2, 3;  # read from left to right

So much of web devel­op­ment is sling­ing around and pick­ing apart com­pli­cat­ed data struc­tures via JSON, so I wel­come any­thing like this to reduce the cog­ni­tive load.

when as a statement modifier

Starting in Perl v5.12, you can use the exper­i­men­tal switch fea­tures when key­word as a post­fix mod­i­fi­er. For example:

for ($foo) {
    $a =  1 when /^abc/;
    $a = 42 when /^dna/;
    ...
}

But I don’t rec­om­mend when, given, or givens smart­match oper­a­tions as they were ret­conned as exper­i­ments in 2013’s Perl v5.18 and have remained so due to their tricky behav­ior. I wrote about some alter­na­tives using sta­ble syn­tax back in February.

Simple class inheritance with use parent

Sometimes in old­er object-​oriented Perl code, you’ll see use base as a prag­ma to estab­lish inher­i­tance from anoth­er class. Older still is the direct manip­u­la­tion of the package’s spe­cial @ISA array. In most cas­es, both should be avoid­ed in favor of use parent, which was added to core in Perl v5.10.1.

Mind you, if you’re fol­low­ing the Perl object-​oriented tutorial’s advice and have select­ed an OO sys­tem from CPAN, use its sub­class­ing mech­a­nism if it has one. Moose, Moo, and Class::Accessor’s antlers” mode all pro­vide an extends func­tion; Object::Pad pro­vides an :isa attribute on its class key­word.

Test for class membership with the isa operator

As an alter­na­tive to the isa() method pro­vid­ed to all Perl objects, Perl v5.32 intro­duced the exper­i­men­tal isa infix oper­a­tor:

$my_object->isa('Local::MyClass')
# or
$my_object isa Local::MyClass

The lat­ter can take either a bare­word class name or string expres­sion, but more impor­tant­ly, it’s safer as it also returns false if the left argu­ment is unde­fined or isn’t a blessed object ref­er­ence. The old­er isa() method will throw an excep­tion in the for­mer case and might return true if called as a class method when $my_object is actu­al­ly a string of a class name that’s the same as or inher­its from isa()s argu­ment.

Lexical subroutines

Introduced in Perl v5.18 and de-​experimentalized in 2017’s Perl v5.26, you can now pre­cede sub dec­la­ra­tions with my, state, or our. One use of the first two is tru­ly pri­vate func­tions and meth­ods, as described in this 2018 Dave Jacoby blog and as part of Neil Bowers’ 2014 sur­vey of pri­vate func­tion techniques.

Subroutine signatures

I’ve writ­ten and pre­sent­ed exten­sive­ly about sig­na­tures and alter­na­tives over the past year, so I won’t repeat that here. I’ll just add that the Perl 5 Porters devel­op­ment mail­ing list has been mak­ing a con­cert­ed effort over the past month to hash out the remain­ing issues towards ren­der­ing this fea­ture non-​experimental. The pop­u­lar Mojolicious real-​time web frame­work also pro­vides a short­cut for enabling sig­na­tures and uses them exten­sive­ly in examples.

Indented here-​documents with <<~

Perl has had shell-​style here-​document” syn­tax for embed­ding multi-​line strings of quot­ed text for a long time. Starting with Perl v5.26, you can pre­cede the delim­it­ing string with a ~ char­ac­ter and Perl will both allow the end­ing delim­iter to be indent­ed as well as strip inden­ta­tion from the embed­ded text. This allows for much more read­able embed­ded code such as runs of HTML and SQL. For example:

if ($do_query) {
    my $rows_deleted = $dbh->do(<<~'END_SQL', undef, 42);
      DELETE FROM table
      WHERE status = ?
      END_SQL
    say "$rows_deleted rows were deleted."; 
}

More readable chained comparisons

When I learned math in school, my teach­ers and text­books would often describe mul­ti­ple com­par­isons and inequal­i­ties as a sin­gle expres­sion. Unfortunately, when it came time to learn pro­gram­ming every com­put­er lan­guage I saw required them to be bro­ken up with a series of and (or &&) oper­a­tors. With Perl v5.32, this is no more:

if ( $x < $y && $y <= $z ) { ... }  # old way
if ( $x < $y <= $z )       { ... }  # new way

It’s more con­cise, less noisy, and more like what reg­u­lar math looks like.

Self-​documenting named regular expression captures

Perl’s expres­sive reg­u­lar expres­sion match­ing and text-​processing prowess are leg­endary, although overuse and poor use of read­abil­i­ty enhance­ments often turn peo­ple away from them (and Perl in gen­er­al). We often use reg­ex­ps for extract­ing data from a matched pat­tern. For example:

if ( /Time: (..):(..):(..)/ ) {  # parse out values
    say "$1 hours, $2 minutes, $3 seconds";
}

Named cap­ture groups, intro­duced in Perl v5.10, make both the pat­tern more obvi­ous and retrieval of its data less cryptic:

if ( /Time: (?<hours>..):(?<minutes>..):(?<seconds>..)/ ) {
    say "$+{hours} hours, $+{minutes} minutes, $+{seconds} seconds";
}

More readable regexp character classes

The /x reg­u­lar expres­sion mod­i­fi­er already enables bet­ter read­abil­i­ty by telling the pars­er to ignore most white­space, allow­ing you to break up com­pli­cat­ed pat­terns into spaced-​out groups and mul­ti­ple lines with code com­ments. With Perl v5.26 you can spec­i­fy /xx to also ignore spaces and tabs inside [brack­et­ed] char­ac­ter class­es, turn­ing this:

/[d-eg-i3-7]/
/[[email protected]"#$%^&*()=?<>']/

…into this:

/ [d-e g-i 3-7]/xx
/[ ! @ " # $ % ^ & * () = ? <> ' ]/xx

Set default regexp flags with the re pragma

Beginning with Perl v5.14, writ­ing use re '/xms'; (or any com­bi­na­tion of reg­u­lar expres­sion mod­i­fi­er flags) will turn on those flags until the end of that lex­i­cal scope, sav­ing you the trou­ble of remem­ber­ing them every time.

Non-​destructive substitution with s///r and tr///r

The s/// sub­sti­tu­tion and tr/// translit­er­a­tion oper­a­tors typ­i­cal­ly change their input direct­ly, often in con­junc­tion with the =~ bind­ing oper­a­tor:

s/foo/bar/;  # changes the first foo to bar in $_
$baz =~ s/foo/bar/;  # the same but in $baz

But what if you want to leave the orig­i­nal untouched, such as when pro­cess­ing an array of strings with a map? With Perl v5.14 and above, add the /r flag, which makes the sub­sti­tu­tion on a copy and returns the result:

my @changed = map { s/foo/bar/r } @original;

Unicode case-​folding with fc for better string comparisons

Unicode and char­ac­ter encod­ing in gen­er­al are com­pli­cat­ed beasts. Perl has han­dled Unicode since v5.6 and has kept pace with fix­es and sup­port for updat­ed stan­dards in the inter­ven­ing decades. If you need to test if two strings are equal regard­less of case, use the fc func­tion intro­duced in Perl v5.16.

Safer processing of file arguments with <<>>

The <> null file­han­dle or dia­mond oper­a­tor” is often used in while loops to process input per line com­ing either from stan­dard input (e.g., piped from anoth­er pro­gram) or from a list of files on the com­mand line. Unfortunately, it uses a form of Perl’s open func­tion that inter­prets spe­cial char­ac­ters such as pipes (|) that would allow it to inse­cure­ly run exter­nal com­mands. Using the <<>> dou­ble dia­mond” oper­a­tor intro­duced in Perl v5.22 forces open to treat all command-​line argu­ments as file names only. For old­er Perls, the per­lop doc­u­men­ta­tion rec­om­mends the ARGV::readonly CPAN mod­ule.

Safer loading of Perl libraries and modules from @INC

Perl v5.26 removed the abil­i­ty for all pro­grams to load mod­ules by default from the cur­rent direc­to­ry, clos­ing a secu­ri­ty vul­ner­a­bil­i­ty orig­i­nal­ly iden­ti­fied and fixed as CVE-​2016 – 1238 in pre­vi­ous ver­sions’ includ­ed scripts. If your code relied on this unsafe behav­ior, the v5.26 release notes include steps on how to adapt.

HTTP::Tiny simple HTTP/1.1 client included

To boot­strap access to CPAN on the web in the pos­si­ble absence of exter­nal tools like curl or wget, Perl v5.14 began includ­ing the HTTP::Tiny mod­ule. You can also use it in your pro­grams if you need a sim­ple web client with no dependencies.

Test2: The next generation of Perl testing frameworks

Forked and refac­tored from the ven­er­a­ble Test::Builder (the basis for the Test::More library that many are famil­iar with), Test2 was includ­ed in the core mod­ule library begin­ning with Perl v5.26. I’ve exper­i­ment­ed recent­ly with using the Test2::Suite CPAN library instead of Test::More and it looks pret­ty good. I’m also intrigued by Test2::Harness’ sup­port for thread­ing, fork­ing, and pre­load­ing mod­ules to reduce test run times.

Task::Kensho: Where to start for recommended Perl modules

This last item may not be includ­ed when you install Perl, but it’s where I turn for a col­lec­tion of well-​regarded CPAN mod­ules for accom­plish­ing a wide vari­ety of com­mon tasks span­ning from asyn­chro­nous pro­gram­ming to XML. Use it as a start­ing point or inter­ac­tive­ly select the mix of libraries appro­pri­ate to your project.


And there you have it: a selec­tion of 34 fea­tures, enhance­ments, and improve­ments for the first 34 years of Perl. What’s your favorite? Did I miss any­thing? Let me know in the comments.

chocolate bar and sugar cubes on a hand
What about My::Favorite::Module?

I men­tioned at the Ephemeral Miniconf last month that as soon as I write about one Perl mod­ule (or five), some­one inevitably brings up anoth­er (or sev­en) I’ve missed. And of course, it hap­pened again last week: no soon­er had I writ­ten in pass­ing that I was using Exception::Class than the denizens of the Libera Chat IRC #perl chan­nel insist­ed I should use Throwable instead for defin­ing my excep­tions. (I’ve already blogged about var­i­ous ways of catch­ing excep­tions.)

Why Throwable? Aside from Exception::Class’s author rec­om­mend­ing it over his own work due to a nicer, more mod­ern inter­face,” Throwable is a Moo role, so it’s com­pos­able into class­es along with oth­er roles instead of muck­ing about with mul­ti­ple inher­i­tance. This means that if your excep­tions need to do some­thing reusable in your appli­ca­tion like log­ging, you can also con­sume a role that does that and not have so much dupli­cate code. (No, I’m not going to pick a favorite log­ging mod­ule; I’ll prob­a­bly get that wrong too.)

However, since Throwable is a role instead of a class, I would have to define sev­er­al addi­tion­al packages in my tiny mod­uli­no script from last week, one for each excep­tion class I want. The beau­ty of Exception::Class is its sim­ple declar­a­tive nature: just use it and pass a list of desired class names along with options for attrib­ut­es and what­not. What’s need­ed for sim­ple use cas­es like mine is a declar­a­tive syn­tax for defin­ing sev­er­al excep­tion class­es with­out the noise of mul­ti­ple packages.

Enter Throwable::SugarFactory, a mod­ule that enables you to do just that by adding an exception func­tion for declar­ing excep­tion class­es. (There’s also the similarly-​named Throwable::Factory; see the above dis­cus­sion about nev­er being able to cov­er everybody’s favorites.) The exception func­tion takes three argu­ments: the name of the desired excep­tion class as a string, a descrip­tion, and an option­al list of instruc­tions Moo uses to build the class. It might look some­thing like this:

package Local::My::Exceptions;
use Throwable::SugarFactory;

exception GenericError  => 'something bad happened';
exception DetailedError => 'something specific happened' =>
  ( has => [ message => ( is => 'ro' ) ] );

1;

Throwable::SugarFactory takes care of cre­at­ing con­struc­tor func­tions in Perl-​style snake_case as well as func­tions for detect­ing what kind of excep­tion is being caught, so you can use your new excep­tion library like this:

#!/usr/bin/env perl

use experimental qw(isa);
use Feature::Compat::Try;
use JSON::MaybeXS;
use Local::My::Exceptions;

try {
    die generic_error();
}
catch ($e) {
    warn 'whoops!';
}

try {
    die detailed_error( message => 'you got me' );
}
catch ($e) {
    die encode_json( $e->to_hash )
      if $e isa DetailedError and defined $e->message;
    $e->throw if $e->does('Throwable');
    die $e;
}

The above also demon­strates a cou­ple of oth­er Throwable::SugarFactory fea­tures. First, you get a to_hash method that returns a hash ref­er­ence of all excep­tion data, suit­able for seri­al­iz­ing to JSON. Second, you get all of Throwable’s meth­ods, includ­ing throw for re-​throwing exceptions. 

So where does this leave last week’s FOAAS.com mod­uli­no client demon­stra­tion of object mock­ing tests? With a lit­tle bit of rewrit­ing to define and then use our sweet­er excep­tion library, it looks like this. You can review for a descrip­tion of the rest of its workings.

#!/usr/bin/env perl

package Local::CallFOAAS::Exceptions;
use Throwable::SugarFactory;

BEGIN {
    exception NoMethodError =>
      'no matching WebService::FOAAS method' =>
      ( has => [ method => ( is => 'ro' ) ] );
    exception ServiceError =>
      'error from WebService::FOAAS' =>
      ( has => [ message => ( is => 'ro' ) ] );
}

package Local::CallFOAAS;  # this is a modulino
use Test2::V0;             # enables strict, warnings, utf8

# declare all the new stuff we're using
use feature qw(say state);
use experimental qw(isa postderef signatures);
use Feature::Compat::Try;
use Syntax::Construct qw(non-destructive-substitution);

use WebService::FOAAS ();
use Package::Stash;
BEGIN { Local::CallFOAAS::Exceptions->import() }

my $foaas = Package::Stash->new('WebService::FOAAS');

my $run_as =
    !!$ENV{CPANTEST}       ? 'test'
  : !defined scalar caller ? 'run'
  :                          undef;
__PACKAGE__->$run_as(@ARGV) if defined $run_as;

sub run ( $class, @args ) {
    try { say $class->call_method(@args) }
    catch ($e) {
        die 'No method ', $e->method, "\n"
          if $e isa NoMethodError;
        die 'Service error: ', $e->message, "\n"
          if $e isa ServiceError;
        die "$e\n";
    }
    return;
}

# Utilities

sub methods ($) {
    state @methods = sort map s/^foaas_(.+)/$1/r,
      grep /^foaas_/, $foaas->list_all_symbols('CODE');
    return @methods;
}

sub call_method ( $class, $method = '', @args ) {
    state %methods = map { $_ => 1 } $class->methods();
    die no_method_error( method => $method )
      unless $methods{$method};
    return do {
        try { $foaas->get_symbol("&$method")->(@args) }
        catch ($e) { die service_error( message => $e ) }
    };
}

# Testing

sub test ( $class, @ ) {
    state $stash = Package::Stash->new($class);
    state @tests = sort grep /^_test_/,
      $stash->list_all_symbols('CODE');

    for my $test (@tests) {
        subtest $test => sub {
            try { $class->$test() }
            catch ($e) { diag $e }
        };
    }
    done_testing();
    return;
}

sub _test_can ($class) {
    state @subs = qw(run call_method methods test);
    can_ok $class, \@subs, "can do: @subs";
    return;
}

sub _test_methods ($class) {
    my $mock = mock 'WebService::FOAAS' => ( track => 1 );

    for my $method ( $class->methods() ) {
        $mock->override( $method => 1 );

        ok lives { $class->call_method($method) },
          "$method lives";
        ok scalar $mock->sub_tracking->{$method}->@*,
          "$method called";
    }
    return;
}

sub _test_service_failure ($class) {
    my $mock = mock 'WebService::FOAAS';

    for my $method ( $class->methods() ) {
        $mock->override( $method => sub { die 'mocked' } );

        my $exception =
          dies { $class->call_method($method) };
        isa_ok $exception, [ServiceError],
          "$method throws ServiceError on failure";
        like $exception->message, qr/^mocked/,
          "correct error in $method exception";
    }
    return;
}

1;

[Updated, thanks to Dan Book, Karen Etheridge, and Bob Kleemann] The only goofy bit above is the need to put the exception calls in a BEGIN block and then explic­it­ly call BEGIN { Local::CallFOAAS::Exceptions->import() }. Since the two pack­ages are in the same file, I can’t do a use state­ment since the implied require would look for a cor­re­spond­ing file or entry in %INC. (You can get around this by mess­ing with %INC direct­ly or through a mod­ule like me::inlined that does that mess­ing for you, but for a single-​purpose mod­uli­no like this it’s fine.)


happy man funny sticking tongue out

Over the past two years, I’ve got­ten back into play­ing Dungeons & Dragons, the famous table­top fan­ta­sy role-​playing game. As a soft­ware devel­op­er and musi­cian, one of my favorite char­ac­ter class­es to play is the bard, a mag­i­cal and inspir­ing per­former or word­smith. The list of basic bardic spells includes Vicious Mockery, enchant­i­ng ver­bal barbs that have the pow­er to psy­chi­cal­ly dam­age and dis­ad­van­tage an oppo­nent even if they don’t under­stand the words. (Can you see why this is so appeal­ing to a coder?)

Mocking has a role to play in soft­ware test­ing as well, in the form of mock objects that sim­u­late parts of a sys­tem that are too brit­tle, too slow, too com­pli­cat­ed, or oth­er­wise too finicky to use in real­i­ty. They enable dis­crete unit test­ing with­out rely­ing on depen­den­cies exter­nal to the code being test­ed. Mocks are great for data­bas­es, web ser­vices, or oth­er net­work resources where the goal is to test what you wrote, not what’s out in the cloud” somewhere.

Speaking of web ser­vices and mock­ing, one of my favorites is the long-​running FOAAS (link has lan­guage not safe for work), a sur­pris­ing­ly expan­sive RESTful insult ser­vice. There’s a cor­re­spond­ing Perl client API, of course, but what I was miss­ing was a handy Perl script to call that API from the ter­mi­nal com­mand line. So I wrote the fol­low­ing over Thanksgiving break, try­ing to keep it sim­ple while also show­ing the basics of mock­ing such an API. It also demon­strates some new­er Perl syn­tax and test­ing tech­niques as well as bri­an d foys mod­uli­no con­cept from Mastering Perl (sec­ond edi­tion, 2014) that mar­ries script and mod­ule into a self-​contained exe­cutable library.

#!/usr/bin/env perl

package Local::CallFOAAS;  # this is a modulino
use Test2::V0;             # enables strict, warnings, utf8

# declare all the new stuff we're using
use feature qw(say state);
use experimental qw(isa postderef signatures);
use Feature::Compat::Try;
use Syntax::Construct qw(non-destructive-substitution);

use WebService::FOAAS ();
use Package::Stash;
use Exception::Class (
    NoMethodException => {
        alias  => 'throw_no_method',
        fields => 'method',
    },
    ServiceException => { alias => 'throw_service' },
);

my $foaas = Package::Stash->new('WebService::FOAAS');

my $run_as =
    !!$ENV{CPANTEST}       ? 'test'
  : !defined scalar caller ? 'run'
  :                          undef;
__PACKAGE__->$run_as(@ARGV) if defined $run_as;

sub run ( $class, @args ) {
    try { say $class->call_method(@args) }
    catch ($e) {
        die 'No method ', $e->method, "\n"
          if $e isa NoMethodException;
        die 'Service error: ', $e->error, "\n"
          if $e isa ServiceException;
        die "$e\n";
    }
    return;
}

# Utilities

sub methods ($) {
    state @methods = sort map s/^foaas_(.+)/$1/r,
      grep /^foaas_/, $foaas->list_all_symbols('CODE');
    return @methods;
}

sub call_method ( $class, $method = '', @args ) {
    state %methods = map { $_ => 1 } $class->methods();
    throw_no_method( method => $method )
      unless $methods{$method};
    return do {
        try { $foaas->get_symbol("&$method")->(@args) }
        catch ($e) { throw_service( error => $e ) }
    };
}

# Testing

sub test ( $class, @ ) {
    state $stash = Package::Stash->new($class);
    state @tests = sort grep /^_test_/,
      $stash->list_all_symbols('CODE');

    for my $test (@tests) {
        subtest $test => sub {
            try { $class->$test() }
            catch ($e) { diag $e }
        };
    }
    done_testing();
    return;
}

sub _test_can ($class) {
    state @subs = qw(run call_method methods test);
    can_ok( $class, \@subs, "can do: @subs" );
    return;
}

sub _test_methods ($class) {
    my $mock = mock 'WebService::FOAAS' => ( track => 1 );

    for my $method ( $class->methods() ) {
        $mock->override( $method => 1 );

        ok lives { $class->call_method($method) },
          "$method lives";
        ok scalar $mock->sub_tracking->{$method}->@*,
          "$method called";
    }
    return;
}

sub _test_service_failure ($class) {
    my $mock = mock 'WebService::FOAAS';

    for my $method ( $class->methods() ) {
        $mock->override( $method => sub { die 'mocked' } );

        my $exception =
          dies { $class->call_method($method) };
        isa_ok $exception, ['ServiceException'],
          "$method throws ServiceException on failure";
        like $exception->error, qr/^mocked/,
          "correct error in $method exception";
    }
    return;
}

1;

Let’s walk through the code above.

Preliminaries

First, there’s a gener­ic she­bang line to indi­cate that Unix and Linux sys­tems should use the perl exe­cutable found in the user’s PATH via the env com­mand. I declare a pack­age name (in the Local:: name­space) so as not to pol­lute the default main pack­age of oth­er scripts that might want to require this as a mod­ule. Then I use the Test2::V0 bun­dle from Test2::Suite since the embed­ded test­ing code uses many of its func­tions. This also has the side effect of enabling the strict, warn­ings, and utf8 prag­mas, so there’s no need to explic­it­ly use them here.

(Why Test2 instead of Test::More and its deriv­a­tives and add-​ons? Both are main­tained by the same author, who rec­om­mends the for­mer. I’m see­ing more and more mod­ules using it, so I thought this would be a great oppor­tu­ni­ty to learn.)

I then declare all the new-​ish Perl fea­tures I’d like to use that need to be explic­it­ly enabled so as not to sac­ri­fice back­ward com­pat­i­bil­i­ty with old­er ver­sions of Perl 5. As of this writ­ing, some of these fea­tures (the isa class instance oper­a­tor, named argu­ment sub­rou­tine sig­na­tures, and try/​catch excep­tion han­dling syn­tax) are con­sid­ered experimental, with the lat­ter enabled in old­er ver­sions of Perl via the Feature::Compat::Try mod­ule. The friend­lier post­fix deref­er­enc­ing syn­tax was main­lined in Perl ver­sion 5.24, but ver­sions 5.20 and 5.22 still need it exper­i­men­tal. Finally, I use Syntax::Construct to announce the /r flag for non-​destructive reg­u­lar expres­sion text sub­sti­tu­tions intro­duced in ver­sion 5.14.

Next, I bring in the afore­men­tioned FOAAS Perl API with­out import­ing any of its func­tions, Package::Stash to make metapro­gram­ming eas­i­er, and a cou­ple of excep­tion class­es so that the com­mand line func­tion and oth­er con­sumers might bet­ter tell what caused a fail­ure. In prepa­ra­tion for the meth­ods below dynam­i­cal­ly dis­cov­er­ing what func­tions are pro­vid­ed by WebService::FOAAS, I gath­er up its sym­bol table (or stash) into the $foaas variable.

The next block deter­mines how, if at all, I’m going to run the code as a script. If the CPANTEST envi­ron­ment vari­able is set, I’ll call the test class method sub, but if there’s no sub­rou­tine call­ing me I’ll exe­cute the run class method. Either will receive the com­mand line argu­ments from @ARGV. If nei­ther of these con­di­tions is true, do noth­ing; the rest of the code is method declarations.

Modulino methods, metaprogramming, and exceptions

The first of these is the run method. It’s a thin wrap­per around the call_method class method detailed below, either out­putting its result or dieing with an appro­pri­ate error depend­ing on the class of excep­tion thrown. Although I chose not to write tests for this out­put, future tests might call this method and catch these rethrown excep­tions to match against them. The mes­sages end with a \n new­line char­ac­ter so die knows not to append the cur­rent script line number.

Next is a util­i­ty method called methods that uses Package::Stash’s list_all_symbols to retrieve the names of all named CODE blocks (i.e., subs) from WebService::FOAAS’s sym­bol table. Reading from right to left, these are then fil­tered with grep to only find those begin­ning in foaas_ and then trans­formed with map to remove that pre­fix. The list is then sorted and stored in a state vari­able and returned so it need not be ini­tial­ized again.

(As an aside, although perlcritic stern­ly warns against it I’ve cho­sen the expres­sion forms of grep and map here over their block forms for sim­plic­i­ty’s sake. It’s OK to bend the rules if you have a good reason.)

sub call_method is where the real action takes place. Its para­me­ters are the class that called it, the name of a FOAAS $method (default­ed to the emp­ty string), and an array of option­al argu­ments in @args. I build a hash or asso­cia­tive array from the ear­li­er methods method which I then use to see if the passed method name is one I know about. If not, I throw a NoMethodException using the throw_no_method alias func­tion cre­at­ed when I used Exception::Class at the begin­ning. Using a func­tion instead of NoMethodException->throw() means that it’s checked at com­pile time rather than run­time, catch­ing typos.

I get the sub­rou­tine (denot­ed by a & sig­il) named by $method from the $foaas stash and pass it any fur­ther received argu­ments from @args. If that WebService::FOAAS sub­rou­tine throws an excep­tion it’ll be caught and re-​thrown as a ServiceException; oth­er­wise call_method returns the result. It’s up to the caller to deter­mine what, if any­thing, to do with that result or any thrown exceptions.

Testing the modulino with mocks

This is where I start using those Test2::Suite tools I men­tioned at the begin­ning. The test class method starts by build­ing a fil­tered list of all subs begin­ning with _test_ in the cur­rent class, much like methods did above with WebService::FOAAS. I then loop through that list of subs, run­ning each as a subtest con­tain­ing a class method with any excep­tions report­ed as diag­nos­tics.

The rest of the mod­uli­no is sub­test meth­ods, start­ing with a sim­ple _test_can san­i­ty check for the pub­lic meth­ods in the class. Following that is _test_methods, which starts by mocking the WebService::FOAAS pack­age and telling Test2::Mock I want to track any added, over­rid­den, or set subs. I then loop through all the method names returned by the methods class method, overrideing each one to return a sim­ple true val­ue. I then test pass­ing those names to call_method and use the hash ref­er­ence returned by sub_tracking to check that the over­rid­den sub was called. This seems a lot sim­pler than the Test::Builder-based mock­ing libraries I’ve tried like Test::MockModule and Test::MockObject.

_test_service_failure acts in much the same way, check­ing that call_method cor­rect­ly throws ServiceExceptions if the wrapped WebService::FOAAS func­tion dies. The main dif­fer­ence is that the mocked WebService::FOAAS subs are now over­rid­den with a code ref­er­ence (sub { die 'mocked' }), which call_method uses to pop­u­late the rethrown ServiceExceptions error field.

Wrapping up

With luck, this arti­cle has giv­en you some ideas, whether it’s in mak­ing scripts (per­haps lega­cy code) testable to improve them, or writ­ing bet­ter unit tests that mock depen­den­cies, or delv­ing a lit­tle into metapro­gram­ming so you can dynam­i­cal­ly sup­port and test new fea­tures of said depen­den­cies. I hope you haven’t come away too offend­ed, at least. Let me know in the com­ments what you think.

Look, I get it. You don’t like the Perl pro­gram­ming lan­guage or have oth­er­wise dis­re­gard­ed it as dead.” (Or per­haps you haven’t, in which case please check out my oth­er blog posts!) It has weird noisy syn­tax, mix­ing reg­u­lar expres­sions, sig­ils on vari­able names, var­i­ous braces and brack­ets for data struc­tures, and a menagerie of cryp­tic spe­cial vari­ables. It’s old: 34 years in December, with a his­to­ry of (some­times ama­teur) devel­op­ers that have used and abused that syn­tax to ship code of ques­tion­able qual­i­ty. Maybe you grudg­ing­ly accept its util­i­ty but think it should die grace­ful­ly, main­tained only to run lega­cy applications.

But you know what? Perl’s still going. It’s had a steady cadence of year­ly releas­es for the past decade, intro­duc­ing new fea­tures and fenc­ing in bad behav­ior while main­tain­ing an admirable lev­el of back­ward com­pat­i­bil­i­ty. Yes, there was a too-​long adven­ture devel­op­ing what start­ed as Perl 6, but that lan­guage now has its own iden­ti­ty as Raku and even has facil­i­ties for mix­ing Perl with its native code or vice versa.

And then there’s CPAN, the Comprehensive Perl Archive Network: a continually-​updated col­lec­tion of over 200,000 open-​source mod­ules writ­ten by over 14,000 authors, the best of which are well-​tested and ‑doc­u­ment­ed (apply­ing peer pres­sure to those that fall short), pre­sent­ed through a search engine and front-​end built by scores of con­trib­u­tors. Through CPAN you can find dis­tri­b­u­tions for things like:

All of this is avail­able through a mature instal­la­tion tool­chain that doesn’t break from month to month.

Finally and most impor­tant­ly, there’s the glob­al Perl com­mu­ni­ty. The COVID-​19 pan­dem­ic has put a damper on the hun­dreds of glob­al Perl Mongers groups’ mee­tups, but that hasn’t stopped the year­ly Perl and Raku Conference from meet­ing vir­tu­al­ly. (In the past there have also been year­ly European and Asian con­fer­ences, occa­sion­al for­ays into South America and Russia, as well as hackathons and work­shops world­wide.) There are IRC servers and chan­nels for chat, mail­ing lists galore, blogs (yes, apart from this one), and a quirky social net­work that pre­dates Facebook and Twitter.

So no, Perl isn’t dead or even dying, but if you don’t like it and favor some­thing new­er, that’s OK! Technologies can coex­ist on their own mer­its and advo­cates of one don’t have to beat down their con­tem­po­raries to be suc­cess­ful. Perl hap­pens to be battle-​tested (to bor­row a term from my friend Curtis Ovid” Poe), it runs large parts of the Web (speak­ing from direct and ongo­ing expe­ri­ence in the host­ing busi­ness here), and it’s still evolv­ing to meet the needs of its users.

depth of field photography of brown tree logs

A recent Lobsters post laud­ing the virtues of AWK remind­ed me that although the lan­guage is pow­er­ful and lightning-​fast, I usu­al­ly find myself exceed­ing its capa­bil­i­ties and reach­ing for Perl instead. One such appli­ca­tion is ana­lyz­ing volu­mi­nous log files such as the ones gen­er­at­ed by this blog. Yes, WordPress has stats, but I’ve nev­er let rein­ven­tion of the wheel get in the way of a good pro­gram­ming exercise.

So I whipped this script up on Sunday night while watch­ing RuPaul’s Drag Race reruns. It pars­es my Apache web serv­er log files and reports on hits from week to week.

#!/usr/bin/env perl

use strict;
use warnings;
use Syntax::Construct 'operator-double-diamond';
use Regexp::Log::Common;
use DateTime::Format::HTTP;
use List::Util 1.33 'any';
use Number::Format 'format_number';

my $parser = Regexp::Log::Common->new(
    format  => ':extended',
    capture => [qw<req ts status>],
);
my @fields      = $parser->capture;
my $compiled_re = $parser->regexp;

my @skip_uri_patterns = qw<
  ^/+robots.txt
  [-\w]*sitemap[-\w]*.xml
  ^/+wp-
  /feed/?$
  ^/+?rest_route=
>;

my ( %count, %week_of );
while ( <<>> ) {
    my %log;
    @log{@fields} = /$compiled_re/;

    # only interested in successful or cached requests
    next unless $log{status} =~ /^2/ or $log{status} == 304;

    my ( $method, $uri, $protocol ) = split ' ', $log{req};
    next unless $method eq 'GET';
    next if any { $uri =~ $_ } @skip_uri_patterns;

    my $dt  = DateTime::Format::HTTP->parse_datetime( $log{ts} );
    my $key = sprintf '%u-%02u', $dt->week;

    # get first date of each week
    $week_of{$key} ||= $dt->date;
    $count{$key}++;
}

printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
  for sort keys %count;

Here’s some sam­ple output:

Week of 2021-07-31:      2,672
Week of 2021-08-02:     16,222
Week of 2021-08-09:     12,609
Week of 2021-08-16:     17,714
Week of 2021-08-23:     14,462
Week of 2021-08-30:     11,758
Week of 2021-09-06:     14,811
Week of 2021-09-13:        407

I first start­ed pro­to­typ­ing this on the com­mand line as if it were an awk one-​liner by using the perl -n and -a flags. The for­mer wraps code in a while loop over the <> dia­mond oper­a­tor”, pro­cess­ing each line from stan­dard input or files passed as argu­ments. The lat­ter splits the fields of the line into an array named @F. It looked some­thing like this while I was list­ing URIs (loca­tions on the website):

gunzip -c ~/logs/phoenixtrap.com-ssl_log-*.gz | \
perl -anE 'say $F[6]'

But once I real­ized I’d need to fil­ter out a bunch of URI pat­terns and do some aggre­ga­tion by date, I turned it into a script and turned to CPAN.

There I found Regexp::Log::Common and DateTime::Format::HTTP, which let me pull apart the Apache log for­mat and its time­stamp strings with­out hav­ing to write even more com­pli­cat­ed reg­u­lar expres­sions myself. (As not­ed above, this was already a wheel-​reinvention exer­cise; no need to com­pound that further.)

Regexp::Log::Common builds a com­piled reg­u­lar expres­sion based on the log for­mat and fields you’re inter­est­ed in, so that’s the con­struc­tor on lines 11 through 14. The expres­sion then returns those fields as a list, which I’m assign­ing to a hash slice with those field names as keys in line 29. I then skip over requests that aren’t suc­cess­ful or brows­er cache hits, skip over requests that don’t GET web pages or oth­er assets (e.g., POSTs to forms or updat­ing oth­er resources), and skip over the URI pat­terns men­tioned earlier.

(Those pat­terns are worth a men­tion: they include the robots.txt and sitemap XML files used by search engine index­ers, WordPress admin­is­tra­tion pages, files used by RSS news­read­ers sub­scribed to my blog, and routes used by the Jetpack WordPress add-​on. If you’re adapt­ing this for your site you might need to cus­tomize this list based on what soft­ware you use to run it.)

Lines 38 and 39 parse the time­stamp from the log into a DateTime object using DateTime::Format::HTTP and then build the key used to store the per-​week hit count. The last lines of the loop then grab the first date of each new week (assum­ing the log is in chrono­log­i­cal order) and incre­ment the count. Once fin­ished, lines 46 and 47 pro­vide a report sort­ed by week, dis­play­ing it as a friend­ly Week of date” and the hit counts aligned to the right with sprintf. Number::Format’s format_number func­tion dis­plays the totals with thou­sands separators.

Update: After this was ini­tial­ly pub­lished. astute read­er Chris McGowan not­ed that I had a bug where $log{status} was assigned the val­ue 304 with the = oper­a­tor rather than com­pared with ==. He also sug­gest­ed I use the double-​diamond <<>> oper­a­tor intro­duced in Perl v5.22.0 to avoid maliciously-​named files. Thanks, Chris!

Room for improvement

DateTime is a very pow­er­ful mod­ule but this comes at a price of speed and mem­o­ry. Something sim­pler like Date::WeekNumber should yield per­for­mance improve­ments, espe­cial­ly as my logs grow (here’s hop­ing). It requires a bit more man­u­al mas­sag­ing of the log dates to con­vert them into some­thing the mod­ule can use, though:

#!/usr/bin/env perl

use strict;
use warnings;
use Syntax::Construct qw<
  operator-double-diamond
  regex-named-capture-group
>;
use Regexp::Log::Common;
use Date::WeekNumber 'iso_week_number';
use List::Util 1.33 'any';
use Number::Format 'format_number';

my $parser = Regexp::Log::Common->new(
    format  => ':extended',
    capture => [qw<req ts status>],
);
my @fields      = $parser->capture;
my $compiled_re = $parser->regexp;

my @skip_uri_patterns = qw<
  ^/+robots.txt
  [-\w]*sitemap[-\w]*.xml
  ^/+wp-
  /feed/?$
  ^/+?rest_route=
>;

my %month = (
    Jan => '01',
    Feb => '02',
    Mar => '03',
    Apr => '04',
    May => '05',
    Jun => '06',
    Jul => '07',
    Aug => '08',
    Sep => '09',
    Oct => '10',
    Nov => '11',
    Dec => '12',
);

my ( %count, %week_of );
while ( <<>> ) {
    my %log;
    @log{@fields} = /$compiled_re/;

    # only interested in successful or cached requests
    next unless $log{status} =~ /^2/ or $log{status} == 304;

    my ( $method, $uri, $protocol ) = split ' ', $log{req};
    next unless $method eq 'GET';
    next if any { $uri =~ $_ } @skip_uri_patterns;

    # convert log timestamp to YYYY-MM-DD
    # for Date::WeekNumber
    $log{ts} =~ m!^
      (?<day>\d\d) /
      (?<month>...) /
      (?<year>\d{4}) : !x;
    my $date = "$+{year}-$month{ $+{month} }-$+{day}";

    my $week = iso_week_number($date);
    $week_of{$week} ||= $date;
    $count{$week}++;
}

printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
  for sort keys %count;

It looks almost the same as the first ver­sion, with the addi­tion of a hash to con­vert month names to num­bers and the actu­al con­ver­sion (using named reg­u­lar expres­sion cap­ture groups for read­abil­i­ty, using Syntax::Construct to check for that fea­ture). On my serv­er, this results in a ten- to eleven-​second sav­ings when pro­cess­ing two months of com­pressed logs.

What’s next? Pretty graphs? Drilling down to spe­cif­ic blog posts? Database stor­age for fur­ther queries and analy­sis? Perl and CPAN make it pos­si­ble to go far beyond what you can do with AWK. What would you add or change? Let me know in the comments.

I pub­lish Perl sto­ries on this blog once a week, and it seems every time there’s at least one response on social media that amounts to, I hate Perl because of its weird syn­tax.” Or, It looks like line noise.” (Perl seems to have out­last­ed that one — when’s the last time you used an acoustic modem?) Or the quote attrib­uted to Keith Bostic: The only lan­guage that looks the same before and after RSA encryption.”

So let’s address, con­front, and demys­ti­fy this hate. What are these objec­tion­able syn­tac­ti­cal, noisy, pos­si­bly encrypt­ed bits? And why does Perl have them?

Regular expressions

Regular expres­sions, or reg­ex­ps, are not unique to Perl. JavaScript has them. Java has them. Python has them as well as anoth­er mod­ule that adds even more fea­tures. It’s hard to find a lan­guage that does­n’t have them, either native­ly or through the use of a library. It’s com­mon to want to search text using some kind of pat­tern, and reg­ex­ps pro­vide a fair­ly stan­dard­ized if terse mini-​language for doing so. There’s even a C‑based library called PCRE, or Perl Compatible Regular Expressions,” enabling many oth­er pieces of soft­ware to embed a reg­exp engine that’s inspired by (though not quite com­pat­i­ble) with Perl’s syntax.

Being itself inspired by Unix tools like grep, sed, and awk, Perl incor­po­rat­ed reg­u­lar expres­sions into the lan­guage as few oth­er lan­guages have, with bind­ing oper­a­tors of =~ and !~ enabling easy match­ing and sub­sti­tu­tions against expres­sions, and pre-​compilation of reg­ex­ps into their own type of val­ue. Perl then added the abil­i­ty to sep­a­rate reg­ex­ps by white­space to improve read­abil­i­ty, use dif­fer­ent delim­iters to avoid the leaning-​toothpick syn­drome of escap­ing slash (/) char­ac­ters with back­slash­es (\), and name your cap­ture groups and back­ref­er­ences when sub­sti­tut­ing or extract­ing strings.

All this is to say that Perl reg­u­lar expres­sions can be some of the most read­able and robust when used to their full poten­tial. Early on this helped cement Perl’s rep­u­ta­tion as a text-​processing pow­er­house, though the core of reg­ex­ps’ suc­cinct syn­tax can result in difficult-​to-​read code. Such inscrutable exam­ples can be found in any lan­guage that imple­ments reg­u­lar expres­sions; at least Perl offers the enhance­ments men­tioned above.

Sigils

Perl has three built-​in data types that enable you to build all oth­er data struc­tures no mat­ter how com­plex. Its vari­able names are always pre­ced­ed by a sig­il, which is just a fan­cy term for a sym­bol or punc­tu­a­tion mark.

  • A scalar con­tains a string of char­ac­ters, a num­ber, or a ref­er­ence to some­thing, and is pre­ced­ed with a $ (dol­lar sign).
  • An array is an ordered list of scalars begin­ning with an ele­ment num­bered 0 and is pre­ced­ed with a @ (at sign). 
  • A hash, or asso­cia­tive array, is an unordered col­lec­tion of scalars indexed by string keys and is pre­ced­ed with a % (per­cent sign).

So vari­able names $look @like %this. Individual ele­ments of arrays or hash­es are scalars, so they $look[0] $like{'this'}. (That’s the first ele­ment of the @look array count­ing from zero, and the ele­ment in the %like hash with a key of 'this'.)

Perl also has a con­cept of slices, or select­ed parts of an array or hash. A slice of an array looks like @this[1, 2, 3], and a slice of a hash looks like @that{'one', 'two', 'three'}. You could write it out long-​hand like ($this[1], $this[2], $this[3]) and ($that{'one'}, $that{'two'}, $that{'three'} but slices are much eas­i­er. Plus you can even spec­i­fy one or more ranges of ele­ments with the .. oper­a­tor, so @this[0 .. 9] would give you the first ten ele­ments of @this, or @this[0 .. 4, 6 .. 9] would give you nine with the one at index 5 miss­ing. Handy, that.

In oth­er words, the sig­il always tells you what you’re going to get. If it’s a sin­gle scalar val­ue, it’s pre­ced­ed with a $; if it’s a list of val­ues, it’s pre­ced­ed with a @; and if it’s a hash of key-​value pairs, it’s pre­ced­ed with a %. You nev­er have to be con­fused about the con­tents of a vari­able because the name will tell you what’s inside.

Data structures, anonymous values, and dereferencing

I men­tioned ear­li­er that you can build com­plex data struc­tures from Perl’s three built-​in data types. Constructing them with­out a lot of inter­me­di­ate vari­ables requires you to use things like:

  • lists, denot­ed between ( paren­the­ses )
  • anony­mous arrays, denot­ed between [ square brack­ets ]
  • and anony­mous hash­es, denot­ed between { curly braces }.

Given these tools you could build, say, a scalar ref­er­enc­ing an array of street address­es, each address being an anony­mous hash:

$addresses = [
  { 'name'    => 'John Doe',
    'address' => '123 Any Street',
    'city'    => 'Anytown',
    'state'   => 'TX',
  },
  { 'name'    => 'Mary Smith',
    'address' => '100 Other Avenue',
    'city'    => 'Whateverville',
    'state'   => 'PA',
  },
];

(The => is just a way to show cor­re­spon­dence between a hash key and its val­ue, and is just a fun­ny way to write a com­ma (,). And like some oth­er pro­gram­ming lan­guages, it’s OK to have trail­ing com­mas in a list as we do for the 'state' entries above; it makes it eas­i­er to add more entries later.)

Although I’ve nice­ly spaced out my exam­ple above, you can imag­ine a less socia­ble devel­op­er might cram every­thing togeth­er with­out any spaces or new­lines. Further, to extract a spe­cif­ic val­ue from this struc­ture this same per­son might write the fol­low­ing, mak­ing you count dol­lar signs one after anoth­er while read­ing right-​to-​left then left-to-right:

say $$addresses[1]{'name'};

We don’t have to do that, though; we can use arrows that look like -> to deref­er­ence our array and hash elements:

say $addresses->[1]->{'name'};

We can even use post­fix deref­er­enc­ing to pull a slice out of this struc­ture, which is just a fan­cy way of say­ing always read­ing left to right”:

say for $addresses->[1]->@{'name', 'city'};

Which prints out:

Mary Smith
Whateverville

Like I said above, the sig­il always tells you what you’re going to get. In this case, we got:

  • a sliced list of val­ues with the keys 'name' and 'city' out of…
  • an anony­mous hash that was itself the sec­ond ele­ment (count­ing from zero, so index of 1) ref­er­enced in…
  • an anony­mous array which was itself ref­er­enced by…
  • the scalar named $addresses.

That’s a mouth­ful, but com­pli­cat­ed data struc­tures often are. That’s why Perl pro­vides a Data Structures Cookbook as the perldsc doc­u­men­ta­tion page, a ref­er­ences tuto­r­i­al as the perlreftut page, and final­ly a detailed guide to ref­er­ences and nest­ed data struc­tures as the perlref page.

Special variables

Perl was also inspired by Unix com­mand shell lan­guages like the Bourne shell (sh) or Bourne-​again shell (bash), so it has many spe­cial vari­able names using punc­tu­a­tion. There’s @_ for the array of argu­ments passed to a sub­rou­tine, $$ for the process num­ber the cur­rent pro­gram is using in the oper­at­ing sys­tem, and so on. Some of these are so com­mon in Perl pro­grams they are writ­ten with­out com­men­tary, but for the oth­ers there is always the English mod­ule, enabling you to sub­sti­tute in friend­ly (or at least more awk-like) names.

With use English; at the top of your pro­gram, you can say:

All of these pre­de­fined vari­ables, punc­tu­a­tion and English names alike, are doc­u­ment­ed on the perlvar doc­u­men­ta­tion page.

The choice to use punc­tu­a­tion vari­ables or their English equiv­a­lents is up to the devel­op­er, and some have more famil­iar­i­ty with and assume their read­ers under­stand the punc­tu­a­tion vari­ety. Other less-​friendly devel­op­ers engage in code golf,” attempt­ing to express their pro­grams in as few key­strokes as possible.

To com­bat these and oth­er unso­cia­ble ten­den­cies, the perlstyle doc­u­men­ta­tion page admon­ish­es, Perl is designed to give you sev­er­al ways to do any­thing, so con­sid­er pick­ing the most read­able one.” Developers can (and should) also use the perlcritic tool and its includ­ed poli­cies to encour­age best prac­tices, such as pro­hibit­ing all but a few com­mon punc­tu­a­tion vari­ables.

Conclusion: Do you still hate Perl?

There are only two kinds of lan­guages: the ones peo­ple com­plain about and the ones nobody uses.

Bjarne Stroustrup, design­er of the C++ pro­gram­ming language

It’s easy to hate what you don’t under­stand. I hope that read­ing this arti­cle has helped you deci­pher some of Perl’s noisy” quirks as well as its fea­tures for increased read­abil­i­ty. Let me know in the com­ments if you’re hav­ing trou­ble grasp­ing any oth­er aspects of the lan­guage or its ecosys­tem, and I’ll do my best to address them in future posts.

Back To The Future DeLorean

Last week saw the release of Perl 5.34.0 (you can get it here), and with it comes a year’s worth of new fea­tures, per­for­mance enhance­ments, bug fix­es, and oth­er improve­ments. It seems like a good time to high­light some of my favorite changes over the past decade and a half, espe­cial­ly for those with more dat­ed knowl­edge of Perl. You can always click on the head­ers below for the full releas­es’ perldelta pages.

Perl 5.10 (2007)

This was a big release, com­ing as it did over five years after the pre­vi­ous major 5.8 release. Not that Perl devel­op­ers were idle — but it would­n’t be until ver­sion 5.14 that the lan­guage would adopt a steady year­ly release cadence.

Due to the build-​up time, many core enhance­ments were made but the most impor­tant was arguably the feature prag­ma, enabling the addi­tion of new syn­tax that would oth­er­wise break Perl’s back­ward com­pat­i­bil­i­ty. 5.10 also intro­duced the defined-​or oper­a­tor (//), state vari­ables that per­sist their pre­vi­ous val­ue, the say func­tion for auto­mat­i­cal­ly append­ing a new­line on out­put (so much saved typ­ing), and a large col­lec­tion of improve­ments to reg­u­lar expres­sions. In addi­tion, this release intro­duced smart match­ing (~~), though ver­sion 5.18 would even­tu­al­ly rel­e­gate it to exper­i­men­tal sta­tus.

Perl 5.12 (2010)

This release also saw many new fea­tures added, but if I had to pick one mar­quee item it would be exper­i­men­tal sup­port for plug­gable key­words, which enabled authors to extend the lan­guage itself with­out mod­i­fy­ing the core. Previously one would either use plain func­tions, hacky source fil­ters, or the dep­re­cat­ed Devel::Declare mod­ule to sim­u­late this func­tion­al­i­ty. CPAN authors would go on to cre­ate all kinds of new syn­tax, some­times pro­to­typ­ing fea­tures that would even­tu­al­ly make their way into core.

Perl 5.14 (2011)

5.14 had a big list of enhance­ments, includ­ing Unicode 6.0 sup­port and a gag­gle of reg­u­lar expres­sion fea­tures. My favorite of these was the /r switch for non-​destructive sub­sti­tu­tions.

But as the first year­ly cadence release, the changes in pol­i­cy took cen­ter stage. The Perl 5 Porters (p5p) explic­it­ly com­mit­ted to sup­port­ing the two most recent sta­ble release series, pro­vid­ing secu­ri­ty patch­es only for release series occur­ring in the past three years. They also defined an explic­it com­pat­i­bil­i­ty and dep­re­ca­tion pol­i­cy, with def­i­n­i­tions for fea­tures that may be exper­i­men­tal, dep­re­cat­ed, dis­cour­aged, and removed.

Perl 5.16 (2012)

Another year, anoth­er ver­sion bump. This time the core enhance­ments were all over the map (although no enhance­ments to the map function 😀 ).

May I high­light anoth­er doc­u­men­ta­tion change, though? The perlootut Object-​Oriented Programming in Perl Tutorial replaced the old perltoot, perltooc, perlboot, and perlbot pages, pro­vid­ing an intro­duc­tion to object-​oriented design con­cepts before strong­ly rec­om­mend­ing the use of one of the OO sys­tems from CPAN. Mentioned are Moose, its alter­na­tive Mouse, Class::Accessor, Object::Tiny, and Role::Tinys usage with the lat­ter two. Later ver­sions of perlootut would rec­om­mend Moo rather than Mouse.

Perl 5.18 (2013)

As men­tioned ear­li­er, Perl 5.18 ren­dered smart­match exper­i­men­tal, as well as lex­i­cal use of the $_ vari­able. With these came a new cat­e­go­ry of warn­ings for exper­i­men­tal fea­tures and a method for over­rid­ing such warn­ings feature-​by-​feature. Fitting in with the secu­ri­ty and safe­ty theme, hash­es were over­hauled to ran­dom­ize key/​value order, increas­ing their resis­tance to algo­rith­mic com­plex­i­ty attacks.

But it was­n’t all fenc­ing in bad behav­ior. Lexical sub­rou­tines made their first (exper­i­men­tal) appear­ance, and although I con­fess I haven’t had much call for them in my work, oth­ers have come up with some inter­est­ing uses. Four years lat­er they became non-​experimental.

Perl 5.20 (2014)

Three new syn­tax fea­tures arrived in 2014: exper­i­men­tal sub­rou­tine sig­na­tures (of which I’ve writ­ten more about here), key/​value hash slices and index/​value array slices, and exper­i­men­tal post­fix deref­er­enc­ing. This last enables clean­er left-​to-​right syn­tax when deref­er­enc­ing variables:

  • @{ $array_ref } becomes $array_ref->@*
  • %{ $hash_ref } becomes $hash_ref->%*
  • Etc.

Postfix deref­er­enc­ing became non-​experimental in Perl 5.24, and vig­or­ous dis­cus­sion con­tin­ues on sub­rou­tine sig­na­tures’ future.

Perl 5.22 (2015)

Speaking of sub­rou­tine sig­na­tures, their loca­tion moved to between the sub­rou­tine name (if any) and the attribute list (if any). Previously they appeared after attrib­ut­es. The les­son? Remain con­scious of exper­i­men­tal fea­tures in your code, and be pre­pared to make changes when upgrading.

In addi­tion to the enhance­ments, secu­ri­ty updates, per­for­mance fix­es, and dep­re­ca­tions, devel­op­ers removed the his­tor­i­cal­ly notable CGI mod­ule. First added to core in 1997 in recog­ni­tion of its crit­i­cal role in enabling web devel­op­ment, it’s been sup­plant­ed by bet­ter alter­na­tives on CPAN.

Perl 5.24 (2016)

Perl 5.20s post­fix deref­er­enc­ing was no longer exper­i­men­tal, and devel­op­ers removed both lex­i­cal $_ and autoderef­er­enc­ing on calls to push, pop, shift, unshift, splice, keys, values, and each.

Perl 5.26 (2017)

The incor­po­ra­tion of exper­i­men­tal fea­tures con­tin­ued, with lex­i­cal sub­rou­tines mov­ing into full sup­port. I like the added read­abil­i­ty enhance­ments, though: indent­ed here-​documents; the /xx reg­u­lar expres­sion mod­i­fi­er for tabs and spaces in char­ac­ter class­es; and @{^CAPTURE}, %{^CAPTURE}, and %{^CAPTURE_ALL} for reg­exp match­es with a lit­tle more self-documentation.

Perl 5.28 (2018)

Experimental sub­rou­tine sig­na­ture and attribute order­ing flipped back to its Perl 5.20 sequence of attributes-​then-​signature. Bit of a roller­coast­er ride on this one. You could do worse than using some­thing like Type::Params until this set­tles and get a wide vari­ety of type con­straints in the bargain.

Perl 5.30 (2019)

Pour one out for AWK and Fortran pro­gram­mers migrat­ing to Perl: the $[vari­able for set­ting the low­er bound of arrays could no longer be set to any­thing oth­er than zero. This had a long dep­re­ca­tion cycle start­ing in Perl 5.12.

Perl 5.32 (2020)

In 2020 Perl’s devel­op­ment moved to GitHub. And once again, I’m going to high­light read­abil­i­ty enhance­ments: the exper­i­men­tal isa oper­a­tor could be used to say:

if ( $obj isa Some::Class ) { ... }

instead of

use Scalar::Util 'blessed';
if ( blessed($obj) and $obj->isa('Some::Class') { ... }

You could also chain com­par­i­son oper­a­tors, lead­ing to the more math­e­mat­i­cal­ly con­cise if ( $x < $y <= $z ) {...} rather than if ( $x < $y and $y <= $z ) {...}.

Perl 5.34 (2021)

Finally, we come to last week’s release and its intro­duc­tion of exper­i­men­tal try/​catch excep­tion han­dling syn­tax. If you need to sup­port ear­li­er ver­sions of Perl back to 5.14, you can use Feature::Compat::Try. Earlier this year I inter­viewed the fea­ture and mod­ule’s author, Paul LeoNerd” Evans, for Perl.com. This year also marked the debut of Perl’s new gov­er­nance mod­el with the appoint­ment of a Core Team and a three-​member Steering Council.

What are some of your favorite Perl improve­ments over the years? Check out the perlhist doc­u­ment for a detailed chronol­o­gy and refresh­er with the var­i­ous perldelta pages and leave me a com­ment below.