Friday, December 17, 2021, marked the thirty-​fourth birth­day of the Perl pro­gram­ming lan­guage, and coin­ci­den­tal­ly this year saw the release of ver­sion 5.34. There are plen­ty of Perl devel­op­ers out there who haven’t kept up with recent (and not-​so-​recent) improve­ments to the lan­guage and its ecosys­tem, so I thought I might list a batch. (You may have seen some of these before in May’s post Perl can do that now!”)

The feature pragma

Perl v5.10 was released in December 2007, and with it came feature, a way of enabling new syn­tax with­out break­ing back­ward com­pat­i­bil­i­ty. You can enable indi­vid­ual fea­tures by name (e.g., use feature qw(say fc); for the say and fc key­words), or by using a fea­ture bun­dle based on the Perl ver­sion that intro­duced them. For exam­ple, the following:

use feature ':5.34';

…gives you the equiv­a­lent of:

use feature qw(bareword_filehandles bitwise current_sub evalbytes fc indirect multidimensional postderef_qq say state switch unicode_eval unicode_strings);

Boy, that’s a mouth­ful. Feature bun­dles are good. The cor­re­spond­ing bun­dle also gets implic­it­ly loaded if you spec­i­fy a min­i­mum required Perl ver­sion, e.g., with use v5.32;. If you use v5.12; or high­er, strict mode is enabled for free. So just say:

use v5.34;

And last­ly, one-​liners can use the -E switch instead of -e to enable all fea­tures for that ver­sion of Perl, so you can say the fol­low­ing on the com­mand line:

perl -E 'say "Hello world!"'

Instead of:

perl -e 'print "Hello world!\n"'

Which is great when you’re try­ing to save some typing.

The experimental pragma

Sometimes new Perl fea­tures need to be dri­ven a cou­ple of releas­es around the block before their behav­ior set­tles. Those exper­i­ments are doc­u­ment­ed in the per­l­ex­per­i­ment page, and usu­al­ly, you need both a use feature (see above) and no warnings state­ment to safe­ly enable them. Or you can sim­ply pass a list to use experimental of the fea­tures you want, e.g.:

use experimental qw(isa postderef signatures);

Ever-​expanding warnings categories

March 2000 saw the release of Perl 5.6, and with it, the expan­sion of the -w command-​line switch to a sys­tem of fine-​grained con­trols for warn­ing against dubi­ous con­structs” that can be turned on and off depend­ing on the lex­i­cal scope. What start­ed as 26 main and 20 sub­cat­e­gories has expand­ed into 31 main and 43 sub­cat­e­gories, includ­ing warn­ings for the afore­men­tioned exper­i­men­tal features.

As the rel­e­vant Perl::Critic pol­i­cy says, Using warn­ings, and pay­ing atten­tion to what they say, is prob­a­bly the sin­gle most effec­tive way to improve the qual­i­ty of your code.” If you must vio­late warn­ings (per­haps because you’re reha­bil­i­tat­ing some lega­cy code), you can iso­late such vio­la­tions to a small scope and indi­vid­ual cat­e­gories. Check out the stric­tures mod­ule on CPAN if you’d like to go fur­ther and make a safe sub­set of these cat­e­gories fatal dur­ing development.

Document other recently-​introduced syntax with Syntax::Construct

Not every new bit of Perl syn­tax is enabled with a feature guard. For the rest, there’s E. Choroba’s Syntax::Construct mod­ule on CPAN. Rather than hav­ing to remem­ber which ver­sion of Perl intro­duced what, Syntax::Construct lets you declare only what you use and pro­vides a help­ful error mes­sage if some­one tries to run your code on an old­er unsup­port­ed ver­sion. Between it and the feature prag­ma, you can pre­vent many head-​scratching moments and give your users a chance to either upgrade or workaround.

Make built-​in functions throw exceptions with autodie

Many of Perl’s built-​in func­tions only return false on fail­ure, requir­ing the devel­op­er to check every time whether a file can be opened or a system com­mand exe­cut­ed. The lex­i­cal autodie prag­ma replaces them with ver­sions that raise an excep­tion with an object that can be inter­ro­gat­ed for fur­ther details. No mat­ter how many func­tions or meth­ods deep a prob­lem occurs, you can choose to catch it and respond appro­pri­ate­ly. This leads us to…

try/​catch exception handling and Feature::Compat::Try

This year’s Perl v5.34 release intro­duced exper­i­men­tal try/​catch syn­tax for excep­tion han­dling that should look more famil­iar to users of oth­er lan­guages while han­dling the issues sur­round­ing using block eval and test­ing of the spe­cial [email protected] vari­able. If you need to remain com­pat­i­ble with old­er ver­sions of Perl (back to v5.14), just use the Feature::Compat::Try mod­ule from CPAN to auto­mat­i­cal­ly select either v5.34’s native try/​catch or a sub­set of the func­tion­al­i­ty pro­vid­ed by Syntax::Keyword::Try.

Pluggable keywords

The above­men­tioned Syntax::Keyword::Try was made pos­si­ble by the intro­duc­tion of a plug­gable key­word mech­a­nism in 2010’s Perl v5.12. So was the Future::AsyncAwait asyn­chro­nous pro­gram­ming library and the Object::Pad test­bed for new object-​oriented Perl syn­tax. If you’re handy with C and Perl’s XS glue lan­guage, check out Paul LeoNerd” Evans’ XS::Parse::Keyword mod­ule to get a leg up on devel­op­ing your own syn­tax module.

Define packages with versions and blocks

Perl v5.12 also helped reduce clut­ter by enabling a package name­space dec­la­ra­tion to also include a ver­sion num­ber, instead of requir­ing a sep­a­rate our $VERSION = ...; v5.14 fur­ther refined packages to be spec­i­fied in code blocks, so a name­space dec­la­ra­tion can be the same as a lex­i­cal scope. Putting the two togeth­er gives you:

package Local::NewHotness v1.2.3 {
    ...
}

Instead of:

{
    package Local::OldAndBusted;
    use version 0.77; our $VERSION = version->declare("v1.2.3");
    ...
}

I know which I’d rather do. (Though you may want to also use Syntax::Construct qw(package-version package-block); to help along with old­er instal­la­tions as described above.)

The // defined-​or operator

This is an easy win from Perl v5.10:

defined $foo ? $foo : $bar  # replace this
$foo // $bar                # with this

And:

$foo = $bar unless defined $foo  # replace this
$foo //= $bar                    # with this

Perfect for assign­ing defaults to variables.

state variables only initialize once

Speaking of vari­ables, ever want one to keep its old val­ue the next time a scope is entered, like in a sub? Declare it with state instead of my. Before Perl v5.10, you need­ed to use a clo­sure instead.

Save some typing with say

Perl v5.10’s bumper crop of enhance­ments also includ­ed the say func­tion, which han­dles the com­mon use case of printing a string or list of strings with a new­line. It’s less noise in your code and saves you four char­ac­ters. What’s not to love?

Note unimplemented code with ...

The ... ellip­sis state­ment (col­lo­qui­al­ly yada-​yada”) gives you an easy place­hold­er for yet-​to-​be-​implemented code. It pars­es OK but will throw an excep­tion if exe­cut­ed. Hopefully, your test cov­er­age (or at least sta­t­ic analy­sis) will catch it before your users do.

Loop and enumerate arrays with each, keys, and values

The each, keys, and values func­tions have always been able to oper­ate on hash­es. Perl v5.12 and above make them work on arrays, too. The lat­ter two are main­ly for con­sis­ten­cy, but you can use each to iter­ate over an array’s indices and val­ues at the same time:

while (my ($index, $value) = each @array) {
    ...
}

This can be prob­lem­at­ic in non-​trivial loops, but I’ve found it help­ful in quick scripts and one-liners.

delete local hash (and array) entries

Ever need­ed to delete an entry from a hash (e.g, an envi­ron­ment vari­able from %ENV or a sig­nal han­dler from %SIG) just inside a block? Perl v5.12 lets you do that with delete local.

Paired hash slices

Jumping for­ward to 2014’s Perl v5.20, the new %foo{'bar', 'baz'} syn­tax enables you to slice a sub­set of a hash with its keys and val­ues intact. Very help­ful for cherry-​picking or aggre­gat­ing many hash­es into one. For example:

my %args = (
    verbose => 1,
    name    => 'Mark',
    extra   => 'pizza',
);
# don't frob the pizza
$my_object->frob( %args{ qw(verbose name) };

Paired array slices

Not to be left out, you can also slice arrays in the same way, in this case return­ing indices and values:

my @letters = 'a' .. 'z';
my @subset_kv = %letters[16, 5, 18, 12];
# @subset_kv is now (16, 'p', 5, 'e', 18, 'r', 12, 'l')

More readable dereferencing

Perl v5.20 intro­duced and v5.24 de-​experimentalized a more read­able post­fix deref­er­enc­ing syn­tax for nav­i­gat­ing nest­ed data struc­tures. Instead of using {braces} or smoosh­ing sig­ils to the left of iden­ti­fiers, you can use a post­fixed sigil-and-star:

push @$array_ref,    1, 2, 3;  # noisy
push @{$array_ref},  1, 2, 3;  # a little easier
push $array_ref->@*, 1, 2, 3;  # read from left to right

So much of web devel­op­ment is sling­ing around and pick­ing apart com­pli­cat­ed data struc­tures via JSON, so I wel­come any­thing like this to reduce the cog­ni­tive load.

when as a statement modifier

Starting in Perl v5.12, you can use the exper­i­men­tal switch fea­tures when key­word as a post­fix mod­i­fi­er. For example:

for ($foo) {
    $a =  1 when /^abc/;
    $a = 42 when /^dna/;
    ...
}

But I don’t rec­om­mend when, given, or givens smart­match oper­a­tions as they were ret­conned as exper­i­ments in 2013’s Perl v5.18 and have remained so due to their tricky behav­ior. I wrote about some alter­na­tives using sta­ble syn­tax back in February.

Simple class inheritance with use parent

Sometimes in old­er object-​oriented Perl code, you’ll see use base as a prag­ma to estab­lish inher­i­tance from anoth­er class. Older still is the direct manip­u­la­tion of the package’s spe­cial @ISA array. In most cas­es, both should be avoid­ed in favor of use parent, which was added to core in Perl v5.10.1.

Mind you, if you’re fol­low­ing the Perl object-​oriented tutorial’s advice and have select­ed an OO sys­tem from CPAN, use its sub­class­ing mech­a­nism if it has one. Moose, Moo, and Class::Accessor’s antlers” mode all pro­vide an extends func­tion; Object::Pad pro­vides an :isa attribute on its class key­word.

Test for class membership with the isa operator

As an alter­na­tive to the isa() method pro­vid­ed to all Perl objects, Perl v5.32 intro­duced the exper­i­men­tal isa infix oper­a­tor:

$my_object->isa('Local::MyClass')
# or
$my_object isa Local::MyClass

The lat­ter can take either a bare­word class name or string expres­sion, but more impor­tant­ly, it’s safer as it also returns false if the left argu­ment is unde­fined or isn’t a blessed object ref­er­ence. The old­er isa() method will throw an excep­tion in the for­mer case and might return true if called as a class method when $my_object is actu­al­ly a string of a class name that’s the same as or inher­its from isa()s argu­ment.

Lexical subroutines

Introduced in Perl v5.18 and de-​experimentalized in 2017’s Perl v5.26, you can now pre­cede sub dec­la­ra­tions with my, state, or our. One use of the first two is tru­ly pri­vate func­tions and meth­ods, as described in this 2018 Dave Jacoby blog and as part of Neil Bowers’ 2014 sur­vey of pri­vate func­tion techniques.

Subroutine signatures

I’ve writ­ten and pre­sent­ed exten­sive­ly about sig­na­tures and alter­na­tives over the past year, so I won’t repeat that here. I’ll just add that the Perl 5 Porters devel­op­ment mail­ing list has been mak­ing a con­cert­ed effort over the past month to hash out the remain­ing issues towards ren­der­ing this fea­ture non-​experimental. The pop­u­lar Mojolicious real-​time web frame­work also pro­vides a short­cut for enabling sig­na­tures and uses them exten­sive­ly in examples.

Indented here-​documents with <<~

Perl has had shell-​style here-​document” syn­tax for embed­ding multi-​line strings of quot­ed text for a long time. Starting with Perl v5.26, you can pre­cede the delim­it­ing string with a ~ char­ac­ter and Perl will both allow the end­ing delim­iter to be indent­ed as well as strip inden­ta­tion from the embed­ded text. This allows for much more read­able embed­ded code such as runs of HTML and SQL. For example:

if ($do_query) {
    my $rows_deleted = $dbh->do(<<~'END_SQL', undef, 42);
      DELETE FROM table
      WHERE status = ?
      END_SQL
    say "$rows_deleted rows were deleted."; 
}

More readable chained comparisons

When I learned math in school, my teach­ers and text­books would often describe mul­ti­ple com­par­isons and inequal­i­ties as a sin­gle expres­sion. Unfortunately, when it came time to learn pro­gram­ming every com­put­er lan­guage I saw required them to be bro­ken up with a series of and (or &&) oper­a­tors. With Perl v5.32, this is no more:

if ( $x < $y && $y <= $z ) { ... }  # old way
if ( $x < $y <= $z )       { ... }  # new way

It’s more con­cise, less noisy, and more like what reg­u­lar math looks like.

Self-​documenting named regular expression captures

Perl’s expres­sive reg­u­lar expres­sion match­ing and text-​processing prowess are leg­endary, although overuse and poor use of read­abil­i­ty enhance­ments often turn peo­ple away from them (and Perl in gen­er­al). We often use reg­ex­ps for extract­ing data from a matched pat­tern. For example:

if ( /Time: (..):(..):(..)/ ) {  # parse out values
    say "$1 hours, $2 minutes, $3 seconds";
}

Named cap­ture groups, intro­duced in Perl v5.10, make both the pat­tern more obvi­ous and retrieval of its data less cryptic:

if ( /Time: (?<hours>..):(?<minutes>..):(?<seconds>..)/ ) {
    say "$+{hours} hours, $+{minutes} minutes, $+{seconds} seconds";
}

More readable regexp character classes

The /x reg­u­lar expres­sion mod­i­fi­er already enables bet­ter read­abil­i­ty by telling the pars­er to ignore most white­space, allow­ing you to break up com­pli­cat­ed pat­terns into spaced-​out groups and mul­ti­ple lines with code com­ments. With Perl v5.26 you can spec­i­fy /xx to also ignore spaces and tabs inside [brack­et­ed] char­ac­ter class­es, turn­ing this:

/[d-eg-i3-7]/
/[[email protected]"#$%^&*()=?<>']/

…into this:

/ [d-e g-i 3-7]/xx
/[ ! @ " # $ % ^ & * () = ? <> ' ]/xx

Set default regexp flags with the re pragma

Beginning with Perl v5.14, writ­ing use re '/xms'; (or any com­bi­na­tion of reg­u­lar expres­sion mod­i­fi­er flags) will turn on those flags until the end of that lex­i­cal scope, sav­ing you the trou­ble of remem­ber­ing them every time.

Non-​destructive substitution with s///r and tr///r

The s/// sub­sti­tu­tion and tr/// translit­er­a­tion oper­a­tors typ­i­cal­ly change their input direct­ly, often in con­junc­tion with the =~ bind­ing oper­a­tor:

s/foo/bar/;  # changes the first foo to bar in $_
$baz =~ s/foo/bar/;  # the same but in $baz

But what if you want to leave the orig­i­nal untouched, such as when pro­cess­ing an array of strings with a map? With Perl v5.14 and above, add the /r flag, which makes the sub­sti­tu­tion on a copy and returns the result:

my @changed = map { s/foo/bar/r } @original;

Unicode case-​folding with fc for better string comparisons

Unicode and char­ac­ter encod­ing in gen­er­al are com­pli­cat­ed beasts. Perl has han­dled Unicode since v5.6 and has kept pace with fix­es and sup­port for updat­ed stan­dards in the inter­ven­ing decades. If you need to test if two strings are equal regard­less of case, use the fc func­tion intro­duced in Perl v5.16.

Safer processing of file arguments with <<>>

The <> null file­han­dle or dia­mond oper­a­tor” is often used in while loops to process input per line com­ing either from stan­dard input (e.g., piped from anoth­er pro­gram) or from a list of files on the com­mand line. Unfortunately, it uses a form of Perl’s open func­tion that inter­prets spe­cial char­ac­ters such as pipes (|) that would allow it to inse­cure­ly run exter­nal com­mands. Using the <<>> dou­ble dia­mond” oper­a­tor intro­duced in Perl v5.22 forces open to treat all command-​line argu­ments as file names only. For old­er Perls, the per­lop doc­u­men­ta­tion rec­om­mends the ARGV::readonly CPAN mod­ule.

Safer loading of Perl libraries and modules from @INC

Perl v5.26 removed the abil­i­ty for all pro­grams to load mod­ules by default from the cur­rent direc­to­ry, clos­ing a secu­ri­ty vul­ner­a­bil­i­ty orig­i­nal­ly iden­ti­fied and fixed as CVE-20161238 in pre­vi­ous ver­sions’ includ­ed scripts. If your code relied on this unsafe behav­ior, the v5.26 release notes include steps on how to adapt.

HTTP::Tiny simple HTTP/1.1 client included

To boot­strap access to CPAN on the web in the pos­si­ble absence of exter­nal tools like curl or wget, Perl v5.14 began includ­ing the HTTP::Tiny mod­ule. You can also use it in your pro­grams if you need a sim­ple web client with no dependencies.

Test2: The next generation of Perl testing frameworks

Forked and refac­tored from the ven­er­a­ble Test::Builder (the basis for the Test::More library that many are famil­iar with), Test2 was includ­ed in the core mod­ule library begin­ning with Perl v5.26. I’ve exper­i­ment­ed recent­ly with using the Test2::Suite CPAN library instead of Test::More and it looks pret­ty good. I’m also intrigued by Test2::Harness’ sup­port for thread­ing, fork­ing, and pre­load­ing mod­ules to reduce test run times.

Task::Kensho: Where to start for recommended Perl modules

This last item may not be includ­ed when you install Perl, but it’s where I turn for a col­lec­tion of well-​regarded CPAN mod­ules for accom­plish­ing a wide vari­ety of com­mon tasks span­ning from asyn­chro­nous pro­gram­ming to XML. Use it as a start­ing point or inter­ac­tive­ly select the mix of libraries appro­pri­ate to your project.


And there you have it: a selec­tion of 34 fea­tures, enhance­ments, and improve­ments for the first 34 years of Perl. What’s your favorite? Did I miss any­thing? Let me know in the comments.

depth of field photography of brown tree logs

A recent Lobsters post laud­ing the virtues of AWK remind­ed me that although the lan­guage is pow­er­ful and lightning-​fast, I usu­al­ly find myself exceed­ing its capa­bil­i­ties and reach­ing for Perl instead. One such appli­ca­tion is ana­lyz­ing volu­mi­nous log files such as the ones gen­er­at­ed by this blog. Yes, WordPress has stats, but I’ve nev­er let rein­ven­tion of the wheel get in the way of a good pro­gram­ming exercise.

So I whipped this script up on Sunday night while watch­ing RuPaul’s Drag Race reruns. It pars­es my Apache web serv­er log files and reports on hits from week to week.

#!/usr/bin/env perl

use strict;
use warnings;
use Syntax::Construct 'operator-double-diamond';
use Regexp::Log::Common;
use DateTime::Format::HTTP;
use List::Util 1.33 'any';
use Number::Format 'format_number';

my $parser = Regexp::Log::Common->new(
    format  => ':extended',
    capture => [qw<req ts status>],
);
my @fields      = $parser->capture;
my $compiled_re = $parser->regexp;

my @skip_uri_patterns = qw<
  ^/+robots.txt
  [-\w]*sitemap[-\w]*.xml
  ^/+wp-
  /feed/?$
  ^/+?rest_route=
>;

my ( %count, %week_of );
while ( <<>> ) {
    my %log;
    @log{@fields} = /$compiled_re/;

    # only interested in successful or cached requests
    next unless $log{status} =~ /^2/ or $log{status} == 304;

    my ( $method, $uri, $protocol ) = split ' ', $log{req};
    next unless $method eq 'GET';
    next if any { $uri =~ $_ } @skip_uri_patterns;

    my $dt  = DateTime::Format::HTTP->parse_datetime( $log{ts} );
    my $key = sprintf '%u-%02u', $dt->week;

    # get first date of each week
    $week_of{$key} ||= $dt->date;
    $count{$key}++;
}

printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
  for sort keys %count;

Here’s some sam­ple output:

Week of 2021-07-31:      2,672
Week of 2021-08-02:     16,222
Week of 2021-08-09:     12,609
Week of 2021-08-16:     17,714
Week of 2021-08-23:     14,462
Week of 2021-08-30:     11,758
Week of 2021-09-06:     14,811
Week of 2021-09-13:        407

I first start­ed pro­to­typ­ing this on the com­mand line as if it were an awk one-​liner by using the perl -n and -a flags. The for­mer wraps code in a while loop over the <> dia­mond oper­a­tor”, pro­cess­ing each line from stan­dard input or files passed as argu­ments. The lat­ter splits the fields of the line into an array named @F. It looked some­thing like this while I was list­ing URIs (loca­tions on the website):

gunzip -c ~/logs/phoenixtrap.com-ssl_log-*.gz | \
perl -anE 'say $F[6]'

But once I real­ized I’d need to fil­ter out a bunch of URI pat­terns and do some aggre­ga­tion by date, I turned it into a script and turned to CPAN.

There I found Regexp::Log::Common and DateTime::Format::HTTP, which let me pull apart the Apache log for­mat and its time­stamp strings with­out hav­ing to write even more com­pli­cat­ed reg­u­lar expres­sions myself. (As not­ed above, this was already a wheel-​reinvention exer­cise; no need to com­pound that further.)

Regexp::Log::Common builds a com­piled reg­u­lar expres­sion based on the log for­mat and fields you’re inter­est­ed in, so that’s the con­struc­tor on lines 11 through 14. The expres­sion then returns those fields as a list, which I’m assign­ing to a hash slice with those field names as keys in line 29. I then skip over requests that aren’t suc­cess­ful or brows­er cache hits, skip over requests that don’t GET web pages or oth­er assets (e.g., POSTs to forms or updat­ing oth­er resources), and skip over the URI pat­terns men­tioned earlier.

(Those pat­terns are worth a men­tion: they include the robots.txt and sitemap XML files used by search engine index­ers, WordPress admin­is­tra­tion pages, files used by RSS news­read­ers sub­scribed to my blog, and routes used by the Jetpack WordPress add-​on. If you’re adapt­ing this for your site you might need to cus­tomize this list based on what soft­ware you use to run it.)

Lines 38 and 39 parse the time­stamp from the log into a DateTime object using DateTime::Format::HTTP and then build the key used to store the per-​week hit count. The last lines of the loop then grab the first date of each new week (assum­ing the log is in chrono­log­i­cal order) and incre­ment the count. Once fin­ished, lines 46 and 47 pro­vide a report sort­ed by week, dis­play­ing it as a friend­ly Week of date” and the hit counts aligned to the right with sprintf. Number::Format’s format_number func­tion dis­plays the totals with thou­sands separators.

Update: After this was ini­tial­ly pub­lished. astute read­er Chris McGowan not­ed that I had a bug where $log{status} was assigned the val­ue 304 with the = oper­a­tor rather than com­pared with ==. He also sug­gest­ed I use the double-​diamond <<>> oper­a­tor intro­duced in Perl v5.22.0 to avoid maliciously-​named files. Thanks, Chris!

Room for improvement

DateTime is a very pow­er­ful mod­ule but this comes at a price of speed and mem­o­ry. Something sim­pler like Date::WeekNumber should yield per­for­mance improve­ments, espe­cial­ly as my logs grow (here’s hop­ing). It requires a bit more man­u­al mas­sag­ing of the log dates to con­vert them into some­thing the mod­ule can use, though:

#!/usr/bin/env perl

use strict;
use warnings;
use Syntax::Construct qw<
  operator-double-diamond
  regex-named-capture-group
>;
use Regexp::Log::Common;
use Date::WeekNumber 'iso_week_number';
use List::Util 1.33 'any';
use Number::Format 'format_number';

my $parser = Regexp::Log::Common->new(
    format  => ':extended',
    capture => [qw<req ts status>],
);
my @fields      = $parser->capture;
my $compiled_re = $parser->regexp;

my @skip_uri_patterns = qw<
  ^/+robots.txt
  [-\w]*sitemap[-\w]*.xml
  ^/+wp-
  /feed/?$
  ^/+?rest_route=
>;

my %month = (
    Jan => '01',
    Feb => '02',
    Mar => '03',
    Apr => '04',
    May => '05',
    Jun => '06',
    Jul => '07',
    Aug => '08',
    Sep => '09',
    Oct => '10',
    Nov => '11',
    Dec => '12',
);

my ( %count, %week_of );
while ( <<>> ) {
    my %log;
    @log{@fields} = /$compiled_re/;

    # only interested in successful or cached requests
    next unless $log{status} =~ /^2/ or $log{status} == 304;

    my ( $method, $uri, $protocol ) = split ' ', $log{req};
    next unless $method eq 'GET';
    next if any { $uri =~ $_ } @skip_uri_patterns;

    # convert log timestamp to YYYY-MM-DD
    # for Date::WeekNumber
    $log{ts} =~ m!^
      (?<day>\d\d) /
      (?<month>...) /
      (?<year>\d{4}) : !x;
    my $date = "$+{year}-$month{ $+{month} }-$+{day}";

    my $week = iso_week_number($date);
    $week_of{$week} ||= $date;
    $count{$week}++;
}

printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
  for sort keys %count;

It looks almost the same as the first ver­sion, with the addi­tion of a hash to con­vert month names to num­bers and the actu­al con­ver­sion (using named reg­u­lar expres­sion cap­ture groups for read­abil­i­ty, using Syntax::Construct to check for that fea­ture). On my serv­er, this results in a ten- to eleven-​second sav­ings when pro­cess­ing two months of com­pressed logs.

What’s next? Pretty graphs? Drilling down to spe­cif­ic blog posts? Database stor­age for fur­ther queries and analy­sis? Perl and CPAN make it pos­si­ble to go far beyond what you can do with AWK. What would you add or change? Let me know in the comments.

Last week I explored using the Inline::Perl5 mod­ule to port a short Perl script to Raku while still keep­ing its Perl depen­den­cies. Over at the Dev.to com­mu­ni­ty, Dave Cross point­ed out that I could get a bit more bang for my buck by let­ting his Feed::Find do the heavy lift­ing instead of WWW::Mechanizes more general-​purpose parsing.

A lit­tle more MetaCPAN inves­ti­ga­tion yield­ed XML::Feed, also main­tained by Dave, and it had the added ben­e­fit of obvi­at­ing my need for XML::RSS by not only dis­cov­er­ing feeds but also retriev­ing and pars­ing them. It also han­dles the Atom syn­di­ca­tion for­mat as well as RSS (hi dax­im!). Putting it all togeth­er pro­duces the fol­low­ing much short­er and clear­er Perl:

#!/usr/bin/env perl

use v5.12; # for strict and say
use warnings;
use XML::Feed;
use URI;

my $url = shift @ARGV || 'https://phoenixtrap.com';

my @feeds = XML::Feed->find_feeds($url);
my $feed  = XML::Feed->parse( URI->new( $feeds[0] ) )
    or die "Couldn't find a feed at $url\n";

binmode STDOUT, ':encoding(UTF-8)';
say $_->title, "\t", $_->link for $feed->entries;

And here’s the Raku version:

#!/usr/bin/env raku

use XML::Feed:from<Perl5>;
use URI:from<Perl5>;

sub MAIN($url = 'https://phoenixtrap.com') {
    my @feeds = XML::Feed.find_feeds($url);
    my $feed  = XML::Feed.parse( URI.new( @feeds.first ) )
        or exit note "Couldn't find a feed at $url";

    put .title, "\t", .link for $feed.entries;
}

It’s even clos­er to Perl now, though it’s using the first rou­tine rather than sub­script­ing the @feeds array and leav­ing off the the $_ vari­able name when call­ing meth­ods on it—less punc­tu­a­tion noise often aids read­abil­i­ty. I also took a sug­gest­ed exit idiom from Raku devel­op­er Elizabeth Mattijsen on Reddit to sim­pli­fy the con­tor­tions I was going through to exit with a sim­ple mes­sage and error code.

There are a cou­ple of lessons here:

  • A lit­tle more effort in mod­ule shop­ping pays div­i­dends in sim­pler code.
  • Get feed­back from far and wide to help improve your code. If it’s for work and you can’t release as open-​source, make sure your code review process cov­ers read­abil­i­ty and maintainability.

The Perl and Raku pro­gram­ming lan­guages have a com­pli­cat­ed his­to­ry togeth­er. The lat­ter was envi­sioned in the year 2000 as Perl 6, a com­plete redesign and rewrite of Perl to solve its prob­lems of dif­fi­cult main­te­nance and the bur­den of then-​13 years of back­ward com­pat­i­bil­i­ty. Unfortunately, the devel­op­ment effort towards a first major release dragged on for ten years, and some devel­op­ers began to believe the delay con­tributed to the decline of Perl’s market- and mind­share among pro­gram­ming languages.

In the inter­ven­ing years work con­tin­ued on Perl 5, and even­tu­al­ly, Perl 6 was posi­tioned as a sis­ter lan­guage, part of the Perl fam­i­ly, not intend­ed as a replace­ment for Perl.” Two years ago it was renamed Raku to bet­ter indi­cate it as a dif­fer­ent project.

Although the two lan­guages aren’t source-​compatible, the Inline::Perl5 mod­ule does enable Raku devel­op­ers to run Perl code and use Perl mod­ules with­in Raku, You can even sub­class Perl class­es in Raku and call Raku meth­ods from Perl code. I hadn’t real­ized until recent­ly that the Perl sup­port was so strong in Raku despite them being so dif­fer­ent, and so I thought I’d take the oppor­tu­ni­ty to write some sam­ple code in both lan­guages to bet­ter under­stand the Raku way of doing things.

Rather than a sim­ple Hello World” pro­gram, I decid­ed to write a sim­ple syn­di­cat­ed news read­er. The Raku mod­ules direc­to­ry didn’t appear to have any­thing com­pa­ra­ble to Perl’s WWW::Mechanize and XML::RSS mod­ules, so this seemed like a great way to test Perl-​Raku interoperability.

Perl Feed Finder

First, the Perl script. I want­ed it smart enough to either direct­ly fetch a news feed or find it on a site’s HTML page.

#!/usr/bin/env perl
use v5.24;    # for strict, say, and postfix dereferencing
use warnings;
use WWW::Mechanize;
use XML::RSS;
use List::Util 1.33 qw(first none);
my @rss_types = qw<
    application/rss+xml
    application/rdf+xml
    application/xml
    text/xml
>;
my $mech = WWW::Mechanize->new;
my $rss  = XML::RSS->new;
my $url = shift @ARGV || 'https://phoenixtrap.com';
my $response = $mech->get($url);
# If we got an HTML page, find the linked RSS feed
if ( $mech->is_html
    and my @alt_links = $mech->find_all_links( rel => 'alternate' ) )
{
    for my $rss_type (@rss_types) {
        $url = ( first { $_->attrs->{type} eq $rss_type } @alt_links )->url
            and last;
    }
    $response = $mech->get($url);
}
die "$url does not have an RSS feed\n"
    if none { $_ eq $response->content_type } @rss_types;
binmode STDOUT, ':encoding(UTF-8)';    # avoid wide character warnings
my @items = $rss->parse( $mech->content )->{items}->@*;
say join "\t", $_->@{qw<title link>} for @items;

In the begin­ning, you’ll notice there’s a bit of boil­er­plate: use v5.24 (released in 2016) to enable restrict­ing unsafe code, the say func­tion, and post­fix deref­er­enc­ing to reduce the noise from nest­ed curly braces. I’m also bring­ing in the first and none list pro­cess­ing func­tions from List::Util as well as the WWW::Mechanize web page retriev­er and pars­er, and the XML::RSS feed parser.

Next is an array of pos­si­ble media (for­mer­ly MIME) types used to serve the RSS news feed for­mat on the web. Like Perl and Raku, RSS for­mats have a long and some­times con­tentious his­to­ry, so a news­read­er needs to sup­port sev­er­al dif­fer­ent ways of iden­ti­fy­ing them on a page.

The pro­gram then cre­ates new WWW::Mechanize (called a mech for short) and XML::RSS objects for use lat­er and gets a URL to browse from its command-​line argu­ment, default­ing to my blog if it has none. (My site, my rules, right?) It then retrieves that URL from the web. If mech believes that the URL con­tains an HTML page and can find link tags with rel="alternate" attrib­ut­es pos­si­bly iden­ti­fy­ing any news feeds, it then goes on to check the media types of those links against the ear­li­er list of RSS types and retrieves the first one it finds.

Next comes the only error check­ing done by this script: check­ing if the retrieved feed’s media type actu­al­ly match­es the list defined ear­li­er. This pre­vents the RSS pars­er from attempt­ing to process plain web pages. This isn’t a large and com­pli­cat­ed pro­gram, so the die func­tion is called with a trail­ing new­line char­ac­ter (\n) to sup­press report­ing the line on which the error occurred.

Finally, it’s time to out­put the head­lines and links, but before that hap­pens Perl has to be told that they may con­tain so-​called wide char­ac­ters” found in the Unicode stan­dard but not in the plain ASCII that it nor­mal­ly uses. This includes things like the typo­graph­i­cal curly quotes’ that I some­times use in my titles. The last two lines of the script loop through the parsed items in the feed, extract­ing their titles and links and print­ing them out with a tab (\t) sep­a­ra­tor between them:

Output from feed_finder.pl

Raku Feed Finder

Programming is often just stitch­ing libraries and APIs togeth­er, so it shouldn’t have been sur­pris­ing that the Raku ver­sion of the above would be so sim­i­lar. There are some sig­nif­i­cant (and some­times wel­come) dif­fer­ences, though, which I’ll go over now:

#!/usr/bin/env raku
use WWW::Mechanize:from<Perl5>;
use XML::RSS:from<Perl5>;
my @rss_types = qw<
  application/rss+xml
  application/rdf+xml
  application/xml
  text/xml
>;
my $mech = WWW::Mechanize.new;
my $rss  = XML::RSS.new;
sub MAIN($url = 'https://phoenixtrap.com') {
    my $response = $mech.get($url);
    # If we got an HTML page, find the linked RSS feed        
    if $mech.is_html {
        my @alt_links = $mech.find_all_links( Scalar, rel => 'alternate' );
        $response = $mech.get(
            @alt_links.first( *.attrs<type> (elem) @rss_types ).url
        );
    }
    if $response.content_type(Scalar) !(elem) @rss_types {
        # Overriding Raku's `die` stack trace is more verbose than we need
        note $mech.uri ~ ' does not have an RSS feed';
        exit 1;
    }
    my @items = $rss.parse( $mech.content ).<items>;
    put join "\t", $_<title link> for @items;
}

The first thing to notice is there’s a bit less boil­er­plate code at the begin­ning. Raku is a younger lan­guage and doesn’t have to add instruc­tions to enable less backward-​compatible fea­tures. It’s also a larg­er lan­guage with func­tions and meth­ods built-​in that Perl needs to load from mod­ules, though this feed find­er pro­gram still needs to bring in WWW::Mechanize and XML::RSS with anno­ta­tions to indi­cate they’re com­ing from the Perl5 side of the fence.

I decid­ed to wrap the major­i­ty of the pro­gram in a MAIN func­tion, which hand­i­ly gives me command-​line argu­ments as vari­ables as well as a usage mes­sage if some­one calls it with a --help option. This is a neat quality-​of-​life fea­ture for script authors that clev­er­ly reuses func­tion sig­na­tures, and I’d love to see this avail­able in Perl as an exten­sion to its sig­na­tures feature.

Raku and Perl also dif­fer in that the for­mer has a dif­fer­ent con­cept of con­text, where an expres­sion may be eval­u­at­ed dif­fer­ent­ly depend­ing upon whether its result is expect­ed to be a sin­gle val­ue (scalar) or a list of val­ues. Inline::Perl5 calls Perl func­tions in list con­text by default, but you can add the Scalar type object as a first argu­ment to force scalar con­text as I’ve done with calls to find_​all_​links (to return an array ref­er­ence) and content_​type (to return the first para­me­ter of the HTTP Content-​Type header).

Another inter­est­ing dif­fer­ence is the use of the (elem) oper­a­tor to deter­mine mem­ber­ship in a set. This is Raku’s ASCII way of spelling the ∈ sym­bol, which it can also use; !(elem) can also be spelled . Both are hard to type on my key­board so I chose the more ver­bose alter­na­tive, but if you want your code to more close­ly resem­ble math­e­mat­i­cal nota­tion it’s nice to know the option is there.

I also didn’t use Raku’s die rou­tine to exit the pro­gram with an error, main­ly because of its method of sup­press­ing the line on which the error occurred. It requires using a CATCH block and then key­ing off of the type of excep­tion thrown in order to cus­tomize its behav­ior, which seemed like overkill for such a small script. It would have looked some­thing like this:

{
    die $mech.uri ~ ' does not have an RSS feed'
        if $response.content_type(Scalar) !(elem) @rss_types;
    CATCH {
        default {
            note .message;
            exit 1;
        }
    }
}

Doubtless, this could be golfed down to reduce its ver­bosi­ty at the expense of read­abil­i­ty, but I didn’t want to resort to clever tricks when try­ing to do a one-​to-​one com­par­i­son with Perl. More expe­ri­enced Raku devel­op­ers are wel­come to set me straight in the com­ments below.

The last dif­fer­ence I’ll point out is Raku’s wel­come lack of deref­er­enc­ing oper­a­tors com­pared to Perl. This is due to the former’s con­cept of con­tain­ers, which I’m still learn­ing about. It seems to be fair­ly DWIMmy so I’m not that wor­ried, but it’s nice to know there’s an under­stand­able mech­a­nism behind it.

Overall I’m pleased with this first ven­ture into Raku and I enjoyed what I’ve learned of the lan­guage so far. It’s not as dif­fer­ent with Perl as I antic­i­pat­ed, and I can fore­see cod­ing more projects as I learn more. The com­mu­ni­ty on the #raku IRC chan­nel was also very friend­ly and help­ful, so I’ll be hang­ing out there as time permits.

What do you think? Can Perl and Raku bet­ter learn to coex­ist, or are they des­tined to be rivals? Leave a com­ment below.

young lady learning sign language during online lesson with female tutor

It’s been years since I’ve had to hack on any­thing XML-relat­ed, but a recent project at work has me once again jump­ing into the waters of gen­er­at­ing, pars­ing, and mod­i­fy­ing this 90s-​era doc­u­ment for­mat. Most devel­op­ers these days like­ly only know of it as part of the curiously-​named XMLHTTPRequest object in web browsers used to retrieve data in JSON for­mat from servers, and as the X” in AJAX. But here we are in 2021, and there are still plen­ty of APIs and doc­u­ments using XML to get their work done.

In my par­tic­u­lar case, the task is to update the API calls for a new ver­sion of Virtuozzo Automator. Its API is a bit unusu­al in that it does­n’t use HTTP, but rather relies on open­ing a TLS-encrypt­ed sock­et to the serv­er and exchang­ing doc­u­ments delim­it­ed with a null char­ac­ter. The pre­vi­ous ver­sion of our code is in 1990s-​sysadmin-​style Perl, with man­u­al blessing of objects and pars­ing the XML using reg­u­lar expres­sions. I’ve decid­ed to update it to use the Moo object sys­tem and a prop­er XML pars­er. But which pars­er and mod­ule to use?

Selecting a parser

There are sev­er­al gener­ic XML mod­ules for pars­ing and gen­er­at­ing XML on CPAN, each with its own advan­tages and dis­ad­van­tages. I’d like to say that I did a com­pre­hen­sive sur­vey of each of them, but this project is pressed for time (aren’t they all?) and I did­n’t want to cre­ate too many extra depen­den­cies in our Perl stack. Luckily, XML::LibXML is already avail­able, I’ve had some pre­vi­ous expe­ri­ence with it, and it’s a good choice for per­for­mant standards-​based XML pars­ing (using either DOM or SAX) and generation.

Given more time and lee­way in adding depen­den­cies, I might use some­thing else. If the Virtuozzo API had an XML Schema or used SOAP, I would con­sid­er XML::Compile as I’ve had some suc­cess with that in oth­er projects. But even that uses XML::LibXML under the hood, so I’d still be using that. Your mileage may vary.

Generating XML

Depending on the size and com­plex­i­ty of the XML doc­u­ments to gen­er­ate, you might choose to build them up node by node using XML::LibXML::Node and XML::LibXML::Element objects. Most of the mes­sages I’m send­ing to Virtuozzo Automator are short and have easily-​interpolated val­ues, so I’m using here-​document islands of XML inside my Perl code. This also has the advan­tage of being eas­i­ly val­i­dat­ed against the exam­ples in the documentation.

Where the inter­po­lat­ed val­ues in the mes­sages are a lit­tle com­pli­cat­ed, I’m using this idiom inside the here-docs:

@{[ ... ]}

This allows me to put an arbi­trary expres­sion in the … part, which is then put into an anony­mous array ref­er­ence, which is then imme­di­ate­ly deref­er­enced into its string result. It’s a cheap and cheer­ful way to do min­i­mal tem­plat­ing inside Perl strings with­out load­ing a full tem­plat­ing library; I’ve also had suc­cess using this tech­nique when gen­er­at­ing SQL for data­base queries.

Parser as an object attribute

Rather than instan­ti­ate a new XML::LibXML in every method that needs to parse a doc­u­ment, I cre­at­ed a pri­vate attribute:

package Local::API::Virtozzo::Agent {
    use Moo;
    use XML::LibXML;
    use Types::Standard qw(InstanceOf);
    ...
    has _parser => (
        is      => 'ro',
        isa     => InstanceOf['XML::LibXML'],
        default => sub { XML::LibXML->new() },
    );
    sub foo {
        my $self = shift;
        my $send_doc = $self->_parser
          ->parse_string(<<"END_XML");
            <foo/>
END_XML
        ...
    }
...
}

Boilerplate

XML doc­u­ments can be ver­bose, with ele­ments that rarely change in every doc­u­ment. In the Virtuozzo API’s case, every doc­u­ment has a <packet> ele­ment con­tain­ing a version attribute and an id attribute to match requests to respons­es. I wrote a sim­ple func­tion to wrap my doc­u­ments in this ele­ment that pulled the ver­sion from a con­stant and always increased the id by one every time it’s called:

sub _wrap_packet {
    state $send_id = 1;
    return qq(<packet version="$PACKET_VERSION" id=")
      . $send_id++ . '">' . shift . '</packet>';
}

If I need to add more attrib­ut­es to the <packet> ele­ment (for instance, name­spaces for attrib­ut­es in enclosed ele­ments, I can always use XML::LibXML::Element::setAttribute after pars­ing the doc­u­ment string.

Parsing responses with XPath

Rather than using brit­tle reg­u­lar expres­sions to extract data from the response, I use the shared pars­er object from above and then the full pow­er of XPath:

use English;
...
sub get_sampleID {
    my ($self, $sample_name) = @_;
    ...
    # used to separate documents
    local $INPUT_RECORD_SEPARATOR = "\0";
    # $self->_sock is the IO::Socket::SSL connection
    my $get_doc = $self->_parser( parse_string(
      $self->_sock->getline(),
    ) );
    my $sample_id = $get_doc->findvalue(
        qq(//ns3:id[following-sibling::ns3:name="$sample_name"]),
    );
    return $sample_id;
}

This way, even if the order of ele­ments change or more ele­ments are intro­duced, the XPath pat­terns will con­tin­ue to find the right data.

Conclusion… so far

I’m only about halfway through updat­ing these API calls, and I’ve left out some non-​XML-​related details such as set­ting up the TLS sock­et con­nec­tion. Hopefully this arti­cle has giv­en you a taste of what’s involved in XML pro­cess­ing these days. Please leave me a com­ment if you have any sug­ges­tions or questions.