IKEA BLÅHAJ shark toys

IKEA’s toy BLÅHAJ shark has become a beloved Internet icon over the past sev­er­al years. I thought it might be cute to write a lit­tle Perl to get info about it and even dis­play a cud­dly pic­ture right in the ter­mi­nal where I’m run­ning the code. Maybe this will give you some ideas for your own quick web clients. Of course, you could accom­plish all of these things using a pipeline of indi­vid­ual command-​line util­i­ties like curl, jq, and GNU core­uti­lsbase64. These exam­ples focus on Perl as the glue, though.

Warning: dodgy API ahead

I haven’t found a publicly-​documented and ‑sup­port­ed offi­cial API for query­ing IKEA prod­uct infor­ma­tion but oth­ers have decon­struct­ed the company’s web site AJAX requests so we can use that instead. The alter­na­tive would be to scrape the IKEA web site direct­ly which, although pos­si­ble, would be more tedious and prone to fail­ure should their design change. An unof­fi­cial API is also unre­li­able but the sim­pler client code is eas­i­er to change should any errors surface.

Enter the Mojolicious

My orig­i­nal goal was to do this in a sin­gle line issued to the perl com­mand, and luck­i­ly the Mojolicious framework’s ojo mod­ule is tailor-​made for such things. By adding a -Mojo switch to the perl com­mand, you get over a dozen quick single-​character func­tions for spin­ning up a quick web appli­ca­tion or, in our case, mak­ing and inter­pret­ing web requests with­out a lot of cer­e­mo­ny. Here’s the start of my one-​line request to the IKEA API for infor­ma­tion on their BLÅHAJ prod­uct, using ojo’s g func­tion to per­form an HTTP GET and dis­play­ing the JSON from the response body to the terminal.

$ perl -Mojo -E 'say g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->body'

This cur­rent­ly returns over 2,400 lines of data, so after read­ing it over I’ll con­vert the response body JSON to a Perl data struc­ture and dump only the main prod­uct infor­ma­tion using ojo’s r func­tion:

$ perl -Mojo -E 'say r g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}'
{
  "availability" => [],
  "breathTaking" => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),
  "colors" => [
    {
      "hex" => "0058a3",
      "id" => 10007,
      "name" => "blue"
    },
    {
      "hex" => "ffffff",
      "id" => 10156,
      "name" => "white"
    }
  ],
  "contextualImageUrl" => "https://www.ikea.com/us/en/images/products/blahaj-soft-toy-shark__0877371_pe633608_s5.jpg",
  "currencyCode" => "USD",
  "discount" => "",
  "features" => [],
  "gprDescription" => {
    "numberOfVariants" => 0,
    "variants" => []
  },
  "id" => 90373590,
  "itemMeasureReferenceText" => "39 \x{bc} \"",
  "itemNo" => 90373590,
  "itemNoGlobal" => 30373588,
  "itemType" => "ART",
  "lastChance" => $VAR1->{"breathTaking"},
  "mainImageAlt" => "BL\x{c5}HAJ Soft toy, shark, 39 \x{bc} \"",
  "mainImageUrl" => "https://www.ikea.com/us/en/images/products/blahaj-soft-toy-shark__0710175_pe727378_s5.jpg",
  "name" => "BL\x{c5}HAJ",
  "onlineSellable" => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),
  "pipUrl" => "https://www.ikea.com/us/en/p/blahaj-soft-toy-shark-90373590/",
  "price" => {
    "decimals" => 99,
    "isRegularCurrency" => $VAR1->{"breathTaking"},
    "prefix" => "\$",
    "separator" => ".",
    "suffix" => "",
    "wholeNumber" => 19
  },
  "priceNumeral" => "19.99",
  "quickFacts" => [],
  "tag" => "NONE",
  "typeName" => "Soft toy"
}

If I just want the price I can do:

$ perl -Mojo -E 'say g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}->@{qw(currencyCode priceNumeral)}'
USD19.99

That ->@{qw(currencyCode priceNumeral)} towards the end uses the post­fix ref­er­ence slic­ing syn­tax intro­duced exper­i­men­tal­ly in Perl v5.20 and made offi­cial in v5.24. If you’re using an old­er perl, you’d say:

$ perl -Mojo -E 'say @{g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}}{qw(currencyCode priceNumeral)}'
USD19.99

I pre­fer the for­mer, though, because it’s eas­i­er to read left-to-right.

But I’m not in the United States! Where’s my native currency?

You can either replace the us/en” in the URL above or use the core I18N::LangTags::Detect mod­ule added in Perl v5.8.5 if you’re real­ly deter­mined to be portable across dif­fer­ent users’ locales. This is real­ly stretch­ing the def­i­n­i­tion of one-​liner,” though.

$ LANG=de_DE.UTF-8 perl -Mojo -MI18N::LangTags::Detect -E 'my @lang = (split /-/, I18N::LangTags::Detect::detect)[1,0]; say g("https://sik.search.blue.cdtapps.com/" . join("/", @lang == 2 ? @lang : ("us", "en")) . "/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}->@{qw(currencyCode priceNumeral)}'
EUR27.99

Window dressing

It’s hard to envi­sion cud­dling a num­ber, but luck­i­ly the prod­uct infor­ma­tion returned above links to a JPEG file in the mainImageUrl key. My favorite ter­mi­nal app iTerm2 can dis­play images inline from either a file or Base64 encod­ed data, so adding an extra HTTP request and encod­ing from the core MIME::Base64 mod­ule yields:

$ perl -Mojo -MMIME::Base64 -E 'say "\c[]1337;File=inline=1;width=100%:", encode_base64(g(g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}{mainImageUrl})->body), "\cG"'

(You could just send the image URL to iTerm2’s bun­dled imgcat util­i­ty, but where’s the fun in that?)

$ imgcat --url `perl -Mojo -E 'print g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}{mainImageUrl}'`

But I don’t have iTerm2 or a Mac!

I got you. At the expense of a num­ber of oth­er depen­den­cies, here’s a ver­sion that will work on any ter­mi­nal that sup­ports 256-​color mode with ANSI codes using Image::Term256Color from CPAN and a Unicode font with block char­ac­ters. I’ll also use Term::ReadKey to size the image for the width of your win­dow. (Again, this stretch­es the def­i­n­i­tion of one-​liner.”)

$ perl -Mojo -MImage::Term256Color -MTerm::ReadKey -E 'say for Image::Term256Color::convert(g(g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}{mainImageUrl})->body, {scale_x => (GetTerminalSize)[0], utf8 => 1})'

I hate Mojolicious! Can’t you just use core modules?

Fine. Here’s retriev­ing the prod­uct price using HTTP::Tiny and the pure-​Perl JSON pars­er JSON::PP, which were added to core in ver­sion 5.14.

$ perl -MHTTP::Tiny -MJSON::PP -E 'say @{decode_json(HTTP::Tiny->new->get("https://sik.search.blue.cdtapps.com/us/en/search-result-page?types=PRODUCT&q=BLÅHAJ")->{content})->{searchResultPage}{products}{main}{items}[0]{product}}{qw(currencyCode priceNumeral)}'
USD19.99

Fetching and dis­play­ing a pic­ture of the hug­gable shark using MIME::Base64 or Image::Term256Color as above is left as an exer­cise to the reader.

Friday, December 17, 2021, marked the thirty-​fourth birth­day of the Perl pro­gram­ming lan­guage, and coin­ci­den­tal­ly this year saw the release of ver­sion 5.34. There are plen­ty of Perl devel­op­ers out there who haven’t kept up with recent (and not-​so-​recent) improve­ments to the lan­guage and its ecosys­tem, so I thought I might list a batch. (You may have seen some of these before in May’s post Perl can do that now!”)

The feature pragma

Perl v5.10 was released in December 2007, and with it came feature, a way of enabling new syn­tax with­out break­ing back­ward com­pat­i­bil­i­ty. You can enable indi­vid­ual fea­tures by name (e.g., use feature qw(say fc); for the say and fc key­words), or by using a fea­ture bun­dle based on the Perl ver­sion that intro­duced them. For exam­ple, the following:

use feature ':5.34';

…gives you the equiv­a­lent of:

use feature qw(bareword_filehandles bitwise current_sub evalbytes fc indirect multidimensional postderef_qq say state switch unicode_eval unicode_strings);

Boy, that’s a mouth­ful. Feature bun­dles are good. The cor­re­spond­ing bun­dle also gets implic­it­ly loaded if you spec­i­fy a min­i­mum required Perl ver­sion, e.g., with use v5.32;. If you use v5.12; or high­er, strict mode is enabled for free. So just say:

use v5.34;

And last­ly, one-​liners can use the -E switch instead of -e to enable all fea­tures for that ver­sion of Perl, so you can say the fol­low­ing on the com­mand line:

perl -E 'say "Hello world!"'

Instead of:

perl -e 'print "Hello world!\n"'

Which is great when you’re try­ing to save some typing.

The experimental pragma

Sometimes new Perl fea­tures need to be dri­ven a cou­ple of releas­es around the block before their behav­ior set­tles. Those exper­i­ments are doc­u­ment­ed in the per­l­ex­per­i­ment page, and usu­al­ly, you need both a use feature (see above) and no warnings state­ment to safe­ly enable them. Or you can sim­ply pass a list to use experimental of the fea­tures you want, e.g.:

use experimental qw(isa postderef signatures);

Ever-​expanding warnings categories

March 2000 saw the release of Perl 5.6, and with it, the expan­sion of the -w command-​line switch to a sys­tem of fine-​grained con­trols for warn­ing against dubi­ous con­structs” that can be turned on and off depend­ing on the lex­i­cal scope. What start­ed as 26 main and 20 sub­cat­e­gories has expand­ed into 31 main and 43 sub­cat­e­gories, includ­ing warn­ings for the afore­men­tioned exper­i­men­tal features.

As the rel­e­vant Perl::Critic pol­i­cy says, Using warn­ings, and pay­ing atten­tion to what they say, is prob­a­bly the sin­gle most effec­tive way to improve the qual­i­ty of your code.” If you must vio­late warn­ings (per­haps because you’re reha­bil­i­tat­ing some lega­cy code), you can iso­late such vio­la­tions to a small scope and indi­vid­ual cat­e­gories. Check out the stric­tures mod­ule on CPAN if you’d like to go fur­ther and make a safe sub­set of these cat­e­gories fatal dur­ing development.

Document other recently-​introduced syntax with Syntax::Construct

Not every new bit of Perl syn­tax is enabled with a feature guard. For the rest, there’s E. Choroba’s Syntax::Construct mod­ule on CPAN. Rather than hav­ing to remem­ber which ver­sion of Perl intro­duced what, Syntax::Construct lets you declare only what you use and pro­vides a help­ful error mes­sage if some­one tries to run your code on an old­er unsup­port­ed ver­sion. Between it and the feature prag­ma, you can pre­vent many head-​scratching moments and give your users a chance to either upgrade or workaround.

Make built-​in functions throw exceptions with autodie

Many of Perl’s built-​in func­tions only return false on fail­ure, requir­ing the devel­op­er to check every time whether a file can be opened or a system com­mand exe­cut­ed. The lex­i­cal autodie prag­ma replaces them with ver­sions that raise an excep­tion with an object that can be inter­ro­gat­ed for fur­ther details. No mat­ter how many func­tions or meth­ods deep a prob­lem occurs, you can choose to catch it and respond appro­pri­ate­ly. This leads us to…

try/​catch exception handling and Feature::Compat::Try

This year’s Perl v5.34 release intro­duced exper­i­men­tal try/​catch syn­tax for excep­tion han­dling that should look more famil­iar to users of oth­er lan­guages while han­dling the issues sur­round­ing using block eval and test­ing of the spe­cial [email protected] vari­able. If you need to remain com­pat­i­ble with old­er ver­sions of Perl (back to v5.14), just use the Feature::Compat::Try mod­ule from CPAN to auto­mat­i­cal­ly select either v5.34’s native try/​catch or a sub­set of the func­tion­al­i­ty pro­vid­ed by Syntax::Keyword::Try.

Pluggable keywords

The above­men­tioned Syntax::Keyword::Try was made pos­si­ble by the intro­duc­tion of a plug­gable key­word mech­a­nism in 2010’s Perl v5.12. So was the Future::AsyncAwait asyn­chro­nous pro­gram­ming library and the Object::Pad test­bed for new object-​oriented Perl syn­tax. If you’re handy with C and Perl’s XS glue lan­guage, check out Paul LeoNerd” Evans’ XS::Parse::Keyword mod­ule to get a leg up on devel­op­ing your own syn­tax module.

Define packages with versions and blocks

Perl v5.12 also helped reduce clut­ter by enabling a package name­space dec­la­ra­tion to also include a ver­sion num­ber, instead of requir­ing a sep­a­rate our $VERSION = ...; v5.14 fur­ther refined packages to be spec­i­fied in code blocks, so a name­space dec­la­ra­tion can be the same as a lex­i­cal scope. Putting the two togeth­er gives you:

package Local::NewHotness v1.2.3 {
    ...
}

Instead of:

{
    package Local::OldAndBusted;
    use version 0.77; our $VERSION = version->declare("v1.2.3");
    ...
}

I know which I’d rather do. (Though you may want to also use Syntax::Construct qw(package-version package-block); to help along with old­er instal­la­tions as described above.)

The // defined-​or operator

This is an easy win from Perl v5.10:

defined $foo ? $foo : $bar  # replace this
$foo // $bar                # with this

And:

$foo = $bar unless defined $foo  # replace this
$foo //= $bar                    # with this

Perfect for assign­ing defaults to variables.

state variables only initialize once

Speaking of vari­ables, ever want one to keep its old val­ue the next time a scope is entered, like in a sub? Declare it with state instead of my. Before Perl v5.10, you need­ed to use a clo­sure instead.

Save some typing with say

Perl v5.10’s bumper crop of enhance­ments also includ­ed the say func­tion, which han­dles the com­mon use case of printing a string or list of strings with a new­line. It’s less noise in your code and saves you four char­ac­ters. What’s not to love?

Note unimplemented code with ...

The ... ellip­sis state­ment (col­lo­qui­al­ly yada-​yada”) gives you an easy place­hold­er for yet-​to-​be-​implemented code. It pars­es OK but will throw an excep­tion if exe­cut­ed. Hopefully, your test cov­er­age (or at least sta­t­ic analy­sis) will catch it before your users do.

Loop and enumerate arrays with each, keys, and values

The each, keys, and values func­tions have always been able to oper­ate on hash­es. Perl v5.12 and above make them work on arrays, too. The lat­ter two are main­ly for con­sis­ten­cy, but you can use each to iter­ate over an array’s indices and val­ues at the same time:

while (my ($index, $value) = each @array) {
    ...
}

This can be prob­lem­at­ic in non-​trivial loops, but I’ve found it help­ful in quick scripts and one-liners.

delete local hash (and array) entries

Ever need­ed to delete an entry from a hash (e.g, an envi­ron­ment vari­able from %ENV or a sig­nal han­dler from %SIG) just inside a block? Perl v5.12 lets you do that with delete local.

Paired hash slices

Jumping for­ward to 2014’s Perl v5.20, the new %foo{'bar', 'baz'} syn­tax enables you to slice a sub­set of a hash with its keys and val­ues intact. Very help­ful for cherry-​picking or aggre­gat­ing many hash­es into one. For example:

my %args = (
    verbose => 1,
    name    => 'Mark',
    extra   => 'pizza',
);
# don't frob the pizza
$my_object->frob( %args{ qw(verbose name) };

Paired array slices

Not to be left out, you can also slice arrays in the same way, in this case return­ing indices and values:

my @letters = 'a' .. 'z';
my @subset_kv = %letters[16, 5, 18, 12];
# @subset_kv is now (16, 'p', 5, 'e', 18, 'r', 12, 'l')

More readable dereferencing

Perl v5.20 intro­duced and v5.24 de-​experimentalized a more read­able post­fix deref­er­enc­ing syn­tax for nav­i­gat­ing nest­ed data struc­tures. Instead of using {braces} or smoosh­ing sig­ils to the left of iden­ti­fiers, you can use a post­fixed sigil-and-star:

push @$array_ref,    1, 2, 3;  # noisy
push @{$array_ref},  1, 2, 3;  # a little easier
push $array_ref->@*, 1, 2, 3;  # read from left to right

So much of web devel­op­ment is sling­ing around and pick­ing apart com­pli­cat­ed data struc­tures via JSON, so I wel­come any­thing like this to reduce the cog­ni­tive load.

when as a statement modifier

Starting in Perl v5.12, you can use the exper­i­men­tal switch fea­tures when key­word as a post­fix mod­i­fi­er. For example:

for ($foo) {
    $a =  1 when /^abc/;
    $a = 42 when /^dna/;
    ...
}

But I don’t rec­om­mend when, given, or givens smart­match oper­a­tions as they were ret­conned as exper­i­ments in 2013’s Perl v5.18 and have remained so due to their tricky behav­ior. I wrote about some alter­na­tives using sta­ble syn­tax back in February.

Simple class inheritance with use parent

Sometimes in old­er object-​oriented Perl code, you’ll see use base as a prag­ma to estab­lish inher­i­tance from anoth­er class. Older still is the direct manip­u­la­tion of the package’s spe­cial @ISA array. In most cas­es, both should be avoid­ed in favor of use parent, which was added to core in Perl v5.10.1.

Mind you, if you’re fol­low­ing the Perl object-​oriented tutorial’s advice and have select­ed an OO sys­tem from CPAN, use its sub­class­ing mech­a­nism if it has one. Moose, Moo, and Class::Accessor’s antlers” mode all pro­vide an extends func­tion; Object::Pad pro­vides an :isa attribute on its class key­word.

Test for class membership with the isa operator

As an alter­na­tive to the isa() method pro­vid­ed to all Perl objects, Perl v5.32 intro­duced the exper­i­men­tal isa infix oper­a­tor:

$my_object->isa('Local::MyClass')
# or
$my_object isa Local::MyClass

The lat­ter can take either a bare­word class name or string expres­sion, but more impor­tant­ly, it’s safer as it also returns false if the left argu­ment is unde­fined or isn’t a blessed object ref­er­ence. The old­er isa() method will throw an excep­tion in the for­mer case and might return true if called as a class method when $my_object is actu­al­ly a string of a class name that’s the same as or inher­its from isa()s argu­ment.

Lexical subroutines

Introduced in Perl v5.18 and de-​experimentalized in 2017’s Perl v5.26, you can now pre­cede sub dec­la­ra­tions with my, state, or our. One use of the first two is tru­ly pri­vate func­tions and meth­ods, as described in this 2018 Dave Jacoby blog and as part of Neil Bowers’ 2014 sur­vey of pri­vate func­tion techniques.

Subroutine signatures

I’ve writ­ten and pre­sent­ed exten­sive­ly about sig­na­tures and alter­na­tives over the past year, so I won’t repeat that here. I’ll just add that the Perl 5 Porters devel­op­ment mail­ing list has been mak­ing a con­cert­ed effort over the past month to hash out the remain­ing issues towards ren­der­ing this fea­ture non-​experimental. The pop­u­lar Mojolicious real-​time web frame­work also pro­vides a short­cut for enabling sig­na­tures and uses them exten­sive­ly in examples.

Indented here-​documents with <<~

Perl has had shell-​style here-​document” syn­tax for embed­ding multi-​line strings of quot­ed text for a long time. Starting with Perl v5.26, you can pre­cede the delim­it­ing string with a ~ char­ac­ter and Perl will both allow the end­ing delim­iter to be indent­ed as well as strip inden­ta­tion from the embed­ded text. This allows for much more read­able embed­ded code such as runs of HTML and SQL. For example:

if ($do_query) {
    my $rows_deleted = $dbh->do(<<~'END_SQL', undef, 42);
      DELETE FROM table
      WHERE status = ?
      END_SQL
    say "$rows_deleted rows were deleted."; 
}

More readable chained comparisons

When I learned math in school, my teach­ers and text­books would often describe mul­ti­ple com­par­isons and inequal­i­ties as a sin­gle expres­sion. Unfortunately, when it came time to learn pro­gram­ming every com­put­er lan­guage I saw required them to be bro­ken up with a series of and (or &&) oper­a­tors. With Perl v5.32, this is no more:

if ( $x < $y && $y <= $z ) { ... }  # old way
if ( $x < $y <= $z )       { ... }  # new way

It’s more con­cise, less noisy, and more like what reg­u­lar math looks like.

Self-​documenting named regular expression captures

Perl’s expres­sive reg­u­lar expres­sion match­ing and text-​processing prowess are leg­endary, although overuse and poor use of read­abil­i­ty enhance­ments often turn peo­ple away from them (and Perl in gen­er­al). We often use reg­ex­ps for extract­ing data from a matched pat­tern. For example:

if ( /Time: (..):(..):(..)/ ) {  # parse out values
    say "$1 hours, $2 minutes, $3 seconds";
}

Named cap­ture groups, intro­duced in Perl v5.10, make both the pat­tern more obvi­ous and retrieval of its data less cryptic:

if ( /Time: (?<hours>..):(?<minutes>..):(?<seconds>..)/ ) {
    say "$+{hours} hours, $+{minutes} minutes, $+{seconds} seconds";
}

More readable regexp character classes

The /x reg­u­lar expres­sion mod­i­fi­er already enables bet­ter read­abil­i­ty by telling the pars­er to ignore most white­space, allow­ing you to break up com­pli­cat­ed pat­terns into spaced-​out groups and mul­ti­ple lines with code com­ments. With Perl v5.26 you can spec­i­fy /xx to also ignore spaces and tabs inside [brack­et­ed] char­ac­ter class­es, turn­ing this:

/[d-eg-i3-7]/
/[[email protected]"#$%^&*()=?<>']/

…into this:

/ [d-e g-i 3-7]/xx
/[ ! @ " # $ % ^ & * () = ? <> ' ]/xx

Set default regexp flags with the re pragma

Beginning with Perl v5.14, writ­ing use re '/xms'; (or any com­bi­na­tion of reg­u­lar expres­sion mod­i­fi­er flags) will turn on those flags until the end of that lex­i­cal scope, sav­ing you the trou­ble of remem­ber­ing them every time.

Non-​destructive substitution with s///r and tr///r

The s/// sub­sti­tu­tion and tr/// translit­er­a­tion oper­a­tors typ­i­cal­ly change their input direct­ly, often in con­junc­tion with the =~ bind­ing oper­a­tor:

s/foo/bar/;  # changes the first foo to bar in $_
$baz =~ s/foo/bar/;  # the same but in $baz

But what if you want to leave the orig­i­nal untouched, such as when pro­cess­ing an array of strings with a map? With Perl v5.14 and above, add the /r flag, which makes the sub­sti­tu­tion on a copy and returns the result:

my @changed = map { s/foo/bar/r } @original;

Unicode case-​folding with fc for better string comparisons

Unicode and char­ac­ter encod­ing in gen­er­al are com­pli­cat­ed beasts. Perl has han­dled Unicode since v5.6 and has kept pace with fix­es and sup­port for updat­ed stan­dards in the inter­ven­ing decades. If you need to test if two strings are equal regard­less of case, use the fc func­tion intro­duced in Perl v5.16.

Safer processing of file arguments with <<>>

The <> null file­han­dle or dia­mond oper­a­tor” is often used in while loops to process input per line com­ing either from stan­dard input (e.g., piped from anoth­er pro­gram) or from a list of files on the com­mand line. Unfortunately, it uses a form of Perl’s open func­tion that inter­prets spe­cial char­ac­ters such as pipes (|) that would allow it to inse­cure­ly run exter­nal com­mands. Using the <<>> dou­ble dia­mond” oper­a­tor intro­duced in Perl v5.22 forces open to treat all command-​line argu­ments as file names only. For old­er Perls, the per­lop doc­u­men­ta­tion rec­om­mends the ARGV::readonly CPAN mod­ule.

Safer loading of Perl libraries and modules from @INC

Perl v5.26 removed the abil­i­ty for all pro­grams to load mod­ules by default from the cur­rent direc­to­ry, clos­ing a secu­ri­ty vul­ner­a­bil­i­ty orig­i­nal­ly iden­ti­fied and fixed as CVE-​2016 – 1238 in pre­vi­ous ver­sions’ includ­ed scripts. If your code relied on this unsafe behav­ior, the v5.26 release notes include steps on how to adapt.

HTTP::Tiny simple HTTP/1.1 client included

To boot­strap access to CPAN on the web in the pos­si­ble absence of exter­nal tools like curl or wget, Perl v5.14 began includ­ing the HTTP::Tiny mod­ule. You can also use it in your pro­grams if you need a sim­ple web client with no dependencies.

Test2: The next generation of Perl testing frameworks

Forked and refac­tored from the ven­er­a­ble Test::Builder (the basis for the Test::More library that many are famil­iar with), Test2 was includ­ed in the core mod­ule library begin­ning with Perl v5.26. I’ve exper­i­ment­ed recent­ly with using the Test2::Suite CPAN library instead of Test::More and it looks pret­ty good. I’m also intrigued by Test2::Harness’ sup­port for thread­ing, fork­ing, and pre­load­ing mod­ules to reduce test run times.

Task::Kensho: Where to start for recommended Perl modules

This last item may not be includ­ed when you install Perl, but it’s where I turn for a col­lec­tion of well-​regarded CPAN mod­ules for accom­plish­ing a wide vari­ety of com­mon tasks span­ning from asyn­chro­nous pro­gram­ming to XML. Use it as a start­ing point or inter­ac­tive­ly select the mix of libraries appro­pri­ate to your project.


And there you have it: a selec­tion of 34 fea­tures, enhance­ments, and improve­ments for the first 34 years of Perl. What’s your favorite? Did I miss any­thing? Let me know in the comments.

I’m busy this week host­ing my par­ents’ first vis­it to Houston, but I didn’t want to let this Tuesday go by with­out link­ing to my talk from last week’s Ephemeral Miniconf. Thanks so much to Thibault Duponchelle for orga­niz­ing such a ter­rif­ic event, to all the oth­er speak­ers for com­ing togeth­er to present, and to every­one who attend­ed for wel­com­ing me.

Template proces­sors and engines are one of those pieces of soft­ware where it seems every devel­op­er wants to rein­vent the wheel. Goodness knows I’ve done it ear­li­er in my career. Tell me if this sounds familiar:

  1. You need to mix data into a doc­u­ment so you start with Perl’s string inter­po­la­tion in "dou­ble quotes" or sprintf for­mats. (Or maybe you inves­ti­gate formats, but the less said about them the bet­ter.)
  2. You real­ize your doc­u­ments need to dis­play things based on cer­tain con­di­tions, or you want to loop over a list or some oth­er structure.
  3. You add these fea­tures via key­word pars­ing and escape char­ac­ters, think­ing it’s OK since this is just a small bespoke project.
  4. Before you know it you’ve invent­ed anoth­er domain-​specific lan­guage (DSL) and have to sup­port it on top of the appli­ca­tion you were try­ing to deliv­er in the first place.

Stop. Just stop. Decades of oth­ers who have walked this same path have already done this for you. Especially if you’re using a web frame­work like Dancer, Mojolicious, or Catalyst, where the tem­plate proces­sor is either built-​in or plug­gable from CPAN. Even if you’re not devel­op­ing a web appli­ca­tion there are sev­er­al general-​purpose options of var­i­ous capa­bil­i­ties like Template Toolkit and Template::Mustache. Investigate the alter­na­tives and deter­mine if they have the fea­tures, per­for­mance, and sup­port you need. If you’re sure none of them tru­ly meet your unique require­ments, then maybe, maybe con­sid­er rolling your own.

Whatever you decide, real­ize that as your appli­ca­tion or web­site grows your invest­ment in that selec­tion will only deep­en. Porting to a new tem­plate proces­sor can be as chal­leng­ing as port­ing any source code to a new pro­gram­ming language.

Unfortunately, there are about as many opin­ions on how to choose a tem­plate proces­sor as there are tem­plate proces­sors. For exam­ple, in 2013 Roland Koehler wrote a good Python-​oriented arti­cle on sev­er­al con­sid­er­a­tions and the dif­fer­ent approach­es avail­able. Although he end­ed up devel­op­ing his own (quelle sur­prise), he makes a good case that a tem­plate proces­sor ought to at least pro­vide var­i­ous log­ic con­structs as well as embed­ded expres­sions, if not a full pro­gram­ming lan­guage. Koehler specif­i­cal­ly warns against the lat­ter, though, as a tem­plate devel­op­er might change an application’s data mod­el, to say noth­ing of the pos­si­bil­i­ty of exe­cut­ing arbi­trary destruc­tive code.

I can appre­ci­ate this rea­son­ing. I’ve suc­cess­ful­ly used Perl tem­plate proces­sors like the afore­men­tioned Template::Toolkit (which has both log­ic direc­tives and an option­al facil­i­ty for eval­u­at­ing Perl code) and Text::Xslate (which sup­ports sev­er­al tem­plate syn­tax­es includ­ing a sub­set of Template::Toolkits, but with­out the abil­i­ty to embed Perl code). We use the lat­ter at work com­bined with Text::Xslate::Bridge::TT2Likes emu­la­tion of var­i­ous Template::Toolkit vir­tu­al meth­ods and it’s served us well.

But using those mod­ules’ DSLs means more sophis­ti­cat­ed tasks need extra time and effort find­ing the cor­rect log­ic and expres­sions. This also assumes that their designer(s) have antic­i­pat­ed my needs either through built-​in fea­tures or exten­sions. I’m already writ­ing Perl; why should I switch to anoth­er, more lim­it­ed lan­guage and envi­ron­ment pro­vid­ed I can remain dis­ci­plined enough to avoid issues like those described above by Koehler?

So for my per­son­al projects, I favor tem­plate proces­sors that use the full pow­er of the Perl lan­guage like Mojolicious’ embed­ded Perl ren­der­er or the ven­er­a­ble Text::Template for non-​web appli­ca­tions. It saves me time and I’ll like­ly want more than any DSL can pro­vide. This may not apply to your sit­u­a­tion, though, and I’m open to counter-arguments.

What’s your favorite tem­plate proces­sor and why? Let me know in the comments.

woman looking at the map

Six months ago I gave an overview of Perl’s list pro­cess­ing fun­da­men­tals, briefly describ­ing what lists are and then intro­duc­ing the built-​in map and grep func­tions for trans­form­ing and fil­ter­ing them. Later on, I com­piled a list (how appro­pri­ate) of list pro­cess­ing mod­ules avail­able via CPAN, not­ing there’s some con­fus­ing dupli­ca­tion of effort. But you’re a busy devel­op­er, and you just want to know the Right Thing To Do™ when faced with a list pro­cess­ing challenge.

First, some cred­it is due: these are all restate­ments of sev­er­al Perl::Critic poli­cies which in turn cod­i­fy stan­dards described in Damian Conway’s Perl Best Practices (2005). I’ve repeat­ed­ly rec­om­mend­ed the lat­ter as a start­ing point for higher-​quality Perl devel­op­ment. Over the years these prac­tices con­tin­ue to be re-​evaluated (includ­ing by the author him­self) and var­i­ous authors release new pol­i­cy mod­ules, but perlcritic remains a great tool for ensur­ing you (and your team or oth­er con­trib­u­tors) main­tain a con­sis­tent high stan­dard in your code.

With that said, on to the recommendations!

Don’t use grep to check if any list elements match

It might sound weird to lead off by rec­om­mend­ing not to use grep, but some­times it’s not the right tool for the job. If you’ve got a list and want to deter­mine if a con­di­tion match­es any item in it, you might try:

if (grep { some_condition($_) } @my_list) {
    ... # don't do this!
}

Yes, this works because (in scalar con­text) grep returns the num­ber of match­es found, but it’s waste­ful, check­ing every ele­ment of @my_list (which could be lengthy) before final­ly pro­vid­ing a result. Use the stan­dard List::Util module’s any func­tion, which imme­di­ate­ly returns (“short-​circuits”) on the first match:

use List::Util 1.33 qw(any);

if (any { some_condition($_) } @my_list) {
... # do something
}

Perl has includ­ed the req­ui­site ver­sion of this mod­ule since ver­sion 5.20 in 2014; for ear­li­er releas­es, you’ll need to update from CPAN. List::Util has many oth­er great list-​reduction, key/​value pair, and oth­er relat­ed func­tions you can import into your code, so check it out before you attempt to re-​invent any wheels.

As a side note for web devel­op­ers, the Perl Dancer frame­work also includes an any key­word for declar­ing mul­ti­ple HTTP routes, so if you’re mix­ing List::Util in there don’t import it. Instead, call it explic­it­ly like this or you’ll get an error about a rede­fined function:

use List::Util 1.33;

if (List::Util::any { some_condition($_) } @my_list) {
... # do something
}

This rec­om­men­da­tion is cod­i­fied in the BuiltinFunctions::ProhibitBooleanGrep Perl::Critic pol­i­cy, comes direct­ly from Perl Best Practices, and is rec­om­mend­ed by the Software Engineering Institute Computer Emergency Response Team (SEI CERT)’s Perl Coding Standard.

Don’t change $_ in map or grep

I men­tioned this back in March, but it bears repeat­ing: map and grep are intend­ed as pure func­tions, not muta­tors with side effects. This means that the orig­i­nal list should remain unchanged. Yes, each ele­ment alias­es in turn to the $_ spe­cial vari­able, but that’s for speed and can have sur­pris­ing results if changed even if it’s tech­ni­cal­ly allowed. If you need to mod­i­fy an array in-​place use some­thing like:

for (@my_array) {
$_ = ...; # make your changes here
}

If you want some­thing that looks like map but won’t change the orig­i­nal list (and don’t mind a few CPAN depen­den­cies), con­sid­er List::SomeUtilsapply function:

use List::SomeUtils qw(apply);

my @doubled_array = apply {$_ *= 2} @old_array;

Lastly, side effects also include things like manip­u­lat­ing oth­er vari­ables or doing input and out­put. Don’t use map or grep in a void con­text (i.e., with­out a result­ing array or list); do some­thing with the results or use a for or foreach loop:

map { print foo($_) } @my_array; # don't do this
print map { foo($_) } @my_array; # do this instead

map { push @new_array, foo($_) } @my_array; # don't do this
@new_array = map { foo($_) } @my_array; # do this instead

This rec­om­men­da­tion is cod­i­fied by the BuiltinFunctions::ProhibitVoidGrep, BuiltinFunctions::ProhibitVoidMap, and ControlStructures::ProhibitMutatingListFunctions Perl::Critic poli­cies. The lat­ter comes from Perl Best Practices and is an SEI CERT Perl Coding Standard rule.

Use blocks with map and grep, not expressions

You can call map or grep like this (paren­the­ses are option­al around built-​in functions):

my @new_array  = map foo($_), @old_array; # don't do this
my @new_array2 = grep !/^#/, @old_array; # don't do this

Or like this:

my @new_array  = map { foo($_) } @old_array;
my @new_array2 = grep {!/^#/} @old_array;

Do it the sec­ond way. It’s eas­i­er to read, espe­cial­ly if you’re pass­ing in a lit­er­al list or mul­ti­ple arrays, and the expres­sion forms can con­ceal bugs. This rec­om­men­da­tion is cod­i­fied by the BuiltinFunctions::RequireBlockGrep and BuiltinFunctions::RequireBlockMap Perl::Critic poli­cies and comes from Perl Best Practices.

Refactor multi-​statement maps, greps, and other list functions

map, grep, and friends should fol­low the Unix phi­los­o­phy of Do One Thing and Do It Well.” Your read­abil­i­ty and main­tain­abil­i­ty drop with every state­ment you place inside one of their blocks. Consider junior devel­op­ers and future main­tain­ers (this includes you!) and refac­tor any­thing with more than one state­ment into a sep­a­rate sub­rou­tine or at least a for loop. This goes for list pro­cess­ing func­tions (like the afore­men­tioned any) import­ed from oth­er mod­ules, too.

This rec­om­men­da­tion is cod­i­fied by the Perl Best Practices-inspired BuiltinFunctions::ProhibitComplexMappings and BuiltinFunctions::RequireSimpleSortBlock Perl::Critic poli­cies, although those only cov­er map and sort func­tions, respectively.


Do you have any oth­er sug­ges­tions for list pro­cess­ing best prac­tices? Feel free to leave them in the com­ments or bet­ter yet, con­sid­er cre­at­ing new Perl::Critic poli­cies for them or con­tact­ing the Perl::Critic team to devel­op them for your organization.

Look, I get it. You don’t like the Perl pro­gram­ming lan­guage or have oth­er­wise dis­re­gard­ed it as dead.” (Or per­haps you haven’t, in which case please check out my oth­er blog posts!) It has weird noisy syn­tax, mix­ing reg­u­lar expres­sions, sig­ils on vari­able names, var­i­ous braces and brack­ets for data struc­tures, and a menagerie of cryp­tic spe­cial vari­ables. It’s old: 34 years in December, with a his­to­ry of (some­times ama­teur) devel­op­ers that have used and abused that syn­tax to ship code of ques­tion­able qual­i­ty. Maybe you grudg­ing­ly accept its util­i­ty but think it should die grace­ful­ly, main­tained only to run lega­cy applications.

But you know what? Perl’s still going. It’s had a steady cadence of year­ly releas­es for the past decade, intro­duc­ing new fea­tures and fenc­ing in bad behav­ior while main­tain­ing an admirable lev­el of back­ward com­pat­i­bil­i­ty. Yes, there was a too-​long adven­ture devel­op­ing what start­ed as Perl 6, but that lan­guage now has its own iden­ti­ty as Raku and even has facil­i­ties for mix­ing Perl with its native code or vice versa.

And then there’s CPAN, the Comprehensive Perl Archive Network: a continually-​updated col­lec­tion of over 200,000 open-​source mod­ules writ­ten by over 14,000 authors, the best of which are well-​tested and ‑doc­u­ment­ed (apply­ing peer pres­sure to those that fall short), pre­sent­ed through a search engine and front-​end built by scores of con­trib­u­tors. Through CPAN you can find dis­tri­b­u­tions for things like:

All of this is avail­able through a mature instal­la­tion tool­chain that doesn’t break from month to month.

Finally and most impor­tant­ly, there’s the glob­al Perl com­mu­ni­ty. The COVID-​19 pan­dem­ic has put a damper on the hun­dreds of glob­al Perl Mongers groups’ mee­tups, but that hasn’t stopped the year­ly Perl and Raku Conference from meet­ing vir­tu­al­ly. (In the past there have also been year­ly European and Asian con­fer­ences, occa­sion­al for­ays into South America and Russia, as well as hackathons and work­shops world­wide.) There are IRC servers and chan­nels for chat, mail­ing lists galore, blogs (yes, apart from this one), and a quirky social net­work that pre­dates Facebook and Twitter.

So no, Perl isn’t dead or even dying, but if you don’t like it and favor some­thing new­er, that’s OK! Technologies can coex­ist on their own mer­its and advo­cates of one don’t have to beat down their con­tem­po­raries to be suc­cess­ful. Perl hap­pens to be battle-​tested (to bor­row a term from my friend Curtis Ovid” Poe), it runs large parts of the Web (speak­ing from direct and ongo­ing expe­ri­ence in the host­ing busi­ness here), and it’s still evolv­ing to meet the needs of its users.

clear light bulb planter on gray rock

Twitter recent­ly rec­om­mend­ed a tweet to me (all hail the algo­rithm) tout­ing what the author viewed as the top 5 web devel­op­ment stacks.”

JavaScript/​Node.js options dom­i­nat­ed the four-​letter acronyms as expect­ed, but the fifth one sur­prised me: LAMP, the com­bi­na­tion of the Linux oper­at­ing sys­tem, Apache web serv­er, MySQL rela­tion­al data­base, and Perl, PHP, or Python pro­gram­ming lan­guages. A quick web search for sim­i­lar lists yield­ed sim­i­lar results. Clearly, this meme (in the Dawkins sense) has out­last­ed its pop­u­lar­iza­tion by tech pub­lish­er O’Reilly in the 2000s.

Originally coined in 1998 dur­ing the dot-​com” bub­ble, I had thought that the term LAMP” had fad­ed with devel­op­ers in the inter­ven­ing decades with the rise of language-​specific web frame­works for:

Certainly on the Perl side (with which I’m most famil­iar), the com­mu­ni­ty has long since rec­om­mend­ed the use of a frame­work built on the PSGI spec­i­fi­ca­tion, dep­re­cat­ing 1990s-​era CGI scripts and the mod_​perl Apache exten­sion. Although general-​purpose web servers like Apache or Nginx may be part of an over­all sys­tem, they’re typ­i­cal­ly used as prox­ies or load bal­ancers for Perl-​specific servers either pro­vid­ed by the frame­work or a third-​party mod­ule.

Granted, PHP still relies on web server-​specific mod­ules, APIs, or vari­a­tions of the FastCGI pro­to­col for inter­fac­ing with a web serv­er. And Python web appli­ca­tions typ­i­cal­ly make use of its WSGI pro­to­col either as a web serv­er exten­sion or, like the Perl exam­ples above, as a prox­ied stand­alone serv­er. But all of these are deploy­ment details and do lit­tle to describe how devel­op­ers imple­ment and extend a web application’s structure.

Note how the var­i­ous four-​letter JavaScript stacks (e.g., MERN, MEVN, MEAN, PERN) dif­fer­en­ti­ate them­selves most­ly by fron­tend frame­work (e.g., Angular, React, Vue.js) and maybe by the (rela­tion­al or NoSQL) data­base (e.g., MongoDB, MySQL, PostgreSQL). All how­ev­er seem stan­dard­ized on the Node.js run­time and Express back­end web frame­work, which could, in the­o­ry, be replaced with non-​JavaScript options like the more mature LAMP-​associated lan­guages and frame­works. (Or if you pre­fer lan­guages that don’t start with P”, there’s C#, Go, Java, Ruby, etc.)

My point is that LAMP” as the name of a web devel­op­ment stack has out­lived its use­ful­ness. It’s at once too spe­cif­ic (about oper­at­ing sys­tem and web serv­er details that are often abstract­ed away for devel­op­ers) and too broad (cov­er­ing three sep­a­rate pro­gram­ming lan­guages and not the frame­works they favor). It also leaves out oth­er non-​JavaScript back-​end lan­guages and their asso­ci­at­ed frameworks.

The ques­tion is: what can replace it? I’d pro­pose NoJS” as rem­i­nis­cent of NoSQL,” but that inac­cu­rate­ly excludes JavaScript from its nec­es­sary role in the front-​end. NJSB” doesn’t exact­ly roll off the tongue, either, and still has the same ambi­gu­i­ty prob­lem as LAMP.”

How about pithy sort-​of-​acronyms pat­terned like database-​frontend-​backend? Here are some Perl examples:

  • MRDancer: MySQL, React, and Dancer (I use this at work. Yes, the M could also stand for MongoDB. Naming things is hard.)
  • MRMojo: MongoDB, React, and Mojolicious
  • PACat: PostgreSQL, Angular, and Catalyst
  • etc.

Ultimately it comes down to com­mu­ni­ty and indus­try adop­tion. If you’re involved with back-​end web devel­op­ment, please let me know in the com­ments if you agree or dis­agree that LAMP” is still a use­ful term, and if not, what should replace it.

depth of field photography of brown tree logs

A recent Lobsters post laud­ing the virtues of AWK remind­ed me that although the lan­guage is pow­er­ful and lightning-​fast, I usu­al­ly find myself exceed­ing its capa­bil­i­ties and reach­ing for Perl instead. One such appli­ca­tion is ana­lyz­ing volu­mi­nous log files such as the ones gen­er­at­ed by this blog. Yes, WordPress has stats, but I’ve nev­er let rein­ven­tion of the wheel get in the way of a good pro­gram­ming exercise.

So I whipped this script up on Sunday night while watch­ing RuPaul’s Drag Race reruns. It pars­es my Apache web serv­er log files and reports on hits from week to week.

#!/usr/bin/env perl

use strict;
use warnings;
use Syntax::Construct 'operator-double-diamond';
use Regexp::Log::Common;
use DateTime::Format::HTTP;
use List::Util 1.33 'any';
use Number::Format 'format_number';

my $parser = Regexp::Log::Common->new(
    format  => ':extended',
    capture => [qw<req ts status>],
);
my @fields      = $parser->capture;
my $compiled_re = $parser->regexp;

my @skip_uri_patterns = qw<
  ^/+robots.txt
  [-\w]*sitemap[-\w]*.xml
  ^/+wp-
  /feed/?$
  ^/+?rest_route=
>;

my ( %count, %week_of );
while ( <<>> ) {
    my %log;
    @log{@fields} = /$compiled_re/;

    # only interested in successful or cached requests
    next unless $log{status} =~ /^2/ or $log{status} == 304;

    my ( $method, $uri, $protocol ) = split ' ', $log{req};
    next unless $method eq 'GET';
    next if any { $uri =~ $_ } @skip_uri_patterns;

    my $dt  = DateTime::Format::HTTP->parse_datetime( $log{ts} );
    my $key = sprintf '%u-%02u', $dt->week;

    # get first date of each week
    $week_of{$key} ||= $dt->date;
    $count{$key}++;
}

printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
  for sort keys %count;

Here’s some sam­ple output:

Week of 2021-07-31:      2,672
Week of 2021-08-02:     16,222
Week of 2021-08-09:     12,609
Week of 2021-08-16:     17,714
Week of 2021-08-23:     14,462
Week of 2021-08-30:     11,758
Week of 2021-09-06:     14,811
Week of 2021-09-13:        407

I first start­ed pro­to­typ­ing this on the com­mand line as if it were an awk one-​liner by using the perl -n and -a flags. The for­mer wraps code in a while loop over the <> dia­mond oper­a­tor”, pro­cess­ing each line from stan­dard input or files passed as argu­ments. The lat­ter splits the fields of the line into an array named @F. It looked some­thing like this while I was list­ing URIs (loca­tions on the website):

gunzip -c ~/logs/phoenixtrap.com-ssl_log-*.gz | \
perl -anE 'say $F[6]'

But once I real­ized I’d need to fil­ter out a bunch of URI pat­terns and do some aggre­ga­tion by date, I turned it into a script and turned to CPAN.

There I found Regexp::Log::Common and DateTime::Format::HTTP, which let me pull apart the Apache log for­mat and its time­stamp strings with­out hav­ing to write even more com­pli­cat­ed reg­u­lar expres­sions myself. (As not­ed above, this was already a wheel-​reinvention exer­cise; no need to com­pound that further.)

Regexp::Log::Common builds a com­piled reg­u­lar expres­sion based on the log for­mat and fields you’re inter­est­ed in, so that’s the con­struc­tor on lines 11 through 14. The expres­sion then returns those fields as a list, which I’m assign­ing to a hash slice with those field names as keys in line 29. I then skip over requests that aren’t suc­cess­ful or brows­er cache hits, skip over requests that don’t GET web pages or oth­er assets (e.g., POSTs to forms or updat­ing oth­er resources), and skip over the URI pat­terns men­tioned earlier.

(Those pat­terns are worth a men­tion: they include the robots.txt and sitemap XML files used by search engine index­ers, WordPress admin­is­tra­tion pages, files used by RSS news­read­ers sub­scribed to my blog, and routes used by the Jetpack WordPress add-​on. If you’re adapt­ing this for your site you might need to cus­tomize this list based on what soft­ware you use to run it.)

Lines 38 and 39 parse the time­stamp from the log into a DateTime object using DateTime::Format::HTTP and then build the key used to store the per-​week hit count. The last lines of the loop then grab the first date of each new week (assum­ing the log is in chrono­log­i­cal order) and incre­ment the count. Once fin­ished, lines 46 and 47 pro­vide a report sort­ed by week, dis­play­ing it as a friend­ly Week of date” and the hit counts aligned to the right with sprintf. Number::Format’s format_number func­tion dis­plays the totals with thou­sands separators.

Update: After this was ini­tial­ly pub­lished. astute read­er Chris McGowan not­ed that I had a bug where $log{status} was assigned the val­ue 304 with the = oper­a­tor rather than com­pared with ==. He also sug­gest­ed I use the double-​diamond <<>> oper­a­tor intro­duced in Perl v5.22.0 to avoid maliciously-​named files. Thanks, Chris!

Room for improvement

DateTime is a very pow­er­ful mod­ule but this comes at a price of speed and mem­o­ry. Something sim­pler like Date::WeekNumber should yield per­for­mance improve­ments, espe­cial­ly as my logs grow (here’s hop­ing). It requires a bit more man­u­al mas­sag­ing of the log dates to con­vert them into some­thing the mod­ule can use, though:

#!/usr/bin/env perl

use strict;
use warnings;
use Syntax::Construct qw<
  operator-double-diamond
  regex-named-capture-group
>;
use Regexp::Log::Common;
use Date::WeekNumber 'iso_week_number';
use List::Util 1.33 'any';
use Number::Format 'format_number';

my $parser = Regexp::Log::Common->new(
    format  => ':extended',
    capture => [qw<req ts status>],
);
my @fields      = $parser->capture;
my $compiled_re = $parser->regexp;

my @skip_uri_patterns = qw<
  ^/+robots.txt
  [-\w]*sitemap[-\w]*.xml
  ^/+wp-
  /feed/?$
  ^/+?rest_route=
>;

my %month = (
    Jan => '01',
    Feb => '02',
    Mar => '03',
    Apr => '04',
    May => '05',
    Jun => '06',
    Jul => '07',
    Aug => '08',
    Sep => '09',
    Oct => '10',
    Nov => '11',
    Dec => '12',
);

my ( %count, %week_of );
while ( <<>> ) {
    my %log;
    @log{@fields} = /$compiled_re/;

    # only interested in successful or cached requests
    next unless $log{status} =~ /^2/ or $log{status} == 304;

    my ( $method, $uri, $protocol ) = split ' ', $log{req};
    next unless $method eq 'GET';
    next if any { $uri =~ $_ } @skip_uri_patterns;

    # convert log timestamp to YYYY-MM-DD
    # for Date::WeekNumber
    $log{ts} =~ m!^
      (?<day>\d\d) /
      (?<month>...) /
      (?<year>\d{4}) : !x;
    my $date = "$+{year}-$month{ $+{month} }-$+{day}";

    my $week = iso_week_number($date);
    $week_of{$week} ||= $date;
    $count{$week}++;
}

printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
  for sort keys %count;

It looks almost the same as the first ver­sion, with the addi­tion of a hash to con­vert month names to num­bers and the actu­al con­ver­sion (using named reg­u­lar expres­sion cap­ture groups for read­abil­i­ty, using Syntax::Construct to check for that fea­ture). On my serv­er, this results in a ten- to eleven-​second sav­ings when pro­cess­ing two months of com­pressed logs.

What’s next? Pretty graphs? Drilling down to spe­cif­ic blog posts? Database stor­age for fur­ther queries and analy­sis? Perl and CPAN make it pos­si­ble to go far beyond what you can do with AWK. What would you add or change? Let me know in the comments.

brown wooden arrow signed

A mentee asked me over the week­end if there was a way with­in a Mojolicious web appli­ca­tion to store the routes sep­a­rate­ly from the main appli­ca­tion class. Here’s one way. These instruc­tions assume you’re using Perl 5.34 and Mojolicious 9.19 (the lat­est as of this writ­ing) via the ter­mi­nal com­mand line on a Linux, Unix, or macOS sys­tem; make the appro­pri­ate changes if this does­n’t apply to you.

First, if you haven’t already, cre­ate your Mojolicious app at your shell prompt:

$ mojo generate app Local::RouteDemo
  [mkdir] /Users/mgardner/Projects/blog/local_route_demo/script
  [write] /Users/mgardner/Projects/blog/local_route_demo/script/local_route_demo
  [chmod] /Users/mgardner/Projects/blog/local_route_demo/script/local_route_demo 744
  [mkdir] /Users/mgardner/Projects/blog/local_route_demo/lib/Local
  [write] /Users/mgardner/Projects/blog/local_route_demo/lib/Local/RouteDemo.pm
  [exist] /Users/mgardner/Projects/blog/local_route_demo
  [write] /Users/mgardner/Projects/blog/local_route_demo/local-route_demo.yml
  [mkdir] /Users/mgardner/Projects/blog/local_route_demo/lib/Local/RouteDemo/Controller
  [write] /Users/mgardner/Projects/blog/local_route_demo/lib/Local/RouteDemo/Controller/Example.pm
  [mkdir] /Users/mgardner/Projects/blog/local_route_demo/t
  [write] /Users/mgardner/Projects/blog/local_route_demo/t/basic.t
  [mkdir] /Users/mgardner/Projects/blog/local_route_demo/public
  [write] /Users/mgardner/Projects/blog/local_route_demo/public/index.html
  [mkdir] /Users/mgardner/Projects/blog/local_route_demo/templates/layouts
  [write] /Users/mgardner/Projects/blog/local_route_demo/templates/layouts/default.html.ep
  [mkdir] /Users/mgardner/Projects/blog/local_route_demo/templates/example
  [write] /Users/mgardner/Projects/blog/local_route_demo/templates/example/welcome.html.ep
$ cd local_route_demo

Create a new Perl mod­ule in your edi­tor for stor­ing your routes. Here we’re using Local::RouteDemo::Routes:

$ touch lib/Local/RouteDemo/Routes.pm
$ $EDITOR lib/Local/RouteDemo/Routes.pm

Make the mod­ule with a func­tion that will cre­ate the routes you want, giv­en a Mojolicious::Routes object. Here we’re just bring­ing over the default route cre­at­ed when we cre­at­ed our app:

package Local::RouteDemo::Routes;
use strict;
use warnings qw(all -experimental::signatures);
use feature 'signatures';
use Exporter 'import';
our @EXPORT_OK = qw(make_routes);

sub make_routes ($router) {
    $router->get('/')->to('Example#welcome');
    # add more routes here

    return;
}

1;

Adjust the appli­ca­tion class to load your new Routes mod­ule and call its export­ed function:

package Local::RouteDemo;
use Mojo::Base 'Mojolicious', -signatures;
use Local::RouteDemo::Routes 'make_routes';

# This method will run once at server start
sub startup ($self) {

    # Load configuration from config file
    my $config = $self->plugin('NotYAMLConfig');

    # Configure the application
    $self->secrets($config->{secrets});

    # Make routes
    make_routes($self->routes);

    return;
}

1;

Finally, run your tests and/​or man­u­al­ly test your routes to be sure every­thing works OK:

$ prove -vlr t
t/basic.t .. [2021-06-07 12:21:55.36917] [58779] [debug] [elVGykGVWlOt] GET "/"
[2021-06-07 12:21:55.36972] [58779] [debug] [elVGykGVWlOt] Routing to controller "Local::RouteDemo::Controller::Example" and action "welcome"
[2021-06-07 12:21:55.37137] [58779] [debug] [elVGykGVWlOt] Rendering template "example/welcome.html.ep"
[2021-06-07 12:21:55.37343] [58779] [debug] [elVGykGVWlOt] Rendering template "layouts/default.html.ep"
[2021-06-07 12:21:55.37495] [58779] [debug] [elVGykGVWlOt] 200 OK (0.005772s, 173.250/s)

ok 1 - GET /
ok 2 - 200 OK
ok 3 - content is similar
1..3
ok
All tests successful.
Files=1, Tests=3,  1 wallclock secs ( 0.02 usr  0.01 sys +  0.38 cusr  0.11 csys =  0.52 CPU)
Result: PASS
$ script/local_route_demo get /
[2021-06-07 12:22:29.55930] [58889] [debug] [f3YoaFhkwJ42] GET "/"
[2021-06-07 12:22:29.55990] [58889] [debug] [f3YoaFhkwJ42] Routing to controller "Local::RouteDemo::Controller::Example" and action "welcome"
[2021-06-07 12:22:29.56059] [58889] [debug] [f3YoaFhkwJ42] Rendering template "example/welcome.html.ep"
[2021-06-07 12:22:29.56269] [58889] [debug] [f3YoaFhkwJ42] Rendering template "layouts/default.html.ep"
[2021-06-07 12:22:29.56432] [58889] [debug] [f3YoaFhkwJ42] 200 OK (0.005004s, 199.840/s)
<!DOCTYPE html>
<html>
  <head><title>Welcome</title></head>
  <body><h2>Welcome to the Mojolicious real-time web framework!</h2>
<p>
  This page was generated from the template "templates/example/welcome.html.ep"
  and the layout "templates/layouts/default.html.ep",
  <a href="/">click here</a> to reload the page or
  <a href="/index.html">here</a> to move forward to a static page.
</p>
</body>
</html>

You can find a git repos­i­to­ry of this work on GitHub, and here’s a com­mit of all the changes made to the default Mojolicious appli­ca­tion so you can see the differences.

Update

Joel Berger from the Mojolicious project told me at The Perl and Raku Conference that it would be more idiomat­ic to use a Mojolicious plu­g­in rather than a plain mod­ule with an export, so here you go:

lib/Local/RouteDemo.pm:

package Local::RouteDemo;
use Mojo::Base 'Mojolicious', -signatures;

# This method will run once at server start
sub startup ($self) {

    # Load configuration from config file
    my $config = $self->plugin('NotYAMLConfig');

    # Configure the application
    $self->secrets($config->{secrets});

    # Add routes from plugin
    $self->plugin('Local::RouteDemo::Plugin::Routes');

    return;
}

1;

lib/Local/RouteDemo/Plugin/Routes.pm:

package Local::RouteDemo::Plugin::Routes;
use Mojo::Base 'Mojolicious::Plugin', -signatures;

sub register ($self, $app, $conf) {
    my $r = $app->routes;

    $r->get('/')->to('Example#welcome');
    # add more routes here

    return;
}

1;

This week we con­sid­ered a view­er’s pull request, added admin­is­tra­tor login, and start­ed on adding the SQLite data­base that will store the admin­is­tra­tor’s accep­tance of assign­ments. We also shored up file upload per­mis­sions for authen­ti­cat­ed users only and added a logout link, learn­ing about some more Mojolicious helpers.

You can find the whole series here.