depth of field photography of brown tree logs

A recent Lobsters post laud­ing the virtues of AWK remind­ed me that although the lan­guage is pow­er­ful and lightning-​fast, I usu­al­ly find myself exceed­ing its capa­bil­i­ties and reach­ing for Perl instead. One such appli­ca­tion is ana­lyz­ing volu­mi­nous log files such as the ones gen­er­at­ed by this blog. Yes, WordPress has stats, but I’ve nev­er let rein­ven­tion of the wheel get in the way of a good pro­gram­ming exercise.

So I whipped this script up on Sunday night while watch­ing RuPaul’s Drag Race reruns. It pars­es my Apache web serv­er log files and reports on hits from week to week.

#!/usr/bin/env perl

use strict;
use warnings;
use Syntax::Construct 'operator-double-diamond';
use Regexp::Log::Common;
use DateTime::Format::HTTP;
use List::Util 1.33 'any';
use Number::Format 'format_number';

my $parser = Regexp::Log::Common->new(
    format  => ':extended',
    capture => [qw<req ts status>],
my @fields      = $parser->capture;
my $compiled_re = $parser->regexp;

my @skip_uri_patterns = qw<

my ( %count, %week_of );
while ( <<>> ) {
    my %log;
    @log{@fields} = /$compiled_re/;

    # only interested in successful or cached requests
    next unless $log{status} =~ /^2/ or $log{status} == 304;

    my ( $method, $uri, $protocol ) = split ' ', $log{req};
    next unless $method eq 'GET';
    next if any { $uri =~ $_ } @skip_uri_patterns;

    my $dt  = DateTime::Format::HTTP->parse_datetime( $log{ts} );
    my $key = sprintf '%u-%02u', $dt->week;

    # get first date of each week
    $week_of{$key} ||= $dt->date;

printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
  for sort keys %count;

Here’s some sam­ple output:

Week of 2021-07-31:      2,672
Week of 2021-08-02:     16,222
Week of 2021-08-09:     12,609
Week of 2021-08-16:     17,714
Week of 2021-08-23:     14,462
Week of 2021-08-30:     11,758
Week of 2021-09-06:     14,811
Week of 2021-09-13:        407

I first start­ed pro­to­typ­ing this on the com­mand line as if it were an awk one-​liner by using the perl -n and -a flags. The for­mer wraps code in a while loop over the <> dia­mond oper­a­tor”, pro­cess­ing each line from stan­dard input or files passed as argu­ments. The lat­ter splits the fields of the line into an array named @F. It looked some­thing like this while I was list­ing URIs (loca­tions on the website):

gunzip -c ~/logs/*.gz | \
perl -anE 'say $F[6]'

But once I real­ized I’d need to fil­ter out a bunch of URI pat­terns and do some aggre­ga­tion by date, I turned it into a script and turned to CPAN.

There I found Regexp::Log::Common and DateTime::Format::HTTP, which let me pull apart the Apache log for­mat and its time­stamp strings with­out hav­ing to write even more com­pli­cat­ed reg­u­lar expres­sions myself. (As not­ed above, this was already a wheel-​reinvention exer­cise; no need to com­pound that further.)

Regexp::Log::Common builds a com­piled reg­u­lar expres­sion based on the log for­mat and fields you’re inter­est­ed in, so that’s the con­struc­tor on lines 11 through 14. The expres­sion then returns those fields as a list, which I’m assign­ing to a hash slice with those field names as keys in line 29. I then skip over requests that aren’t suc­cess­ful or brows­er cache hits, skip over requests that don’t GET web pages or oth­er assets (e.g., POSTs to forms or updat­ing oth­er resources), and skip over the URI pat­terns men­tioned earlier.

(Those pat­terns are worth a men­tion: they include the robots.txt and sitemap XML files used by search engine index­ers, WordPress admin­is­tra­tion pages, files used by RSS news­read­ers sub­scribed to my blog, and routes used by the Jetpack WordPress add-​on. If you’re adapt­ing this for your site you might need to cus­tomize this list based on what soft­ware you use to run it.)

Lines 38 and 39 parse the time­stamp from the log into a DateTime object using DateTime::Format::HTTP and then build the key used to store the per-​week hit count. The last lines of the loop then grab the first date of each new week (assum­ing the log is in chrono­log­i­cal order) and incre­ment the count. Once fin­ished, lines 46 and 47 pro­vide a report sort­ed by week, dis­play­ing it as a friend­ly Week of date” and the hit counts aligned to the right with sprintf. Number::Format’s format_number func­tion dis­plays the totals with thou­sands separators.

Update: After this was ini­tial­ly pub­lished. astute read­er Chris McGowan not­ed that I had a bug where $log{status} was assigned the val­ue 304 with the = oper­a­tor rather than com­pared with ==. He also sug­gest­ed I use the double-​diamond <<>> oper­a­tor intro­duced in Perl v5.22.0 to avoid maliciously-​named files. Thanks, Chris!

Room for improvement

DateTime is a very pow­er­ful mod­ule but this comes at a price of speed and mem­o­ry. Something sim­pler like Date::WeekNumber should yield per­for­mance improve­ments, espe­cial­ly as my logs grow (here’s hop­ing). It requires a bit more man­u­al mas­sag­ing of the log dates to con­vert them into some­thing the mod­ule can use, though:

#!/usr/bin/env perl

use strict;
use warnings;
use Syntax::Construct qw<
use Regexp::Log::Common;
use Date::WeekNumber 'iso_week_number';
use List::Util 1.33 'any';
use Number::Format 'format_number';

my $parser = Regexp::Log::Common->new(
    format  => ':extended',
    capture => [qw<req ts status>],
my @fields      = $parser->capture;
my $compiled_re = $parser->regexp;

my @skip_uri_patterns = qw<

my %month = (
    Jan => '01',
    Feb => '02',
    Mar => '03',
    Apr => '04',
    May => '05',
    Jun => '06',
    Jul => '07',
    Aug => '08',
    Sep => '09',
    Oct => '10',
    Nov => '11',
    Dec => '12',

my ( %count, %week_of );
while ( <<>> ) {
    my %log;
    @log{@fields} = /$compiled_re/;

    # only interested in successful or cached requests
    next unless $log{status} =~ /^2/ or $log{status} == 304;

    my ( $method, $uri, $protocol ) = split ' ', $log{req};
    next unless $method eq 'GET';
    next if any { $uri =~ $_ } @skip_uri_patterns;

    # convert log timestamp to YYYY-MM-DD
    # for Date::WeekNumber
    $log{ts} =~ m!^
      (?<day>\d\d) /
      (?<month>...) /
      (?<year>\d{4}) : !x;
    my $date = "$+{year}-$month{ $+{month} }-$+{day}";

    my $week = iso_week_number($date);
    $week_of{$week} ||= $date;

printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
  for sort keys %count;

It looks almost the same as the first ver­sion, with the addi­tion of a hash to con­vert month names to num­bers and the actu­al con­ver­sion (using named reg­u­lar expres­sion cap­ture groups for read­abil­i­ty, using Syntax::Construct to check for that fea­ture). On my serv­er, this results in a ten- to eleven-​second sav­ings when pro­cess­ing two months of com­pressed logs.

What’s next? Pretty graphs? Drilling down to spe­cif­ic blog posts? Database stor­age for fur­ther queries and analy­sis? Perl and CPAN make it pos­si­ble to go far beyond what you can do with AWK. What would you add or change? Let me know in the comments.

I pub­lish Perl sto­ries on this blog once a week, and it seems every time there’s at least one response on social media that amounts to, I hate Perl because of its weird syn­tax.” Or, It looks like line noise.” (Perl seems to have out­last­ed that one — when’s the last time you used an acoustic modem?) Or the quote attrib­uted to Keith Bostic: The only lan­guage that looks the same before and after RSA encryption.”

So let’s address, con­front, and demys­ti­fy this hate. What are these objec­tion­able syn­tac­ti­cal, noisy, pos­si­bly encrypt­ed bits? And why does Perl have them?

Regular expressions

Regular expres­sions, or reg­ex­ps, are not unique to Perl. JavaScript has them. Java has them. Python has them as well as anoth­er mod­ule that adds even more fea­tures. It’s hard to find a lan­guage that does­n’t have them, either native­ly or through the use of a library. It’s com­mon to want to search text using some kind of pat­tern, and reg­ex­ps pro­vide a fair­ly stan­dard­ized if terse mini-​language for doing so. There’s even a C‑based library called PCRE, or Perl Compatible Regular Expressions,” enabling many oth­er pieces of soft­ware to embed a reg­exp engine that’s inspired by (though not quite com­pat­i­ble) with Perl’s syntax.

Being itself inspired by Unix tools like grep, sed, and awk, Perl incor­po­rat­ed reg­u­lar expres­sions into the lan­guage as few oth­er lan­guages have, with bind­ing oper­a­tors of =~ and !~ enabling easy match­ing and sub­sti­tu­tions against expres­sions, and pre-​compilation of reg­ex­ps into their own type of val­ue. Perl then added the abil­i­ty to sep­a­rate reg­ex­ps by white­space to improve read­abil­i­ty, use dif­fer­ent delim­iters to avoid the leaning-​toothpick syn­drome of escap­ing slash (/) char­ac­ters with back­slash­es (\), and name your cap­ture groups and back­ref­er­ences when sub­sti­tut­ing or extract­ing strings.

All this is to say that Perl reg­u­lar expres­sions can be some of the most read­able and robust when used to their full poten­tial. Early on this helped cement Perl’s rep­u­ta­tion as a text-​processing pow­er­house, though the core of reg­ex­ps’ suc­cinct syn­tax can result in difficult-​to-​read code. Such inscrutable exam­ples can be found in any lan­guage that imple­ments reg­u­lar expres­sions; at least Perl offers the enhance­ments men­tioned above.


Perl has three built-​in data types that enable you to build all oth­er data struc­tures no mat­ter how com­plex. Its vari­able names are always pre­ced­ed by a sig­il, which is just a fan­cy term for a sym­bol or punc­tu­a­tion mark.

  • A scalar con­tains a string of char­ac­ters, a num­ber, or a ref­er­ence to some­thing, and is pre­ced­ed with a $ (dol­lar sign).
  • An array is an ordered list of scalars begin­ning with an ele­ment num­bered 0 and is pre­ced­ed with a @ (at sign). 
  • A hash, or asso­cia­tive array, is an unordered col­lec­tion of scalars indexed by string keys and is pre­ced­ed with a % (per­cent sign).

So vari­able names $look @like %this. Individual ele­ments of arrays or hash­es are scalars, so they $look[0] $like{'this'}. (That’s the first ele­ment of the @look array count­ing from zero, and the ele­ment in the %like hash with a key of 'this'.)

Perl also has a con­cept of slices, or select­ed parts of an array or hash. A slice of an array looks like @this[1, 2, 3], and a slice of a hash looks like @that{'one', 'two', 'three'}. You could write it out long-​hand like ($this[1], $this[2], $this[3]) and ($that{'one'}, $that{'two'}, $that{'three'} but slices are much eas­i­er. Plus you can even spec­i­fy one or more ranges of ele­ments with the .. oper­a­tor, so @this[0 .. 9] would give you the first ten ele­ments of @this, or @this[0 .. 4, 6 .. 9] would give you nine with the one at index 5 miss­ing. Handy, that.

In oth­er words, the sig­il always tells you what you’re going to get. If it’s a sin­gle scalar val­ue, it’s pre­ced­ed with a $; if it’s a list of val­ues, it’s pre­ced­ed with a @; and if it’s a hash of key-​value pairs, it’s pre­ced­ed with a %. You nev­er have to be con­fused about the con­tents of a vari­able because the name will tell you what’s inside.

Data structures, anonymous values, and dereferencing

I men­tioned ear­li­er that you can build com­plex data struc­tures from Perl’s three built-​in data types. Constructing them with­out a lot of inter­me­di­ate vari­ables requires you to use things like:

  • lists, denot­ed between ( paren­the­ses )
  • anony­mous arrays, denot­ed between [ square brack­ets ]
  • and anony­mous hash­es, denot­ed between { curly braces }.

Given these tools you could build, say, a scalar ref­er­enc­ing an array of street address­es, each address being an anony­mous hash:

$addresses = [
  { 'name'    => 'John Doe',
    'address' => '123 Any Street',
    'city'    => 'Anytown',
    'state'   => 'TX',
  { 'name'    => 'Mary Smith',
    'address' => '100 Other Avenue',
    'city'    => 'Whateverville',
    'state'   => 'PA',

(The => is just a way to show cor­re­spon­dence between a hash key and its val­ue, and is just a fun­ny way to write a com­ma (,). And like some oth­er pro­gram­ming lan­guages, it’s OK to have trail­ing com­mas in a list as we do for the 'state' entries above; it makes it eas­i­er to add more entries later.)

Although I’ve nice­ly spaced out my exam­ple above, you can imag­ine a less socia­ble devel­op­er might cram every­thing togeth­er with­out any spaces or new­lines. Further, to extract a spe­cif­ic val­ue from this struc­ture this same per­son might write the fol­low­ing, mak­ing you count dol­lar signs one after anoth­er while read­ing right-​to-​left then left-to-right:

say $$addresses[1]{'name'};

We don’t have to do that, though; we can use arrows that look like -> to deref­er­ence our array and hash elements:

say $addresses->[1]->{'name'};

We can even use post­fix deref­er­enc­ing to pull a slice out of this struc­ture, which is just a fan­cy way of say­ing always read­ing left to right”:

say for $addresses->[1]->@{'name', 'city'};

Which prints out:

Mary Smith

Like I said above, the sig­il always tells you what you’re going to get. In this case, we got:

  • a sliced list of val­ues with the keys 'name' and 'city' out of…
  • an anony­mous hash that was itself the sec­ond ele­ment (count­ing from zero, so index of 1) ref­er­enced in…
  • an anony­mous array which was itself ref­er­enced by…
  • the scalar named $addresses.

That’s a mouth­ful, but com­pli­cat­ed data struc­tures often are. That’s why Perl pro­vides a Data Structures Cookbook as the perldsc doc­u­men­ta­tion page, a ref­er­ences tuto­r­i­al as the perlreftut page, and final­ly a detailed guide to ref­er­ences and nest­ed data struc­tures as the perlref page.

Special variables

Perl was also inspired by Unix com­mand shell lan­guages like the Bourne shell (sh) or Bourne-​again shell (bash), so it has many spe­cial vari­able names using punc­tu­a­tion. There’s @_ for the array of argu­ments passed to a sub­rou­tine, $$ for the process num­ber the cur­rent pro­gram is using in the oper­at­ing sys­tem, and so on. Some of these are so com­mon in Perl pro­grams they are writ­ten with­out com­men­tary, but for the oth­ers there is always the English mod­ule, enabling you to sub­sti­tute in friend­ly (or at least more awk-like) names.

With use English; at the top of your pro­gram, you can say:

All of these pre­de­fined vari­ables, punc­tu­a­tion and English names alike, are doc­u­ment­ed on the perlvar doc­u­men­ta­tion page.

The choice to use punc­tu­a­tion vari­ables or their English equiv­a­lents is up to the devel­op­er, and some have more famil­iar­i­ty with and assume their read­ers under­stand the punc­tu­a­tion vari­ety. Other less-​friendly devel­op­ers engage in code golf,” attempt­ing to express their pro­grams in as few key­strokes as possible.

To com­bat these and oth­er unso­cia­ble ten­den­cies, the perlstyle doc­u­men­ta­tion page admon­ish­es, Perl is designed to give you sev­er­al ways to do any­thing, so con­sid­er pick­ing the most read­able one.” Developers can (and should) also use the perlcritic tool and its includ­ed poli­cies to encour­age best prac­tices, such as pro­hibit­ing all but a few com­mon punc­tu­a­tion vari­ables.

Conclusion: Do you still hate Perl?

There are only two kinds of lan­guages: the ones peo­ple com­plain about and the ones nobody uses.

Bjarne Stroustrup, design­er of the C++ pro­gram­ming language

It’s easy to hate what you don’t under­stand. I hope that read­ing this arti­cle has helped you deci­pher some of Perl’s noisy” quirks as well as its fea­tures for increased read­abil­i­ty. Let me know in the com­ments if you’re hav­ing trou­ble grasp­ing any oth­er aspects of the lan­guage or its ecosys­tem, and I’ll do my best to address them in future posts.

Back To The Future DeLorean

Last week saw the release of Perl 5.34.0 (you can get it here), and with it comes a year’s worth of new fea­tures, per­for­mance enhance­ments, bug fix­es, and oth­er improve­ments. It seems like a good time to high­light some of my favorite changes over the past decade and a half, espe­cial­ly for those with more dat­ed knowl­edge of Perl. You can always click on the head­ers below for the full releas­es’ perldelta pages.

Perl 5.10 (2007)

This was a big release, com­ing as it did over five years after the pre­vi­ous major 5.8 release. Not that Perl devel­op­ers were idle — but it would­n’t be until ver­sion 5.14 that the lan­guage would adopt a steady year­ly release cadence.

Due to the build-​up time, many core enhance­ments were made but the most impor­tant was arguably the feature prag­ma, enabling the addi­tion of new syn­tax that would oth­er­wise break Perl’s back­ward com­pat­i­bil­i­ty. 5.10 also intro­duced the defined-​or oper­a­tor (//), state vari­ables that per­sist their pre­vi­ous val­ue, the say func­tion for auto­mat­i­cal­ly append­ing a new­line on out­put (so much saved typ­ing), and a large col­lec­tion of improve­ments to reg­u­lar expres­sions. In addi­tion, this release intro­duced smart match­ing (~~), though ver­sion 5.18 would even­tu­al­ly rel­e­gate it to exper­i­men­tal sta­tus.

Perl 5.12 (2010)

This release also saw many new fea­tures added, but if I had to pick one mar­quee item it would be exper­i­men­tal sup­port for plug­gable key­words, which enabled authors to extend the lan­guage itself with­out mod­i­fy­ing the core. Previously one would either use plain func­tions, hacky source fil­ters, or the dep­re­cat­ed Devel::Declare mod­ule to sim­u­late this func­tion­al­i­ty. CPAN authors would go on to cre­ate all kinds of new syn­tax, some­times pro­to­typ­ing fea­tures that would even­tu­al­ly make their way into core.

Perl 5.14 (2011)

5.14 had a big list of enhance­ments, includ­ing Unicode 6.0 sup­port and a gag­gle of reg­u­lar expres­sion fea­tures. My favorite of these was the /r switch for non-​destructive sub­sti­tu­tions.

But as the first year­ly cadence release, the changes in pol­i­cy took cen­ter stage. The Perl 5 Porters (p5p) explic­it­ly com­mit­ted to sup­port­ing the two most recent sta­ble release series, pro­vid­ing secu­ri­ty patch­es only for release series occur­ring in the past three years. They also defined an explic­it com­pat­i­bil­i­ty and dep­re­ca­tion pol­i­cy, with def­i­n­i­tions for fea­tures that may be exper­i­men­tal, dep­re­cat­ed, dis­cour­aged, and removed.

Perl 5.16 (2012)

Another year, anoth­er ver­sion bump. This time the core enhance­ments were all over the map (although no enhance­ments to the map function 😀 ).

May I high­light anoth­er doc­u­men­ta­tion change, though? The perlootut Object-​Oriented Programming in Perl Tutorial replaced the old perltoot, perltooc, perlboot, and perlbot pages, pro­vid­ing an intro­duc­tion to object-​oriented design con­cepts before strong­ly rec­om­mend­ing the use of one of the OO sys­tems from CPAN. Mentioned are Moose, its alter­na­tive Mouse, Class::Accessor, Object::Tiny, and Role::Tinys usage with the lat­ter two. Later ver­sions of perlootut would rec­om­mend Moo rather than Mouse.

Perl 5.18 (2013)

As men­tioned ear­li­er, Perl 5.18 ren­dered smart­match exper­i­men­tal, as well as lex­i­cal use of the $_ vari­able. With these came a new cat­e­go­ry of warn­ings for exper­i­men­tal fea­tures and a method for over­rid­ing such warn­ings feature-​by-​feature. Fitting in with the secu­ri­ty and safe­ty theme, hash­es were over­hauled to ran­dom­ize key/​value order, increas­ing their resis­tance to algo­rith­mic com­plex­i­ty attacks.

But it was­n’t all fenc­ing in bad behav­ior. Lexical sub­rou­tines made their first (exper­i­men­tal) appear­ance, and although I con­fess I haven’t had much call for them in my work, oth­ers have come up with some inter­est­ing uses. Four years lat­er they became non-​experimental.

Perl 5.20 (2014)

Three new syn­tax fea­tures arrived in 2014: exper­i­men­tal sub­rou­tine sig­na­tures (of which I’ve writ­ten more about here), key/​value hash slices and index/​value array slices, and exper­i­men­tal post­fix deref­er­enc­ing. This last enables clean­er left-​to-​right syn­tax when deref­er­enc­ing variables:

  • @{ $array_ref } becomes $array_ref->@*
  • %{ $hash_ref } becomes $hash_ref->%*
  • Etc.

Postfix deref­er­enc­ing became non-​experimental in Perl 5.24, and vig­or­ous dis­cus­sion con­tin­ues on sub­rou­tine sig­na­tures’ future.

Perl 5.22 (2015)

Speaking of sub­rou­tine sig­na­tures, their loca­tion moved to between the sub­rou­tine name (if any) and the attribute list (if any). Previously they appeared after attrib­ut­es. The les­son? Remain con­scious of exper­i­men­tal fea­tures in your code, and be pre­pared to make changes when upgrading.

In addi­tion to the enhance­ments, secu­ri­ty updates, per­for­mance fix­es, and dep­re­ca­tions, devel­op­ers removed the his­tor­i­cal­ly notable CGI mod­ule. First added to core in 1997 in recog­ni­tion of its crit­i­cal role in enabling web devel­op­ment, it’s been sup­plant­ed by bet­ter alter­na­tives on CPAN.

Perl 5.24 (2016)

Perl 5.20s post­fix deref­er­enc­ing was no longer exper­i­men­tal, and devel­op­ers removed both lex­i­cal $_ and autoderef­er­enc­ing on calls to push, pop, shift, unshift, splice, keys, values, and each.

Perl 5.26 (2017)

The incor­po­ra­tion of exper­i­men­tal fea­tures con­tin­ued, with lex­i­cal sub­rou­tines mov­ing into full sup­port. I like the added read­abil­i­ty enhance­ments, though: indent­ed here-​documents; the /xx reg­u­lar expres­sion mod­i­fi­er for tabs and spaces in char­ac­ter class­es; and @{^CAPTURE}, %{^CAPTURE}, and %{^CAPTURE_ALL} for reg­exp match­es with a lit­tle more self-documentation.

Perl 5.28 (2018)

Experimental sub­rou­tine sig­na­ture and attribute order­ing flipped back to its Perl 5.20 sequence of attributes-​then-​signature. Bit of a roller­coast­er ride on this one. You could do worse than using some­thing like Type::Params until this set­tles and get a wide vari­ety of type con­straints in the bargain.

Perl 5.30 (2019)

Pour one out for AWK and Fortran pro­gram­mers migrat­ing to Perl: the $[vari­able for set­ting the low­er bound of arrays could no longer be set to any­thing oth­er than zero. This had a long dep­re­ca­tion cycle start­ing in Perl 5.12.

Perl 5.32 (2020)

In 2020 Perl’s devel­op­ment moved to GitHub. And once again, I’m going to high­light read­abil­i­ty enhance­ments: the exper­i­men­tal isa oper­a­tor could be used to say:

if ( $obj isa Some::Class ) { ... }

instead of

use Scalar::Util 'blessed';
if ( blessed($obj) and $obj->isa('Some::Class') { ... }

You could also chain com­par­i­son oper­a­tors, lead­ing to the more math­e­mat­i­cal­ly con­cise if ( $x < $y <= $z ) {...} rather than if ( $x < $y and $y <= $z ) {...}.

Perl 5.34 (2021)

Finally, we come to last week’s release and its intro­duc­tion of exper­i­men­tal try/​catch excep­tion han­dling syn­tax. If you need to sup­port ear­li­er ver­sions of Perl back to 5.14, you can use Feature::Compat::Try. Earlier this year I inter­viewed the fea­ture and mod­ule’s author, Paul LeoNerd” Evans, for This year also marked the debut of Perl’s new gov­er­nance mod­el with the appoint­ment of a Core Team and a three-​member Steering Council.

What are some of your favorite Perl improve­ments over the years? Check out the perlhist doc­u­ment for a detailed chronol­o­gy and refresh­er with the var­i­ous perldelta pages and leave me a com­ment below.