Replied to MetaCPAN on Fastly's Fast Forward (Twitter)
We’re excited to be a part of @Fastly’s new initiative to empower everyone to build the good, open internet — Fast Forward ⏩ Fastly has served MetaCPAN with free services since 2012. Learn more about it here: https://www.fastly.com/blog/fast-forward-lets-build-the-good-internet-together

I’m tremen­dous­ly grate­ful for Fastlys sup­port of the Perl MetaCPAN search engine over the past ten years, but the for­mer’s page says they only start­ed serv­ing open source” for eight.
logos: Rust, Python, Ruby, Scratch, LF, ASF, curl

And I’m not sure how MetaCPAN is explic­it­ly part of this ini­tia­tive. I don’t see it mentioned.

clear light bulb planter on gray rock

Twitter recent­ly rec­om­mend­ed a tweet to me (all hail the algo­rithm) tout­ing what the author viewed as the top 5 web devel­op­ment stacks.”

JavaScript/​Node.js options dom­i­nat­ed the four-​letter acronyms as expect­ed, but the fifth one sur­prised me: LAMP, the com­bi­na­tion of the Linux oper­at­ing sys­tem, Apache web serv­er, MySQL rela­tion­al data­base, and Perl, PHP, or Python pro­gram­ming lan­guages. A quick web search for sim­i­lar lists yield­ed sim­i­lar results. Clearly, this meme (in the Dawkins sense) has out­last­ed its pop­u­lar­iza­tion by tech pub­lish­er O’Reilly in the 2000s.

Originally coined in 1998 dur­ing the dot-​com” bub­ble, I had thought that the term LAMP” had fad­ed with devel­op­ers in the inter­ven­ing decades with the rise of language-​specific web frame­works for:

Certainly on the Perl side (with which I’m most famil­iar), the com­mu­ni­ty has long since rec­om­mend­ed the use of a frame­work built on the PSGI spec­i­fi­ca­tion, dep­re­cat­ing 1990s-​era CGI scripts and the mod_​perl Apache exten­sion. Although general-​purpose web servers like Apache or Nginx may be part of an over­all sys­tem, they’re typ­i­cal­ly used as prox­ies or load bal­ancers for Perl-​specific servers either pro­vid­ed by the frame­work or a third-​party mod­ule.

Granted, PHP still relies on web server-​specific mod­ules, APIs, or vari­a­tions of the FastCGI pro­to­col for inter­fac­ing with a web serv­er. And Python web appli­ca­tions typ­i­cal­ly make use of its WSGI pro­to­col either as a web serv­er exten­sion or, like the Perl exam­ples above, as a prox­ied stand­alone serv­er. But all of these are deploy­ment details and do lit­tle to describe how devel­op­ers imple­ment and extend a web application’s structure.

Note how the var­i­ous four-​letter JavaScript stacks (e.g., MERN, MEVN, MEAN, PERN) dif­fer­en­ti­ate them­selves most­ly by fron­tend frame­work (e.g., Angular, React, Vue.js) and maybe by the (rela­tion­al or NoSQL) data­base (e.g., MongoDB, MySQL, PostgreSQL). All how­ev­er seem stan­dard­ized on the Node.js run­time and Express back­end web frame­work, which could, in the­o­ry, be replaced with non-​JavaScript options like the more mature LAMP-​associated lan­guages and frame­works. (Or if you pre­fer lan­guages that don’t start with P”, there’s C#, Go, Java, Ruby, etc.)

My point is that LAMP” as the name of a web devel­op­ment stack has out­lived its use­ful­ness. It’s at once too spe­cif­ic (about oper­at­ing sys­tem and web serv­er details that are often abstract­ed away for devel­op­ers) and too broad (cov­er­ing three sep­a­rate pro­gram­ming lan­guages and not the frame­works they favor). It also leaves out oth­er non-​JavaScript back-​end lan­guages and their asso­ci­at­ed frameworks.

The ques­tion is: what can replace it? I’d pro­pose NoJS” as rem­i­nis­cent of NoSQL,” but that inac­cu­rate­ly excludes JavaScript from its nec­es­sary role in the front-​end. NJSB” doesn’t exact­ly roll off the tongue, either, and still has the same ambi­gu­i­ty prob­lem as LAMP.”

How about pithy sort-​of-​acronyms pat­terned like database-​frontend-​backend? Here are some Perl examples:

  • MRDancer: MySQL, React, and Dancer (I use this at work. Yes, the M could also stand for MongoDB. Naming things is hard.)
  • MRMojo: MongoDB, React, and Mojolicious
  • PACat: PostgreSQL, Angular, and Catalyst
  • etc.

Ultimately it comes down to com­mu­ni­ty and indus­try adop­tion. If you’re involved with back-​end web devel­op­ment, please let me know in the com­ments if you agree or dis­agree that LAMP” is still a use­ful term, and if not, what should replace it.

depth of field photography of brown tree logs

A recent Lobsters post laud­ing the virtues of AWK remind­ed me that although the lan­guage is pow­er­ful and lightning-​fast, I usu­al­ly find myself exceed­ing its capa­bil­i­ties and reach­ing for Perl instead. One such appli­ca­tion is ana­lyz­ing volu­mi­nous log files such as the ones gen­er­at­ed by this blog. Yes, WordPress has stats, but I’ve nev­er let rein­ven­tion of the wheel get in the way of a good pro­gram­ming exercise.

So I whipped this script up on Sunday night while watch­ing RuPaul’s Drag Race reruns. It pars­es my Apache web serv­er log files and reports on hits from week to week.

#!/usr/bin/env perl

use strict;
use warnings;
use Syntax::Construct 'operator-double-diamond';
use Regexp::Log::Common;
use DateTime::Format::HTTP;
use List::Util 1.33 'any';
use Number::Format 'format_number';

my $parser = Regexp::Log::Common->new(
    format  => ':extended',
    capture => [qw<req ts status>],
);
my @fields      = $parser->capture;
my $compiled_re = $parser->regexp;

my @skip_uri_patterns = qw<
  ^/+robots.txt
  [-\w]*sitemap[-\w]*.xml
  ^/+wp-
  /feed/?$
  ^/+?rest_route=
>;

my ( %count, %week_of );
while ( <<>> ) {
    my %log;
    @log{@fields} = /$compiled_re/;

    # only interested in successful or cached requests
    next unless $log{status} =~ /^2/ or $log{status} == 304;

    my ( $method, $uri, $protocol ) = split ' ', $log{req};
    next unless $method eq 'GET';
    next if any { $uri =~ $_ } @skip_uri_patterns;

    my $dt  = DateTime::Format::HTTP->parse_datetime( $log{ts} );
    my $key = sprintf '%u-%02u', $dt->week;

    # get first date of each week
    $week_of{$key} ||= $dt->date;
    $count{$key}++;
}

printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
  for sort keys %count;

Here’s some sam­ple output:

Week of 2021-07-31:      2,672
Week of 2021-08-02:     16,222
Week of 2021-08-09:     12,609
Week of 2021-08-16:     17,714
Week of 2021-08-23:     14,462
Week of 2021-08-30:     11,758
Week of 2021-09-06:     14,811
Week of 2021-09-13:        407

I first start­ed pro­to­typ­ing this on the com­mand line as if it were an awk one-​liner by using the perl -n and -a flags. The for­mer wraps code in a while loop over the <> dia­mond oper­a­tor”, pro­cess­ing each line from stan­dard input or files passed as argu­ments. The lat­ter splits the fields of the line into an array named @F. It looked some­thing like this while I was list­ing URIs (loca­tions on the website):

gunzip -c ~/logs/phoenixtrap.com-ssl_log-*.gz | \
perl -anE 'say $F[6]'

But once I real­ized I’d need to fil­ter out a bunch of URI pat­terns and do some aggre­ga­tion by date, I turned it into a script and turned to CPAN.

There I found Regexp::Log::Common and DateTime::Format::HTTP, which let me pull apart the Apache log for­mat and its time­stamp strings with­out hav­ing to write even more com­pli­cat­ed reg­u­lar expres­sions myself. (As not­ed above, this was already a wheel-​reinvention exer­cise; no need to com­pound that further.)

Regexp::Log::Common builds a com­piled reg­u­lar expres­sion based on the log for­mat and fields you’re inter­est­ed in, so that’s the con­struc­tor on lines 11 through 14. The expres­sion then returns those fields as a list, which I’m assign­ing to a hash slice with those field names as keys in line 29. I then skip over requests that aren’t suc­cess­ful or brows­er cache hits, skip over requests that don’t GET web pages or oth­er assets (e.g., POSTs to forms or updat­ing oth­er resources), and skip over the URI pat­terns men­tioned earlier.

(Those pat­terns are worth a men­tion: they include the robots.txt and sitemap XML files used by search engine index­ers, WordPress admin­is­tra­tion pages, files used by RSS news­read­ers sub­scribed to my blog, and routes used by the Jetpack WordPress add-​on. If you’re adapt­ing this for your site you might need to cus­tomize this list based on what soft­ware you use to run it.)

Lines 38 and 39 parse the time­stamp from the log into a DateTime object using DateTime::Format::HTTP and then build the key used to store the per-​week hit count. The last lines of the loop then grab the first date of each new week (assum­ing the log is in chrono­log­i­cal order) and incre­ment the count. Once fin­ished, lines 46 and 47 pro­vide a report sort­ed by week, dis­play­ing it as a friend­ly Week of date” and the hit counts aligned to the right with sprintf. Number::Format’s format_number func­tion dis­plays the totals with thou­sands separators.

Update: After this was ini­tial­ly pub­lished. astute read­er Chris McGowan not­ed that I had a bug where $log{status} was assigned the val­ue 304 with the = oper­a­tor rather than com­pared with ==. He also sug­gest­ed I use the double-​diamond <<>> oper­a­tor intro­duced in Perl v5.22.0 to avoid maliciously-​named files. Thanks, Chris!

Room for improvement

DateTime is a very pow­er­ful mod­ule but this comes at a price of speed and mem­o­ry. Something sim­pler like Date::WeekNumber should yield per­for­mance improve­ments, espe­cial­ly as my logs grow (here’s hop­ing). It requires a bit more man­u­al mas­sag­ing of the log dates to con­vert them into some­thing the mod­ule can use, though:

#!/usr/bin/env perl

use strict;
use warnings;
use Syntax::Construct qw<
  operator-double-diamond
  regex-named-capture-group
>;
use Regexp::Log::Common;
use Date::WeekNumber 'iso_week_number';
use List::Util 1.33 'any';
use Number::Format 'format_number';

my $parser = Regexp::Log::Common->new(
    format  => ':extended',
    capture => [qw<req ts status>],
);
my @fields      = $parser->capture;
my $compiled_re = $parser->regexp;

my @skip_uri_patterns = qw<
  ^/+robots.txt
  [-\w]*sitemap[-\w]*.xml
  ^/+wp-
  /feed/?$
  ^/+?rest_route=
>;

my %month = (
    Jan => '01',
    Feb => '02',
    Mar => '03',
    Apr => '04',
    May => '05',
    Jun => '06',
    Jul => '07',
    Aug => '08',
    Sep => '09',
    Oct => '10',
    Nov => '11',
    Dec => '12',
);

my ( %count, %week_of );
while ( <<>> ) {
    my %log;
    @log{@fields} = /$compiled_re/;

    # only interested in successful or cached requests
    next unless $log{status} =~ /^2/ or $log{status} == 304;

    my ( $method, $uri, $protocol ) = split ' ', $log{req};
    next unless $method eq 'GET';
    next if any { $uri =~ $_ } @skip_uri_patterns;

    # convert log timestamp to YYYY-MM-DD
    # for Date::WeekNumber
    $log{ts} =~ m!^
      (?<day>\d\d) /
      (?<month>...) /
      (?<year>\d{4}) : !x;
    my $date = "$+{year}-$month{ $+{month} }-$+{day}";

    my $week = iso_week_number($date);
    $week_of{$week} ||= $date;
    $count{$week}++;
}

printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
  for sort keys %count;

It looks almost the same as the first ver­sion, with the addi­tion of a hash to con­vert month names to num­bers and the actu­al con­ver­sion (using named reg­u­lar expres­sion cap­ture groups for read­abil­i­ty, using Syntax::Construct to check for that fea­ture). On my serv­er, this results in a ten- to eleven-​second sav­ings when pro­cess­ing two months of com­pressed logs.

What’s next? Pretty graphs? Drilling down to spe­cif­ic blog posts? Database stor­age for fur­ther queries and analy­sis? Perl and CPAN make it pos­si­ble to go far beyond what you can do with AWK. What would you add or change? Let me know in the comments.