Replied to

I’m tremendously grateful for Fastlys support of the Perl MetaCPAN search engine over the past ten years, but the former’s page says they only started serving open source” for eight.
logos: Rust, Python, Ruby, Scratch, LF, ASF, curl

And I’m not sure how MetaCPAN is explicitly part of this initiative. I don’t see it mentioned.

clear light bulb planter on gray rock

Twitter recently recommended a tweet to me (all hail the algorithm) touting what the author viewed as the top 5 web development stacks.”

JavaScript/​Node.js options dominated the four-​letter acronyms as expected, but the fifth one surprised me: LAMP, the combination of the Linux operating system, Apache web server, MySQL relational database, and Perl, PHP, or Python programming languages. A quick web search for similar lists yielded similar results. Clearly, this meme (in the Dawkins sense) has outlasted its popularization by tech publisher O’Reilly in the 2000s.

Originally coined in 1998 during the dot-​com” bubble, I had thought that the term LAMP” had faded with developers in the intervening decades with the rise of language-​specific web frameworks for:

Certainly on the Perl side (with which I’m most familiar), the community has long since recommended the use of a framework built on the PSGI specification, deprecating 1990s-​era CGI scripts and the mod_​perl Apache extension. Although general-​purpose web servers like Apache or Nginx may be part of an overall system, they’re typically used as proxies or load balancers for Perl-​specific servers either provided by the framework or a third-​party module.

Granted, PHP still relies on web server-​specific modules, APIs, or variations of the FastCGI protocol for interfacing with a web server. And Python web applications typically make use of its WSGI protocol either as a web server extension or, like the Perl examples above, as a proxied standalone server. But all of these are deployment details and do little to describe how developers implement and extend a web application’s structure.

Note how the various four-​letter JavaScript stacks (e.g., MERN, MEVN, MEAN, PERN) differentiate themselves mostly by frontend framework (e.g., Angular, React, Vue.js) and maybe by the (relational or NoSQL) database (e.g., MongoDB, MySQL, PostgreSQL). All however seem standardized on the Node.js runtime and Express backend web framework, which could, in theory, be replaced with non-​JavaScript options like the more mature LAMP-​associated languages and frameworks. (Or if you prefer languages that don’t start with P”, there’s C#, Go, Java, Ruby, etc.)

My point is that LAMP” as the name of a web development stack has outlived its usefulness. It’s at once too specific (about operating system and web server details that are often abstracted away for developers) and too broad (covering three separate programming languages and not the frameworks they favor). It also leaves out other non-​JavaScript back-​end languages and their associated frameworks.

The question is: what can replace it? I’d propose NoJS” as reminiscent of NoSQL,” but that inaccurately excludes JavaScript from its necessary role in the front-​end. NJSB” doesn’t exactly roll off the tongue, either, and still has the same ambiguity problem as LAMP.”

How about pithy sort-​of-​acronyms patterned like database-​frontend-​backend? Here are some Perl examples:

  • MRDancer: MySQL, React, and Dancer (I use this at work. Yes, the M could also stand for MongoDB. Naming things is hard.)
  • MRMojo: MongoDB, React, and Mojolicious
  • PACat: PostgreSQL, Angular, and Catalyst
  • etc.

Ultimately it comes down to community and industry adoption. If you’re involved with back-​end web development, please let me know in the comments if you agree or disagree that LAMP” is still a useful term, and if not, what should replace it.

depth of field photography of brown tree logs

A recent Lobsters post lauding the virtues of AWK reminded me that although the language is powerful and lightning-​fast, I usually find myself exceeding its capabilities and reaching for Perl instead. One such application is analyzing voluminous log files such as the ones generated by this blog. Yes, WordPress has stats, but I’ve never let reinvention of the wheel get in the way of a good programming exercise.

So I whipped this script up on Sunday night while watching RuPaul’s Drag Race reruns. It parses my Apache web server log files and reports on hits from week to week.

#!/usr/bin/env perl

use strict;
use warnings;
use Syntax::Construct 'operator-double-diamond';
use Regexp::Log::Common;
use DateTime::Format::HTTP;
use List::Util 1.33 'any';
use Number::Format 'format_number';

my $parser = Regexp::Log::Common->new(
    format  => ':extended',
    capture => [qw<req ts status>],
);
my @fields      = $parser->capture;
my $compiled_re = $parser->regexp;

my @skip_uri_patterns = qw<
  ^/+robots.txt
  [-\w]*sitemap[-\w]*.xml
  ^/+wp-
  /feed/?$
  ^/+?rest_route=
>;

my ( %count, %week_of );
while ( <<>> ) {
    my %log;
    @log{@fields} = /$compiled_re/;

    # only interested in successful or cached requests
    next unless $log{status} =~ /^2/ or $log{status} == 304;

    my ( $method, $uri, $protocol ) = split ' ', $log{req};
    next unless $method eq 'GET';
    next if any { $uri =~ $_ } @skip_uri_patterns;

    my $dt  = DateTime::Format::HTTP->parse_datetime( $log{ts} );
    my $key = sprintf '%u-%02u', $dt->week;

    # get first date of each week
    $week_of{$key} ||= $dt->date;
    $count{$key}++;
}

printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
  for sort keys %count;

Here’s some sample output:

Week of 2021-07-31:      2,672
Week of 2021-08-02:     16,222
Week of 2021-08-09:     12,609
Week of 2021-08-16:     17,714
Week of 2021-08-23:     14,462
Week of 2021-08-30:     11,758
Week of 2021-09-06:     14,811
Week of 2021-09-13:        407

I first started prototyping this on the command line as if it were an awk one-​liner by using the perl -n and -a flags. The former wraps code in a while loop over the <> diamond operator”, processing each line from standard input or files passed as arguments. The latter splits the fields of the line into an array named @F. It looked something like this while I was listing URIs (locations on the website):

gunzip -c ~/logs/phoenixtrap.com-ssl_log-*.gz | \
perl -anE 'say $F[6]'

But once I realized I’d need to filter out a bunch of URI patterns and do some aggregation by date, I turned it into a script and turned to CPAN.

There I found Regexp::Log::Common and DateTime::Format::HTTP, which let me pull apart the Apache log format and its timestamp strings without having to write even more complicated regular expressions myself. (As noted above, this was already a wheel-​reinvention exercise; no need to compound that further.)

Regexp::Log::Common builds a compiled regular expression based on the log format and fields you’re interested in, so that’s the constructor on lines 11 through 14. The expression then returns those fields as a list, which I’m assigning to a hash slice with those field names as keys in line 29. I then skip over requests that aren’t successful or browser cache hits, skip over requests that don’t GET web pages or other assets (e.g., POSTs to forms or updating other resources), and skip over the URI patterns mentioned earlier.

(Those patterns are worth a mention: they include the robots.txt and sitemap XML files used by search engine indexers, WordPress administration pages, files used by RSS newsreaders subscribed to my blog, and routes used by the Jetpack WordPress add-​on. If you’re adapting this for your site you might need to customize this list based on what software you use to run it.)

Lines 38 and 39 parse the timestamp from the log into a DateTime object using DateTime::Format::HTTP and then build the key used to store the per-​week hit count. The last lines of the loop then grab the first date of each new week (assuming the log is in chronological order) and increment the count. Once finished, lines 46 and 47 provide a report sorted by week, displaying it as a friendly Week of date” and the hit counts aligned to the right with sprintf. Number::Format’s format_number function displays the totals with thousands separators.

Update: After this was initially published. astute reader Chris McGowan noted that I had a bug where $log{status} was assigned the value 304 with the = operator rather than compared with ==. He also suggested I use the double-​diamond <<>> operator introduced in Perl v5.22.0 to avoid maliciously-​named files. Thanks, Chris!

Room for improvement

DateTime is a very powerful module but this comes at a price of speed and memory. Something simpler like Date::WeekNumber should yield performance improvements, especially as my logs grow (here’s hoping). It requires a bit more manual massaging of the log dates to convert them into something the module can use, though:

#!/usr/bin/env perl

use strict;
use warnings;
use Syntax::Construct qw<
  operator-double-diamond
  regex-named-capture-group
>;
use Regexp::Log::Common;
use Date::WeekNumber 'iso_week_number';
use List::Util 1.33 'any';
use Number::Format 'format_number';

my $parser = Regexp::Log::Common->new(
    format  => ':extended',
    capture => [qw<req ts status>],
);
my @fields      = $parser->capture;
my $compiled_re = $parser->regexp;

my @skip_uri_patterns = qw<
  ^/+robots.txt
  [-\w]*sitemap[-\w]*.xml
  ^/+wp-
  /feed/?$
  ^/+?rest_route=
>;

my %month = (
    Jan => '01',
    Feb => '02',
    Mar => '03',
    Apr => '04',
    May => '05',
    Jun => '06',
    Jul => '07',
    Aug => '08',
    Sep => '09',
    Oct => '10',
    Nov => '11',
    Dec => '12',
);

my ( %count, %week_of );
while ( <<>> ) {
    my %log;
    @log{@fields} = /$compiled_re/;

    # only interested in successful or cached requests
    next unless $log{status} =~ /^2/ or $log{status} == 304;

    my ( $method, $uri, $protocol ) = split ' ', $log{req};
    next unless $method eq 'GET';
    next if any { $uri =~ $_ } @skip_uri_patterns;

    # convert log timestamp to YYYY-MM-DD
    # for Date::WeekNumber
    $log{ts} =~ m!^
      (?<day>\d\d) /
      (?<month>...) /
      (?<year>\d{4}) : !x;
    my $date = "$+{year}-$month{ $+{month} }-$+{day}";

    my $week = iso_week_number($date);
    $week_of{$week} ||= $date;
    $count{$week}++;
}

printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
  for sort keys %count;

It looks almost the same as the first version, with the addition of a hash to convert month names to numbers and the actual conversion (using named regular expression capture groups for readability, using Syntax::Construct to check for that feature). On my server, this results in a ten- to eleven-​second savings when processing two months of compressed logs.

What’s next? Pretty graphs? Drilling down to specific blog posts? Database storage for further queries and analysis? Perl and CPAN make it possible to go far beyond what you can do with AWK. What would you add or change? Let me know in the comments.