Tag: Perl

  • Scraping the Dragon with Perl and Mojolicious

    Scraping the Dragon with Perl and Mojolicious

    Every extend­ed Labor Day week­end, 80,000 fans of pop cul­ture descend on Atlanta for Dragon Con. It’s a sprawl­ing choose-​your-​own adven­ture of a con­ven­tion with 38 pro­gram­ming tracks and over 5,000 hours of events. It spans five down­town host hotels, and there is no way to see it all.

    Sadly, this year’s con is almost over. Still, I thought I’d share a lit­tle script I wrote to help me make sense of it all.

    The offi­cial mobile app is fine for search­ing and book­mark­ing events, speak­ers, and exhibitors. Nonetheless, it’s not suit­able for scan­ning the whole land­scape at once. I want­ed a sin­gle, scrol­lable view of every event, before I even packed my cosplay.

    Even in the app’s tablet ver­sion, the Dragon Con Events area is a scroll-fest.

    The web ver­sion of the app gave me exact­ly what I need­ed: pre­dictable per-​day URLs and seman­ti­cal­ly marked-​up HTML. That meant I can skip the API hunt, skip the man­u­al scrolling, and go straight to scraping.

    Inspecting the HTML reveals per-​day event URLs and per-​event <div> blocks.

    From Chaos to Clarity in 40 lines

    We’re about to turn a messy, multi-​day, multi-​hotel sched­ule into one clean, scroll-​once list. This is the forty-​five-​line Perl map that gets us there, aid­ed by the Mojolicious web toolkit.

    Laying the Groundwork: Tools for the Job

    #!/usr/bin/env perl
    
    use v5.40;
    
    use Carp;
    use English;
    use Mojo::UserAgent;
    use Mojo::URL;
    use Mojo::DOM;
    use Mojo::Collection q(c);
    use Time::Piece;
    use HTML::HTML5::Entities;
    use Memoize;
    
    binmode STDOUT, ':encoding(UTF-8)'
      or croak "Couldn't encode STDOUT: $OS_ERROR";
    
    my $ua   = Mojo::UserAgent->new();
    my $site = Mojo::URL->new('https://app.core-apps.com');
    my $path = '/dragoncon25/events/view_by_day';

    What’s hap­pen­ing: Load the mod­ules that will do the heavy lifting–HTTP fetch­es, DOM pars­ing, date han­dling, Unicode cleanup. Lock STDOUT to UTF8 so char­ac­ters like curly quotes and em-​dashes don’t break the out­put. Point the script at the base sched­ule URL.

    Remembering the Days Without Re-Parsing

    my $date_from_dom = memoize( sub ($dom) {
      return content_at( $dom, 'div.section_header[class~="alt"]' );
    } );

    What’s hap­pen­ing: Create a mem­o­ized helper that plucks the date from a day’s HTML and caches it. That way, if we need it again, we skip the DOM re-​parse and keep the pipeline fast.

    content_at is a helper func­tion I define later.

    Starting Where the App Starts

    my $today_dom = Mojo::DOM->new( $ua->get("$site$path")->result->text );

    What’s hap­pen­ing: Fetch the today” view–the same default the app shows. This is so we have a known start­ing point for build­ing the full timeline.

    Collecting the Whole Timeline

    my $day_doms = c(
      $today_dom,
      $today_dom->find(qq(div.filter_box-days > a[href^="$path?day="]))
        ->map( \&dom_from_anchor )
        ->to_array->@*,
    )->sort( sub { day_epoch($a) <=> day_epoch($b) } );

    What’s hap­pen­ing: Grab every day link from the fil­ter bar, fetch each day’s HTML, and sort them chrono­log­i­cal­ly. Now we’ve got the entire con’s sched­ule in mem­o­ry, ready to process.

    dom_from_anchor and day_epoch are two more helper func­tions explained fur­ther down.

    Turning HTML into a Human-​Readable Schedule

    $day_doms->each( sub {    # process each day's events
      my $date = $date_from_dom->($_);
    
      $_->find('a.bookmark[data-type="events"] + a.object_link')
        ->each( sub {         # output start time + title
    
          my $time    = content_at( $_, 'div.line[class~="two"]' );
          my $title   = content_at( $_, 'div.line[class~="one"]' );
          my ($start) = split /\s*\p{Dash_Punctuation}/, $time;
    
          say "$date $start: ", decode_entities($title);
        } );
    } );

    What’s hap­pen­ing: For each day, find every event link and pull out the start time and title. Split the time clean­ly on any dash and decode HTML enti­ties so the out­put reads like a real schedule.

    The Little Routines That Make It All Work

    sub dom_from_anchor ($dom) {    # fetch DOM for a day link
      return Mojo::DOM->new(
        $ua->get( Mojo::URL->new( $dom->attr('href') )->to_abs($site) )
          ->result->text );
    }
    
    sub day_epoch ($dom) {    # parse date into epoch
      return Time::Piece->strptime( $date_from_dom->($dom), '%A, %b %e' )
        ->epoch;
    }
    
    # extract and trim text from selector
    sub content_at ( $dom, @args ) { return trim $dom->at(@args)->content }

    What’s hap­pen­ing:

    1. dom_from_anchor: fetch and pars­es a linked days’ HTML.
    2. day_epoch: turn a date string into a sort-​able epoch.
    3. content_at: extract and trim text from a DOM frag­ment, giv­en a CSS selector.

    These helpers keep the main flow read­able and re-usable.

    The Schedule, Unlocked

    Run the script and you get a clean, UTF-​8-​safe list of every event, in chrono­log­i­cal order, across all days. No swip­ing around, no tap­ping, no what did I miss?” anx­i­ety. (Ha, who am I kid­ding? There’s too much going on at Dragon Con to not end up miss­ing something.)

    An exam­ple run of the script in my ter­mi­nal. Each line is Day, Date Time: Event Title”, sort­ed chrono­log­i­cal­ly across the whole con.

    And here’s just a small slice of the 2,500+ lines it produces:

    Sunday, Aug 31 11:30 AM: Unmasking Sherlock: Beyond the Many Faces
    Sunday, Aug 31 11:30 AM: Weaponization of the FCC and Other Agencies to Chill Speech
    Sunday, Aug 31 11:30 AM: Where Physics Gets Weird
    . . .
    Sunday, Aug 31 11:50 AM: Photo Session: Amelia Tyler
    Sunday, Aug 31 11:50 AM: Photo Session: Cissy Jones
    Sunday, Aug 31 11:50 AM: Photo Session: Emma Gregory
    . . .
    Sunday, Aug 31 12:00 PM: Dragon Con Mashups
    Sunday, Aug 31 12:00 PM: James J. Butcher and R.R. Virdi signing at The Missing Volume booth# 1300
    Sunday, Aug 31 12:00 PM: JoeDan Worley and Eric Dontigney signing at the Shadow Alley Press Booth# 2
    . . .
    Sunday, Aug 31 12:00 PM: Photo Session: Robert Duncan McNeill
    Sunday, Aug 31 12:00 PM: Photo Session: Robert Picardo
    Sunday, Aug 31 12:00 PM: Photo Session: Tamara Taylor

    Key Techniques

    Here’s the fun part–the tech­niques that make this tidy, scroll-​once list possible.

    CSS selectors for precision

    I used a.bookmark[data-type="events" + a.object_link] to grab only the event title links, and div.line[class~="two" /​div.line[class~="one"] for time and title, respec­tive­ly. This avoids scrap­ing unre­lat­ed elements.

    Memoization for efficiency

    memoize caches the date string for each day’s DOM so I did­n’t end up re-​parsing the HTML frag­ment mul­ti­ple times.

    Unicode-​safe splitting

    \p{Dash_Punctuation} match­es any dash type (em, en, hyphen-​minus, etc.), so I could split times reli­ably with­out wor­ry­ing about which dash the site uses.

    Functional chaining

    Mojo::Collections map, sort, and each meth­ods let me express the scrape→transform→output pipeline in a lin­ear, read­able way.

    Entity decoding at output

    HTML::HTML5::Entitiesdecode_entities is applied right before print­ing, so HTML enti­ties like &amp; or &quot; are human-​readable in the final output.

    A Pattern You Can Take Anywhere

    The same approach that tamed Dragon Con’s chaos works any­where you’ve got:

    • Predictable URLs–so you can iter­ate with­out guesswork
    • Consistent HTML structure–so your selec­tors stay stable
    • A need to see every­thing at once–so you can make deci­sions with­out pag­ing or filtering

    From fan con­ven­tions to con­fer­ence sched­ules, from local sports fix­tures to film fes­ti­val line‑ups–the same pat­tern applies. Sometimes the right tool isn’t a sprawl­ing frame­work or heavy­weight API client. It’s a forty‑odd‑line Perl script that does one thing with ruth­less clarity.

    Because once you’ve tamed a sched­ule like this, the only lines you’ll stand in are the ones that feel like part of the show.

  • Even lighter Perl modulinos with Util::H2O::More

    Even lighter Perl modulinos with Util::H2O::More

    A few weeks ago, I wrote about how to use the mod­uli­no pat­tern in Perl to cre­ate unit-​testable com­mand line tools. Fellow Houston Perl Monger Brett Estrade point­ed me to a dif­fer­ent approach on the Perl Applications & Algorithms Discord. This approach trims boil­er­plate while keep­ing scripts testable.

    Brett’s Utils::H2O::More mod­ule amends the light­weight class builder Utils::H2O. It adds many extra meth­ods, includ­ing com­mand line argu­ment pro­cess­ing via the Perl-​packaged Getopt::Long mod­ule. It also promis­es to build its acces­sors with less cer­e­mo­ny and code than Moo.

    So let’s dive in!

    A simple script

    A script can use Util::H2O::More’s Getopt2h2o func­tion to process com­mand line options. It returns an object with acces­sors for each parameter.

    Here’s a sim­ple exam­ple, mod­eled after my ear­li­er mod­uli­no exer­cise:

    #!/usr/bin/env perl
    
    use v5.38;
    use Util::H2O::More v0.4.2 qw(Getopt2h2o);
    
    # name is a string, water is an array of strings
    my $o = Getopt2h2o \@ARGV, {}, qw(
        name=s
        water=s@
    );
    die "Missing --name\n" unless $o->name;
    
    printf "Good %s, %s!\n", time_of_day(), $o->name;
    
    if ( defined $o->water and $o->water->@* ) {
        say 'What kind of water would you like?';
        say "- $_" for $o->water->@*;
    }
    
    sub time_of_day {
        my %hours = (
             5 => 'morning',
            12 => 'afternoon',
            17 => 'evening',
            21 => 'night',
        );
    
        for ( sort { $b <=> $a } keys %hours ) {
            return $hours{$_} if (localtime)[2] >= $_;
        }
        return 'night';
    }

    This is short and read­able. I like how Getopt::Long’s quirky para­me­ter pars­ing syn­tax is repur­posed to cre­ate acces­sors, even for multi-​valued options.

    You can call this script like so:

    ./h2options.pl --name Aquarius --water hot --water cold

    We had to check that --name was set, as Util::H2O::More does­n’t sup­port using an = mod­i­fi­er to spec­i­fy required options. This is some­thing Getopt::Long allows.

    And as Util::H2O’s doc­u­men­ta­tion sug­gests: You should prob­a­bly switch to some­thing like Moo instead [for advanced features].”

    But enough about limitations–what if you want­ed to use this as a Perl mod­ule for testing?

    Testing the waters

    One of the strengths of a mod­uli­no is the abil­i­ty to unit test its log­ic with­out invok­ing it from the shell. A typ­i­cal test script looks like this:

    #!/usr/bin/env perl
    
    use v5.38;
    use Test2::V0;
    use modulinh2o;
    
    plan(6);
    
    can_ok(
        'modulinh2o',
        [ 'time_of_day', 'name', 'water' ],
        'class has methods',
    );
    my $water = modulinh2o->new( name => 'Aquarius' );
    isa_ok( $water, ['modulinh2o'],
            'object is expected class' );
    can_ok(
        $water,
        [ 'time_of_day', 'name', 'water' ],
        'object has methods',
    );
    
    is( $water->name,        'Aquarius', 'name set' );
    is( $water->name('Bob'), 'Bob',      'name change' );
    is( $water->time_of_day,
        in_set( qw(
            morning
            afternoon
            evening
            night
        ) ),
        'time of day function',
    );

    It isn’t dif­fi­cult to adapt a mod­uli­no from our ear­li­er sim­ple script:

    #!/usr/bin/env perl
    
    use v5.38;
    
    package modulinh2o;
    
    use Getopt::Long qw();
    use Util::H2O::More v0.4.2 qw(h2o opt2h2o);
    
    my @opt_spec = qw(
        name=s
        water=s@
    );
    my $o = h2o -classify => __PACKAGE__, {},
            opt2h2o(@opt_spec);
    
    sub time_of_day {
        my %hours = (
             5 => 'morning',
            12 => 'afternoon',
            17 => 'evening',
            21 => 'night',
        );
    
        for ( sort { $b <=> $a } keys %hours ) {
            return $hours{$_} if (localtime)[2] >= $_;
        }
        return 'night';
    }
    
    # constructor that parses arguments w/ basic validation
    sub new_with_options ($class) {
        Getopt::Long::GetOptionsFromArray(
          \@ARGV, $o, @opt_spec
        ) or die "bad options\n";
        die "Missing --name\n" unless $o->name;
        return $o;
    }
    
    sub run ($self) {
        printf "Good %s, %s!\n",
               time_of_day(), $self->name;
    
        if ( defined $self->water and $self->water->@* ) {
            say 'What kind of water would you like?';
            say "- $_" for $self->water->@*;
        }
        return;
    }
    
    package main;
    
    main() unless caller;
    
    sub main { modulinh2o->new_with_options->run() }

    And run:

    ./modulinh2o.pm --name Aquarius \
      --water sparkling --water still

    This is much short­er than the Moo-​based mod­uli­no from three weeks ago, but it also does­n’t do as much. There’s no sup­port for using com­ma sep­a­ra­tors to pass mul­ti­ple val­ues to a sin­gle argu­ment. Worse, there’s no auto­mat­ic help text if one pass­es the wrong options.

    Both are fix­able, as we’ll see in a moment. Still, you end up hav­ing to write the POD your­self, print­ed out with var­i­ous invo­ca­tions of Pod::Usages pod2usage() function.

    An ounce of script is worth a gallon of documentation

    Here’s a full exam­ple that adds both --help and --man com­mand line options, as typ­i­cal­ly pro­vid­ed by tra­di­tion­al Getopt::Long-based scripts:

    #!/usr/bin/env perl
    
    use v5.38;
    
    package modulinh2o2;
    
    use Getopt::Long qw();
    use Util::H2O::More v0.4.2 qw(h2o opt2h2o);
    use Pod::Usage;
    
    my @opt_spec = qw(
        name=s
        water=s@
    
        help
        man
    );
    my $o = h2o -classify => __PACKAGE__, {},
            opt2h2o(@opt_spec);
    
    sub time_of_day {
        my %hours = (
             5 => 'morning',
            12 => 'afternoon',
            17 => 'evening',
            21 => 'night',
        );
    
        for ( sort { $b <=> $a } keys %hours ) {
            return $hours{$_} if (localtime)[2] >= $_;
        }
        return 'night';
    }
    
    # different parameter mixes for pod2usage()
    my %pod2usage_opt = (
        cmdline => {
            -exitval  => 2,
            -verbose  => 99,
            -sections => 'USAGE/Command line',
            -message  => 'Use --help to list options',
        },
        opts => {
            -exitval  => 0,
            -verbose  => 99,
            -sections => ['USAGE/Command line', 'OPTIONS'],
        },
        man => {
            -exitval => 0,
            -verbose => 2,
        },
    );
    
    sub new_with_options ($class) {
        Getopt::Long::GetOptionsFromArray(
          \@ARGV, $o, @opt_spec
        ) or pod2usage( %pod2usage_opt{cmdline} );
    
        pod2usage( $pod2usage_opt{opts} ) if $o->help;
        pod2usage( $pod2usage_opt{man} )  if $o->man;
        pod2usage( %pod2usage_opt{cmdline},
          -message => 'Missing --name',
        ) unless $o->name;
    
        # default values for the water parameter
        $o->water(
              ( defined $o->water and $o->water->@* )
            ? [ split /,/, join q{,}, $o->water->@* ]
            : [ qw(
                still
                sparkling
                tap
            ) ] );
    
        return $o;
    }
    
    sub run ($self) {
        printf "Good %s, %s!\n",
               time_of_day(), $self->name;
    
        say 'What kind of water would you like?';
        say "- $_" for $self->water->@*;
    
        return;
    }
    
    package main;
    
    main() unless caller;
    
    sub main { modulinh2o2->new_with_options->run() }
    
    # the rest below is documentation
    
    __END__
    
    =head1 NAME
    
    modulinh2o2 - demo of a modulino using Util::H2O::More
    
    =head1 USAGE
    
    =head2 Command line
    
        modulinh2o2.pm [options]
    
    =head2 Perl
    
        use modulinh2o2;
        my $water = modulinh2o2->new(
            name  => 'Aquarius',
            water => [ qw(
                sparkling
                still
                tap
            ) ],
        );
        $water->run;
    
    =head1 OPTIONS
    
    =over
    
    =item B<--name>
    
    Your name here! (required)
    
    =item B<--water>
    
    Type of water to serve. May be specified multiple times, either by repeating the option or separated by commas.
    
    Takes an arrayref when used as a method or construction parameter.
    
    Default values:
    
    =over
    
    =item still
    
    =item sparkling
    
    =item tap
    
    =back
    
    =item B<--help>
    
    Displays a brief help message.
    
    =item B<--man>
    
    Display full documentation as a manual page.
    
    =back
    
    =head1 DESCRIPTION
    
    A sample L<Util::H2O::More> modulino that prints your name, what part of the day it is, and a menu of water choices.
    
    =head1 METHODS
    
    =head2 new
    
    Constructor that takes the above L</OPTIONS> but without the preceding C<-->.
    
    =head2 new_with_options
    
    Alternate constructor that receives its parameters from C<@ARGV>.
    
    =head2 run
    
    Prints out a greeting along with a selection of refreshing drinks.
    
    =head2 ACCESSOR OPTIONS
    
    All of the L</OPTIONS> above are also available as method accessors but without the preceding C<-->.
    
    =head1 FUNCTIONS
    
    =head2 time_of_day
    
    Returns the period of the day based on local time. Possible values are:
    
    =over
    
    =item morning
    
    =item afternoon
    
    =item evening
    
    =item night
    
    =back
    
    =head1 AUTHOR
    
    Mark Gardner <[email protected]>
    
    =head1 LICENSE AND COPYRIGHT
    
    This software is copyright (c) 2025 by Mark Gardner.
    
    This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

    Now we can run either:

    ./modulinh2o2.pm --help

    to get a quick help mes­sage, or:

    ./modulinh2o2.pm --man

    to show the full man­u­al page.

    Yes, over half of the line count is doc­u­men­ta­tion. Even grant­i­ng this the code amount is now com­pa­ra­ble to the Moo-​based ver­sion. And you don’t get the advan­tage of MooX::Options’ declar­a­tive syntax.

    To sum­ma­rize, here’s a table chart­ing the evo­lu­tion of our modulinh2o:

    StageProsConsComplexity
    Simple script:
    Getopt2h2o only
    * Minimal code foot­print
    * Instant acces­sors from CLI argu­ments
    * Multi-​valued options via @ syntax
    * No required-​argument enforce­ment
    * No auto-​help
    * Manual validation
    Low:
    about 20 lines, one file
    Basic mod­uli­no:
    h2o + opt2h2o
    * Testable as a mod­ule
    * Reusable new_with_options con­struc­tor
    * Keeps brevi­ty vs. Moo
    * Still no comma-​separated multi-​values
    * No auto-​help on bad arguments
    Medium:
    adds pack­age struc­ture, test harness
    Full mod­uli­no w/​help & man:
    Pod::Usage inte­grat­ed
    * Support for --help and --man
    * Comma-​separated multi-​values han­dled
    * Defaults for miss­ing --water
    * Clearer on-​boarding for new users
    * More boil­er­plate
    * Over half the file is POD
    * Still less declar­a­tive than MooX::Options
    High:
    more mov­ing parts, but user-friendly
    Util::H2O::More mod­uli­no evo­lu­tion and fea­ture trade-offs

    Seen togeth­er, these stages trace a course from a bare-​bones script to a well-​provisioned mod­uli­no. Each step adds fea­tures at the cost of a lit­tle more complexity.

    In the end, Util::H2O::More deliv­ers a lean, testable mod­uli­no with far less boil­er­plate than Moo–but also few­er built‑in niceties. If you val­ue speed from idea to work­ing script and can live with­out declar­a­tive option han­dling, it’s a com­pelling choice. For more struc­tured needs, MooX::Options still has the edge. Either way, the path from script to production-​ready mod­uli­no is short­er than you think.

    The best tool is the one that flows from idea to it works” in a sin­gle pour.

  • Kaiju Boss Battle: A Dist::Zilla Journey from Chaos to Co-Op

    Kaiju Boss Battle: A Dist::Zilla Journey from Chaos to Co-Op

    Last week I wrote about devel­op­ing a Perl mod­ule enabling me to out­put log entries to macOS’ uni­fied log­ging sys­tem. This week­end’s adven­ture involved port­ing that mod­ule’s man­u­al process­es. These process­es includ­ed depen­den­cy man­age­ment, doc­u­men­ta­tion sync­ing, ver­sion bump­ing, and release. The goal was to make every­thing more auto­mat­ed and repeatable.

    And maybe even… monstrous.

    I was already famil­iar with the Dist::Zilla (DZil to its friends) suite of tools and plu­g­ins. I also knew that some Perl devel­op­ers view it as a huge bar­ri­er to entry. This per­cep­tion affects their will­ing­ness to con­tribute to oth­ers’ projects.

    So even though Log-​Any-​Adapter-​MacOS-​OSLog was a small mod­ule of inter­est to a lim­it­ed cod­ing audi­ence (macOS Perl users), I thought it wise to have both a main branch and sep­a­rate build/main branch for those pro­gram­mers who want­ed to work as though things had bare­ly changed:

    • Entire source code with full POD-​formatted documentation;
    • A Makefile.PL script to gen­er­ate a portable build­ing, test­ing, and installing Makefile;
    • Plus, every Perl dis­tri­b­u­tion should sup­ply the typ­i­cal README, MANIFEST, LICENSE, and CONTRIBUTING doc­u­men­ta­tion. This is essen­tial if it’s meant for pub­lic consumption.

    A small dis­tri­b­u­tion would also give me a mod­el I can scale up for larg­er projects. At the very least, it was anoth­er learn­ing oppor­tu­ni­ty for me.

    Boy, was it.

    Why Dist::Zilla?

    Because I know it can auto­mate away the boil­er­plate code and rep­e­ti­tion of infor­ma­tion that’s unfor­tu­nate­ly nec­es­sary in a mod­ern Perl mod­ule distribution:

    • the ver­sion num­bers in every mod­ule and script
    • the README that often includes the same intro­duc­to­ry text as the main mod­ule’s documentation
    • the nam­ing, order, and con­tent of POD sec­tions (via DZil’s sis­ter suite, Pod::Weaver), some of which repeat dis­tri­b­u­tion meta­da­ta like author, ver­sion, sup­port, copy­right, license, and so on

    If you need fur­ther detail, Dan Book’s Dist::Zilla::Starter is, as its name sug­gests. an excel­lent and remark­ably thor­ough guide to the how and why of Dist::Zilla. It even cov­ers the basic struc­ture of CPAN dis­tri­b­u­tions and the his­to­ry of Perl mod­ule author­ing tools.

    Why not something else?

    Because just as in the sto­ry of Goldilocks and the three bears, every­thing else I looked at seemed want­i­ng in some way:

    • Module::Build/​Module::Build::Tiny: Although min­i­mal, pure-​Perl, and easy to install, Module::Build prop­er still requires ship­ping extra boil­er­plate and dupli­cate meta­da­ta. ::Tiny shaves that down fur­ther but drops whole swaths of func­tion­al­i­ty. As an exam­ple, you can’t spec­i­fy at set­up time that users need a spe­cif­ic oper­at­ing sys­tem. It’s a deal-​breaker for a mod­ule that requires macOS 10.12 Sierra or newer.
    • Minilla: Opinionated con­ven­tion over con­fig­u­ra­tion, but I don’t share its opin­ions. Overriding direc­to­ry lay­out, test struc­ture, and README/​license han­dling would mean fight­ing Minilla’s defaults, DZil-​config style, with­out DZil’s plu­g­in ecosystem.
    • ShipIt: Simple, one-​file con­fig­u­ra­tion. But too sim­ple: it’s most­ly release automa­tion and not author­ing automa­tion. Everything I want­ed to tool away is still manual.
    • No or min­i­mal tool­chain: That was how I start­ed this mod­ule, with ExtUtils::MakeMaker. Totally man­u­al, with every con­trib­u­tor, includ­ing yours tru­ly, need­ing to remem­ber all the mov­ing parts by hand.

    Off to see the lizard!

    My DZil dist.ini con­fig­u­ra­tion file start­ed off sim­ply enough:

    name    = Log-Any-Adapter-MacOS-OSLog
    author  = Mark Gardner <[email protected]>
    license = Perl_5
    copyright_holder = Mark Gardner
    version = 0.0.5 ; bump as appropriate
    
    [MetaResources]
    repository.type = git
    bugtracker.web  = https://codeberg.org/mjgardner/perl-Log-Any-Adapter-MacOS-OSLog/issues
    
    [Repository]
    web = https://codeberg.org/mjgardner/perl-Log-Any-Adapter-MacOS-OSLog

    That [MetaResources] is the first evi­dence of using plu­g­ins. It pro­vides a nice tidy way to spec­i­fy the meta­da­ta. Automated tools can use this infor­ma­tion to index, exam­ine, pack­age, or install Perl distributions.

    So far so good. But then the mon­ster attacked.

    [@Filter]
    -bundle = @Basic
    -remove = GatherDir
    -remove = MakeMaker
    -remove = Readme
    ; License plugin still active here
    
    [GatherDir]
    include_dotfiles = 1
    exclude_filename = .gitignore
    exclude_filename = .build
    
    [ReadmeAnyFromPod]
    type = markdown
    filename = README.md
    location = build
    
    [CopyFilesFromBuild]
    copy = README.md
    
    ; various MakeMaker::*, Git::*, Pod::Weaver, etc. plugins

    I was get­ting errors from the dzil --build com­mand as it attempt­ed to add the same file mul­ti­ple times. These were gen­er­at­ed files com­mit­ted to the main ver­sion con­trol branch and a build/main branch for non-​DZil contributors.

    This brought me to a dead stop. I went down a rabbit-​hole exam­in­ing built files from rsync(1) with [Run::AfterBuild]., com­par­ing them to the branch I had set up.

    Plus my .perlcriticrc file was dis­ap­pear­ing from the build arti­facts despite that instruc­tion to [GatherDir] to include dot­files.

    Let them fight.”

    LICENSE to kill (files)

    Eventually I traced the LICENSE file dupli­ca­tion to duel­ing DZil plu­g­ins. The afore­men­tioned [GatherDir] was duti­ful­ly copy­ing the file from my work­ing root even as the [License] plu­g­in was gen­er­at­ing it. I did­n’t want to lose the lat­ter source of what­ev­er license I hap­pened to be using to dis­trib­ute this code.

    And .perlcriticrc? It turns out the [PruneCruft] plu­g­in does­n’t lis­ten to its friend [GatherDir] and was hoover­ing it away.

    In the end, I had to expand my manually-​configured plu­g­in ros­ter a lit­tle to keep them from fight­ing over what files came from where:

    [@Filter]
    -bundle = @Basic
    -remove = GatherDir
    -remove = PruneCruft
    -remove = MakeMaker
    -remove = Readme
    
    [PruneCruft]
    except = .perlcriticrc
    
    [GatherDir]
    include_dotfiles = 1
    exclude_filename = .gitignore
    exclude_filename = .build
    exclude_filename = LICENSE
    
    [ReadmeAnyFromPod]
    type            = markdown
    filename        = README.md
    
    [CopyFilesFromBuild]
    copy = README.md
    copy = LICENSE

    With these tweaks to Log-​Any-​Adapter-​MacOS-​OSLog’s dist.ini:

    • [License] gen­er­ates its file,
    • [CopyFilesFromBuild] copies it along with the README into the root for my commit,
    • And [GatherDir] does­n’t stomp on it like so many Tokyo city blocks.

    The build/main branch for non-​DZil con­trib­u­tors would con­tain exact­ly what they and I expect­ed. It would be a match of the DZil work­ing copy plus arti­facts, git cloneable with no surprises.

    Lessons learned, and what’s next?

    I want­ed to serve two styles of devel­op­ment and had signed myself up for a com­pli­cat­ed author­ing process. I now have more knowl­edge of which plu­g­in runs in each phase of the build. This allows me to decide between gen­er­at­ed and version-​controlled files.

    CPAN has a rich vari­ety of per­son­al Dist::Zilla::PluginBundle::s fac­tor­ing indi­vid­ual authors’ pref­er­ences into a sin­gle place. Those authors don’t have to copy-​and-​paste DZil con­fig­u­ra­tions around. It’s past time for me to mint one of my own bun­dles for more con­sis­tent and well-​understood Perl dis­tri­b­u­tions. Then I can start spin­ning up new projects with­out revis­it­ing the same pain.


    This week­end has brought on a men­tal shift. I moved from fight­ing DZil’s defaults to mak­ing it work my way. I also found sat­is­fac­tion in a pre­dictable, minimal-​effort Perl release pipeline.

    The mon­ster was just a guy in a rub­ber suit all along.

  • Logging from Perl to macOS’ unified log with FFI and Log::Any

    Logging from Perl to macOS’ unified log with FFI and Log::Any

    Part 1: The elephant in the room

    A few weeks ago, I start­ed host­ing my own Mastodon instance on a Mac mini in my home office. I want­ed to join the social Fediverse on my own terms–but it did­n’t take long to notice bal­loon­ing disk usage. Cached media from oth­er users’ posts was pil­ing up fast.

    That got me think­ing: how do I track this growth before it gets out of hand?

    Logging seemed like the obvi­ous answer. On Unix and Linux sys­tems, it’s straight­for­ward enough. But on macOS, find­ing a native, main­tain­able solu­tion takes more digging.

    Part 2: Feeding the Apple

    macOS is Unix-​based, so you’d expect log­ging to be sim­ple. You can install logro­tate via Homebrew, then sched­ule it with cron(8). It works–but it adds lay­ers of con­fig­u­ra­tion files, per­mis­sions, and guess­work. I want­ed some­thing native. Something that felt like it belonged on a Mac.

    Turns out, macOS offers two built-​in options. One is newsys­log, a BSD-​style tool that rotates logs based on size or time. It’s reli­able, but it requires priv­i­leged root-owned con­fig­u­ra­tion files and feels like a holdover from old­er Unix systems.

    The oth­er is Apple’s uni­fied log­ging sys­tem–a mod­ern API used across macOS, iOS, and even watchOS. It’s struc­tured, search­able, and already baked into the plat­form. That’s the one I decid­ed to explore.

    Howard Oakley’s explain­er on the Unified Log helped me under­stand Apple’s sys­tem for con­sol­i­dat­ing logs. It showed how they are stored in a com­pressed bina­ry for­mat, com­plete with struc­tured meta­da­ta and pri­va­cy con­trols. With that foun­da­tion, I turned to Apple’s OSLog Framework doc­u­men­ta­tion. It showed how to tag entries and fil­ter them with pred­i­cates. macOS han­dles the rest.

    It’s elegant–but you need to use the API to write logs. Yes, read­ing and fil­ter­ing can be done on the com­mand line or in the Console app. But Apple seems to expect log­ging to be the sole province of Swift and Objective‑C devel­op­ers. I’d rather not have to learn a new pro­gram­ming lan­guage just to write logs.

    UPDATE: Howard Oakley’s blow­hole util­i­ty pro­vides a sim­ple way to write to the uni­fied log from the com­mand line, but all mes­sages come from the co.eclecticlight.blowhole” sub­sys­tem with a gen­er­al” cat­e­go­ry. We can do better.

    Part 3: A platypus in the key of C

    I do know Perl. I also know just enough C to be dan­ger­ous. And I briefly con­sid­ered learn­ing Swift or Objective‑C. Nevertheless, I won­dered about bridg­ing Perl to Apple’s uni­fied log­ging sys­tem with­out switch­ing languages.

    macOS expos­es a C API in <os/log.h>:

    #include <os/log.h>
    
    void
    os_log(os_log_t log, const char *format, ...);
    
    void
    os_log_info(os_log_t log, const char *format, ...);
    
    void
    os_log_debug(os_log_t log, const char *format, ...);
    
    void
    os_log_error(os_log_t log, const char *format, ...);
    
    void
    os_log_fault(os_log_t log, const char *format, ...);

    Perl’s CPAN has a mod­ule called FFI::Platypus that would let me call for­eign func­tions in C and oth­er lan­guages. It looked promising.

    But there’s a catch: these log­ging func­tions are vari­adic macros, not plain func­tions. That makes them inac­ces­si­ble via FFI. Worse, they expand into pri­vate API calls–unstable across OS updates and risky to rely upon.

    So I wrote a small C wrap­per to con­vert each macro into a prop­er func­tion. This makes them FFI-​safe and lets me con­trol vis­i­bil­i­ty (pub­lic log­ging vs. pri­vate, redact­ed log­ging) using Apple’s for­mat specifiers:

    #include <os/log.h>
    
    #define DEFINE_OSLOG_WRAPPERS(level_macro, suffix)    \
        void os_log_##suffix##_public(os_log_t log,       \
                                      const char *msg) {  \
            level_macro(log, "%{public}s", msg);          \
        }                                                 \
        void os_log_##suffix##_private(os_log_t log,      \
                                       const char *msg) { \
            level_macro(log, "%{private}s", msg);         \
        }
    
    // Generate wrappers for each log level
    DEFINE_OSLOG_WRAPPERS(os_log, default)
    DEFINE_OSLOG_WRAPPERS(os_log_info, info)
    DEFINE_OSLOG_WRAPPERS(os_log_debug, debug)
    DEFINE_OSLOG_WRAPPERS(os_log_error, error)
    DEFINE_OSLOG_WRAPPERS(os_log_fault, fault)

    This macro gen­er­ates two func­tions per log level–one pub­lic, one private–giving down­stream Perl code a choice. It’s ver­bose, but it’s safe, auditable, and future-proof.

    Part 4: Plugging into Log::Any

    With the wrap­per library in place, I began map­ping Apple’s log lev­els to some­thing Perl can use. I chose Log::Any from CPAN because it’s light­weight, wide­ly sup­port­ed, and its adapters don’t lock you into a spe­cif­ic back-​end. The same code that logs to the screen can also log to a file, or in our case, Apple’s system.

    Admittedly, at this point I’m no longer writ­ing a sim­ple log­ging script for my Mastodon instance. Instead, it’s a full-​fledged log­ging mod­ule. Oh well.

    Some Log::Any lev­els share the same under­ly­ing Apple call– OSLog does­n’t dis­tin­guish between notice and info or trace and debug. That’s a lit­tle dif­fer­ent from how Unix sys­log does things, but that’s fine. The goal here is com­pat­i­bil­i­ty, not per­fect fidelity.

    Building a sim­ple dis­patch table to route log mes­sages based on lev­el, I then used FFI::Platypus to bind each wrap­per function:

    use FFI::Platypus 2.00;
    
    my %OS_LOG_MAP = (
        trace     => 'os_log_debug',
        debug     => 'os_log_debug',
        info      => 'os_log_info',
        notice    => 'os_log_info',
        warning   => 'os_log_fault',
        error     => 'os_log_error',
        critical  => 'os_log_default',
        alert     => 'os_log_default',
        emergency => 'os_log_default',
    );
    
    my $ffi = FFI::Platypus->new(
        api => 2,
        lib => [ './liboslogwrapper.dylib' ],
    );
    
    $ffi->attach(
        [ os_log_create => '_os_log_create' ],
        [ 'string', 'string' ],
        'opaque',
    );
    
    # attach each wrapper function
    my %UNIQUE_OS_LOG = map { $_ => 1 } values %OS_LOG_MAP;
    foreach my $function ( keys %UNIQUE_OS_LOG ) {
        for my $variant (qw(public private)) {
            my $name = "${function}_$variant";
            $ffi->attach(
                [ $name => "_$name" ],
                [ 'opaque', 'string' ],
                'void',
            );
        }
    }

    This set­up gives me a clean way to log from Perl using Apple’s native sys­tem. I can achieve this with­out touch­ing Swift, Objective‑C, or exter­nal tools. Each log lev­el maps to a C wrap­per, and the FFI lay­er han­dles the rest.

    Now I just need an init func­tion to cre­ate the os_​log_​t object and a set of meth­ods for log­ging and detect­ing whether a giv­en log lev­el is enabled:

    use strict;
    use Carp;
    use base qw(Log::Any::Adapter::Base);
    use Log::Any::Adapter::Util qw(
      detection_methods
      numeric_level
    );
    
    sub init {
        my $self = shift;
        $self->{private} ||= 0;
        croak 'subsystem is required'
          unless defined $self->{subsystem};
    
        $self->{_os_log} = _os_log_create(
          @{$self}{qw(subsystem category)},
        );
    
        return;
    }
    
    foreach my $log_level ( keys %OS_LOG_MAP ) {
        no strict 'refs';
        *{$log_level} = sub {
            my ( $self, $message ) = @_;
    
            &{  "_$OS_LOG_MAP{$log_level}_"
                    . ( $self->{private}
                        ? 'private'
                        : 'public'
                    ) }( $self->{_os_log}, $message );
        };
    }
    
    foreach my $method ( detection_methods() ) {
        my $method_level = numeric_level(substr $method 3);
        no strict 'refs';
        *{$method} = sub {
            !!( $method_level <= (
              $_[0]->{log_level} // numeric_level('info')
            ) );
        };
    }

    What’s that sub­sys­tem” bit up there? That’s the term macOS uses for iden­ti­fy­ing process­es in logs. They’re usu­al­ly for­mat­ted in reverse DNS nota­tion (e.g., com.example.perl”). Once again, Howard Oakley has a great explain­er on the top­ic.

    Also, there’s some metapro­gram­ming going on there:

    • The first fore­ach loop cre­ates func­tions called trace, debug, and info. These func­tions call the cor­re­spond­ing FFI::Platypus-created func­tions. It uses the pri­vate vari­ants if the pri­vate attribute for the log adapter was set.
    • The sec­ond fore­ach loop cre­ates cre­ates func­tions called is_​trace, is_​debug, is_​info, etc., that return true if the adapter is catch­ing that lev­el of log message.

    Part 5: At long last, logging… mostly

    Once this is pack­aged in a Perl mod­ule, how do you use it? At least that part isn’t too hard:

    use Log::Any '$log', default_adapter => [
      'MacOS::OSLog', subsystem => 'com.phoenixtrap.perl',
    ];
    use English;
    use Carp qw(longmess);
    
    $log->info('Hello from Perl!');
    $log->infof('You are using Perl %s', $PERL_VERSION);
    
    $log->trace( longmess('tracing!') );
    $log->debug(     'debugging!'     );
    $log->info(      'informing!'     );
    $log->notice(    'noticing!'      );
    $log->warning(   'warning!'       );
    $log->error(     'erring!'        );
    $log->critical(  'critiquing!'    );
    $log->alert(     'alerting!'      );
    $log->emergency( 'emerging!'      );

    And then you can run this com­mand line to stream log mes­sages from the sub­sytem used above:

    % log stream --level debug \
      --predicate 'subsystem == "com.phoenixtrap.perl"

    What hap­pened to the trace and debug log mes­sages that were sup­posed to call os_log_debug(3)? According to macOS’ log(1) man­u­al page, you have to explic­it­ly allow debug­ging out­put for a giv­en subsystem:

    % sudo log config --mode "level:debug" \
      --subsystem com.phoenixtrap.perl

    Et voilà!

    Hmm, same lack of debug­ging messages.

    I’m still fig­ur­ing this out. Any clues? Drop me a line!

    UPDATE: This is now fixed thanks to some inspi­ra­tion from the source code of Log::Any::Adapter::Syslog. I’ve updat­ed the code on Codeberg; here is the diff.

    Bonus: Fancy output

    Thanks to Log::Any::Proxy, you also get sprintf for­mat­ting vari­ant functions:

    use English;
    $log->infof(
        'You are using Perl %s in %d',
        $PERL_VERSION, (localtime)[5] + 1900,
    );
    You are using Perl v5.40.2 in 2025

    If you out­put an object that over­loads string rep­re­sen­ta­tion, you get that string:

    use DateTime;
    $log->infof('It is now %s', DateTime->now);
    It is now 2025-08-10T20:16:50

    And you get single-​line Data::Dumper out­put of com­plex data struc­tures, plus replac­ing unde­fined val­ues with the string undef”:

    $log->info( {
        foo    => 'hello',
        bar    => 'world',
        colors => [ qw(
            red
            green
            blue
        ) ],
        null => undef,
    } );
    {bar => "world",colors => ["red","green","blue"],foo => "hello",null => undef}

    Conclusion: Build once, use everywhere

    The best tools aren’t always the ones you planned to build. They’re the ones that solve a prob­lem cleanly–and then solve five more you hadn’t thought of yet.

    What start­ed as a quick fix for Mastodon media mon­i­tor­ing became a reusable bridge between Perl and macOS’ Unified Log. Along the way, I got to explore Apple’s log­ging inter­nals, write an FFI-​respecting C wrap­per, and inte­grate clean­ly with Log::Any. The result­ing code is mod­u­lar, auditable, and–most importantly–maintainable.

    I did­n’t set out to write a log­ging adapter. But when you care about clean ops and repro­ducible infra­struc­ture, some­times the best tools are the ones you build your­self. And if they hap­pen to be over-​engineered for the task at hand? All the better–they’ll prob­a­bly out­live it.

    Try it out or contribute!

    The full adapter code is on Codeberg. If you’re log­ging from Perl on macOS, give it a spin. Contributions, bug reports, and real-​world feed­back are welcome–especially if you’re test­ing it in pro­duc­tion or on old­er macOS versions.

    I’ll do my best to stay com­pat­i­ble with past and future macOS and Perl releas­es. Keeping the code auditable and min­i­mal should help it stay use­ful with­out becom­ing a mov­ing target.

  • Lightweight object-​oriented Perl scripts: From modulinos to moodulinos

    Lightweight object-​oriented Perl scripts: From modulinos to moodulinos

    Last week I found myself devel­op­ing a Perl script to cat­a­log some infor­ma­tion for our qual­i­ty assur­ance team. Unfortunately, as these things some­times do, the scrip­t’s com­plex­i­ty and require­ments start­ed increas­ing. I still want­ed to keep it as a sim­ple script. Yet, it was grow­ing com­mand line argu­ments that need­ed extra val­i­da­tion. I also need­ed to test some func­tions with­out wait­ing for the entire script to run.

    As with many things Perl, the basic solu­tion is fair­ly old. Over twen­ty years ago, bri­an d foy pop­u­lar­ized the mod­uli­no pat­tern. in which Perl scripts that you exe­cute from the com­mand line can also act as Perl mod­ules. You can even use these mod­ules in oth­er con­texts, for exam­ple test­ing.* A mod­uli­no seemed like the per­fect solu­tion for test­ing indi­vid­ual script func­tions, but writ­ing object-​oriented Perl out­side of a frame­work (or the new Perl class syn­tax) can be chal­leng­ing and verbose.

    Enter the cow (Moo)

    The Moo sys­tem of mod­ules are billed as a light­weight way to con­cise­ly define objects and roles with a con­ve­nient syn­tax that avoids the details of Perl’s object sys­tem.” It does­n’t have any XS code. Thus, it does­n’t need a C com­pil­er to install. Unlike its inspi­ra­tion, Moose, it’s opti­mized for the fast start­up time need­ed for a command-​line script. Sure, you don’t get a full-​strength meta-​object pro­to­col for query­ing and manip­u­lat­ing class­es, objects, and attributes—those capa­bil­i­ties are con­cerns for larg­er appli­ca­tions or libraries. In keep­ing with the light­weight theme, you can use Type::Tiny con­straints for para­me­ter val­i­da­tion. Additionally, there are sev­er­al solu­tions for turn­ing command-​line argu­ments into object attrib­ut­es. (I chose to use MooX::Options, main­ly because of its easy avail­abil­i­ty as an Ubuntu Linux pack­age.)

    I’m not about to dump a pro­pri­etary script here on my blog. Yet, I have worked up an illus­tra­tive exam­ple of how to incor­po­rate Moo into a mod­uli­no. Call it a mooduli­no” if you like; here’s a short-​ish script to tell Perl just how you feel at this time of day:

    #!/usr/bin/env perl
    
    use v5.38;
    
    package moodulino;
    use Moo;
    use MooX::Options;
    use Types::Standard qw(ArrayRef Str);
    
    option name => (
        is       => 'ro',
        isa      => Str,
        required => 1,
        short    => 'n',
        doc      => 'your name here',
        format   => 's',
    );
    
    option moods => (
        is        => 'ro',
        isa       => ArrayRef [Str],
        predicate => 1,
        short     => 'm',
        doc       => 'a list of how you might feel',
        format    => 's@',
        autosplit => ',',
    );
    
    has time_of_day => (
        is      => 'ro',
        isa     => Str,
        builder => 1,
    );
    
    sub _build_time_of_day ($self) {
        my %hours = (
             5 => ‘morning’,
            12 => 'afternoon',
            17 => 'evening',
            21 => 'night',
        );
    
        for ( sort { $b <=> $a } keys %hours ) {
            return $hours{$_} if (localtime)[2] >= $_;
        }
        return 'night';
    }
    
    sub run ($self) {
        printf "Good %s, %s!\n",
          $self->time_of_day,
          $self->name;
    
        if ( $self->has_moods ) {
            say 'How are you feeling?';
            say "- $_?" for $self->moods->@*;
        }
    }
    
    package main;
    
    main() unless caller;
    
    sub main { moodulino->new_with_options->run() }

    And here’s what hap­pens when I run it:

    % chmod a+x moodulino.pm
    % ./moodulino.pm
    name is missing
    USAGE: moodulino.pm [-h] [long options ...]
    
        -m --moods=[Strings]  a list of how you might feel
        -n --name=String      your name here
    
        --usage               show a short help message
        -h                    show a compact help message
        --help                show a long help message
        --man                 show the manual
    % ./moodulino.pm --name Mark
    Good afternoon, Mark!
    % ./moodulino.pm —name Mark --moods happy --moods sad --moods excited
    Good afternoon, Mark!
    How are you feeling?
    - happy?
    - sad?
    - excited?
    % ./moodulino.pm —name Mark --moods happy,sad,excited
    Good afternoon, Mark!
    How are you feeling?
    - happy?
    - sad?
    - excited?

    If the mood strikes, I can even write a test script for my script:

    #!/usr/bin/env perl
    
    use v5.38;
    use Test2::V0;
    use moodulino;
    
    plan(3);
    
    my $mood = moodulino->new( name => 'Bessy' );
    isa_ok( $mood, 'moodulino' );
    can_ok( $mood, 'time_of_day' );
    
    is( $mood->time_of_day,
        in_set( qw(
            morning
            afternoon
            evening
            night
        ) ) );

    And run it:

    % prove -I. t/time_of_day.t
    t/daytime.t .. ok
    All tests successful.
    Files=1, Tests=3,  0 wallclock secs ( 0.00 usr  0.00 sys +  0.07 cusr  0.01 csys =  0.08 CPU)
    Result: PASS

    * foy lat­er expand­ed this idea into the chap­ter Modules as Programs” in Mastering Perl (2007). You can also read more in his 2014 arti­cle Rescue lega­cy code with mod­uli­nos”. Also explore Gábor Szabó’s arti­cles on the top­ic. ↩︎