Tag: webdev

  • Scraping the Dragon with Perl and Mojolicious

    Scraping the Dragon with Perl and Mojolicious

    Every extend­ed Labor Day week­end, 80,000 fans of pop cul­ture descend on Atlanta for Dragon Con. It’s a sprawl­ing choose-​your-​own adven­ture of a con­ven­tion with 38 pro­gram­ming tracks and over 5,000 hours of events. It spans five down­town host hotels, and there is no way to see it all.

    Sadly, this year’s con is almost over. Still, I thought I’d share a lit­tle script I wrote to help me make sense of it all.

    The offi­cial mobile app is fine for search­ing and book­mark­ing events, speak­ers, and exhibitors. Nonetheless, it’s not suit­able for scan­ning the whole land­scape at once. I want­ed a sin­gle, scrol­lable view of every event, before I even packed my cosplay.

    Even in the app’s tablet ver­sion, the Dragon Con Events area is a scroll-fest.

    The web ver­sion of the app gave me exact­ly what I need­ed: pre­dictable per-​day URLs and seman­ti­cal­ly marked-​up HTML. That meant I can skip the API hunt, skip the man­u­al scrolling, and go straight to scraping.

    Inspecting the HTML reveals per-​day event URLs and per-​event <div> blocks.

    From Chaos to Clarity in 40 lines

    We’re about to turn a messy, multi-​day, multi-​hotel sched­ule into one clean, scroll-​once list. This is the forty-​five-​line Perl map that gets us there, aid­ed by the Mojolicious web toolkit.

    Laying the Groundwork: Tools for the Job

    #!/usr/bin/env perl
    
    use v5.40;
    
    use Carp;
    use English;
    use Mojo::UserAgent;
    use Mojo::URL;
    use Mojo::DOM;
    use Mojo::Collection q(c);
    use Time::Piece;
    use HTML::HTML5::Entities;
    use Memoize;
    
    binmode STDOUT, ':encoding(UTF-8)'
      or croak "Couldn't encode STDOUT: $OS_ERROR";
    
    my $ua   = Mojo::UserAgent->new();
    my $site = Mojo::URL->new('https://app.core-apps.com');
    my $path = '/dragoncon25/events/view_by_day';

    What’s hap­pen­ing: Load the mod­ules that will do the heavy lifting–HTTP fetch­es, DOM pars­ing, date han­dling, Unicode cleanup. Lock STDOUT to UTF8 so char­ac­ters like curly quotes and em-​dashes don’t break the out­put. Point the script at the base sched­ule URL.

    Remembering the Days Without Re-Parsing

    my $date_from_dom = memoize( sub ($dom) {
      return content_at( $dom, 'div.section_header[class~="alt"]' );
    } );

    What’s hap­pen­ing: Create a mem­o­ized helper that plucks the date from a day’s HTML and caches it. That way, if we need it again, we skip the DOM re-​parse and keep the pipeline fast.

    content_at is a helper func­tion I define later.

    Starting Where the App Starts

    my $today_dom = Mojo::DOM->new( $ua->get("$site$path")->result->text );

    What’s hap­pen­ing: Fetch the today” view–the same default the app shows. This is so we have a known start­ing point for build­ing the full timeline.

    Collecting the Whole Timeline

    my $day_doms = c(
      $today_dom,
      $today_dom->find(qq(div.filter_box-days > a[href^="$path?day="]))
        ->map( \&dom_from_anchor )
        ->to_array->@*,
    )->sort( sub { day_epoch($a) <=> day_epoch($b) } );

    What’s hap­pen­ing: Grab every day link from the fil­ter bar, fetch each day’s HTML, and sort them chrono­log­i­cal­ly. Now we’ve got the entire con’s sched­ule in mem­o­ry, ready to process.

    dom_from_anchor and day_epoch are two more helper func­tions explained fur­ther down.

    Turning HTML into a Human-​Readable Schedule

    $day_doms->each( sub {    # process each day's events
      my $date = $date_from_dom->($_);
    
      $_->find('a.bookmark[data-type="events"] + a.object_link')
        ->each( sub {         # output start time + title
    
          my $time    = content_at( $_, 'div.line[class~="two"]' );
          my $title   = content_at( $_, 'div.line[class~="one"]' );
          my ($start) = split /\s*\p{Dash_Punctuation}/, $time;
    
          say "$date $start: ", decode_entities($title);
        } );
    } );

    What’s hap­pen­ing: For each day, find every event link and pull out the start time and title. Split the time clean­ly on any dash and decode HTML enti­ties so the out­put reads like a real schedule.

    The Little Routines That Make It All Work

    sub dom_from_anchor ($dom) {    # fetch DOM for a day link
      return Mojo::DOM->new(
        $ua->get( Mojo::URL->new( $dom->attr('href') )->to_abs($site) )
          ->result->text );
    }
    
    sub day_epoch ($dom) {    # parse date into epoch
      return Time::Piece->strptime( $date_from_dom->($dom), '%A, %b %e' )
        ->epoch;
    }
    
    # extract and trim text from selector
    sub content_at ( $dom, @args ) { return trim $dom->at(@args)->content }

    What’s hap­pen­ing:

    1. dom_from_anchor: fetch and pars­es a linked days’ HTML.
    2. day_epoch: turn a date string into a sort-​able epoch.
    3. content_at: extract and trim text from a DOM frag­ment, giv­en a CSS selector.

    These helpers keep the main flow read­able and re-usable.

    The Schedule, Unlocked

    Run the script and you get a clean, UTF-​8-​safe list of every event, in chrono­log­i­cal order, across all days. No swip­ing around, no tap­ping, no what did I miss?” anx­i­ety. (Ha, who am I kid­ding? There’s too much going on at Dragon Con to not end up miss­ing something.)

    An exam­ple run of the script in my ter­mi­nal. Each line is Day, Date Time: Event Title”, sort­ed chrono­log­i­cal­ly across the whole con.

    And here’s just a small slice of the 2,500+ lines it produces:

    Sunday, Aug 31 11:30 AM: Unmasking Sherlock: Beyond the Many Faces
    Sunday, Aug 31 11:30 AM: Weaponization of the FCC and Other Agencies to Chill Speech
    Sunday, Aug 31 11:30 AM: Where Physics Gets Weird
    . . .
    Sunday, Aug 31 11:50 AM: Photo Session: Amelia Tyler
    Sunday, Aug 31 11:50 AM: Photo Session: Cissy Jones
    Sunday, Aug 31 11:50 AM: Photo Session: Emma Gregory
    . . .
    Sunday, Aug 31 12:00 PM: Dragon Con Mashups
    Sunday, Aug 31 12:00 PM: James J. Butcher and R.R. Virdi signing at The Missing Volume booth# 1300
    Sunday, Aug 31 12:00 PM: JoeDan Worley and Eric Dontigney signing at the Shadow Alley Press Booth# 2
    . . .
    Sunday, Aug 31 12:00 PM: Photo Session: Robert Duncan McNeill
    Sunday, Aug 31 12:00 PM: Photo Session: Robert Picardo
    Sunday, Aug 31 12:00 PM: Photo Session: Tamara Taylor

    Key Techniques

    Here’s the fun part–the tech­niques that make this tidy, scroll-​once list possible.

    CSS selectors for precision

    I used a.bookmark[data-type="events" + a.object_link] to grab only the event title links, and div.line[class~="two" /​div.line[class~="one"] for time and title, respec­tive­ly. This avoids scrap­ing unre­lat­ed elements.

    Memoization for efficiency

    memoize caches the date string for each day’s DOM so I did­n’t end up re-​parsing the HTML frag­ment mul­ti­ple times.

    Unicode-​safe splitting

    \p{Dash_Punctuation} match­es any dash type (em, en, hyphen-​minus, etc.), so I could split times reli­ably with­out wor­ry­ing about which dash the site uses.

    Functional chaining

    Mojo::Collections map, sort, and each meth­ods let me express the scrape→transform→output pipeline in a lin­ear, read­able way.

    Entity decoding at output

    HTML::HTML5::Entitiesdecode_entities is applied right before print­ing, so HTML enti­ties like &amp; or &quot; are human-​readable in the final output.

    A Pattern You Can Take Anywhere

    The same approach that tamed Dragon Con’s chaos works any­where you’ve got:

    • Predictable URLs–so you can iter­ate with­out guesswork
    • Consistent HTML structure–so your selec­tors stay stable
    • A need to see every­thing at once–so you can make deci­sions with­out pag­ing or filtering

    From fan con­ven­tions to con­fer­ence sched­ules, from local sports fix­tures to film fes­ti­val line‑ups–the same pat­tern applies. Sometimes the right tool isn’t a sprawl­ing frame­work or heavy­weight API client. It’s a forty‑odd‑line Perl script that does one thing with ruth­less clarity.

    Because once you’ve tamed a sched­ule like this, the only lines you’ll stand in are the ones that feel like part of the show.

  • WordPress, ActivityPub, and Friends

    I’ve also been mess­ing with the Friends and ActivityPub plu­g­ins for WordPress on my blog, and I share Shelley’s con­cerns about the for­mer bloat­ing the data­base with feed items. You can con­trol this some­what by set­ting reten­tion val­ues in days or a num­ber of posts, but you have to go into each friend’s Feeds tab and do it manually–there’s no default setting.

    After read­ing that post, I’m also con­sid­er­ing dis­abling Friends in favor of a feed read­er, espe­cial­ly because (as Shelley also not­ed) there are gaps when with favorites and com­ment con­ver­sa­tions bridg­ing between WordPress and Mastodon servers. Like her, I’m not keen on installing a single-​user Mastodon instance or oth­er fedi­verse serv­er that requires man­ag­ing an unfa­mil­iar pro­gram­ming language.

    I’m also try­ing to do this in tan­dem with a suite of IndieWeb plu­g­ins, and I’m run­ning into an issue with my friends feed page not show­ing any posts when the Post Kinds plu­g­in is acti­vat­ed. I real­ly want to keep this plu­g­in because it lets me inter­act bet­ter with oth­er IndieWeb sites as well as the Bridgy POSSE/​back­feed ser­vice con­nect­ing me to oth­er social networks.

    My ide­al is a per­son­al web­site where I write every­thing, includ­ing long-​form arti­cles, short sta­tus­es, and replies like these. Folks can then find me via a sin­gle iden­ti­fi­able address and then subscribe/​follow the entire fire­hose of con­tent or choose sub­sets accord­ing to post types, top­ics, or tags. They’d then be able to reply or react on my site or their favored plat­form, which my site would col­lect regard­less of ori­gin, with sub­se­quent replies and reac­tions get­ting pushed out to them. Oh, and it should work with both ActivityPub clients and servers, IndieWeb sites, and syndicate/​backfeed to oth­er social net­works either with or akin to the Bridgy ser­vice I men­tioned above.

    So far I haven’t seen any­thing that ticks all these box­es, and I’m get­ting itchy to write my own. Perl is my favorite pro­gram­ming lan­guage, so I’m look­ing at the Yancy CMS as a base. But I know that it would still be a hell of a project, and one of the rea­sons I chose WordPress for blog­ging was that it was well-​established and ‑sup­port­ed but still eas­i­ly exten­si­ble so that I could con­cen­trate on writ­ing instead of end­less­ly tweak­ing the engine. Unfortunately, I’m start­ing to fall into that trap anyway.

  • How much is that BLÅHAJ in the (terminal) window?

    How much is that BLÅHAJ in the (terminal) window?

    IKEA’s toy BLÅHAJ shark has become a beloved Internet icon over the past sev­er­al years. I thought it might be cute to write a lit­tle Perl to get info about it and even dis­play a cud­dly pic­ture right in the ter­mi­nal where I’m run­ning the code. Maybe this will give you some ideas for your own quick web clients. Of course, you could accom­plish all of these things using a pipeline of indi­vid­ual command-​line util­i­ties like curl, jq, and GNU core­uti­lsbase64. These exam­ples focus on Perl as the glue, though.

    Warning: dodgy API ahead

    I haven’t found a publicly-​documented and ‑sup­port­ed offi­cial API for query­ing IKEA prod­uct infor­ma­tion but oth­ers have decon­struct­ed the company’s web site AJAX requests so we can use that instead. The alter­na­tive would be to scrape the IKEA web site direct­ly which, although pos­si­ble, would be more tedious and prone to fail­ure should their design change. An unof­fi­cial API is also unre­li­able but the sim­pler client code is eas­i­er to change should any errors surface.

    Enter the Mojolicious

    My orig­i­nal goal was to do this in a sin­gle line issued to the perl com­mand, and luck­i­ly the Mojolicious framework’s ojo mod­ule is tailor-​made for such things. By adding a -Mojo switch to the perl com­mand, you get over a dozen quick single-​character func­tions for spin­ning up a quick web appli­ca­tion or, in our case, mak­ing and inter­pret­ing web requests with­out a lot of cer­e­mo­ny. Here’s the start of my one-​line request to the IKEA API for infor­ma­tion on their BLÅHAJ prod­uct, using ojo’s g func­tion to per­form an HTTP GET and dis­play­ing the JSON from the response body to the terminal.

    $ perl -Mojo -E 'say g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->body'

    This cur­rent­ly returns over 2,400 lines of data, so after read­ing it over I’ll con­vert the response body JSON to a Perl data struc­ture and dump only the main prod­uct infor­ma­tion using ojo’s r func­tion:

    $ perl -Mojo -E 'say r g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}'
    {
      "availability" => [],
      "breathTaking" => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),
      "colors" => [
        {
          "hex" => "0058a3",
          "id" => 10007,
          "name" => "blue"
        },
        {
          "hex" => "ffffff",
          "id" => 10156,
          "name" => "white"
        }
      ],
      "contextualImageUrl" => "https://www.ikea.com/us/en/images/products/blahaj-soft-toy-shark__0877371_pe633608_s5.jpg",
      "currencyCode" => "USD",
      "discount" => "",
      "features" => [],
      "gprDescription" => {
        "numberOfVariants" => 0,
        "variants" => []
      },
      "id" => 90373590,
      "itemMeasureReferenceText" => "39 \x{bc} \"",
      "itemNo" => 90373590,
      "itemNoGlobal" => 30373588,
      "itemType" => "ART",
      "lastChance" => $VAR1->{"breathTaking"},
      "mainImageAlt" => "BL\x{c5}HAJ Soft toy, shark, 39 \x{bc} \"",
      "mainImageUrl" => "https://www.ikea.com/us/en/images/products/blahaj-soft-toy-shark__0710175_pe727378_s5.jpg",
      "name" => "BL\x{c5}HAJ",
      "onlineSellable" => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),
      "pipUrl" => "https://www.ikea.com/us/en/p/blahaj-soft-toy-shark-90373590/",
      "price" => {
        "decimals" => 99,
        "isRegularCurrency" => $VAR1->{"breathTaking"},
        "prefix" => "\$",
        "separator" => ".",
        "suffix" => "",
        "wholeNumber" => 19
      },
      "priceNumeral" => "19.99",
      "quickFacts" => [],
      "tag" => "NONE",
      "typeName" => "Soft toy"
    }

    If I just want the price I can do:

    $ perl -Mojo -E 'say g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}->@{qw(currencyCode priceNumeral)}'
    USD19.99

    That ->@{qw(currencyCode priceNumeral)} towards the end uses the post­fix ref­er­ence slic­ing syn­tax intro­duced exper­i­men­tal­ly in Perl v5.20 and made offi­cial in v5.24. If you’re using an old­er perl, you’d say:

    $ perl -Mojo -E 'say @{g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}}{qw(currencyCode priceNumeral)}'
    USD19.99

    I pre­fer the for­mer, though, because it’s eas­i­er to read left-to-right.

    But I’m not in the United States! Where’s my native currency?

    You can either replace the us/en” in the URL above or use the core I18N::LangTags::Detect mod­ule added in Perl v5.8.5 if you’re real­ly deter­mined to be portable across dif­fer­ent users’ locales. This is real­ly stretch­ing the def­i­n­i­tion of one-​liner,” though.

    $ LANG=de_DE.UTF-8 perl -Mojo -MI18N::LangTags::Detect -E 'my @lang = (split /-/, I18N::LangTags::Detect::detect)[1,0]; say g("https://sik.search.blue.cdtapps.com/" . join("/", @lang == 2 ? @lang : ("us", "en")) . "/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}->@{qw(currencyCode priceNumeral)}'
    EUR27.99

    Window dressing

    It’s hard to envi­sion cud­dling a num­ber, but luck­i­ly the prod­uct infor­ma­tion returned above links to a JPEG file in the mainImageUrl key. My favorite ter­mi­nal app iTerm2 can dis­play images inline from either a file or Base64 encod­ed data, so adding an extra HTTP request and encod­ing from the core MIME::Base64 mod­ule yields:

    $ perl -Mojo -MMIME::Base64 -E 'say "\c[]1337;File=inline=1;width=100%:", encode_base64(g(g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}{mainImageUrl})->body), "\cG"'

    (You could just send the image URL to iTerm2’s bun­dled imgcat util­i­ty, but where’s the fun in that?)

    $ imgcat --url `perl -Mojo -E 'print g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}{mainImageUrl}'`

    But I don’t have iTerm2 or a Mac!

    I got you. At the expense of a num­ber of oth­er depen­den­cies, here’s a ver­sion that will work on any ter­mi­nal that sup­ports 256-​color mode with ANSI codes using Image::Term256Color from CPAN and a Unicode font with block char­ac­ters. I’ll also use Term::ReadKey to size the image for the width of your win­dow. (Again, this stretch­es the def­i­n­i­tion of one-​liner.”)

    $ perl -Mojo -MImage::Term256Color -MTerm::ReadKey -E 'say for Image::Term256Color::convert(g(g("https://sik.search.blue.cdtapps.com/us/en/search-result-page", form => {types => "PRODUCT", q => "BLÅHAJ"})->json->{searchResultPage}{products}{main}{items}[0]{product}{mainImageUrl})->body, {scale_x => (GetTerminalSize)[0], utf8 => 1})'

    I hate Mojolicious! Can’t you just use core modules?

    Fine. Here’s retriev­ing the prod­uct price using HTTP::Tiny and the pure-​Perl JSON pars­er JSON::PP, which were added to core in ver­sion 5.14.

    $ perl -MHTTP::Tiny -MJSON::PP -E 'say @{decode_json(HTTP::Tiny->new->get("https://sik.search.blue.cdtapps.com/us/en/search-result-page?types=PRODUCT&q=BLÅHAJ")->{content})->{searchResultPage}{products}{main}{items}[0]{product}}{qw(currencyCode priceNumeral)}'
    USD19.99

    Fetching and dis­play­ing a pic­ture of the hug­gable shark using MIME::Base64 or Image::Term256Color as above is left as an exer­cise to the reader.

  • 34 at 34 for v5.34: Modern Perl features for Perl’s birthday

    34 at 34 for v5.34: Modern Perl features for Perl’s birthday

    Friday, December 17, 2021, marked the thirty-​fourth birth­day of the Perl pro­gram­ming lan­guage, and coin­ci­den­tal­ly this year saw the release of ver­sion 5.34. There are plen­ty of Perl devel­op­ers out there who haven’t kept up with recent (and not-​so-​recent) improve­ments to the lan­guage and its ecosys­tem, so I thought I might list a batch. (You may have seen some of these before in May’s post Perl can do that now!”)

    The feature pragma

    Perl v5.10 was released in December 2007, and with it came feature, a way of enabling new syn­tax with­out break­ing back­ward com­pat­i­bil­i­ty. You can enable indi­vid­ual fea­tures by name (e.g., use feature qw(say fc); for the say and fc key­words), or by using a fea­ture bun­dle based on the Perl ver­sion that intro­duced them. For exam­ple, the following:

    use feature ':5.34';

    …gives you the equiv­a­lent of:

    use feature qw(bareword_filehandles bitwise current_sub evalbytes fc indirect multidimensional postderef_qq say state switch unicode_eval unicode_strings);

    Boy, that’s a mouth­ful. Feature bun­dles are good. The cor­re­spond­ing bun­dle also gets implic­it­ly loaded if you spec­i­fy a min­i­mum required Perl ver­sion, e.g., with use v5.32;. If you use v5.12; or high­er, strict mode is enabled for free. So just say:

    use v5.34;

    And last­ly, one-​liners can use the -E switch instead of -e to enable all fea­tures for that ver­sion of Perl, so you can say the fol­low­ing on the com­mand line:

    perl -E 'say "Hello world!"'

    Instead of:

    perl -e 'print "Hello world!\n"'

    Which is great when you’re try­ing to save some typing.

    The experimental pragma

    Sometimes new Perl fea­tures need to be dri­ven a cou­ple of releas­es around the block before their behav­ior set­tles. Those exper­i­ments are doc­u­ment­ed in the per­l­ex­per­i­ment page, and usu­al­ly, you need both a use feature (see above) and no warnings state­ment to safe­ly enable them. Or you can sim­ply pass a list to use experimental of the fea­tures you want, e.g.:

    use experimental qw(isa postderef signatures);

    Ever-​expanding warnings categories

    March 2000 saw the release of Perl 5.6, and with it, the expan­sion of the -w command-​line switch to a sys­tem of fine-​grained con­trols for warn­ing against dubi­ous con­structs” that can be turned on and off depend­ing on the lex­i­cal scope. What start­ed as 26 main and 20 sub­cat­e­gories has expand­ed into 31 main and 43 sub­cat­e­gories, includ­ing warn­ings for the afore­men­tioned exper­i­men­tal features.

    As the rel­e­vant Perl::Critic pol­i­cy says, Using warn­ings, and pay­ing atten­tion to what they say, is prob­a­bly the sin­gle most effec­tive way to improve the qual­i­ty of your code.” If you must vio­late warn­ings (per­haps because you’re reha­bil­i­tat­ing some lega­cy code), you can iso­late such vio­la­tions to a small scope and indi­vid­ual cat­e­gories. Check out the stric­tures mod­ule on CPAN if you’d like to go fur­ther and make a safe sub­set of these cat­e­gories fatal dur­ing development.

    Document other recently-​introduced syntax with Syntax::Construct

    Not every new bit of Perl syn­tax is enabled with a feature guard. For the rest, there’s E. Choroba’s Syntax::Construct mod­ule on CPAN. Rather than hav­ing to remem­ber which ver­sion of Perl intro­duced what, Syntax::Construct lets you declare only what you use and pro­vides a help­ful error mes­sage if some­one tries to run your code on an old­er unsup­port­ed ver­sion. Between it and the feature prag­ma, you can pre­vent many head-​scratching moments and give your users a chance to either upgrade or workaround.

    Make built-​in functions throw exceptions with autodie

    Many of Perl’s built-​in func­tions only return false on fail­ure, requir­ing the devel­op­er to check every time whether a file can be opened or a system com­mand exe­cut­ed. The lex­i­cal autodie prag­ma replaces them with ver­sions that raise an excep­tion with an object that can be inter­ro­gat­ed for fur­ther details. No mat­ter how many func­tions or meth­ods deep a prob­lem occurs, you can choose to catch it and respond appro­pri­ate­ly. This leads us to…

    try/​catch exception handling and Feature::Compat::Try

    This year’s Perl v5.34 release intro­duced exper­i­men­tal try/​catch syn­tax for excep­tion han­dling that should look more famil­iar to users of oth­er lan­guages while han­dling the issues sur­round­ing using block eval and test­ing of the spe­cial $@ vari­able. If you need to remain com­pat­i­ble with old­er ver­sions of Perl (back to v5.14), just use the Feature::Compat::Try mod­ule from CPAN to auto­mat­i­cal­ly select either v5.34’s native try/​catch or a sub­set of the func­tion­al­i­ty pro­vid­ed by Syntax::Keyword::Try.

    Pluggable keywords

    The above­men­tioned Syntax::Keyword::Try was made pos­si­ble by the intro­duc­tion of a plug­gable key­word mech­a­nism in 2010’s Perl v5.12. So was the Future::AsyncAwait asyn­chro­nous pro­gram­ming library and the Object::Pad test­bed for new object-​oriented Perl syn­tax. If you’re handy with C and Perl’s XS glue lan­guage, check out Paul LeoNerd” Evans’ XS::Parse::Keyword mod­ule to get a leg up on devel­op­ing your own syn­tax module.

    Define packages with versions and blocks

    Perl v5.12 also helped reduce clut­ter by enabling a package name­space dec­la­ra­tion to also include a ver­sion num­ber, instead of requir­ing a sep­a­rate our $VERSION = ...; v5.14 fur­ther refined packages to be spec­i­fied in code blocks, so a name­space dec­la­ra­tion can be the same as a lex­i­cal scope. Putting the two togeth­er gives you:

    package Local::NewHotness v1.2.3 {
        ...
    }

    Instead of:

    {
        package Local::OldAndBusted;
        use version 0.77; our $VERSION = version->declare("v1.2.3");
        ...
    }

    I know which I’d rather do. (Though you may want to also use Syntax::Construct qw(package-version package-block); to help along with old­er instal­la­tions as described above.)

    The // defined-​or operator

    This is an easy win from Perl v5.10:

    defined $foo ? $foo : $bar  # replace this
    $foo // $bar                # with this

    And:

    $foo = $bar unless defined $foo  # replace this
    $foo //= $bar                    # with this

    Perfect for assign­ing defaults to variables.

    state variables only initialize once

    Speaking of vari­ables, ever want one to keep its old val­ue the next time a scope is entered, like in a sub? Declare it with state instead of my. Before Perl v5.10, you need­ed to use a clo­sure instead.

    Save some typing with say

    Perl v5.10’s bumper crop of enhance­ments also includ­ed the say func­tion, which han­dles the com­mon use case of printing a string or list of strings with a new­line. It’s less noise in your code and saves you four char­ac­ters. What’s not to love?

    Note unimplemented code with ...

    The ... ellip­sis state­ment (col­lo­qui­al­ly yada-​yada”) gives you an easy place­hold­er for yet-​to-​be-​implemented code. It pars­es OK but will throw an excep­tion if exe­cut­ed. Hopefully, your test cov­er­age (or at least sta­t­ic analy­sis) will catch it before your users do.

    Loop and enumerate arrays with each, keys, and values

    The each, keys, and values func­tions have always been able to oper­ate on hash­es. Perl v5.12 and above make them work on arrays, too. The lat­ter two are main­ly for con­sis­ten­cy, but you can use each to iter­ate over an array’s indices and val­ues at the same time:

    while (my ($index, $value) = each @array) {
        ...
    }

    This can be prob­lem­at­ic in non-​trivial loops, but I’ve found it help­ful in quick scripts and one-liners.

    delete local hash (and array) entries

    Ever need­ed to delete an entry from a hash (e.g, an envi­ron­ment vari­able from %ENV or a sig­nal han­dler from %SIG) just inside a block? Perl v5.12 lets you do that with delete local.

    Paired hash slices

    Jumping for­ward to 2014’s Perl v5.20, the new %foo{'bar', 'baz'} syn­tax enables you to slice a sub­set of a hash with its keys and val­ues intact. Very help­ful for cherry-​picking or aggre­gat­ing many hash­es into one. For example:

    my %args = (
        verbose => 1,
        name    => 'Mark',
        extra   => 'pizza',
    );
    # don't frob the pizza
    $my_object->frob( %args{ qw(verbose name) };

    Paired array slices

    Not to be left out, you can also slice arrays in the same way, in this case return­ing indices and values:

    my @letters = 'a' .. 'z';
    my @subset_kv = %letters[16, 5, 18, 12];
    # @subset_kv is now (16, 'p', 5, 'e', 18, 'r', 12, 'l')

    More readable dereferencing

    Perl v5.20 intro­duced and v5.24 de-​experimentalized a more read­able post­fix deref­er­enc­ing syn­tax for nav­i­gat­ing nest­ed data struc­tures. Instead of using {braces} or smoosh­ing sig­ils to the left of iden­ti­fiers, you can use a post­fixed sigil-and-star:

    push @$array_ref,    1, 2, 3;  # noisy
    push @{$array_ref},  1, 2, 3;  # a little easier
    push $array_ref->@*, 1, 2, 3;  # read from left to right

    So much of web devel­op­ment is sling­ing around and pick­ing apart com­pli­cat­ed data struc­tures via JSON, so I wel­come any­thing like this to reduce the cog­ni­tive load.

    when as a statement modifier

    Starting in Perl v5.12, you can use the exper­i­men­tal switch fea­tures when key­word as a post­fix mod­i­fi­er. For example:

    for ($foo) {
        $a =  1 when /^abc/;
        $a = 42 when /^dna/;
        ...
    }

    But I don’t rec­om­mend when, given, or givens smart­match oper­a­tions as they were ret­conned as exper­i­ments in 2013’s Perl v5.18 and have remained so due to their tricky behav­ior. I wrote about some alter­na­tives using sta­ble syn­tax back in February.

    Simple class inheritance with use parent

    Sometimes in old­er object-​oriented Perl code, you’ll see use base as a prag­ma to estab­lish inher­i­tance from anoth­er class. Older still is the direct manip­u­la­tion of the package’s spe­cial @ISA array. In most cas­es, both should be avoid­ed in favor of use parent, which was added to core in Perl v5.10.1.

    Mind you, if you’re fol­low­ing the Perl object-​oriented tutorial’s advice and have select­ed an OO sys­tem from CPAN, use its sub­class­ing mech­a­nism if it has one. Moose, Moo, and Class::Accessor’s antlers” mode all pro­vide an extends func­tion; Object::Pad pro­vides an :isa attribute on its class key­word.

    Test for class membership with the isa operator

    As an alter­na­tive to the isa() method pro­vid­ed to all Perl objects, Perl v5.32 intro­duced the exper­i­men­tal isa infix oper­a­tor:

    $my_object->isa('Local::MyClass')
    # or
    $my_object isa Local::MyClass

    The lat­ter can take either a bare­word class name or string expres­sion, but more impor­tant­ly, it’s safer as it also returns false if the left argu­ment is unde­fined or isn’t a blessed object ref­er­ence. The old­er isa() method will throw an excep­tion in the for­mer case and might return true if called as a class method when $my_object is actu­al­ly a string of a class name that’s the same as or inher­its from isa()s argu­ment.

    Lexical subroutines

    Introduced in Perl v5.18 and de-​experimentalized in 2017’s Perl v5.26, you can now pre­cede sub dec­la­ra­tions with my, state, or our. One use of the first two is tru­ly pri­vate func­tions and meth­ods, as described in this 2018 Dave Jacoby blog and as part of Neil Bowers’ 2014 sur­vey of pri­vate func­tion techniques.

    Subroutine signatures

    I’ve writ­ten and pre­sent­ed exten­sive­ly about sig­na­tures and alter­na­tives over the past year, so I won’t repeat that here. I’ll just add that the Perl 5 Porters devel­op­ment mail­ing list has been mak­ing a con­cert­ed effort over the past month to hash out the remain­ing issues towards ren­der­ing this fea­ture non-​experimental. The pop­u­lar Mojolicious real-​time web frame­work also pro­vides a short­cut for enabling sig­na­tures and uses them exten­sive­ly in examples.

    Indented here-​documents with <<~

    Perl has had shell-​style here-​document” syn­tax for embed­ding multi-​line strings of quot­ed text for a long time. Starting with Perl v5.26, you can pre­cede the delim­it­ing string with a ~ char­ac­ter and Perl will both allow the end­ing delim­iter to be indent­ed as well as strip inden­ta­tion from the embed­ded text. This allows for much more read­able embed­ded code such as runs of HTML and SQL. For example:

    if ($do_query) {
        my $rows_deleted = $dbh->do(<<~'END_SQL', undef, 42);
          DELETE FROM table
          WHERE status = ?
          END_SQL
        say "$rows_deleted rows were deleted."; 
    }

    More readable chained comparisons

    When I learned math in school, my teach­ers and text­books would often describe mul­ti­ple com­par­isons and inequal­i­ties as a sin­gle expres­sion. Unfortunately, when it came time to learn pro­gram­ming every com­put­er lan­guage I saw required them to be bro­ken up with a series of and (or &&) oper­a­tors. With Perl v5.32, this is no more:

    if ( $x < $y && $y <= $z ) { ... }  # old way
    if ( $x < $y <= $z )       { ... }  # new way

    It’s more con­cise, less noisy, and more like what reg­u­lar math looks like.

    Self-​documenting named regular expression captures

    Perl’s expres­sive reg­u­lar expres­sion match­ing and text-​processing prowess are leg­endary, although overuse and poor use of read­abil­i­ty enhance­ments often turn peo­ple away from them (and Perl in gen­er­al). We often use reg­ex­ps for extract­ing data from a matched pat­tern. For example:

    if ( /Time: (..):(..):(..)/ ) {  # parse out values
        say "$1 hours, $2 minutes, $3 seconds";
    }

    Named cap­ture groups, intro­duced in Perl v5.10, make both the pat­tern more obvi­ous and retrieval of its data less cryptic:

    if ( /Time: (?<hours>..):(?<minutes>..):(?<seconds>..)/ ) {
        say "$+{hours} hours, $+{minutes} minutes, $+{seconds} seconds";
    }

    More readable regexp character classes

    The /x reg­u­lar expres­sion mod­i­fi­er already enables bet­ter read­abil­i­ty by telling the pars­er to ignore most white­space, allow­ing you to break up com­pli­cat­ed pat­terns into spaced-​out groups and mul­ti­ple lines with code com­ments. With Perl v5.26 you can spec­i­fy /xx to also ignore spaces and tabs inside [brack­et­ed] char­ac­ter class­es, turn­ing this:

    /[d-eg-i3-7]/
    /[!@"#$%^&*()=?<>']/

    …into this:

    / [d-e g-i 3-7]/xx
    /[ ! @ " # $ % ^ & * () = ? <> ' ]/xx

    Set default regexp flags with the re pragma

    Beginning with Perl v5.14, writ­ing use re '/xms'; (or any com­bi­na­tion of reg­u­lar expres­sion mod­i­fi­er flags) will turn on those flags until the end of that lex­i­cal scope, sav­ing you the trou­ble of remem­ber­ing them every time.

    Non-​destructive substitution with s///r and tr///r

    The s/// sub­sti­tu­tion and tr/// translit­er­a­tion oper­a­tors typ­i­cal­ly change their input direct­ly, often in con­junc­tion with the =~ bind­ing oper­a­tor:

    s/foo/bar/;  # changes the first foo to bar in $_
    $baz =~ s/foo/bar/;  # the same but in $baz

    But what if you want to leave the orig­i­nal untouched, such as when pro­cess­ing an array of strings with a map? With Perl v5.14 and above, add the /r flag, which makes the sub­sti­tu­tion on a copy and returns the result:

    my @changed = map { s/foo/bar/r } @original;

    Unicode case-​folding with fc for better string comparisons

    Unicode and char­ac­ter encod­ing in gen­er­al are com­pli­cat­ed beasts. Perl has han­dled Unicode since v5.6 and has kept pace with fix­es and sup­port for updat­ed stan­dards in the inter­ven­ing decades. If you need to test if two strings are equal regard­less of case, use the fc func­tion intro­duced in Perl v5.16.

    Safer processing of file arguments with <<>>

    The <> null file­han­dle or dia­mond oper­a­tor” is often used in while loops to process input per line com­ing either from stan­dard input (e.g., piped from anoth­er pro­gram) or from a list of files on the com­mand line. Unfortunately, it uses a form of Perl’s open func­tion that inter­prets spe­cial char­ac­ters such as pipes (|) that would allow it to inse­cure­ly run exter­nal com­mands. Using the <<>> dou­ble dia­mond” oper­a­tor intro­duced in Perl v5.22 forces open to treat all command-​line argu­ments as file names only. For old­er Perls, the per­lop doc­u­men­ta­tion rec­om­mends the ARGV::readonly CPAN mod­ule.

    Safer loading of Perl libraries and modules from @INC

    Perl v5.26 removed the abil­i­ty for all pro­grams to load mod­ules by default from the cur­rent direc­to­ry, clos­ing a secu­ri­ty vul­ner­a­bil­i­ty orig­i­nal­ly iden­ti­fied and fixed as CVE-20161238 in pre­vi­ous ver­sions’ includ­ed scripts. If your code relied on this unsafe behav­ior, the v5.26 release notes include steps on how to adapt.

    HTTP::Tiny simple HTTP/1.1 client included

    To boot­strap access to CPAN on the web in the pos­si­ble absence of exter­nal tools like curl or wget, Perl v5.14 began includ­ing the HTTP::Tiny mod­ule. You can also use it in your pro­grams if you need a sim­ple web client with no dependencies.

    Test2: The next generation of Perl testing frameworks

    Forked and refac­tored from the ven­er­a­ble Test::Builder (the basis for the Test::More library that many are famil­iar with), Test2 was includ­ed in the core mod­ule library begin­ning with Perl v5.26. I’ve exper­i­ment­ed recent­ly with using the Test2::Suite CPAN library instead of Test::More and it looks pret­ty good. I’m also intrigued by Test2::Harness’ sup­port for thread­ing, fork­ing, and pre­load­ing mod­ules to reduce test run times.

    Task::Kensho: Where to start for recommended Perl modules

    This last item may not be includ­ed when you install Perl, but it’s where I turn for a col­lec­tion of well-​regarded CPAN mod­ules for accom­plish­ing a wide vari­ety of com­mon tasks span­ning from asyn­chro­nous pro­gram­ming to XML. Use it as a start­ing point or inter­ac­tive­ly select the mix of libraries appro­pri­ate to your project.


    And there you have it: a selec­tion of 34 fea­tures, enhance­ments, and improve­ments for the first 34 years of Perl. What’s your favorite? Did I miss any­thing? Let me know in the comments.

  • Video: A Year of Being Wrong on the Internet”

    I’m busy this week host­ing my par­ents’ first vis­it to Houston, but I didn’t want to let this Tuesday go by with­out link­ing to my talk from last week’s Ephemeral Miniconf. Thanks so much to Thibault Duponchelle for orga­niz­ing such a ter­rif­ic event, to all the oth­er speak­ers for com­ing togeth­er to present, and to every­one who attend­ed for wel­com­ing me.