Friday, December 17, 2021, marked the thirty-fourth birthday of the Perl programming language, and coincidentally this year saw the release of version 5.34. There are plenty of Perl developers out there who haven’t kept up with recent (and not-so-recent) improvements to the language and its ecosystem, so I thought I might list a batch. (You may have seen some of these before in May’s post “Perl can do that now!”)
Perl v5.10 was released in December 2007, and with it came feature, a way of enabling new syntax without breaking backward compatibility. You can enable individual features by name (e.g., use feature qw(say fc); for the say and fc keywords), or by using a feature bundle based on the Perl version that introduced them. For example, the following:
use feature ':5.34';
…gives you the equivalent of:
use feature qw(bareword_filehandles bitwise current_sub evalbytes fc indirect multidimensional postderef_qq say state switch unicode_eval unicode_strings);
Boy, that’s a mouthful. Feature bundles are good. The corresponding bundle also gets implicitly loaded if you specify a minimum required Perl version, e.g., with use v5.32;. If you use v5.12; or higher, strict mode is enabled for free. So just say:
use v5.34;
And lastly, one-liners can use the -E switch instead of -e to enable all features for that version of Perl, so you can say the following on the command line:
perl -E 'say "Hello world!"'
Instead of:
perl -e 'print "Hello world!\n"'
Which is great when you’re trying to save some typing.
Sometimes new Perl features need to be driven a couple of releases around the block before their behavior settles. Those experiments are documented in the perlexperiment page, and usually, you need both a use feature (see above) and no warnings statement to safely enable them. Or you can simply pass a list to use experimental of the features you want, e.g.:
As the relevant Perl::Critic policy says, “Using warnings, and paying attention to what they say, is probably the single most effective way to improve the quality of your code.” If you must violate warnings (perhaps because you’re rehabilitating some legacy code), you can isolate such violations to a small scope and individual categories. Check out the strictures module on CPAN if you’d like to go further and make a safe subset of these categories fatal during development.
Not every new bit of Perl syntax is enabled with a feature guard. For the rest, there’s E. Choroba’s Syntax::Construct module on CPAN. Rather than having to remember which version of Perl introduced what, Syntax::Construct lets you declare only what you use and provides a helpful error message if someone tries to run your code on an older unsupported version. Between it and the feature pragma, you can prevent many head-scratching moments and give your users a chance to either upgrade or workaround.
Make built-in functions throw exceptions with autodie
Many of Perl’s built-in functions only return false on failure, requiring the developer to check every time whether a file can be opened or a system command executed. The lexical autodie pragma replaces them with versions that raise an exception with an object that can be interrogated for further details. No matter how many functions or methods deep a problem occurs, you can choose to catch it and respond appropriately. This leads us to…
{
package Local::OldAndBusted;
use version 0.77; our $VERSION = version->declare("v1.2.3");
...
}
I know which I’d rather do. (Though you may want to also use Syntax::Construct qw(package-version package-block); to help along with older installations as described above.)
Speaking of variables, ever want one to keep its old value the next time a scope is entered, like in a sub? Declare it with state instead of my. Before Perl v5.10, you needed to use a closure instead.
Perl v5.10’s bumper crop of enhancements also included the say function, which handles the common use case of printing a string or list of strings with a newline. It’s less noise in your code and saves you four characters. What’s not to love?
The ... ellipsis statement (colloquially “yada-yada”) gives you an easy placeholder for yet-to-be-implemented code. It parses OK but will throw an exception if executed. Hopefully, your test coverage (or at least static analysis) will catch it before your users do.
The each, keys, and values functions have always been able to operate on hashes. Perl v5.12 and above make them work on arrays, too. The latter two are mainly for consistency, but you can use each to iterate over an array’s indices and values at the same time:
push @$array_ref, 1, 2, 3; # noisy
push @{$array_ref}, 1, 2, 3; # a little easier
push $array_ref->@*, 1, 2, 3; # read from left to right
So much of web development is slinging around and picking apart complicated data structures via JSON, so I welcome anything like this to reduce the cognitive load.
Sometimes in older object-oriented Perl code, you’ll see use base as a pragma to establish inheritance from another class. Older still is the direct manipulation of the package’s special @ISA array. In most cases, both should be avoided in favor of use parent, which was added to core in Perl v5.10.1.
$my_object->isa('Local::MyClass')
# or
$my_object isa Local::MyClass
The latter can take either a bareword class name or string expression, but more importantly, it’s safer as it also returns false if the left argument is undefined or isn’t a blessed object reference. The older isa() method will throw an exception in the former case and might return true if called as a class method when $my_object is actually a string of a class name that’s the same as or inherits from isa()’s argument.
I’ve written and presentedextensively about signatures and alternatives over the past year, so I won’t repeat that here. I’ll just add that the Perl 5 Porters development mailing list has been making a concerted effort over the past month to hash out the remaining issues towards rendering this feature non-experimental. The popular Mojolicious real-time web framework also provides a shortcut for enabling signatures and uses them extensively in examples.
Indented here-documents with <<~
Perl has had shell-style “here-document” syntax for embedding multi-line strings of quoted text for a long time. Starting with Perl v5.26, you can precede the delimiting string with a ~ character and Perl will both allow the ending delimiter to be indented as well as strip indentation from the embedded text. This allows for much more readable embedded code such as runs of HTML and SQL. For example:
if ($do_query) {
my $rows_deleted = $dbh->do(<<~'END_SQL', undef, 42);
DELETE FROM table
WHERE status = ?
END_SQL
say "$rows_deleted rows were deleted.";
}
More readable chained comparisons
When I learned math in school, my teachers and textbooks would often describe multiple comparisons and inequalities as a single expression. Unfortunately, when it came time to learn programming every computer language I saw required them to be broken up with a series of and (or &&) operators. With Perl v5.32, this is no more:
if ( $x < $y && $y <= $z ) { ... } # old way
if ( $x < $y <= $z ) { ... } # new way
It’s more concise, less noisy, and more like what regular math looks like.
Self-documenting named regular expression captures
Perl’s expressive regular expression matching and text-processing prowess are legendary, although overuse and poor use of readability enhancements often turn people away from them (and Perl in general). We often use regexps for extracting data from a matched pattern. For example:
if ( /Time: (..):(..):(..)/ ) { # parse out values
say "$1 hours, $2 minutes, $3 seconds";
}
if ( /Time: (?<hours>..):(?<minutes>..):(?<seconds>..)/ ) {
say "$+{hours} hours, $+{minutes} minutes, $+{seconds} seconds";
}
More readable regexp character classes
The /x regular expression modifier already enables better readability by telling the parser to ignore most whitespace, allowing you to break up complicated patterns into spaced-out groups and multiple lines with code comments. With Perl v5.26 you can specify /xx to also ignore spaces and tabs inside [bracketed] character classes, turning this:
s/foo/bar/; # changes the first foo to bar in $_
$baz =~ s/foo/bar/; # the same but in $baz
But what if you want to leave the original untouched, such as when processing an array of strings with a map? With Perl v5.14 and above, add the /r flag, which makes the substitution on a copy and returns the result:
Unicode and character encoding in general are complicated beasts. Perl has handled Unicode since v5.6 and has kept pace with fixes and support for updated standards in the intervening decades. If you need to test if two strings are equal regardless of case, use the fc function introduced in Perl v5.16.
Safer processing of file arguments with <<>>
The <> null filehandle or “diamond operator” is often used in while loops to process input per line coming either from standard input (e.g., piped from another program) or from a list of files on the command line. Unfortunately, it uses a form of Perl’s open function that interprets special characters such as pipes (|) that would allow it to insecurely run external commands. Using the <<>> “double diamond” operator introduced in Perl v5.22 forces open to treat all command-line arguments as file names only. For older Perls, the perlop documentation recommends the ARGV::readonly CPAN module.
Safer loading of Perl libraries and modules from @INC
To bootstrap access to CPAN on the web in the possible absence of external tools like curl or wget, Perl v5.14 began including the HTTP::Tiny module. You can also use it in your programs if you need a simple web client with no dependencies.
Test2: The next generation of Perl testing frameworks
Forked and refactored from the venerable Test::Builder (the basis for the Test::More library that many are familiar with), Test2 was included in the core module library beginning with Perl v5.26. I’ve experimented recently with using the Test2::SuiteCPAN library instead of Test::More and it looks pretty good. I’m also intrigued by Test2::Harness’ support for threading, forking, and preloading modules to reduce test run times.
Task::Kensho: Where to start for recommended Perl modules
This last item may not be included when you install Perl, but it’s where I turn for a collection of well-regarded CPAN modules for accomplishing a wide variety of common tasks spanning from asynchronous programming to XML. Use it as a starting point or interactively select the mix of libraries appropriate to your project.
And there you have it: a selection of 34 features, enhancements, and improvements for the first 34 years of Perl. What’s your favorite? Did I miss anything? Let me know in the comments.
A recent Lobsters post lauding the virtues of AWK reminded me that although the language is powerful and lightning-fast, I usually find myself exceeding its capabilities and reaching for Perl instead. One such application is analyzing voluminous log files such as the ones generated by this blog. Yes, WordPress has stats, but I’ve never let reinvention of the wheel get in the way of a good programming exercise.
#!/usr/bin/env perl
use strict;
use warnings;
use Syntax::Construct 'operator-double-diamond';
use Regexp::Log::Common;
use DateTime::Format::HTTP;
use List::Util 1.33 'any';
use Number::Format 'format_number';
my $parser = Regexp::Log::Common->new(
format => ':extended',
capture => [qw<req ts status>],
);
my @fields = $parser->capture;
my $compiled_re = $parser->regexp;
my @skip_uri_patterns = qw<
^/+robots.txt
[-\w]*sitemap[-\w]*.xml
^/+wp-
/feed/?$
^/+?rest_route=
>;
my ( %count, %week_of );
while ( <<>> ) {
my %log;
@log{@fields} = /$compiled_re/;
# only interested in successful or cached requests
next unless $log{status} =~ /^2/ or $log{status} == 304;
my ( $method, $uri, $protocol ) = split ' ', $log{req};
next unless $method eq 'GET';
next if any { $uri =~ $_ } @skip_uri_patterns;
my $dt = DateTime::Format::HTTP->parse_datetime( $log{ts} );
my $key = sprintf '%u-%02u', $dt->week;
# get first date of each week
$week_of{$key} ||= $dt->date;
$count{$key}++;
}
printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
for sort keys %count;
Here’s some sample output:
Week of 2021-07-31: 2,672
Week of 2021-08-02: 16,222
Week of 2021-08-09: 12,609
Week of 2021-08-16: 17,714
Week of 2021-08-23: 14,462
Week of 2021-08-30: 11,758
Week of 2021-09-06: 14,811
Week of 2021-09-13: 407
I first started prototyping this on the command line as if it were an awk one-liner by using the perl -n and -a flags. The former wraps code in a while loop over the <> “diamond operator”, processing each line from standard input or files passed as arguments. The latter splits the fields of the line into an array named @F. It looked something like this while I was listing URIs (locations on the website):
But once I realized I’d need to filter out a bunch of URI patterns and do some aggregation by date, I turned it into a script and turned to CPAN.
There I found Regexp::Log::Common and DateTime::Format::HTTP, which let me pull apart the Apache log format and its timestamp strings without having to write even more complicated regular expressions myself. (As noted above, this was already a wheel-reinvention exercise; no need to compound that further.)
Regexp::Log::Common builds a compiled regular expression based on the log format and fields you’re interested in, so that’s the constructor on lines 11 through 14. The expression then returns those fields as a list, which I’m assigning to a hash slice with those field names as keys in line 29. I then skip over requests that aren’t successful or browser cache hits, skip over requests that don’t GET web pages or other assets (e.g., POSTs to forms or updating other resources), and skip over the URI patterns mentioned earlier.
(Those patterns are worth a mention: they include the robots.txt and sitemap XML files used by search engine indexers, WordPress administration pages, files used by RSS newsreaders subscribed to my blog, and routes used by the Jetpack WordPress add-on. If you’re adapting this for your site you might need to customize this list based on what software you use to run it.)
Lines 38 and 39 parse the timestamp from the log into a DateTime object using DateTime::Format::HTTP and then build the key used to store the per-week hit count. The last lines of the loop then grab the first date of each new week (assuming the log is in chronological order) and increment the count. Once finished, lines 46 and 47 provide a report sorted by week, displaying it as a friendly “Week of date” and the hit counts aligned to the right with sprintf. Number::Format’s format_number function displays the totals with thousands separators.
Update: After this was initially published. astute reader Chris McGowan noted that I had a bug where $log{status} was assigned the value 304 with the = operator rather than compared with ==. He also suggested I use the double-diamond <<>> operator introduced in Perl v5.22.0 to avoid maliciously-named files. Thanks, Chris!
Room for improvement
DateTime is a very powerful module but this comes at a price of speed and memory. Something simpler like Date::WeekNumber should yield performance improvements, especially as my logs grow (here’s hoping). It requires a bit more manual massaging of the log dates to convert them into something the module can use, though:
#!/usr/bin/env perl
use strict;
use warnings;
use Syntax::Construct qw<
operator-double-diamond
regex-named-capture-group
>;
use Regexp::Log::Common;
use Date::WeekNumber 'iso_week_number';
use List::Util 1.33 'any';
use Number::Format 'format_number';
my $parser = Regexp::Log::Common->new(
format => ':extended',
capture => [qw<req ts status>],
);
my @fields = $parser->capture;
my $compiled_re = $parser->regexp;
my @skip_uri_patterns = qw<
^/+robots.txt
[-\w]*sitemap[-\w]*.xml
^/+wp-
/feed/?$
^/+?rest_route=
>;
my %month = (
Jan => '01',
Feb => '02',
Mar => '03',
Apr => '04',
May => '05',
Jun => '06',
Jul => '07',
Aug => '08',
Sep => '09',
Oct => '10',
Nov => '11',
Dec => '12',
);
my ( %count, %week_of );
while ( <<>> ) {
my %log;
@log{@fields} = /$compiled_re/;
# only interested in successful or cached requests
next unless $log{status} =~ /^2/ or $log{status} == 304;
my ( $method, $uri, $protocol ) = split ' ', $log{req};
next unless $method eq 'GET';
next if any { $uri =~ $_ } @skip_uri_patterns;
# convert log timestamp to YYYY-MM-DD
# for Date::WeekNumber
$log{ts} =~ m!^
(?<day>\d\d) /
(?<month>...) /
(?<year>\d{4}) : !x;
my $date = "$+{year}-$month{ $+{month} }-$+{day}";
my $week = iso_week_number($date);
$week_of{$week} ||= $date;
$count{$week}++;
}
printf "Week of %s: % 10s\n", $week_of{$_}, format_number( $count{$_} )
for sort keys %count;
It looks almost the same as the first version, with the addition of a hash to convert month names to numbers and the actual conversion (using named regular expression capture groups for readability, using Syntax::Construct to check for that feature). On my server, this results in a ten- to eleven-second savings when processing two months of compressed logs.
What’s next? Pretty graphs? Drilling down to specific blog posts? Database storage for further queries and analysis? Perl and CPAN make it possible to go far beyond what you can do with AWK. What would you add or change? Let me know in the comments.
Last week I explored using the Inline::Perl5 module to port a short Perl script to Raku while still keeping its Perl dependencies. Over at the Dev.to community, Dave Cross pointed out that I could get a bit more bang for my buck by letting his Feed::Find do the heavy lifting instead of WWW::Mechanize’s more general-purpose parsing.
A little more MetaCPAN investigation yielded XML::Feed, also maintained by Dave, and it had the added benefit of obviating my need for XML::RSS by not only discovering feeds but also retrieving and parsing them. It also handles the Atom syndication format as well as RSS(hi daxim!). Putting it all together produces the following much shorter and clearer Perl:
#!/usr/bin/env perl
use v5.12; # for strict and say
use warnings;
use XML::Feed;
use URI;
my $url = shift @ARGV || 'https://phoenixtrap.com';
my @feeds = XML::Feed->find_feeds($url);
my $feed = XML::Feed->parse( URI->new( $feeds[0] ) )
or die "Couldn't find a feed at $url\n";
binmode STDOUT, ':encoding(UTF-8)';
say $_->title, "\t", $_->link for $feed->entries;
And here’s the Raku version:
#!/usr/bin/env raku
use XML::Feed:from<Perl5>;
use URI:from<Perl5>;
sub MAIN($url = 'https://phoenixtrap.com') {
my @feeds = XML::Feed.find_feeds($url);
my $feed = XML::Feed.parse( URI.new( @feeds.first ) )
or exit note "Couldn't find a feed at $url";
put .title, "\t", .link for $feed.entries;
}
It’s even closer to Perl now, though it’s using the first routine rather than subscripting the @feeds array and leaving off the the $_ variable name when calling methods on it—less punctuation noise often aids readability. I also took a suggested exit idiom from Raku developer Elizabeth Mattijsen on Reddit to simplify the contortions I was going through to exit with a simple message and error code.
There are a couple of lessons here:
A little more effort in module shopping pays dividends in simpler code.
Get feedback from far and wide to help improve your code. If it’s for work and you can’t release as open-source, make sure your code review process covers readability and maintainability.
The Perl and Raku programming languages have a complicated history together. The latter was envisioned in the year 2000 as Perl 6, a complete redesign and rewrite of Perl to solve its problems of difficult maintenance and the burden of then-13 years of backward compatibility. Unfortunately, the development effort towards a first major release dragged on for ten years, and some developers began to believe the delay contributed to the decline of Perl’s market- and mindshare among programming languages.
In the intervening years work continued on Perl 5, and eventually, Perl 6 was positioned as “a sister language, part of the Perl family, not intended as a replacement for Perl.” Two years ago it was renamed Raku to better indicate it as a different project.
Although the two languages aren’t source-compatible, the Inline::Perl5 module does enable Raku developers to run Perl code and use Perl modules within Raku, You can even subclass Perl classes in Raku and call Raku methods from Perl code. I hadn’t realized until recently that the Perl support was so strong in Raku despite them being so different, and so I thought I’d take the opportunity to write some sample code in both languages to better understand the Raku way of doing things.
Rather than a simple “Hello World” program, I decided to write a simple syndicated news reader. The Raku modules directory didn’t appear to have anything comparable to Perl’s WWW::Mechanize and XML::RSS modules, so this seemed like a great way to test Perl-Raku interoperability.
Perl Feed Finder
First, the Perl script. I wanted it smart enough to either directly fetch a news feed or find it on a site’s HTML page.
#!/usr/bin/env perl
use v5.24; # for strict, say, and postfix dereferencing
use warnings;
use WWW::Mechanize;
use XML::RSS;
use List::Util 1.33 qw(first none);
my @rss_types = qw<
application/rss+xml
application/rdf+xml
application/xml
text/xml
>;
my $mech = WWW::Mechanize->new;
my $rss = XML::RSS->new;
my $url = shift @ARGV || 'https://phoenixtrap.com';
my $response = $mech->get($url);
# If we got an HTML page, find the linked RSS feed
if ( $mech->is_html
and my @alt_links = $mech->find_all_links( rel => 'alternate' ) )
{
for my $rss_type (@rss_types) {
$url = ( first { $_->attrs->{type} eq $rss_type } @alt_links )->url
and last;
}
$response = $mech->get($url);
}
die "$url does not have an RSS feed\n"
if none { $_ eq $response->content_type } @rss_types;
binmode STDOUT, ':encoding(UTF-8)'; # avoid wide character warnings
my @items = $rss->parse( $mech->content )->{items}->@*;
say join "\t", $_->@{qw<title link>} for @items;
The program then creates new WWW::Mechanize (called a mech for short) and XML::RSS objects for use later and gets a URL to browse from its command-line argument, defaulting to my blog if it has none. (My site, my rules, right?) It then retrieves that URL from the web. If mech believes that the URLcontains an HTML page and can find link tags with rel="alternate" attributes possibly identifying any news feeds, it then goes on to check the media types of those links against the earlier list of RSS types and retrieves the first one it finds.
Next comes the only error checking done by this script: checking if the retrieved feed’s media type actually matches the list defined earlier. This prevents the RSS parser from attempting to process plain web pages. This isn’t a large and complicated program, so the die function is called with a trailing newline character (\n) to suppress reporting the line on which the error occurred.
Finally, it’s time to output the headlines and links, but before that happens Perl has to be told that they may contain so-called “wide characters” found in the Unicode standard but not in the plain ASCII that it normally uses. This includes things like the typographical ‘curly quotes’ that I sometimes use in my titles. The last two lines of the script loop through the parsed items in the feed, extracting their titles and links and printing them out with a tab (\t) separator between them:
Output from feed_finder.pl
Raku Feed Finder
Programming is often just stitching libraries and APIs together, so it shouldn’t have been surprising that the Raku version of the above would be so similar. There are some significant (and sometimes welcome) differences, though, which I’ll go over now:
#!/usr/bin/env raku
use WWW::Mechanize:from<Perl5>;
use XML::RSS:from<Perl5>;
my @rss_types = qw<
application/rss+xml
application/rdf+xml
application/xml
text/xml
>;
my $mech = WWW::Mechanize.new;
my $rss = XML::RSS.new;
sub MAIN($url = 'https://phoenixtrap.com') {
my $response = $mech.get($url);
# If we got an HTML page, find the linked RSS feed
if $mech.is_html {
my @alt_links = $mech.find_all_links( Scalar, rel => 'alternate' );
$response = $mech.get(
@alt_links.first( *.attrs<type> (elem) @rss_types ).url
);
}
if $response.content_type(Scalar) !(elem) @rss_types {
# Overriding Raku's `die` stack trace is more verbose than we need
note $mech.uri ~ ' does not have an RSS feed';
exit 1;
}
my @items = $rss.parse( $mech.content ).<items>;
put join "\t", $_<title link> for @items;
}
The first thing to notice is there’s a bit less boilerplate code at the beginning. Raku is a younger language and doesn’t have to add instructions to enable less backward-compatible features. It’s also a larger language with functions and methods built-in that Perl needs to load from modules, though this feed finder program still needs to bring in WWW::Mechanize and XML::RSS with annotations to indicate they’re coming from the Perl5 side of the fence.
I decided to wrap the majority of the program in a MAIN function, which handily gives me command-line arguments as variables as well as a usage message if someone calls it with a --help option. This is a neat quality-of-life feature for script authors that cleverly reuses function signatures, and I’d love to see this available in Perl as an extension to its signatures feature.
Raku and Perl also differ in that the former has a different concept of context, where an expression may be evaluated differently depending upon whether its result is expected to be a single value (scalar) or a list of values. Inline::Perl5 calls Perl functions in list context by default, but you can add the Scalar type object as a first argument to force scalar context as I’ve done with calls to find_all_links (to return an array reference) and content_type (to return the first parameter of the HTTP Content-Type header).
Another interesting difference is the use of the (elem) operator to determine membership in a set. This is Raku’s ASCII way of spelling the ∈ symbol, which it can also use; !(elem) can also be spelled ∉. Both are hard to type on my keyboard so I chose the more verbose alternative, but if you want your code to more closely resemble mathematical notation it’s nice to know the option is there.
I also didn’t use Raku’s die routine to exit the program with an error, mainly because of its method of suppressing the line on which the error occurred. It requires using a CATCH block and then keying off of the type of exception thrown in order to customize its behavior, which seemed like overkill for such a small script. It would have looked something like this:
{
die $mech.uri ~ ' does not have an RSS feed'
if $response.content_type(Scalar) !(elem) @rss_types;
CATCH {
default {
note .message;
exit 1;
}
}
}
Doubtless, this could be golfed down to reduce its verbosity at the expense of readability, but I didn’t want to resort to clever tricks when trying to do a one-to-one comparison with Perl. More experienced Raku developers are welcome to set me straight in the comments below.
The last difference I’ll point out is Raku’s welcome lack of dereferencing operators compared to Perl. This is due to the former’s concept of containers, which I’m still learning about. It seems to be fairly DWIMmy so I’m not that worried, but it’s nice to know there’s an understandable mechanism behind it.
Overall I’m pleased with this first venture into Raku and I enjoyed what I’ve learned of the language so far. It’s not as different with Perl as I anticipated, and I can foresee coding more projects as I learn more. The community on the #rakuIRC channel was also very friendly and helpful, so I’ll be hanging out there as time permits.
What do you think? Can Perl and Raku better learn to coexist, or are they destined to be rivals? Leave a comment below.
It’s been years since I’ve had to hack on anything XML-related, but a recent project at work has me once again jumping into the waters of generating, parsing, and modifying this 90s-era document format. Most developers these days likely only know of it as part of the curiously-named XMLHTTPRequest object in web browsers used to retrieve data in JSON format from servers, and as the “X” in AJAX. But here we are in 2021, and there are still plenty of APIs and documents using XML to get their work done.
In my particular case, the task is to update the API calls for a new version of Virtuozzo Automator. Its API is a bit unusual in that it doesn’t use HTTP, but rather relies on opening a TLS-encrypted socket to the server and exchanging documents delimited with a null character. The previous version of our code is in 1990s-sysadmin-style Perl, with manual blessing of objects and parsing the XML using regular expressions. I’ve decided to update it to use the Moo object system and a proper XML parser. But which parser and module to use?
Selecting a parser
There are several generic XML modules for parsing and generating XML on CPAN, each with its own advantages and disadvantages. I’d like to say that I did a comprehensive survey of each of them, but this project is pressed for time (aren’t they all?) and I didn’t want to create too many extra dependencies in our Perl stack. Luckily, XML::LibXML is already available, I’ve had some previous experience with it, and it’s a good choice for performant standards-based XML parsing (using either DOM or SAX) and generation.
Given more time and leeway in adding dependencies, I might use something else. If the Virtuozzo API had an XML Schema or used SOAP, I would consider XML::Compile as I’ve had some success with that in other projects. But even that uses XML::LibXML under the hood, so I’d still be using that. Your mileage may vary.
Generating XML
Depending on the size and complexity of the XML documents to generate, you might choose to build them up node by node using XML::LibXML::Node and XML::LibXML::Element objects. Most of the messages I’m sending to Virtuozzo Automator are short and have easily-interpolated values, so I’m using here-document islands of XML inside my Perl code. This also has the advantage of being easily validated against the examples in the documentation.
Where the interpolated values in the messages are a little complicated, I’m using this idiom inside the here-docs:
@{[ ... ]}
This allows me to put an arbitrary expression in the … part, which is then put into an anonymous array reference, which is then immediately dereferenced into its string result. It’s a cheap and cheerful way to do minimal templating inside Perl strings without loading a full templating library; I’ve also had success using this technique when generating SQL for database queries.
Parser as an object attribute
Rather than instantiate a new XML::LibXML in every method that needs to parse a document, I created a private attribute:
package Local::API::Virtozzo::Agent {
use Moo;
use XML::LibXML;
use Types::Standard qw(InstanceOf);
...
has _parser => (
is => 'ro',
isa => InstanceOf['XML::LibXML'],
default => sub { XML::LibXML->new() },
);
sub foo {
my $self = shift;
my $send_doc = $self->_parser
->parse_string(<<"END_XML");
<foo/>
END_XML
...
}
...
}
Boilerplate
XML documents can be verbose, with elements that rarely change in every document. In the Virtuozzo API’s case, every document has a <packet> element containing a version attribute and an id attribute to match requests to responses. I wrote a simple function to wrap my documents in this element that pulled the version from a constant and always increased the id by one every time it’s called:
sub _wrap_packet {
state $send_id = 1;
return qq(<packet version="$PACKET_VERSION" id=")
. $send_id++ . '">' . shift . '</packet>';
}
If I need to add more attributes to the <packet> element (for instance, namespaces for attributes in enclosed elements, I can always use XML::LibXML::Element::setAttribute after parsing the document string.
Rather than using brittle regular expressions to extract data from the response, I use the shared parser object from above and then the full power of XPath:
use English;
...
sub get_sampleID {
my ($self, $sample_name) = @_;
...
# used to separate documents
local $INPUT_RECORD_SEPARATOR = "\0";
# $self->_sock is the IO::Socket::SSL connection
my $get_doc = $self->_parser( parse_string(
$self->_sock->getline(),
) );
my $sample_id = $get_doc->findvalue(
qq(//ns3:id[following-sibling::ns3:name="$sample_name"]),
);
return $sample_id;
}
This way, even if the order of elements change or more elements are introduced, the XPath patterns will continue to find the right data.
Conclusion… so far
I’m only about halfway through updating these API calls, and I’ve left out some non-XML-related details such as setting up the TLS socket connection. Hopefully this article has given you a taste of what’s involved in XML processing these days. Please leave me a comment if you have any suggestions or questions.
{"id":"3","mode":"text_link","open_style":"in_place","currency_code":"USD","currency_symbol":"$","currency_type":"decimal","blank_flag_url":"https:\/\/phoenixtrap.com\/wp-content\/plugins\/tip-jar-wp\/\/assets\/images\/flags\/blank.gif","flag_sprite_url":"https:\/\/phoenixtrap.com\/wp-content\/plugins\/tip-jar-wp\/\/assets\/images\/flags\/flags.png","default_amount":500,"top_media_type":"featured_image","featured_image_url":"https:\/\/phoenixtrap.com\/wp-content\/uploads\/2021\/02\/image-200x200.jpg","featured_embed":"","header_media":null,"file_download_attachment_data":null,"recurring_options_enabled":true,"recurring_options":{"never":{"selected":true,"after_output":"One time only"},"weekly":{"selected":false,"after_output":"Every week"},"monthly":{"selected":false,"after_output":"Every month"},"yearly":{"selected":false,"after_output":"Every year"}},"strings":{"current_user_email":"","current_user_name":"","link_text":"Do you like what you see? Leave a tip!","complete_payment_button_error_text":"Check info and try again","payment_verb":"Pay","payment_request_label":"The Phoenix Trap","form_has_an_error":"Please check and fix the errors above","general_server_error":"Something isn't working right at the moment. Please try again.","form_title":"The Phoenix Trap","form_subtitle":"Do you like what you see? Leave a one-time or recurring tip!","currency_search_text":"Country or Currency here","other_payment_option":"Other payment option","manage_payments_button_text":"Manage your payments","thank_you_message":"Thank you for being a supporter!","payment_confirmation_title":"The Phoenix Trap","receipt_title":"Your Receipt","print_receipt":"Print Receipt","email_receipt":"Email Receipt","email_receipt_sending":"Sending receipt...","email_receipt_success":"Email receipt successfully sent","email_receipt_failed":"Email receipt failed to send. Please try again.","receipt_payee":"Paid to","receipt_statement_descriptor":"This will show up on your statement as","receipt_date":"Date","receipt_transaction_id":"Transaction ID","receipt_transaction_amount":"Amount","refund_payer":"Refund from","login":"Log in to manage your payments","manage_payments":"Manage Payments","transactions_title":"Your Transactions","transaction_title":"Transaction Receipt","transaction_period":"Plan Period","arrangements_title":"Your Plans","arrangement_title":"Manage Plan","arrangement_details":"Plan Details","arrangement_id_title":"Plan ID","arrangement_payment_method_title":"Payment Method","arrangement_amount_title":"Plan Amount","arrangement_renewal_title":"Next renewal date","arrangement_action_cancel":"Cancel Plan","arrangement_action_cant_cancel":"Cancelling is currently not available.","arrangement_action_cancel_double":"Are you sure you'd like to cancel?","arrangement_cancelling":"Cancelling Plan...","arrangement_cancelled":"Plan Cancelled","arrangement_failed_to_cancel":"Failed to cancel plan","back_to_plans":"\u2190 Back to Plans","update_payment_method_verb":"Update","sca_auth_description":"Your have a pending renewal payment which requires authorization.","sca_auth_verb":"Authorize renewal payment","sca_authing_verb":"Authorizing payment","sca_authed_verb":"Payment successfully authorized!","sca_auth_failed":"Unable to authorize! Please try again.","login_button_text":"Log in","login_form_has_an_error":"Please check and fix the errors above","uppercase_search":"Search","lowercase_search":"search","uppercase_page":"Page","lowercase_page":"page","uppercase_items":"Items","lowercase_items":"items","uppercase_per":"Per","lowercase_per":"per","uppercase_of":"Of","lowercase_of":"of","back":"Back to plans","zip_code_placeholder":"Zip\/Postal Code","download_file_button_text":"Download File","input_field_instructions":{"tip_amount":{"placeholder_text":"How much would you like to tip?","initial":{"instruction_type":"normal","instruction_message":"How much would you like to tip? Choose any currency."},"empty":{"instruction_type":"error","instruction_message":"How much would you like to tip? Choose any currency."},"invalid_curency":{"instruction_type":"error","instruction_message":"Please choose a valid currency."}},"recurring":{"placeholder_text":"Recurring","initial":{"instruction_type":"normal","instruction_message":"How often would you like to give this?"},"success":{"instruction_type":"success","instruction_message":"How often would you like to give this?"},"empty":{"instruction_type":"error","instruction_message":"How often would you like to give this?"}},"name":{"placeholder_text":"Name on Credit Card","initial":{"instruction_type":"normal","instruction_message":"Enter the name on your card."},"success":{"instruction_type":"success","instruction_message":"Enter the name on your card."},"empty":{"instruction_type":"error","instruction_message":"Please enter the name on your card."}},"privacy_policy":{"terms_title":"Terms and conditions","terms_body":null,"terms_show_text":"View Terms","terms_hide_text":"Hide Terms","initial":{"instruction_type":"normal","instruction_message":"I agree to the terms."},"unchecked":{"instruction_type":"error","instruction_message":"Please agree to the terms."},"checked":{"instruction_type":"success","instruction_message":"I agree to the terms."}},"email":{"placeholder_text":"Your email address","initial":{"instruction_type":"normal","instruction_message":"Enter your email address"},"success":{"instruction_type":"success","instruction_message":"Enter your email address"},"blank":{"instruction_type":"error","instruction_message":"Enter your email address"},"not_an_email_address":{"instruction_type":"error","instruction_message":"Make sure you have entered a valid email address"}},"note_with_tip":{"placeholder_text":"Your note here...","initial":{"instruction_type":"normal","instruction_message":"Attach a note to your tip (optional)"},"empty":{"instruction_type":"normal","instruction_message":"Attach a note to your tip (optional)"},"not_empty_initial":{"instruction_type":"normal","instruction_message":"Attach a note to your tip (optional)"},"saving":{"instruction_type":"normal","instruction_message":"Saving note..."},"success":{"instruction_type":"success","instruction_message":"Note successfully saved!"},"error":{"instruction_type":"error","instruction_message":"Unable to save note note at this time. Please try again."}},"email_for_login_code":{"placeholder_text":"Your email address","initial":{"instruction_type":"normal","instruction_message":"Enter your email to log in."},"success":{"instruction_type":"success","instruction_message":"Enter your email to log in."},"blank":{"instruction_type":"error","instruction_message":"Enter your email to log in."},"empty":{"instruction_type":"error","instruction_message":"Enter your email to log in."}},"login_code":{"initial":{"instruction_type":"normal","instruction_message":"Check your email and enter the login code."},"success":{"instruction_type":"success","instruction_message":"Check your email and enter the login code."},"blank":{"instruction_type":"error","instruction_message":"Check your email and enter the login code."},"empty":{"instruction_type":"error","instruction_message":"Check your email and enter the login code."}},"stripe_all_in_one":{"initial":{"instruction_type":"normal","instruction_message":"Enter your credit card details here."},"empty":{"instruction_type":"error","instruction_message":"Enter your credit card details here."},"success":{"instruction_type":"normal","instruction_message":"Enter your credit card details here."},"invalid_number":{"instruction_type":"error","instruction_message":"The card number is not a valid credit card number."},"invalid_expiry_month":{"instruction_type":"error","instruction_message":"The card's expiration month is invalid."},"invalid_expiry_year":{"instruction_type":"error","instruction_message":"The card's expiration year is invalid."},"invalid_cvc":{"instruction_type":"error","instruction_message":"The card's security code is invalid."},"incorrect_number":{"instruction_type":"error","instruction_message":"The card number is incorrect."},"incomplete_number":{"instruction_type":"error","instruction_message":"The card number is incomplete."},"incomplete_cvc":{"instruction_type":"error","instruction_message":"The card's security code is incomplete."},"incomplete_expiry":{"instruction_type":"error","instruction_message":"The card's expiration date is incomplete."},"incomplete_zip":{"instruction_type":"error","instruction_message":"The card's zip code is incomplete."},"expired_card":{"instruction_type":"error","instruction_message":"The card has expired."},"incorrect_cvc":{"instruction_type":"error","instruction_message":"The card's security code is incorrect."},"incorrect_zip":{"instruction_type":"error","instruction_message":"The card's zip code failed validation."},"invalid_expiry_year_past":{"instruction_type":"error","instruction_message":"The card's expiration year is in the past"},"card_declined":{"instruction_type":"error","instruction_message":"The card was declined."},"missing":{"instruction_type":"error","instruction_message":"There is no card on a customer that is being charged."},"processing_error":{"instruction_type":"error","instruction_message":"An error occurred while processing the card."},"invalid_request_error":{"instruction_type":"error","instruction_message":"Unable to process this payment, please try again or use alternative method."},"invalid_sofort_country":{"instruction_type":"error","instruction_message":"The billing country is not accepted by SOFORT. Please try another country."}}}},"fetched_oembed_html":false}
{"id":"11","mode":"button","open_style":"in_modal","currency_code":"USD","currency_symbol":"$","currency_type":"decimal","blank_flag_url":"https:\/\/phoenixtrap.com\/wp-content\/plugins\/tip-jar-wp\/\/assets\/images\/flags\/blank.gif","flag_sprite_url":"https:\/\/phoenixtrap.com\/wp-content\/plugins\/tip-jar-wp\/\/assets\/images\/flags\/flags.png","default_amount":500,"top_media_type":"featured_image","featured_image_url":"https:\/\/phoenixtrap.com\/wp-content\/uploads\/2021\/02\/image-200x200.jpg","featured_embed":"","header_media":null,"file_download_attachment_data":null,"recurring_options_enabled":true,"recurring_options":{"never":{"selected":true,"after_output":"One time only"},"weekly":{"selected":false,"after_output":"Every week"},"monthly":{"selected":false,"after_output":"Every month"},"yearly":{"selected":false,"after_output":"Every year"}},"strings":{"current_user_email":"","current_user_name":"","link_text":"Leave a tip!","complete_payment_button_error_text":"Check info and try again","payment_verb":"Pay","payment_request_label":"The Phoenix Trap","form_has_an_error":"Please check and fix the errors above","general_server_error":"Something isn't working right at the moment. Please try again.","form_title":"The Phoenix Trap","form_subtitle":"Do you like what you see? Leave a one-time or recurring tip!","currency_search_text":"Country or Currency here","other_payment_option":"Other payment option","manage_payments_button_text":"Manage your payments","thank_you_message":"Thank you for being a supporter!","payment_confirmation_title":"The Phoenix Trap","receipt_title":"Your Receipt","print_receipt":"Print Receipt","email_receipt":"Email Receipt","email_receipt_sending":"Sending receipt...","email_receipt_success":"Email receipt successfully sent","email_receipt_failed":"Email receipt failed to send. Please try again.","receipt_payee":"Paid to","receipt_statement_descriptor":"This will show up on your statement as","receipt_date":"Date","receipt_transaction_id":"Transaction ID","receipt_transaction_amount":"Amount","refund_payer":"Refund from","login":"Log in to manage your payments","manage_payments":"Manage Payments","transactions_title":"Your Transactions","transaction_title":"Transaction Receipt","transaction_period":"Plan Period","arrangements_title":"Your Plans","arrangement_title":"Manage Plan","arrangement_details":"Plan Details","arrangement_id_title":"Plan ID","arrangement_payment_method_title":"Payment Method","arrangement_amount_title":"Plan Amount","arrangement_renewal_title":"Next renewal date","arrangement_action_cancel":"Cancel Plan","arrangement_action_cant_cancel":"Cancelling is currently not available.","arrangement_action_cancel_double":"Are you sure you'd like to cancel?","arrangement_cancelling":"Cancelling Plan...","arrangement_cancelled":"Plan Cancelled","arrangement_failed_to_cancel":"Failed to cancel plan","back_to_plans":"\u2190 Back to Plans","update_payment_method_verb":"Update","sca_auth_description":"Your have a pending renewal payment which requires authorization.","sca_auth_verb":"Authorize renewal payment","sca_authing_verb":"Authorizing payment","sca_authed_verb":"Payment successfully authorized!","sca_auth_failed":"Unable to authorize! Please try again.","login_button_text":"Log in","login_form_has_an_error":"Please check and fix the errors above","uppercase_search":"Search","lowercase_search":"search","uppercase_page":"Page","lowercase_page":"page","uppercase_items":"Items","lowercase_items":"items","uppercase_per":"Per","lowercase_per":"per","uppercase_of":"Of","lowercase_of":"of","back":"Back to plans","zip_code_placeholder":"Zip\/Postal Code","download_file_button_text":"Download File","input_field_instructions":{"tip_amount":{"placeholder_text":"How much would you like to tip?","initial":{"instruction_type":"normal","instruction_message":"How much would you like to tip? Choose any currency."},"empty":{"instruction_type":"error","instruction_message":"How much would you like to tip? Choose any currency."},"invalid_curency":{"instruction_type":"error","instruction_message":"Please choose a valid currency."}},"recurring":{"placeholder_text":"Recurring","initial":{"instruction_type":"normal","instruction_message":"How often would you like to give this?"},"success":{"instruction_type":"success","instruction_message":"How often would you like to give this?"},"empty":{"instruction_type":"error","instruction_message":"How often would you like to give this?"}},"name":{"placeholder_text":"Name on Credit Card","initial":{"instruction_type":"normal","instruction_message":"What is the name on your credit card?"},"success":{"instruction_type":"success","instruction_message":"Enter the name on your card."},"empty":{"instruction_type":"error","instruction_message":"Please enter the name on your card."}},"privacy_policy":{"terms_title":"Terms and conditions","terms_body":null,"terms_show_text":"View Terms","terms_hide_text":"Hide Terms","initial":{"instruction_type":"normal","instruction_message":"I agree to the terms."},"unchecked":{"instruction_type":"error","instruction_message":"Please agree to the terms."},"checked":{"instruction_type":"success","instruction_message":"I agree to the terms."}},"email":{"placeholder_text":"Your email address","initial":{"instruction_type":"normal","instruction_message":"What is your email address?"},"success":{"instruction_type":"success","instruction_message":"Enter your email address"},"blank":{"instruction_type":"error","instruction_message":"Enter your email address"},"not_an_email_address":{"instruction_type":"error","instruction_message":"Make sure you have entered a valid email address"}},"note_with_tip":{"placeholder_text":"Your note here...","initial":{"instruction_type":"normal","instruction_message":"Attach a note to your tip (optional)"},"empty":{"instruction_type":"normal","instruction_message":"Attach a note to your tip (optional)"},"not_empty_initial":{"instruction_type":"normal","instruction_message":"Attach a note to your tip (optional)"},"saving":{"instruction_type":"normal","instruction_message":"Saving note..."},"success":{"instruction_type":"success","instruction_message":"Note successfully saved!"},"error":{"instruction_type":"error","instruction_message":"Unable to save note note at this time. Please try again."}},"email_for_login_code":{"placeholder_text":"Your email address","initial":{"instruction_type":"normal","instruction_message":"Enter your email to log in."},"success":{"instruction_type":"success","instruction_message":"Enter your email to log in."},"blank":{"instruction_type":"error","instruction_message":"Enter your email to log in."},"empty":{"instruction_type":"error","instruction_message":"Enter your email to log in."}},"login_code":{"initial":{"instruction_type":"normal","instruction_message":"Check your email and enter the login code."},"success":{"instruction_type":"success","instruction_message":"Check your email and enter the login code."},"blank":{"instruction_type":"error","instruction_message":"Check your email and enter the login code."},"empty":{"instruction_type":"error","instruction_message":"Check your email and enter the login code."}},"stripe_all_in_one":{"initial":{"instruction_type":"normal","instruction_message":"Enter your credit card details here."},"empty":{"instruction_type":"error","instruction_message":"Enter your credit card details here."},"success":{"instruction_type":"normal","instruction_message":"Enter your credit card details here."},"invalid_number":{"instruction_type":"error","instruction_message":"The card number is not a valid credit card number."},"invalid_expiry_month":{"instruction_type":"error","instruction_message":"The card's expiration month is invalid."},"invalid_expiry_year":{"instruction_type":"error","instruction_message":"The card's expiration year is invalid."},"invalid_cvc":{"instruction_type":"error","instruction_message":"The card's security code is invalid."},"incorrect_number":{"instruction_type":"error","instruction_message":"The card number is incorrect."},"incomplete_number":{"instruction_type":"error","instruction_message":"The card number is incomplete."},"incomplete_cvc":{"instruction_type":"error","instruction_message":"The card's security code is incomplete."},"incomplete_expiry":{"instruction_type":"error","instruction_message":"The card's expiration date is incomplete."},"incomplete_zip":{"instruction_type":"error","instruction_message":"The card's zip code is incomplete."},"expired_card":{"instruction_type":"error","instruction_message":"The card has expired."},"incorrect_cvc":{"instruction_type":"error","instruction_message":"The card's security code is incorrect."},"incorrect_zip":{"instruction_type":"error","instruction_message":"The card's zip code failed validation."},"invalid_expiry_year_past":{"instruction_type":"error","instruction_message":"The card's expiration year is in the past"},"card_declined":{"instruction_type":"error","instruction_message":"The card was declined."},"missing":{"instruction_type":"error","instruction_message":"There is no card on a customer that is being charged."},"processing_error":{"instruction_type":"error","instruction_message":"An error occurred while processing the card."},"invalid_request_error":{"instruction_type":"error","instruction_message":"Unable to process this payment, please try again or use alternative method."},"invalid_sofort_country":{"instruction_type":"error","instruction_message":"The billing country is not accepted by SOFORT. Please try another country."}}}},"fetched_oembed_html":false}