iceberg in body of water

We have a huge code­base of over 700,000 lines of Perl spread across a cou­ple dozen Git repos­i­to­ries at work. Sometimes refac­tor­ing is easy if the class­es and meth­ods involved are con­fined to one of those repos, but last week we want­ed to rename a method that was poten­tial­ly used across many of them with­out hav­ing to QA and launch so many changes. After get­ting some help from Dan Book and Ryan Voots on the #perl libera.chat IRC chan­nel, I arrived at the fol­low­ing solution.

First, if all you want to do is alias the new method call to the old while mak­ing the least amount of changes, you can just do this:

*new_method = \&old_method;

This takes advan­tage of Perl’s type­globs by assign­ing to the new method­’s name in the sym­bol table a ref­er­ence (indi­cat­ed by the \ char­ac­ter) to the old method. Methods are just sub­rou­tines in Perl, and although you don’t need the & char­ac­ter when call­ing one, you do need it if you’re pass­ing a sub­rou­tine as an argu­ment or cre­at­ing a ref­er­ence, as we’re doing above.

I want­ed to do a bit more, though. First, I want­ed to log the calls to the old method name so that I could track just how wide­ly it’s used and have a head start on renam­ing it else­where in our code­base. Also, I did­n’t want to fill our logs with those calls — we have enough noise in there already. And last­ly, I want­ed future calls to go direct­ly to the new method name with­out adding anoth­er stack frame when using caller or Carp.

With all that in mind, here’s the result:

sub old_method {
    warn 'old_method is deprecated';
    no warnings 'redefine';
    *old_method = \&new_method;
    goto &new_method;
}

sub new_method {
    # code from old_method goes here
}

Old (and not-​so-​old) hands at pro­gram­ming are prob­a­bly leap­ing out of their seats right now yelling, YOU’RE USING GOTO! GOTO IS CONSIDERED HARMFUL!” And they’re right, but this isn’t Dijkstra’s goto. From the Perl manual:

The goto &NAME form is quite dif­fer­ent from the oth­er forms of goto. In fact, it isn’t a goto in the nor­mal sense at all, and does­n’t have the stig­ma asso­ci­at­ed with oth­er gotos. Instead, it exits the cur­rent sub­rou­tine (los­ing any changes set by local) and imme­di­ate­ly calls in its place the named sub­rou­tine using the cur­rent val­ue of @_. […] After the goto, not even caller will be able to tell that this rou­tine was called first.

perl­func man­u­al page

Computer sci­en­tists call this tail call elim­i­na­tion. The bot­tom line is that this achieves our third goal above: imme­di­ate­ly jump­ing to the new method as if it were orig­i­nal­ly called.

The oth­er tricky bit is in the line before, when we’re redefin­ing old_method to point to new_method while we’re still inside old_method. (Yes, you can do this.) If you’re run­ning under use warnings (and we are, and you should), you first need to dis­able that warn­ing. Later calls to old_method will go straight to new_method with­out log­ging anything.

And that’s it. The next step after launch­ing this change is to add a sto­ry to our back­log to mon­i­tor our logs for calls to the old method, and grad­u­al­ly refac­tor our oth­er repos­i­to­ries. Then we can final­ly remove the old method wrapper.

This week my main task for this sprint was can­celed. While not as momen­tous as the can­cel­la­tion of an entire project (I’ve been there too), delet­ing the past week’s work still stung. This isn’t the first time, though, so I know that there are a few things to keep in mind:

You didn’t waste your time

Bottom line: Were you paid for your work? Then your employ­er still sees it as valu­able, if only to make sure that a giv­en line of devel­op­ment was suf­fi­cient­ly explored before deter­min­ing it was­n’t worth con­tin­u­ing. Developing a prod­uct or ser­vice often means say­ing no” to things, and some­times that means cut­ting loss­es before the sunk cost fal­la­cy takes hold.

You probably learned something

Over the past week’s work, I learned about man­ag­ing TLS con­nec­tions (includ­ing sup­port­ing ciphers that are no longer con­sid­ered secure), para­me­ter val­i­da­tion, and XML name­space sup­port in XPath. You prob­a­bly learned a lot more if your project extend­ed longer, and you can use that knowl­edge fur­ther on in your career. Put it on your résumé or CV, and you may get an oppor­tu­ni­ty to work on the same things in the future.

You could continue if you want

Okay, maybe you’re not going to sneak into the office for months to fin­ish things. But as long as you have the time and incli­na­tion, you could con­tin­ue to work on your project, espe­cial­ly if you think it could be valu­able to the com­pa­ny lat­er on. Consider this care­ful­ly, though — you don’t want off-​the-​books work tak­ing time and ener­gy away from your main job.

There’s no shame

Lastly, you should­n’t feel ashamed about being part of a can­celed project. They hap­pen all the time, and prob­a­bly should hap­pen more — his­to­ry is lit­tered with failed soft­ware projects that like­ly could have cost less if only their prob­lems were rec­og­nized ear­li­er. By its nature, soft­ware devel­op­ment is explorato­ry and dif­fi­cult, and not every idea pans out. As long as you can find some­thing new to work on, you’ll be fine.

It’s been years since I’ve had to hack on any­thing XML-relat­ed, but a recent project at work has me once again jump­ing into the waters of gen­er­at­ing, pars­ing, and mod­i­fy­ing this 90s-​era doc­u­ment for­mat. Most devel­op­ers these days like­ly only know of it as part of the curiously-​named XMLHTTPRequest object in web browsers used to retrieve data in JSON for­mat from servers, and as the X” in AJAX. But here we are in 2021, and there are still plen­ty of APIs and doc­u­ments using XML to get their work done.

In my par­tic­u­lar case, the task is to update the API calls for a new ver­sion of Virtuozzo Automator. Its API is a bit unusu­al in that it does­n’t use HTTP, but rather relies on open­ing a TLS-encrypt­ed sock­et to the serv­er and exchang­ing doc­u­ments delim­it­ed with a null char­ac­ter. The pre­vi­ous ver­sion of our code is in 1990s-​sysadmin-​style Perl, with man­u­al blessing of objects and pars­ing the XML using reg­u­lar expres­sions. I’ve decid­ed to update it to use the Moo object sys­tem and a prop­er XML pars­er. But which pars­er and mod­ule to use?

Selecting a parser

There are sev­er­al gener­ic XML mod­ules for pars­ing and gen­er­at­ing XML on CPAN, each with its own advan­tages and dis­ad­van­tages. I’d like to say that I did a com­pre­hen­sive sur­vey of each of them, but this project is pressed for time (aren’t they all?) and I did­n’t want to cre­ate too many extra depen­den­cies in our Perl stack. Luckily, XML::LibXML is already avail­able, I’ve had some pre­vi­ous expe­ri­ence with it, and it’s a good choice for per­for­mant standards-​based XML pars­ing (using either DOM or SAX) and generation.

Given more time and lee­way in adding depen­den­cies, I might use some­thing else. If the Virtuozzo API had an XML Schema or used SOAP, I would con­sid­er XML::Compile as I’ve had some suc­cess with that in oth­er projects. But even that uses XML::LibXML under the hood, so I’d still be using that. Your mileage may vary.

Generating XML

Depending on the size and com­plex­i­ty of the XML doc­u­ments to gen­er­ate, you might choose to build them up node by node using XML::LibXML::Node and XML::LibXML::Element objects. Most of the mes­sages I’m send­ing to Virtuozzo Automator are short and have easily-​interpolated val­ues, so I’m using here-​document islands of XML inside my Perl code. This also has the advan­tage of being eas­i­ly val­i­dat­ed against the exam­ples in the documentation.

Where the inter­po­lat­ed val­ues in the mes­sages are a lit­tle com­pli­cat­ed, I’m using this idiom inside the here-docs:

@{[ ... ]}

This allows me to put an arbi­trary expres­sion in the … part, which is then put into an anony­mous array ref­er­ence, which is then imme­di­ate­ly deref­er­enced into its string result. It’s a cheap and cheer­ful way to do min­i­mal tem­plat­ing inside Perl strings with­out load­ing a full tem­plat­ing library; I’ve also had suc­cess using this tech­nique when gen­er­at­ing SQL for data­base queries.

Parser as an object attribute

Rather than instan­ti­ate a new XML::LibXML in every method that needs to parse a doc­u­ment, I cre­at­ed a pri­vate attribute:

package Local::API::Virtozzo::Agent {
    use Moo;
    use XML::LibXML;
    use Types::Standard qw(InstanceOf);
    ...
    has _parser => (
        is      => 'ro',
        isa     => InstanceOf['XML::LibXML'],
        default => sub { XML::LibXML->new() },
    );
    sub foo {
        my $self = shift;
        my $send_doc = $self->_parser
          ->parse_string(<<"END_XML");
            <foo/>
END_XML
        ...
    }
...
}

Boilerplate

XML doc­u­ments can be ver­bose, with ele­ments that rarely change in every doc­u­ment. In the Virtuozzo API’s case, every doc­u­ment has a <packet> ele­ment con­tain­ing a version attribute and an id attribute to match requests to respons­es. I wrote a sim­ple func­tion to wrap my doc­u­ments in this ele­ment that pulled the ver­sion from a con­stant and always increased the id by one every time it’s called:

sub _wrap_packet {
    state $send_id = 1;
    return qq(<packet version="$PACKET_VERSION" id=")
      . $send_id++ . '">' . shift . '</packet>';
}

If I need to add more attrib­ut­es to the <packet> ele­ment (for instance, name­spaces for attrib­ut­es in enclosed ele­ments, I can always use XML::LibXML::Element::setAttribute after pars­ing the doc­u­ment string.

Parsing responses with XPath

Rather than using brit­tle reg­u­lar expres­sions to extract data from the response, I use the shared pars­er object from above and then the full pow­er of XPath:

use English;
...
sub get_sampleID {
    my ($self, $sample_name) = @_;
    ...
    # used to separate documents
    local $INPUT_RECORD_SEPARATOR = "\0";
    # $self->_sock is the IO::Socket::SSL connection
    my $get_doc = $self->_parser( parse_string(
      $self->_sock->getline(),
    ) );
    my $sample_id = $get_doc->findvalue(
        qq(//ns3:id[following-sibling::ns3:name="$sample_name"]),
    );
    return $sample_id;
}

This way, even if the order of ele­ments change or more ele­ments are intro­duced, the XPath pat­terns will con­tin­ue to find the right data.

Conclusion… so far

I’m only about halfway through updat­ing these API calls, and I’ve left out some non-​XML-​related details such as set­ting up the TLS sock­et con­nec­tion. Hopefully this arti­cle has giv­en you a taste of what’s involved in XML pro­cess­ing these days. Please leave me a com­ment if you have any sug­ges­tions or questions.

This pro­pos­al from Dan Book seems rea­son­able to me. A ver­sion 7 fea­ture bun­dle that ren­ders sig­na­tures non-​experimental; removes the indi­rect, mul­ti­di­men­sion­al, and bare­word file­han­dle fea­tures; enables warn­ings and utf8 by default? Sure. And more impor­tant­ly, incre­ment­ing the major ver­sion every time a new fea­ture is stable.

Unfortunately we’re still on Perl 5.16.3 at work, and it would be a big push to make sure our code­base is com­pat­i­ble with a new­er ver­sion, much less adopt new fea­tures. But I’m will­ing to bet a rea­son­able release of ver­sion 7 might be just the push we need.

On the heels of my blog arti­cle and upcom­ing pre­sen­ta­tion comes Paul Evans’ call to de-​experimentalize (is that a word?) sub­rou­tine sig­na­tures in Perl core. It’s been sta­ble for over four years now, and the exper­i­men­tal” tag has been hold­ing back devel­op­ers big and small, so I ful­ly sup­port this effort. Maybe it can make it into Perl 5.34? Here’s hoping.

Did you know that you could increase the read­abil­i­ty and reli­a­bil­i­ty of your Perl code with one fea­ture? I’m talk­ing about sub­rou­tine sig­na­tures: the abil­i­ty to declare what argu­ments, and in some cas­es what types of argu­ments, your func­tions and meth­ods take.

Most Perl pro­gram­mers know about the @_ vari­able (or @ARG if you use English). When a sub­rou­tine is called, @_ con­tains the para­me­ters passed. It’s an array (thus the @ sig­il) and can be treat­ed as such; it’s even the default argu­ment for pop and shift. Here’s an example:

use v5.10;
use strict;
use warnings;

sub foo {
    my $parameter = shift;
    say "You passed me $parameter";
}

Or for mul­ti­ple parameters:

use v5.10;
use strict;
use warnings;

sub foo {
    my ($parameter1, $parameter2) = @_;
    say "You passed me $parameter1 and $parameter2";
}

(What’s that use v5.10; doing there? It enables all fea­tures that were intro­duced in Perl 5.10, such as the say func­tion. We’ll assume you type it in from now on to reduce clutter.)

We can do bet­ter, though. Perl 5.20 (released in 2014; why haven’t you upgrad­ed?) intro­duced the exper­i­men­tal signatures fea­ture, which as described above, allows para­me­ters to be intro­duced right when you declare the sub­rou­tine. It looks like this:

use experimental 'signatures';

sub foo ($parameter1, $parameter2 = 1, @rest) {
    say "You passed me $parameter1 and $parameter2";
    say "And these:";
    say for @rest;
}

You can even set defaults for option­al para­me­ters, as seen above with the = sign, or slurp up remain­ing para­me­ters into an array, like the @rest array above. For more help­ful uses of this fea­ture, con­sult the perl­sub man­u­al page.

We can do bet­ter still. The Comprehensive Perl Archive Network (CPAN) con­tains sev­er­al mod­ules that both enable sig­na­tures, as well as val­i­date para­me­ters are of a cer­tain type or for­mat. (Yes, Perl can have types!) Let’s take a tour of some of them.

Params::Validate

This mod­ule adds two new func­tions, validate() and validate_pos(). validate() intro­duces named para­me­ters, which make your code more read­able by describ­ing what para­me­ters are being called at the time you call them. It looks like this:

use Params::Validate;

say foo(parameter1 => 'hello',  parameter2 => 'world');

sub foo {
    my %p = validate(@_, {
        parameter1 => 1, # mandatory
        parameter2 => 0, # optional
    } );
    return $p->{parameter1}, $p->{parameter2};
}

If all you want to do is val­i­date un-​named (posi­tion­al) para­me­ters, use validate_pos():

use Params::Validate;

say foo('hello', 'world');

sub foo {
    my @p = validate_pos(@_, 1, 0);
    return @p;
}

Params::Validate also has fair­ly deep sup­port for type val­i­da­tion, enabling you to val­i­date para­me­ters against sim­ple types, method inter­faces (also known as duck typ­ing”), mem­ber­ship in a class, reg­u­lar expres­sion match­es, and arbi­trary code call­backs. As always, con­sult the doc­u­men­ta­tion for the nitty-​gritty details.

MooseX::Params::Validate

MooseX::Params::Validate adds type val­i­da­tion via the Moose object-​oriented frame­work’s type sys­tem, mean­ing that any­thing that can be defined as a Moose type can be used to val­i­date the para­me­ters passed to your func­tions or meth­ods. It adds the validated_hash(), validated_list(), and pos_validated_list() func­tions, and looks like this:

package Foo;

use Moose;
use MooseX::Params::Validate;

say __PACKAGE__->foo(parameter1 => 'Mouse');
say __PACKAGE__->bar(parameter1 => 'Mice');
say __PACKAGE__->baz('Men', 42);

sub foo {
    my ($self, %params) = validated_hash(
        \@_,
        parameter1 => { isa => 'Str', default => 'Moose' },
    );
    return $params{parameter1};
}

sub bar {
    my ($self, $param1) = validated_pos(
        \@_,
        parameter1 => { isa => 'Str', default => 'Moose' },
    );
    return $param1;
}

sub baz {
    my ($self, $foo, $bar) = pos_validated_list(
        \@_,
        { isa => 'Str' },
        { isa => 'Int' },
    );
    return $foo, $bar;
}

Note that the first para­me­ter passed to each func­tion is a ref­er­ence to the @_ array, denot­ed by a backslash.

MooseX::Params::Validate has sev­er­al more things you can spec­i­fy when list­ing para­me­ters, includ­ing roles, coer­cions, and depen­den­cies. The doc­u­men­ta­tion for the mod­ule has all the details. We use this mod­ule at work a lot, and even use it with­out Moose when val­i­dat­ing para­me­ters passed to test functions.

Function::Parameters

For a dif­fer­ent take on sub­rou­tine sig­na­tures, you can use the Function::Parameters mod­ule. Rather than pro­vid­ing helper func­tions, it defines two new Perl key­words, fun and method. It looks like this:

use Function::Parameters;

say foo('hello', 'world');
say bar(param1 => 'hello');

fun foo($param1, $param2) {
    return $param1, $param2;
}

fun bar(:$param1, :$param2 = 42) {
    return $param1, $param2;
}

The colons in the bar() func­tion above indi­cate that the para­me­ters are named, and need to be spec­i­fied by name when the func­tion is called, using the => oper­a­tor as if you were spec­i­fy­ing a hash.

In addi­tion to defaults and the posi­tion­al and named para­me­ters demon­strat­ed above, Function::Parameters sup­ports type con­straints (via Type::Tiny) and Moo or Moose method mod­i­fiers. (If you don’t know what those are, the Moose and Class::Method::Modifiers doc­u­men­ta­tion are helpful.)

I’m not a fan of mod­ules that add new syn­tax for com­mon tasks like sub­rou­tines and meth­ods, if only because there’s an extra effort in updat­ing tool­ings like syn­tax high­lighters and Perl::Critic code analy­sis. Still, this may appeal to you, espe­cial­ly if you’re com­ing from oth­er lan­guages that have sim­i­lar syntax.

Type::Params

Speaking of Type::Tiny, it includes its own para­me­ter val­i­da­tion library called Type::Params. I think I would favor this for new work, as it’s com­pat­i­ble with both Moo and Moose but does­n’t require them.

Type::Params has a num­ber of func­tions, none of which are pro­vid­ed by default, so you’ll have to import them explic­it­ly when use ing the mod­ule. It also intro­duces a sep­a­rate step for com­pil­ing your val­i­da­tion spec­i­fi­ca­tion to speed up per­for­mance. It looks like this:

use Types::Standard qw(Str Int);
use Type::Params qw(compile compile_named);

say foo('hello', 42);
say bar(param1 => 'hello');

sub foo {
    state $check = compile(Str, Int);
    my ($param1, $param2) = $check->(@_);

    return $param1, $param2;
}

sub bar {
    state $check = compile_named(
        param1 => Str,
        param2 => Int, {optional => 1},
    );
    my $params_ref = $check->(@_);

    return $params_ref->{param1}, $params_ref->{param2};
}

The fea­tures of Type::Tiny and its bun­dled mod­ules are pret­ty vast, so I sug­gest once again that you con­sult the doc­u­men­ta­tion on how to use it.

Params::ValidationCompiler

At the top of the doc­u­men­ta­tion to Params::Validate, you’ll notice that the author rec­om­mends instead his Params::ValidationCompiler mod­ule for faster per­for­mance, using a com­pi­la­tion step much like Type::Params. It pro­vides two func­tions for you to import, validation_for() and source_for(). We’ll con­cen­trate on the for­mer since the lat­ter is main­ly use­ful for debugging.

It looks like this:

use Types::Standard qw(Int Str);
use Params::ValidationCompiler 'validation_for';

my $validator = validation_for(
    params => {
        param1 => {
            type    => Str,
            default => 'Perl is cool',
        },
        param2 => {
            type     => Int,
            optional => 1,
        },
);

say foo(param1 => 'hello');

sub foo {
    my %params = $validator->(@_);
    return @params{'param1', 'param2'};
}

As you can see, it sup­ports type con­straints, defaults, and option­al val­ues. It can also put extra argu­ments in a list (it calls this fea­ture slur­py”), and can even return gen­er­at­ed objects to make it eas­i­er to catch typos (since a typoed hash key just gen­er­ates that key rather than return­ing an error). There’s a bit more to this mod­ule, so please read the doc­u­men­ta­tion to exam­ine all its features.

Conclusion

One of Perl’s mot­tos is there’s more than one way to do it,” and you’re wel­come to choose what­ev­er method you need to enable sig­na­tures and type val­i­da­tion. Just remem­ber to be con­sis­tent and have good rea­sons for your choic­es, since the over­all goal is to improve your code’s reli­a­bil­i­ty and read­abil­i­ty. And be sure to share your favorite tech­niques with oth­ers, so they too can devel­op bet­ter software.

This arti­cle is also pub­lished on dev.to.