This week my main task for this sprint was can­celed. While not as momen­tous as the can­cel­la­tion of an entire project (I’ve been there too), delet­ing the past week’s work still stung. This isn’t the first time, though, so I know that there are a few things to keep in mind:

You didn’t waste your time

Bottom line: Were you paid for your work? Then your employ­er still sees it as valu­able, if only to make sure that a giv­en line of devel­op­ment was suf­fi­cient­ly explored before deter­min­ing it was­n’t worth con­tin­u­ing. Developing a prod­uct or ser­vice often means say­ing no” to things, and some­times that means cut­ting loss­es before the sunk cost fal­la­cy takes hold.

You probably learned something

Over the past week’s work, I learned about man­ag­ing TLS con­nec­tions (includ­ing sup­port­ing ciphers that are no longer con­sid­ered secure), para­me­ter val­i­da­tion, and XML name­space sup­port in XPath. You prob­a­bly learned a lot more if your project extend­ed longer, and you can use that knowl­edge fur­ther on in your career. Put it on your résumé or CV, and you may get an oppor­tu­ni­ty to work on the same things in the future.

You could continue if you want

Okay, maybe you’re not going to sneak into the office for months to fin­ish things. But as long as you have the time and incli­na­tion, you could con­tin­ue to work on your project, espe­cial­ly if you think it could be valu­able to the com­pa­ny lat­er on. Consider this care­ful­ly, though—you don’t want off-​the-​books work tak­ing time and ener­gy away from your main job.

There’s no shame

Lastly, you should­n’t feel ashamed about being part of a can­celed project. They hap­pen all the time, and prob­a­bly should hap­pen more—history is lit­tered with failed soft­ware projects that like­ly could have cost less if only their prob­lems were rec­og­nized ear­li­er. By its nature, soft­ware devel­op­ment is explorato­ry and dif­fi­cult, and not every idea pans out. As long as you can find some­thing new to work on, you’ll be fine.

young lady learning sign language during online lesson with female tutor

It’s been years since I’ve had to hack on any­thing XML-relat­ed, but a recent project at work has me once again jump­ing into the waters of gen­er­at­ing, pars­ing, and mod­i­fy­ing this 90s-​era doc­u­ment for­mat. Most devel­op­ers these days like­ly only know of it as part of the curiously-​named XMLHTTPRequest object in web browsers used to retrieve data in JSON for­mat from servers, and as the X” in AJAX. But here we are in 2021, and there are still plen­ty of APIs and doc­u­ments using XML to get their work done.

In my par­tic­u­lar case, the task is to update the API calls for a new ver­sion of Virtuozzo Automator. Its API is a bit unusu­al in that it does­n’t use HTTP, but rather relies on open­ing a TLS-encrypt­ed sock­et to the serv­er and exchang­ing doc­u­ments delim­it­ed with a null char­ac­ter. The pre­vi­ous ver­sion of our code is in 1990s-​sysadmin-​style Perl, with man­u­al blessing of objects and pars­ing the XML using reg­u­lar expres­sions. I’ve decid­ed to update it to use the Moo object sys­tem and a prop­er XML pars­er. But which pars­er and mod­ule to use?

Selecting a parser

There are sev­er­al gener­ic XML mod­ules for pars­ing and gen­er­at­ing XML on CPAN, each with its own advan­tages and dis­ad­van­tages. I’d like to say that I did a com­pre­hen­sive sur­vey of each of them, but this project is pressed for time (aren’t they all?) and I did­n’t want to cre­ate too many extra depen­den­cies in our Perl stack. Luckily, XML::LibXML is already avail­able, I’ve had some pre­vi­ous expe­ri­ence with it, and it’s a good choice for per­for­mant standards-​based XML pars­ing (using either DOM or SAX) and generation.

Given more time and lee­way in adding depen­den­cies, I might use some­thing else. If the Virtuozzo API had an XML Schema or used SOAP, I would con­sid­er XML::Compile as I’ve had some suc­cess with that in oth­er projects. But even that uses XML::LibXML under the hood, so I’d still be using that. Your mileage may vary.

Generating XML

Depending on the size and com­plex­i­ty of the XML doc­u­ments to gen­er­ate, you might choose to build them up node by node using XML::LibXML::Node and XML::LibXML::Element objects. Most of the mes­sages I’m send­ing to Virtuozzo Automator are short and have easily-​interpolated val­ues, so I’m using here-​document islands of XML inside my Perl code. This also has the advan­tage of being eas­i­ly val­i­dat­ed against the exam­ples in the documentation.

Where the inter­po­lat­ed val­ues in the mes­sages are a lit­tle com­pli­cat­ed, I’m using this idiom inside the here-docs:

@{[ ... ]}

This allows me to put an arbi­trary expres­sion in the … part, which is then put into an anony­mous array ref­er­ence, which is then imme­di­ate­ly deref­er­enced into its string result. It’s a cheap and cheer­ful way to do min­i­mal tem­plat­ing inside Perl strings with­out load­ing a full tem­plat­ing library; I’ve also had suc­cess using this tech­nique when gen­er­at­ing SQL for data­base queries.

Parser as an object attribute

Rather than instan­ti­ate a new XML::LibXML in every method that needs to parse a doc­u­ment, I cre­at­ed a pri­vate attribute:

package Local::API::Virtozzo::Agent {
    use Moo;
    use XML::LibXML;
    use Types::Standard qw(InstanceOf);
    ...
    has _parser => (
        is      => 'ro',
        isa     => InstanceOf['XML::LibXML'],
        default => sub { XML::LibXML->new() },
    );
    sub foo {
        my $self = shift;
        my $send_doc = $self->_parser
          ->parse_string(<<"END_XML");
            <foo/>
END_XML
        ...
    }
...
}

Boilerplate

XML doc­u­ments can be ver­bose, with ele­ments that rarely change in every doc­u­ment. In the Virtuozzo API’s case, every doc­u­ment has a <packet> ele­ment con­tain­ing a version attribute and an id attribute to match requests to respons­es. I wrote a sim­ple func­tion to wrap my doc­u­ments in this ele­ment that pulled the ver­sion from a con­stant and always increased the id by one every time it’s called:

sub _wrap_packet {
    state $send_id = 1;
    return qq(<packet version="$PACKET_VERSION" id=")
      . $send_id++ . '">' . shift . '</packet>';
}

If I need to add more attrib­ut­es to the <packet> ele­ment (for instance, name­spaces for attrib­ut­es in enclosed ele­ments, I can always use XML::LibXML::Element::setAttribute after pars­ing the doc­u­ment string.

Parsing responses with XPath

Rather than using brit­tle reg­u­lar expres­sions to extract data from the response, I use the shared pars­er object from above and then the full pow­er of XPath:

use English;
...
sub get_sampleID {
    my ($self, $sample_name) = @_;
    ...
    # used to separate documents
    local $INPUT_RECORD_SEPARATOR = "\0";
    # $self->_sock is the IO::Socket::SSL connection
    my $get_doc = $self->_parser( parse_string(
      $self->_sock->getline(),
    ) );
    my $sample_id = $get_doc->findvalue(
        qq(//ns3:id[following-sibling::ns3:name="$sample_name"]),
    );
    return $sample_id;
}

This way, even if the order of ele­ments change or more ele­ments are intro­duced, the XPath pat­terns will con­tin­ue to find the right data.

Conclusion… so far

I’m only about halfway through updat­ing these API calls, and I’ve left out some non-​XML-​related details such as set­ting up the TLS sock­et con­nec­tion. Hopefully this arti­cle has giv­en you a taste of what’s involved in XML pro­cess­ing these days. Please leave me a com­ment if you have any sug­ges­tions or questions.

This pro­pos­al from Dan Book seems rea­son­able to me. A ver­sion 7 fea­ture bun­dle that ren­ders sig­na­tures non-​experimental; removes the indi­rect, mul­ti­di­men­sion­al, and bare­word file­han­dle fea­tures; enables warn­ings and utf8 by default? Sure. And more impor­tant­ly, incre­ment­ing the major ver­sion every time a new fea­ture is stable.

Unfortunately we’re still on Perl 5.16.3 at work, and it would be a big push to make sure our code­base is com­pat­i­ble with a new­er ver­sion, much less adopt new fea­tures. But I’m will­ing to bet a rea­son­able release of ver­sion 7 might be just the push we need.

On the heels of my blog arti­cle and upcom­ing pre­sen­ta­tion comes Paul Evans’ call to de-​experimentalize (is that a word?) sub­rou­tine sig­na­tures in Perl core. It’s been sta­ble for over four years now, and the exper­i­men­tal” tag has been hold­ing back devel­op­ers big and small, so I ful­ly sup­port this effort. Maybe it can make it into Perl 5.34? Here’s hoping.