Look, I get it. You don’t like the Perl pro­gram­ming lan­guage or have oth­er­wise dis­re­gard­ed it as dead.” (Or per­haps you haven’t, in which case please check out my oth­er blog posts!) It has weird noisy syn­tax, mix­ing reg­u­lar expres­sions, sig­ils on vari­able names, var­i­ous braces and brack­ets for data struc­tures, and a menagerie of cryp­tic spe­cial vari­ables. It’s old: 34 years in December, with a his­to­ry of (some­times ama­teur) devel­op­ers that have used and abused that syn­tax to ship code of ques­tion­able qual­i­ty. Maybe you grudg­ing­ly accept its util­i­ty but think it should die grace­ful­ly, main­tained only to run lega­cy applications.

But you know what? Perl’s still going. It’s had a steady cadence of year­ly releas­es for the past decade, intro­duc­ing new fea­tures and fenc­ing in bad behav­ior while main­tain­ing an admirable lev­el of back­ward com­pat­i­bil­i­ty. Yes, there was a too-​long adven­ture devel­op­ing what start­ed as Perl 6, but that lan­guage now has its own iden­ti­ty as Raku and even has facil­i­ties for mix­ing Perl with its native code or vice versa.

And then there’s CPAN, the Comprehensive Perl Archive Network: a continually-​updated col­lec­tion of over 200,000 open-​source mod­ules writ­ten by over 14,000 authors, the best of which are well-​tested and ‑doc­u­ment­ed (apply­ing peer pres­sure to those that fall short), pre­sent­ed through a search engine and front-​end built by scores of con­trib­u­tors. Through CPAN you can find dis­tri­b­u­tions for things like:

All of this is avail­able through a mature instal­la­tion tool­chain that doesn’t break from month to month.

Finally and most impor­tant­ly, there’s the glob­al Perl com­mu­ni­ty. The COVID-​19 pan­dem­ic has put a damper on the hun­dreds of glob­al Perl Mongers groups’ mee­tups, but that hasn’t stopped the year­ly Perl and Raku Conference from meet­ing vir­tu­al­ly. (In the past there have also been year­ly European and Asian con­fer­ences, occa­sion­al for­ays into South America and Russia, as well as hackathons and work­shops world­wide.) There are IRC servers and chan­nels for chat, mail­ing lists galore, blogs (yes, apart from this one), and a quirky social net­work that pre­dates Facebook and Twitter.

So no, Perl isn’t dead or even dying, but if you don’t like it and favor some­thing new­er, that’s OK! Technologies can coex­ist on their own mer­its and advo­cates of one don’t have to beat down their con­tem­po­raries to be suc­cess­ful. Perl hap­pens to be battle-​tested (to bor­row a term from my friend Curtis Ovid” Poe), it runs large parts of the Web (speak­ing from direct and ongo­ing expe­ri­ence in the host­ing busi­ness here), and it’s still evolv­ing to meet the needs of its users.

I pub­lish Perl sto­ries on this blog once a week, and it seems every time there’s at least one response on social media that amounts to, I hate Perl because of its weird syn­tax.” Or, It looks like line noise.” (Perl seems to have out­last­ed that one—when’s the last time you used an acoustic modem?) Or the quote attrib­uted to Keith Bostic: The only lan­guage that looks the same before and after RSA encryption.”

So let’s address, con­front, and demys­ti­fy this hate. What are these objec­tion­able syn­tac­ti­cal, noisy, pos­si­bly encrypt­ed bits? And why does Perl have them?

Regular expressions

Regular expres­sions, or reg­ex­ps, are not unique to Perl. JavaScript has them. Java has them. Python has them as well as anoth­er mod­ule that adds even more fea­tures. It’s hard to find a lan­guage that does­n’t have them, either native­ly or through the use of a library. It’s com­mon to want to search text using some kind of pat­tern, and reg­ex­ps pro­vide a fair­ly stan­dard­ized if terse mini-​language for doing so. There’s even a C‑based library called PCRE, or Perl Compatible Regular Expressions,” enabling many oth­er pieces of soft­ware to embed a reg­exp engine that’s inspired by (though not quite com­pat­i­ble) with Perl’s syntax.

Being itself inspired by Unix tools like grep, sed, and awk, Perl incor­po­rat­ed reg­u­lar expres­sions into the lan­guage as few oth­er lan­guages have, with bind­ing oper­a­tors of =~ and !~ enabling easy match­ing and sub­sti­tu­tions against expres­sions, and pre-​compilation of reg­ex­ps into their own type of val­ue. Perl then added the abil­i­ty to sep­a­rate reg­ex­ps by white­space to improve read­abil­i­ty, use dif­fer­ent delim­iters to avoid the leaning-​toothpick syn­drome of escap­ing slash (/) char­ac­ters with back­slash­es (\), and name your cap­ture groups and back­ref­er­ences when sub­sti­tut­ing or extract­ing strings.

All this is to say that Perl reg­u­lar expres­sions can be some of the most read­able and robust when used to their full poten­tial. Early on this helped cement Perl’s rep­u­ta­tion as a text-​processing pow­er­house, though the core of reg­ex­ps’ suc­cinct syn­tax can result in difficult-​to-​read code. Such inscrutable exam­ples can be found in any lan­guage that imple­ments reg­u­lar expres­sions; at least Perl offers the enhance­ments men­tioned above.

Sigils

Perl has three built-​in data types that enable you to build all oth­er data struc­tures no mat­ter how com­plex. Its vari­able names are always pre­ced­ed by a sig­il, which is just a fan­cy term for a sym­bol or punc­tu­a­tion mark.

  • A scalar con­tains a string of char­ac­ters, a num­ber, or a ref­er­ence to some­thing, and is pre­ced­ed with a $ (dol­lar sign).
  • An array is an ordered list of scalars begin­ning with an ele­ment num­bered 0 and is pre­ced­ed with a @ (at sign). 
  • A hash, or asso­cia­tive array, is an unordered col­lec­tion of scalars indexed by string keys and is pre­ced­ed with a % (per­cent sign).

So vari­able names $look @like %this. Individual ele­ments of arrays or hash­es are scalars, so they $look[0] $like{'this'}. (That’s the first ele­ment of the @look array count­ing from zero, and the ele­ment in the %like hash with a key of 'this'.)

Perl also has a con­cept of slices, or select­ed parts of an array or hash. A slice of an array looks like @this[1, 2, 3], and a slice of a hash looks like @that{'one', 'two', 'three'}. You could write it out long-​hand like ($this[1], $this[2], $this[3]) and ($that{'one'}, $that{'two'}, $that{'three'} but slices are much eas­i­er. Plus you can even spec­i­fy one or more ranges of ele­ments with the .. oper­a­tor, so @this[0 .. 9] would give you the first ten ele­ments of @this, or @this[0 .. 4, 6 .. 9] would give you nine with the one at index 5 miss­ing. Handy, that.

In oth­er words, the sig­il always tells you what you’re going to get. If it’s a sin­gle scalar val­ue, it’s pre­ced­ed with a $; if it’s a list of val­ues, it’s pre­ced­ed with a @; and if it’s a hash of key-​value pairs, it’s pre­ced­ed with a %. You nev­er have to be con­fused about the con­tents of a vari­able because the name will tell you what’s inside.

Data structures, anonymous values, and dereferencing

I men­tioned ear­li­er that you can build com­plex data struc­tures from Perl’s three built-​in data types. Constructing them with­out a lot of inter­me­di­ate vari­ables requires you to use things like:

  • lists, denot­ed between ( paren­the­ses )
  • anony­mous arrays, denot­ed between [ square brack­ets ]
  • and anony­mous hash­es, denot­ed between { curly braces }.

Given these tools you could build, say, a scalar ref­er­enc­ing an array of street address­es, each address being an anony­mous hash:

$addresses = [
  { 'name'    => 'John Doe',
    'address' => '123 Any Street',
    'city'    => 'Anytown',
    'state'   => 'TX',
  },
  { 'name'    => 'Mary Smith',
    'address' => '100 Other Avenue',
    'city'    => 'Whateverville',
    'state'   => 'PA',
  },
];

(The => is just a way to show cor­re­spon­dence between a hash key and its val­ue, and is just a fun­ny way to write a com­ma (,). And like some oth­er pro­gram­ming lan­guages, it’s OK to have trail­ing com­mas in a list as we do for the 'state' entries above; it makes it eas­i­er to add more entries later.)

Although I’ve nice­ly spaced out my exam­ple above, you can imag­ine a less socia­ble devel­op­er might cram every­thing togeth­er with­out any spaces or new­lines. Further, to extract a spe­cif­ic val­ue from this struc­ture this same per­son might write the fol­low­ing, mak­ing you count dol­lar signs one after anoth­er while read­ing right-​to-​left then left-to-right:

say $$addresses[1]{'name'};

We don’t have to do that, though; we can use arrows that look like -> to deref­er­ence our array and hash elements:

say $addresses->[1]->{'name'};

We can even use post­fix deref­er­enc­ing to pull a slice out of this struc­ture, which is just a fan­cy way of say­ing always read­ing left to right”:

say for $addresses->[1]->@{'name', 'city'};

Which prints out:

Mary Smith
Whateverville

Like I said above, the sig­il always tells you what you’re going to get. In this case, we got:

  • a sliced list of val­ues with the keys 'name' and 'city' out of…
  • an anony­mous hash that was itself the sec­ond ele­ment (count­ing from zero, so index of 1) ref­er­enced in…
  • an anony­mous array which was itself ref­er­enced by…
  • the scalar named $addresses.

That’s a mouth­ful, but com­pli­cat­ed data struc­tures often are. That’s why Perl pro­vides a Data Structures Cookbook as the perldsc doc­u­men­ta­tion page, a ref­er­ences tuto­r­i­al as the perlreftut page, and final­ly a detailed guide to ref­er­ences and nest­ed data struc­tures as the perlref page.

Special variables

Perl was also inspired by Unix com­mand shell lan­guages like the Bourne shell (sh) or Bourne-​again shell (bash), so it has many spe­cial vari­able names using punc­tu­a­tion. There’s @_ for the array of argu­ments passed to a sub­rou­tine, $$ for the process num­ber the cur­rent pro­gram is using in the oper­at­ing sys­tem, and so on. Some of these are so com­mon in Perl pro­grams they are writ­ten with­out com­men­tary, but for the oth­ers there is always the English mod­ule, enabling you to sub­sti­tute in friend­ly (or at least more awk-like) names.

With use English; at the top of your pro­gram, you can say:

All of these pre­de­fined vari­ables, punc­tu­a­tion and English names alike, are doc­u­ment­ed on the perlvar doc­u­men­ta­tion page.

The choice to use punc­tu­a­tion vari­ables or their English equiv­a­lents is up to the devel­op­er, and some have more famil­iar­i­ty with and assume their read­ers under­stand the punc­tu­a­tion vari­ety. Other less-​friendly devel­op­ers engage in code golf,” attempt­ing to express their pro­grams in as few key­strokes as possible.

To com­bat these and oth­er unso­cia­ble ten­den­cies, the perlstyle doc­u­men­ta­tion page admon­ish­es, Perl is designed to give you sev­er­al ways to do any­thing, so con­sid­er pick­ing the most read­able one.” Developers can (and should) also use the perlcritic tool and its includ­ed poli­cies to encour­age best prac­tices, such as pro­hibit­ing all but a few com­mon punc­tu­a­tion vari­ables.

Conclusion: Do you still hate Perl?

There are only two kinds of lan­guages: the ones peo­ple com­plain about and the ones nobody uses.

Bjarne Stroustrup, design­er of the C++ pro­gram­ming language

It’s easy to hate what you don’t under­stand. I hope that read­ing this arti­cle has helped you deci­pher some of Perl’s noisy” quirks as well as its fea­tures for increased read­abil­i­ty. Let me know in the com­ments if you’re hav­ing trou­ble grasp­ing any oth­er aspects of the lan­guage or its ecosys­tem, and I’ll do my best to address them in future posts.