Look, I get it. You don’t like the Perl programming language or have otherwise disregarded it as dead.” (Or perhaps you haven’t, in which case please check out my other blog posts!) It has weird noisy syntax, mixing regular expressions, sigils on variable names, various braces and brackets for data structures, and a menagerie of cryptic special variables. It’s old: 34 years in December, with a history of (sometimes amateur) developers that have used and abused that syntax to ship code of questionable quality. Maybe you grudgingly accept its utility but think it should die gracefully, maintained only to run legacy applications.

But you know what? Perl’s still going. It’s had a steady cadence of yearly releases for the past decade, introducing new features and fencing in bad behavior while maintaining an admirable level of backward compatibility. Yes, there was a too-​long adventure developing what started as Perl 6, but that language now has its own identity as Raku and even has facilities for mixing Perl with its native code or vice versa.

And then there’s CPAN, the Comprehensive Perl Archive Network: a continually-​updated collection of over 200,000 open-​source modules written by over 14,000 authors, the best of which are well-​tested and ‑documented (applying peer pressure to those that fall short), presented through a search engine and front-​end built by scores of contributors. Through CPAN you can find distributions for things like:

All of this is available through a mature installation toolchain that doesn’t break from month to month.

Finally and most importantly, there’s the global Perl community. The COVID-​19 pandemic has put a damper on the hundreds of global Perl Mongers groups’ meetups, but that hasn’t stopped the yearly Perl and Raku Conference from meeting virtually. (In the past there have also been yearly European and Asian conferences, occasional forays into South America and Russia, as well as hackathons and workshops worldwide.) There are IRC servers and channels for chat, mailing lists galore, blogs (yes, apart from this one), and a quirky social network that predates Facebook and Twitter.

So no, Perl isn’t dead or even dying, but if you don’t like it and favor something newer, that’s OK! Technologies can coexist on their own merits and advocates of one don’t have to beat down their contemporaries to be successful. Perl happens to be battle-​tested (to borrow a term from my friend Curtis Ovid” Poe), it runs large parts of the Web (speaking from direct and ongoing experience in the hosting business here), and it’s still evolving to meet the needs of its users.

I publish Perl stories on this blog once a week, and it seems every time there’s at least one response on social media that amounts to, I hate Perl because of its weird syntax.” Or, It looks like line noise.” (Perl seems to have outlasted that one—when’s the last time you used an acoustic modem?) Or the quote attributed to Keith Bostic: The only language that looks the same before and after RSA encryption.”

So let’s address, confront, and demystify this hate. What are these objectionable syntactical, noisy, possibly encrypted bits? And why does Perl have them?

Regular expressions

Regular expressions, or regexps, are not unique to Perl. JavaScript has them. Java has them. Python has them as well as another module that adds even more features. It’s hard to find a language that doesn’t have them, either natively or through the use of a library. It’s common to want to search text using some kind of pattern, and regexps provide a fairly standardized if terse mini-​language for doing so. There’s even a C‑based library called PCRE, or Perl Compatible Regular Expressions,” enabling many other pieces of software to embed a regexp engine that’s inspired by (though not quite compatible) with Perl’s syntax.

Being itself inspired by Unix tools like grep, sed, and awk, Perl incorporated regular expressions into the language as few other languages have, with binding operators of =~ and !~ enabling easy matching and substitutions against expressions, and pre-​compilation of regexps into their own type of value. Perl then added the ability to separate regexps by whitespace to improve readability, use different delimiters to avoid the leaning-​toothpick syndrome of escaping slash (/) characters with backslashes (\), and name your capture groups and backreferences when substituting or extracting strings.

All this is to say that Perl regular expressions can be some of the most readable and robust when used to their full potential. Early on this helped cement Perl’s reputation as a text-​processing powerhouse, though the core of regexps’ succinct syntax can result in difficult-​to-​read code. Such inscrutable examples can be found in any language that implements regular expressions; at least Perl offers the enhancements mentioned above.

Sigils

Perl has three built-​in data types that enable you to build all other data structures no matter how complex. Its variable names are always preceded by a sigil, which is just a fancy term for a symbol or punctuation mark.

  • A scalar contains a string of characters, a number, or a reference to something, and is preceded with a $ (dollar sign).
  • An array is an ordered list of scalars beginning with an element numbered 0 and is preceded with a @ (at sign). 
  • A hash, or associative array, is an unordered collection of scalars indexed by string keys and is preceded with a % (percent sign).

So variable names $look @like %this. Individual elements of arrays or hashes are scalars, so they $look[0] $like{'this'}. (That’s the first element of the @look array counting from zero, and the element in the %like hash with a key of 'this'.)

Perl also has a concept of slices, or selected parts of an array or hash. A slice of an array looks like @this[1, 2, 3], and a slice of a hash looks like @that{'one', 'two', 'three'}. You could write it out long-​hand like ($this[1], $this[2], $this[3]) and ($that{'one'}, $that{'two'}, $that{'three'} but slices are much easier. Plus you can even specify one or more ranges of elements with the .. operator, so @this[0 .. 9] would give you the first ten elements of @this, or @this[0 .. 4, 6 .. 9] would give you nine with the one at index 5 missing. Handy, that.

In other words, the sigil always tells you what you’re going to get. If it’s a single scalar value, it’s preceded with a $; if it’s a list of values, it’s preceded with a @; and if it’s a hash of key-​value pairs, it’s preceded with a %. You never have to be confused about the contents of a variable because the name will tell you what’s inside.

Data structures, anonymous values, and dereferencing

I mentioned earlier that you can build complex data structures from Perl’s three built-​in data types. Constructing them without a lot of intermediate variables requires you to use things like:

  • lists, denoted between ( parentheses )
  • anonymous arrays, denoted between [ square brackets ]
  • and anonymous hashes, denoted between { curly braces }.

Given these tools you could build, say, a scalar referencing an array of street addresses, each address being an anonymous hash:

$addresses = [
  { 'name'    => 'John Doe',
    'address' => '123 Any Street',
    'city'    => 'Anytown',
    'state'   => 'TX',
  },
  { 'name'    => 'Mary Smith',
    'address' => '100 Other Avenue',
    'city'    => 'Whateverville',
    'state'   => 'PA',
  },
];

(The => is just a way to show correspondence between a hash key and its value, and is just a funny way to write a comma (,). And like some other programming languages, it’s OK to have trailing commas in a list as we do for the 'state' entries above; it makes it easier to add more entries later.)

Although I’ve nicely spaced out my example above, you can imagine a less sociable developer might cram everything together without any spaces or newlines. Further, to extract a specific value from this structure this same person might write the following, making you count dollar signs one after another while reading right-​to-​left then left-to-right:

say $$addresses[1]{'name'};

We don’t have to do that, though; we can use arrows that look like -> to dereference our array and hash elements:

say $addresses->[1]->{'name'};

We can even use postfix dereferencing to pull a slice out of this structure, which is just a fancy way of saying always reading left to right”:

say for $addresses->[1]->@{'name', 'city'};

Which prints out:

Mary Smith
Whateverville

Like I said above, the sigil always tells you what you’re going to get. In this case, we got:

  • a sliced list of values with the keys 'name' and 'city' out of…
  • an anonymous hash that was itself the second element (counting from zero, so index of 1) referenced in…
  • an anonymous array which was itself referenced by…
  • the scalar named $addresses.

That’s a mouthful, but complicated data structures often are. That’s why Perl provides a Data Structures Cookbook as the perldsc documentation page, a references tutorial as the perlreftut page, and finally a detailed guide to references and nested data structures as the perlref page.

Special variables

Perl was also inspired by Unix command shell languages like the Bourne shell (sh) or Bourne-​again shell (bash), so it has many special variable names using punctuation. There’s @_ for the array of arguments passed to a subroutine, $$ for the process number the current program is using in the operating system, and so on. Some of these are so common in Perl programs they are written without commentary, but for the others there is always the English module, enabling you to substitute in friendly (or at least more awk-like) names.

With use English; at the top of your program, you can say:

All of these predefined variables, punctuation and English names alike, are documented on the perlvar documentation page.

The choice to use punctuation variables or their English equivalents is up to the developer, and some have more familiarity with and assume their readers understand the punctuation variety. Other less-​friendly developers engage in code golf,” attempting to express their programs in as few keystrokes as possible.

To combat these and other unsociable tendencies, the perlstyle documentation page admonishes, Perl is designed to give you several ways to do anything, so consider picking the most readable one.” Developers can (and should) also use the perlcritic tool and its included policies to encourage best practices, such as prohibiting all but a few common punctuation variables.

Conclusion: Do you still hate Perl?

There are only two kinds of languages: the ones people complain about and the ones nobody uses.

Bjarne Stroustrup, designer of the C++ programming language

It’s easy to hate what you don’t understand. I hope that reading this article has helped you decipher some of Perl’s noisy” quirks as well as its features for increased readability. Let me know in the comments if you’re having trouble grasping any other aspects of the language or its ecosystem, and I’ll do my best to address them in future posts.