Script Malware: When simple things turn out hard (Part 2 of 2)

Script Malware: When simple things turn out hard (Part 2 of 2)

Paul Ducklin
Paul Ducklin
16 min read
Share this article:

Low-level exploits and vulnerabilities that lead to spyware takeovers on mobile phones sound like hard problems to solve, and they are. In contrast, script malware sounds as it should be easy to deal with.

After all, just how hard could a chunk of rogue code in the form of a text-based script possibly be for an experienced threat responder to find and remove?

The fact that we’ve had to split this article into two parts probably gives you a hint at the answer: not just hard, but also time-consuming.

In Part 1 of this article, we asked, “How hard can text-based scripts really be for cybersecurity responders?”

After all, searching through plain old text files for rogue commands sounds as though it should be straightforward.

If you imagine a typical Windows batch file, for instance, or think of an old-school BASIC program, and compare it to executable code created in hand-crafted assembly language, or compiled from source code written in C or any other system-level programming tools…

…it’s hard to imagine how even the gnarliest script could be harder to deal with than machine-code malware.

Last time, however, even when we went way back to the late 1980s and the super-simple early days of MS DOS batch files, we saw that script malware quickly turned into a very complex problem indeed, for numerous reasons.

Almost any text file can be turned into a working batch script, even if it it generates thousands or millions of error messages when it’s run, and takes tens of seconds or minutes to finish instead of the few milliseconds that it would take in its simplest form.

But that sort of size and complexity is irrelevant to cyberattackers, especially if the rogue code runs in the background so the error messages go unnoticed.

On the other hand, detecting the embedded malware, as trivial as it might be once it’s been found and extracted from the file it’s hidden in, may take longer than defenders can really afford to spend.

A needle and a stalk of hay laid next to each other are trivial to differentiate, both by eye and with an automated detector: in most cases, a child’s toy magnet will do the job admirably.

Yet the term finding a needle in a haystack is a common metaphor for a task that is so time-consuming and fussy to carry out that it’s considered impossible, especially if you want the haystack to survive the ordeal.

As we noted last time, script malware:

  • Tilted the programming burden in favour of the cybercriminals by flattening the learning curve for malware creation and modification.
  • Tilted the complexity of detection and prevention against the cybersecurity industry, given that almost any file could, in theory, act as an infectious host for some sort of malware code.

Office scripts

It wasn’t until the second half of the 1990s, however, that script malware really showed how troublesome it could be, when cybercriminals figured out how to abuse the Visual Basic scripting engine inside Microsoft Word, soon enhanced and renamed to Visual Basic for Applications (VBA) for the Microsoft Office suite.

Word, Excel, Powerpoint and other commonly-shared files, it turned out, could be used as carriers for programs colloquially known as macros that remain a serious cybersecurity problem to this day, although they’re somewhat easier to control in the latest versions of Windows and Office.

Excitingly but dangerously, VBA programs were not only easy for anyone to write, but could also be created right inside the Office apps themselves.

Every Word and Excel user had access to a slick programming editor, an online help system for VBA, and a fully-featured code debugger, just by going to the Tools menu and opening up the Macros option, as seen here in Word 97:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

The Visual Basic Editor was powerful and easy to use, and VBA code was easy to get the hang of, as in this brief example of a macro to pop up a message box on the screen:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

Even more dangerous was the fact that simply by using a range of special names for your macros, you could tell Office to run them automatically when the document was used, without needing to bother the recipient with the details of how to do so. (These automatic macro ‘features’ still exist in current versions of Word, but various protections added over the years by Microsoft make it much less likely that macros will run without warning.)

In the Word 97 example above, the macro AutoOpen() will trigger automatically, as the name suggests, whenever the document is opened:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

Macros such as AutoOpen can be considered a feature if your goal is to update content such as dates and currency exchange rates automatically every time you open the document, but they’re a massive cybersecurity risk when you’re opening files from other people, such as email attachments or files copied from a shared drive.

As you can imagine, VBA macros quickly became popular with malware creators, who used them to create self-replicating, fast-spreading viruses, often with damaging side-effects, including dangerously modifying the content of files (for example, randomly changing some appearances of the words is not to is), injecting political messages, and spamming out hundreds or thousands of infectious emails to people in your address book.

Confronting the enemy

Threat responders and cybersecurity programmers alike therefore needed to know how to examine Word documents, Excel spreadsheets and PowerPoint presentations, which are often huge files, in order to find and extract any rogue VBA code before it could cause havoc.

You might hope, as those of us involved at that time in anti-malware protection did, that Office files would contain data stored close to the start of the file to denote whether it contained macros or not, which would make it quick and easy to decide which documents needed further vetting for possible malicious scripts.

You might also hope that this ‘macros are present’ indicator, if it existed, would tell you directly where to go in the file to find the macros, and that they’d be stored in an easy-to-analyse text form, ideally exactly as our AutoOpen macro appears in the Visual Basic Editor above.

A brief glance at a DOC file instantly dashes those hopes.

Internally, DOC files are organised much like a giant floppy diskette: they have a 512-byte header, a file allocation table or FAT, a root directory, and are arranged into files and folders, though in Office file jargon these are referred to as streams and storages.

First of all, you need to understand how Office file storage works, and although Microsoft now has an online description of the so-called Compound File Binary File Format, also known as the OLE2 format, it was a proprietary secret back in the 1990s.

Then, you need to figure out how and where the macro code is stored so you can extract it and examine it.

The closest we can find to the plaintext version of the AutoOpen macro code appears about 75% of the way through the file in our example:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

The blue-tinted bytes are the original source code, although it looks slightly garbled above because it’s been compressed to save space.

The pink-tinted bytes are a pre-compiled or tokenised version of the macro source code, converted into what’s known as pseudo-code or p-code, a proprietary internal representation that varies between Windows and Mac, and from Office version to Office version.

A script storage trifecta

Annoyingly, Office ignores the source code altogether if your file is opened by a user with the same operating system type and Office version as you, to the point that if they open your code in the Visual Basic Editor, the source code they see will be reconstructed from the decompiled p-code.

But the p-code is worthless if ever you share your files with people using a different computer or Office version, and if that happens, the p-code that their computer ends up executing will be silently re-generated by Office from the compressed source code.

Two-faced file format ambiguities like this are a perennial and serious problem in cybersecurity, for reasons that will now become painfully obvious.

Crooks soon realised that different security products (and different malware analysts) variously relied either on the source code or on the p-code when doing their research, given that the two forms are supposed to be equivalent.

Cybercriminals therefore learned how to inject different macros into the source code and the p-code sections as a way of sowing cybersecurity confusion.

One code version might be innocent and the other malicious, so different malware scanners would produce conflicting results depending on whether they treated the source or the p-code as definitive.

Some users would get infected despite thinking they were protected, and others whose software blocked the threat might wonder what all the fuss what about, even inside the same network or company.

Amazingly, the full story was (and still is) even worse, because after compiling, debugging or running any macro in an Office file, a third version of that macro gets generated and stored in the file, and will be used if possible by Office in future, in preference to the p-code:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

This third flavour of macro code is commonly known in the jargon as a SRP stream, pronounced ‘serp stream’, after the internal names __SRP_x that are used by Office, where x is a number that counts up from zero.

Apparently, SRP streams are a re-usable memory dump of the p-code after it’s been loaded for execution but just before it actually gets used, presumably as a cache to save a few moments next time anyone opens the file in the same version of Office.

Note how the programming comments were retained in the p-code above, so that the source code could be fully reconstructed from it, but have been discarded in the SRP stream because they’re not needed when the macro is finally executed.

Viewing the structure of the above file in the form of a directory listing, as if it were part of a regular hard disk, shows the numerous storages and streams into which the file is organised internally:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

Simplicity made complex

Reliably scanning an Office file for malware therefore requires at least:

  • Understanding the Compound File Binary File Format (OLE2) filing system.
  • Extracting the storages and streams (directories and files) it contains.
  • Figuring out which ones might contain macro code.
  • Deconstructing, decompressing and decompiling up to three different forms of that code, each of which might be the one that gets used next time, depending on who opens the file, and how, and where.
  • Constructing a meaningful report of what you found.
  • Doing all that in reverse, and rebuilding the file without corrupting it, for any threat remediation product that supports disifnection and cleanup.

As an example, here’s part of the source code of the infamous Melissa virus that was unleashed by its author, David Smith, in 1999. (Smith, in case you’re wondering, went to prison for doing so.)

The code you can see here, opened in human-friendly format in the Visual Basic Editor, gives you a good idea why the virus spread so far and so fast:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

Even if you aren’t fluent in VBA, and even without simplifing the deliberately obtuse variable names UngaDasOutlook, DasMapiName, and BreakUmOffASlice, you can probably make out that the code consists of a loop that generates an email for each of your Outlook address books, then loops round to add the first 50 entries in each address book to each email, and sends those emails out one after the other.

Any cybercriminal with the source code of Melissa in text form could trivially create a working variant of it simply by pasting the text into the Visual Basic Editor inside any copy of Word, and making any modifications they fancied, such as setting a bigger loop count, using different variable names, varying the email content, and so on.

The technical complexities of compiling to p-code, generating the __SRP_x stream, compressing the source code, and packaging the new malware contents into the needed storages and streams inside the DOC file would be silently and efficiently taken care of by Microsoft’s proprietary code in Office and Windows: no real technical skill required.

In contrast, threat responders or malware detection toolkits that wanted to detect and prevent this malware would need to do everything the hard way.

They’d need to recognise the compressed source code of this malware:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

They’d also need to handle the equivalent p-code representation, shown below, because a file with valid p-code but corrupted, modified or deleted source code would still be infectious.

You can see from the file offsets (the hexadecimal numbers in the leftmost column that denote how far along in the file we are), where we are looking at file offset 0xA1BD (41,4105), that the p-code below is some distance away from the source code in the file, which starts above at 0x7429 (29,737).

Note also, in this case, the p-code appears later on in the physical file than the compressed source, unlike the situation in our simple AutoOpen example above where the p-code immediately preceded the source code.

This apparently inconsistent layout is a complexity caused by the chunk-based, filesystem-like construction of DOC files:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

Threat responders and detections tools would also need to be able to make sense of the pre-generated, undocumented, ready-to-run binary version of the same code from the relevant SRP stream, because even Office files with apparently harmless macro source code and p-code could still propagate the malware from their cached version of the script.

Here, you can see that the bytes in the SRP stream that correspond to the mass-mailing part of the malware aren’t even in one chunk inside the file, because they’ve been split up between two physically separate 512-byte blocks, again thanks to the sector-like way that DOC files are organised internally (the pink lines represent 512-byte file boundaries):

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

Script detection asymmetry

Similar cybersecurity asymmetry, to use the jargon term for a technical imbalance between attackers and defenders, is a huge problem in many other widepsread script languages today.

JavaScript, for example, is sent to your browser by many if not most modern websites, where your browser is expected to accept it and run it without putting your online safety or privacy at risk.

Generally speaking, the risks are better contained than they are when running Office macro scripts, because your browser is supposed to restrict what remotely-received JavaScript can do.

A Word macro, for example, can reach outside Office and look at files on your hard disk, take screenshots of other apps, or run other programs on your behalf, whereas JavaScript in your browser can’t.

Nevertheless, browser-based JavaScript can peek at and modify your keystrokes while you’re reading the webpage it came in on; can silently upload data back to other sites; can fill in forms and click ads on your behalf; and much more, so that filtering web content for suspicious or known-bad JavaScript before allowing it through is a wise precaution.

And although JavaScript often arrives in uncompiled, non-p-coded, old-school, text-based source form, unlike Office macros, that doesn’t mean it’s fast and easy to deal with.

Loosely speaking, JavaScript can be embedded almost anywhere in almost any file that’s fetched as part of any web page, so threat scanners always need to look at every byte of every file on the way through, lest they allow a rogue script slip through.

Also, as we saw with MS DOS batch files in Part 1, JavaScript source code can automatically be converted by so-called script obfuscation tools from simple, human-readable source code into equivalent but illegible and incomprehensible code that your browser can run just fine, but that human and automated analysis alike can’t analyse and understand quickly.

For example, let’s take one line of JavaScript that prints a message to the console:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

We can make that look a bit less conventional, and perhaps sidestep simple text matching that’s on the lookout for rogue calls to console.log(), by splitting things up like this:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

Then we can introduce a function for reversing text strings so that words hello, world and log don’t appear directly in the code:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

Security by obscurity

Unfortunately, in the JavaScript world, as in many other programming languages in which software is distributed and executed in source code form (or in some sort of p-code that can reliably be converted back into human-readable text), disguising your code still further has become a mainstream activity.

Open-source and commercial code scrambling tools abound.

These tools are often pitched by their creators as cybersecurity measures that can protect your intellectual property, ‘secure’ your code from tampering and reverse engineering, and, if they also compress the code, reduce download times for your users, albeit at the cost of slower startup and higher memory usage.

Google, for example, actively encourages Android developers to use the ProGuard code code scrambling tool that’s built into Android Studio, pitching the ProGuard tool as a feature under the heading, “Shrink, obfuscate, and optimize your app.”

Although the obfuscation step can make your code smaller, by changing meaningful variable names such as ReadAddressList to e567 or ax9et, its primary goal is to make decompiled code written in Java or Kotlin much harder to understand.

In other words, even code that’s clearly been modified to evade detection and analysis can’t simply be identified as suspicious and blocked for that reason.

Security tools that blocked obfuscated code on the reasonable grounds that it deliberately prevented them from vetting it would end up condemning any number of legitimate websites, server tools and mobile apps, even when that code had purposefully been obfuscated using the very same tricks that cybercriminals have embraced for hiding malware.

Well-known code packers that are just a click away on the web will happily convert our JavaScript one-liner into forms like this:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

Or this more comprehensively scrambled confusion:

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

Or even into this academic absurdity, which is legal, valid, functioning JavaScript code in which every digit and character has been recursively decomposed into a minimal set of JavaScript constructs that require just six different characters, namely square brackets (which create arrays or lists), round brackets (which call functions), exclamation points (the NOT operator) and plus signs (the ADD operator):

Script Malware: When simple things turn out hard (Part 2 of 2) - SolCyber

This weird encoding isn’t useful in real life because it can produce output thousands of times larger than its input, and correspondingly slower to run.

But it nevertheless shows how dramatically text-based scripts can be re-worked while still retaining their original function, and still consisting of plain old text.

It’s also a reminder why cybersecurity is often much harder than it needs to be, especially when legitimate programmers and cybersecurity companies themselves embrace and sell tools that metaphorically cut both ways.

There’s nothing technically wrong with using any of the wide range of commercial and open-source script obfuscators on the market (they’re available for a wide range of script and script-like languages, from JavaScript and PHP to Java and C#), and they do provide the functionality that Google claims for its ProGuard tools.

But they come at a double cost.

Firstly, code obfuscators are really little more than security through obscurity, rather than security by design; secondly, this obscurity means that there are a range of handy ‘tells’ we could otherwise use to help stop malware that are no longer available for that purpose.

What to do?

As we wrote at the end of last week’s article, scanning text files for potential script malware sounds as though it ought to be easier, faster and less prone to mistakes than scanning more complex files such as compiled executables.

But even simple scripts can be converted into much more complex forms that make it much harder to tell good and bad apart, and makes the process much less amenable to automation.

If you’re having trouble keeping up, why not look for help from a company that specialises in the human side of cybersecurity?

Look for a managed security service provider who will regularly and routinely work with you to review and adapt your own policies to ensure that by opening the door wide enough to to admit good things from legitimate companies that look like bad things from cybercriminals…

…you aren’t inadvertently letting bad things wander in unnoticed, even though they have all the tell-tale signs of being dangerous and needing to be kept out.

More About Duck

Paul Ducklin is a respected expert with more than 30 years of experience as a programmer, reverser, researcher and educator in the cybersecurity industry. Duck, as he is known, is also a globally respected writer, presenter and podcaster with an unmatched knack for explaining even the most complex technical issues in plain English. Read, learn, enjoy!

Paul Ducklin
Paul Ducklin
Share this article:

Table of contents:

The world doesn’t need another traditional MSSP 
or MDR or XDR.

What it requires is practicality and reason.

Related articles

The world doesn’t need another traditional MSSP or MDR or XDR.
What it requires is practicality and reason.

And security that won’t let you down. It's time to put an end to the cyber insanity once and for all.
No more paying for useless bells and whistles.
No more time wasted on endless security alerts.
No more juggling multiple technologies and contracts.

Follow us!


Join our newsletter to stay up to date on features and releases.

By subscribing you agree to our Privacy Policy and provide consent to receive updates from our company.

SolCyber. All rights reserved
Made with
Jason Pittock

I am interested in
SolCyber XDR++™

I am interested in
SolCyber MDR++™

I am interested in
SolCyber Extended Coverage™

I am interested in
SolCyber Foundational Coverage™

I am interested in a
Free Demo