Dire cybersecurity warnings about QR codes are commonplace, but is the risk really as bad as some vendors are saying?
We dig into the history and use of QR codes, so you can judge the dangers for yourself.
Even if you’re not into using QR codes, you’ll have seen them all over the place, and you’ve probably scanned at least one or two of them in your time.
Colloquially, QR codes are often described as two-dimensional barcodes, which is something of a misnomer.
Data encoded into a QR code is represented as a collection of squares, known in QR jargon by the mildly peculiar name modules, exactly as wide as they are tall:
In contrast, barcodes, which most of us have lived with on product packaging for our entire lives, are considered one-dimensional, because they are scanned in one direction, typically from left to right, even though they are printed as a rectangular series of vertical stripes much taller than they are wide.
Increasing the height of each stripe doesn’t add any extra data – it’s done so that the barcode and the scanner don’t have to line up exactly, with the scanner free to pass over the stripes at a wide range of angles, as long as it does so at a largely constant speed:
Most barcodes of this sort represent a 13-digit number, using a redundant encoding method to improve reliability. (The numbers printed underneath are there for convenience. They might not be correct, and the barcode protocol ignores them whether they’re right or wrong.)
Redundant, in this context, means that there is more than one way of representing each digit, so that different encodings of the same digit can be used to convey additional information or to help detect mistakes.
The full barcode consists of 95 vertical stripes of equal width, each of which is either black for the binary digit 1, or white for 0, thus potentially representing 295 values, or numbers of at least 28 decimal digits.
As you can imagine, however, reliably distinguishing every possible combination, from 95 white stripes on a white background (the number 0), to an uninterrupted slab of black (the number 295-1 = 39,614,081,257,132,168,796,771,975,167), would be as good as impossible.
Instead, product barcodes impose a pattern that scanners with the modest computing power available in the 1970s could handle easily.
The first and last three binary strips are always 101
, and the middle five are always 01010
, so that the scanner can reliably detect the start, middle and end of the code.
Each digit is represented by seven stripes of its own, encoding each digit using a 7-bit binary code:
Cleverly, digits at the left hand end of the barcode are always encoded using a binary value with an odd number of bits set to 1, and those at the right hand end using values with an even number of 1 bits.
As an example, the digit 0
is encoded as as 0001101
(the number 13, with three bits set) when it is at the start of a barcode, but as 1110010
(the number 114, with four bits set) when it is at the end.
That’s possible because there are plenty of different encodings to choose from for each digit, given that 27 different 7-digit binary values are available to represent just 10 different decimal digits.
The scanner can therefore use the eleven marker bits at the ends and in the middle of the barcode, which have the symmetric pattern 101...01010...101
, to get its timing right, and can use the encoding of the digit it read in first to tell which end of the barcode it started at.
In other words, the scanner can tell whether it read the barcode upside-down or not, and flip the digits round if needed.
The great thing about conventional product barcodes is that they’re easy to print, reliable to scan with modest amounts of processing power, and have been widely accepted and globally standardised for decades.
The not-so-great thing is that they aren’t very good for general use, for example by you and me at home, or in our general marketing activities, or for promoting a new URL on a website.
After all, there are only 1013, or ten million million different numbers that a 13-digit barcode can represent, and one of those 13 digits is a checksum, uniquely derived from the others to help detect errors.
In practice, then, there are only 1012 (one million million, or one trillion) different barcode values available.
If we try to use the barcode value itself to represent text characters directly, we have just 40 bits available (because 1012 is about 240), so even sticking to Roman alphabets and using 8 bits per character, we’re limited to a measly five characters (because 8×5 = 40).
If we pick arbitrary numbers and use them as a tracking codes of our own, we could create our own lookup lists, such as:
696,959,461,333 → https://solcyber.com/blog 195,275,999,132 → https://solcyber.com/foundational-coverage/ 876,124,031,621 → https://solcyber.com/pricing/
But we would need some sort of central authority or number allocation protocol, such as exists for the global product barcodes that identify tins of beans, books, household appliances and so on.
This would make the codes as good as useless for short-term, everyday purposes such as web promotions, Wi-Fi passwords and the like.
If we all chose barcodes randomly instead of using a formal and bureaucratic allocation system, we’d quickly be in trouble.
As we learned in our recent study of the delightfully alarming Birthday Paradox, the chance that two people will randomly pick the same number from a selection of N possible numbers will be higher than 50% after just 1.177√N
choices.
For a barcode that can represent one trillion different numbers, the 50-50 chance of hitting a collision comes at just 1.177√1012
= 1,177,000.
Even if you choose just 500,000 barcodes randomly, the probability of a collision is 10%; with 3 million or more random barcodes in circulation, at least one collision is as good as certain:
The answer?
Bigger barcodes, capable of storing more, much more, data!
By the early 1990s, a two-dimensional barcode standard known as PDF417 was introduced, a curious name that’s short for portable data file, not to be confused with Adobe’s Portable Document Format, also known as PDF, which came a few years later.
PDF417 uses an interesting way of extending old-school product barcodes, allowing it to work with very similar scanning technology based on the reflection of a single beam of light.
These barcodes have their own start and stop patterns to the left and right, with a variable number of data blocks in between , each 17 ‘squares’ wide.
These data blocks consist of alternating black-and-white patterns, with 4 black-and-white pairs every 17 squares, thus the numbers 417
in the name of the standard.
These horizontal, one-dimensional barcodes can be stacked together vertically, and are processed from left-to-right, top-to-bottom, thus encoding more and more data as the code area gets wider, or taller, or both.
The completed codes don’t have to be square, so they can be fitted into rectangular spaces on documents of varying sizes and shape, notably on driving licences and other types of identity document in the US and elsewhere.
However, to support line-by-line scanning for reading PDF417 codes, the designers couldn’t use square pixels on each line of barcode data, or else the barcode and the scanner would have need to be aligned much more precisely than was viable in the field.
PDF417 therefore requires that each horizontal ‘pixel’ be at least three times taller than it is wide, thus making it at least three times less efficient in the area it occupies than a system based on square pixels.
This gives PDF417 codes a vertically stretched look, as in this example:
By the mid-1990s, an alternative approach to two-dimensional barcodes had been invented at a Japanese automotive company that wanted to solve several problems at the same time:
This invention was dubbed the QR code, short for quick response, which first appeared in 1994.
Part of the success story of the QR code system is almost certainly down to the way it was engineered.
The project wasn’t run by a large committee, no matter how well-informed; it wasn’t worked on by a crowd-sourced group of global contributors, no matter how individually competent; but was completed by a team of just two passionate and scientific experts.
Thanks to ever-increasing computing power, and the ever-improving reliability of digital camera, the designers didn’t stick to scanning based on light beams.
Instead, they chose to work with digital image captures of the entire code and the area around it, for example from a picture snapped of the side of a box of components as it was transported across an automotive factory floor.
They first focused on how to find and extract QR images reliably, even if there was lots of other digitally printed black-and-white text surrounding it.
They decided to go with square pixels and square images, with a so-called finder module of 7×7 pixels in each of three corners, deliberately arranged as a 3×3 dark grid, surrounded by a 1-pixel light border, followed a 1-pixel dark border:
This black-to-white ratio of 1:1:3:1:1 in the blocks used to make the QR code stand out from its surroundings was deliberately and cleverly chosen.
According to the inventors, who scanned and analysed reams of existing documentation as part of their planning and design, “it was the pattern least likely to appear on various business forms and the like.”
As you pack more and more data into a QR code, the encoding algorithm makes the image larger, with the width (and thus obviously also the height) increasing in jumps.
QR codes have widths denoted in squares, formally known as modules.
Each module is almost certain to take up more than just one pixel on the screen or in print, though we shall used the words square, module and pixel in a loosely interchangeable way here.
There are 40 different allowable widths, confusingly referred to as versions, although they all form part of the same specification.
As the version number increases, the area covered by the image increases quadratically, so the QR standard adds a regularised square grid of so-called alignment modules of 5×5 squares to help the decoder keep track of its position.
The standard also requires alternating lines of dark-and-light dots, like pedestrian crossings, known as timing modules, that act as predictable ‘pathways’ between the finder modules.
Also, the pixel that’s a chess knight’s move away from the bottom-left finder module must always be set to the colour used as ‘dark’, typically black, though the importance of that single dark pixel in the overall encoding is unclear. (My own iPhone camera doesn’t seem to care, and doesn’t say anything, if the so-called dark module itself is white when all the other dark pixels in the image are black.)
Here’s an animation showing the 40 different sizes of QR code, with the finder modules in blue, the timing modules and alignment modules in green, the solitary dark module picked out in bright red, and the parts of the image set aside for data storage in light pink:
And here’s an animation showing the consistent location of the finder, alignment and tracking modules, along with data that changes every time, starting from the URL https://solcyber.com/#00000000
and incrementing to https://solcyber.com/#00000099
.
The black pixels are used for additional format-related information generated in the encoding process, and used when decoding the pink-and-dark-red pixels that represent the stored data, including any needed error correction codes:
As you can imagine, QR codes had an obvious, immediate and convenient use for tracking components, packages, pallets, machinery, and other items on an automotive factory floor, in a warehouse, during shipping, and so on.
Likewise, tracking the processes and documentation that goes with all of the above, especially when needed for regulatory reasons, is clearly a great use case for the flexibility of the QR code system.
As DENSO Wave, the company that created QR codes, writes in its own history:
“The QR Code was adopted by the auto industry for use in their electronic [management system, or Kanban], and it contributed greatly to making their management work efficient for a wide range of tasks from production to shipping to the issuing of transaction slips. Also, in response to a newly-emerging societal trend where people demanded that the industries’ production processes be made transparent partly to make products traceable, food, pharmaceutical and contact lens companies began to use the code to control their merchandise.
Particularly, after incidents such as the BSE problem [commonly known as “mad cow disease”] that threatened food safety, the industry had to respond to consumers’ demands that the whole processes of production and logistics for the foods that ended up on their dining tables be made completely transparent. The QR Code became an indispensable medium that could store a great deal of information on these processes.”
But what about the rest of us?
DENSO Wave patented the system, but made it free for anyone to use, as long as they followed the standards correctly, so there were no licensing fees to pay.
Nevertheless, QR codes weren’t an immediate hit, at least outside Japan, perhaps because we already had ways of doing most or all of the things they could help us with.
Also, of course, handling two-dimensional QR codes requires the reliable capture of an entire image up front, which means having a decent-quality digital camera in every scanning device, so companies couldn’t repurpose their existing barcode scanners.
Clearly, QR codes are little use on laptops.
Even if you have a built-in webcam with satisfactory resolution, which was certainly not usual in the 2000s, there’s no way to aim it at a QR code image that’s displayed on the laptop screen.
And there’s little or no incentive to dig out your laptop in the street, open it up, unlock it, and aim it clumsily backwards at a QR code on a billboard or a bus shelter to access a web link that you could simply type in.
Although feature-phone software capable of reading in QR codes was available in Japan from 2002, even today the DENSO Wave website warns you not to expect an entry-level mobile phone camera to read QR codes reliably if they are larger than version 10, which is 57×57 squares in size.
That’s enough for most URLs you might need (a version 10 QR code can store 119 bytes with high error correction, or 271 bytes at the lowest setting), but not for anything more ambitious than that.
And with mobile networks in the pre-iPhone era suffering from slow and expensive data plans, most users were happy to stick to using phones for dedicated tasks, such as keeping up with email on a BlackBerry or Windows CE device.
That left little use for a picture-based way of importing tiny snippets of data from adverts, or for accessing a web link that would be handier to visit from your laptop anyway.
QR codes had a short burst of publicity and interest in the 2000s and early 2010s, but never became as prevalent as some observers predicted (or might have liked).
Several factors, however, have combined to make QR codes both fashionable and useful again, at least in North America and Western Europe, including:
we1RD-p!assw0rds
on their phones.As with most “features” in IT, the convenience of QR codes comes with risk.
But just how serious is that risk, and how easy is it to manage?
Clearly, QR codes that scan as URLs pose at least as much risk as unknown URLs you encounter on web pages, or receive via email, or type in from brochures and posters.
Indeed, some people argue that QR codes are much more dangerous than URLs sent by more conventional means, because the codes are illegible to humans, so even well-informed users can’t peruse them in advance to look for likely tricks or obvious signs of phishing.
Phishing via QR codes has a jargon name all of its own: quishing. However, that name hasn’t caught on as much as some vendors and media writers might like. It’s often seen as just another potentially confusing term we don’t really need.
For example, a rogue QR code glued carefully on top of the official QR code on a parking payment machine in the town you’re currently visiting could unknowingly lure you to a look-alike payment site run by cybercriminals.
With your car already parked up, under pressure to pay promptly to avoid a fine, and on a website you’ve never used before because you’re a tourist, you can see how a reasonably careful clone of the legitimate payment site could catch you out.
Likewise, scanning a QR code at a cafe table to place your order is fast and convenient, but typically relies on a plaque or sticker that’s left unattended most of the time, and could have been substituted or modified by any previous visitor to the table or passer-by:
However, this ignores the fact that what we referred to above as “conventional” URLs can be just as sneakily – and perhaps even less obviously – obscured by cybercriminals keen to lure you into trouble.
For example, many email clients, including browser-based email, will automatically convert text that looks like a URL into a clickable link that takes you exactly where the text says, like this:
But the same email can be sent with the URL in the text already converted into a link that goes somewhere else entirely, producing an identical-looking email that actually takes you where you probably don’t expect:
Only if you hover over the link for a while before clicking it will the deception be revealed obviously:
Here, the diversion is obvious because the URLs are entirely different, but cybercriminals are adept at registering look-alike domains (often called typosquatting in the jargon) that are close enough at first glance to deceive the eye, so they can write exannple.com
instead of example.com
, or pay2-park.example
instead of pay2park.example
.
In other words, the most-warned-about risk posed by QR codes is a risk we need to be aware of all the time anyway, as part of our routine anti-phishing awareness.
Surprisingly, perhaps, one risk posed by QR codes is not down to the codes themselves, but to the app you choose for scanning them.
The built-in camera apps in both iOS and Android can automatically detect and offer to act on QR codes when they find them in frame, and if you aren’t in a hurry, you can review the data in the code carefully before you do anything with it.
But there are plenty of pushy third-party apps out there that promise to augment and improve your QR code experience, sometimes relying on the fact that you aren’t familiar with the built-in abilities of your phone’s camera app.
Not all of these add-on apps have your best interests at heart.
Plenty of legitimate QR apps do exist, but if you’re tempted to dive in and install one, especially if it comes with a “sign up to try before you buy” subscription or comes from a secondary app market if you’re on Android, take advice from people you actually know and trust before you install it. (Don’t rely on product reviews in the App Store or Play Store: they could have been written by anyone, and probably were.)
Whichever app you choose to use, whether built into your camera or downloaded later, learn how to control its QR features, notably how to open a window to view the full data you’ll be trusting if you click through to any links or services offered by the QR code.
As always, when you’re online, don’t be in a hurry: Stop. Think. Connect.
And whenever and however you reach a web page that’s asking for personal information: If in doubt, don’t give it out.
Paul Ducklin is a respected expert with more than 30 years of experience as a programmer, reverser, researcher and educator in the cybersecurity industry. Duck, as he is known, is also a globally respected writer, presenter and podcaster with an unmatched knack for explaining even the most complex technical issues in plain English. Read, learn, enjoy!
By subscribing you agree to our Privacy Policy and provide consent to receive updates from our company.