If HTTPS provides true end-to-end encryption, how do web firewalls crack into your network traffic to do their filtering?
And if a legitimate firewall can do this when you want, what’s stopping random cybercriminals somewhere further down the line from doing it when you don’t?
In a recent article, we investigated HTTPS, the technology that puts the padlock in your browser’s address bar in order to protect you from unwanted surveillance and interference while you’re online.
HTTPS, short for HTTP with security, provides what is known in the jargon as end-to-end encryption using a protocol called TLS (cybersecurity sure loves its acronyms and initialisms!), itself short for transport layer security.
Simply put, TLS takes a network connection that would normally go out in what’s known as plaintext form, raw and unscrambled, and packages it into an encrypted ‘communications tunnel’ until it reaches the server at the other end.
At this point, the encryption is stripped off and the plaintext of the network connection is delivered to the service you’re talking to.
With HTTPS, your browser still generates text-based commands such as GET /sample.html HTTP/1.1
, which is plain old unencrypted HTTP, and the website at the other end sends back text-based HTTP replies such as HTTP/1.1 200 OK
followed by the page content, or HTTP/1.1 404 NOT FOUND
followed by an error message, so that the underlying protocol is much the same as it has been for decades.
But this raw data is only visible on the computer at each end of the link, which is why we say that the part between the network connection endpoints enjoys end-to-end encryption.
HTTPS is important because plain old HTTP, if used all the way from browser to server, sends your data across the internet in a way that is trivial to sniff out and snoop on.
The unencrypted HTTP connection below was captured from the network, after the data had left my computer but before it reached the server, using the popular open-source network analyzer Wireshark:
But introducing TLS for the across-the-internet sector of the journey, thus creating an HTTPS connection, means that that snoops, spies, scammers and any other inquisitive intermediaries along the way see what is mostly just shredded digital cabbage
There’s a modest cost in network performance terms, as you can see (4283 bytes of network data data with TLS versus 537 bytes without), but the web data exchanged is now shrouded from view:
Just to reiterate what we mentioned last time: the TLS-encrypted part of an HTTPS interaction secures the transportation of the data, providing not only confidentiality against surveillance but also protection against malevolent manipulation of the content along the way.
However (and this is vital to remember), TLS isn’t there to vet or validate the original content it is sending and receiving.
In other words, if a web server sends you a legitimate app, then TLS will prevent an intermediary from casually replacing it with malware before it reaches you.
But if a web server sends you malware, then TLS will not (and isn’t supposed to, given that its job is to shield what is sent so it can’t be modified) detect and warn you about that malware, or block it, or clean it up in transit.
Likewise, if you visit a fake news story or a phishing site, then HTTPS will precisely and protectively deliver that dangerous content directly into your browser, because the S in HTTPS exists to preserve data that was already generated and sent out, not to analyze it and pass judgement on it before accepting it for transmission.
At this point, it’s worth asking the question, “How do web filters work if most of the browsing traffic passing through them is just pseudo-random scrambled data that gives little or nothing away?”
When HERE BE DRAGONS
shows up as #YhBmh009~Ib
and THIS IS MALWARE
is intractably disguised as lF5B_h56bj=I@s1
until it reaches your computer, how can security software that isn’t running directly on your endpoint help to block unwanted content proactively?
After all, end-to-end encryption is meant to provide an impenetrable security tunnel that shields your data from inspection by anyone, without fear or favor, including cybercriminals, the government, your ISP, your VPN provider if you have one, and even your own IT department.
Indeed, TLS traffic protection isn’t just a matter of encryption, where each end agrees on a one-time encryption key for the current session.
TLS aims to reassure you not only that your traffic was encrypted and unmodified in transit, but also that it came from the site you expected in the first place. (As we me mentioned earlier, this is not the same as vouching for the safety of the actual content, merely an attestation as to its origin.)
Without this sort of verification of origin, anyone could serve up encrypted data under a banner such as “Genuinely from Wellknown Corp
“, and convincingly distribute modified files as the real deal.
Even worse is that any operator anywhere along your network path could mount what’s known as a manipulator-in-the-middle attack, or MitM for short, and easily masquerade as any site that you choose to visit, like this:
If web servers had no way to convince you that you really had reached the site you wanted, then encryption alone would not be enough to stop cybercriminals from setting up fake servers to impersonate them,
Explainer. Step 2 above, where you reveal which website name you want to visit in plaintext form before the encryption starts, is part of setting up a TLS connection. This text is known as SNI, short for server name indication, and is inserted because most contemporary cloud servers provide content for thousands or even millions of different customers and need to know which customer’s site will be answering the connection request. This is a bit like writing a private letter that you seal into an envelope instead of sending it openly by postcard: it won’t reach the right destination unless you put the recipient’s full address on the outside.
When you send an SNI string denoting that you want to download content from, say, example.com
, the server at the other end sends what’s known as a TLS certificate that identifies itself as the example.com
site.
If the SNI string and the name in the returned certificate don’t match, which could be the result of a genuine mistake, a server that has gone offline and been replaced with a placeholder, or a deliberate misdirection, your browser will helpfully warn you and refuse to go any further.
But anyone can create a certificate in any name they like, as we show here using the programming language Lua and a popular TLS library based on OpenSSL called luaossl
:
All browsers, and any well-written apps, therefore require that you haven’t merely signed the certificate yourself, but that you have had it signed in turn by a so-called certificate authority, or CA, that the app or browser itself already considers trustworthy.
This chain of digital signatures often involves three stages, not just two: you generate a certificate claiming to represent example.com
; the CA signs this with what’s known as an intermediate certificate; and the intermediate is signed by a trusted, top-level certificate from the same CA, or from another CA.
Operating systems and browsers typically maintain their own lists of ‘assumed-good’ top-level CAs, known as root CA certificates, and will automatically approve any website-level certificate that is vouched for by a chain of signatures that ends in a trusted root:
Even though the end-to-end encryption offered by HTTPS provides additional online safety and security for all of us, there are a few groups that actively seek to work around it.
Some have legitimate and ethical goals; others have evil intent.
IT departments, for example, often want to inspect HTTPS traffic with good intentions, such as blocking phishing scams and other known-bad web pages, detecting and stopping rogue downloads, and preventing the unexpected upload of data that isn’t supposed to leave the company.
Many governments are also keen on having at least some way of keeping tabs on their citizens’ browsing, some under the banner of investigating and preventing online crime when duly warranted; others with the authoritarian goal of spying on, censoring or otherwise controlling the lives of the populace.
And, as you can imagine, cybercrooks who can masquerade as legitimate sites that pass the HTTPS ‘encryption and certification test’ can lure their victims into a thoroughly false sense of security, and much more easily convince them to download and install malware, to enter personal information into phishing sites, or to read and believe fake news.
Just as importantly, cybercriminals who can crack open end-to-end TLS encryption tunnels without raising any alarms or popping up any warnings may be able keep track of your business operations in considerable detail.
They could be reading your emails, stealing and selling on data belonging to your staff and customers, running off with your intellectual property, misdirecting payments to or from the company, and more, without implanting any active malware on your computers.
But how can they do so, given that this is exactly what TLS is supposed to prevent?
The certificate chain diagrams above suggest three annoyingly obvious ways:
Most legitimate web filtering tools rely on some version of the third trick above.
A web filtering firewall, for example, will typically generate its own clearly-labelled corporate interception certificate when it is first activated, and then rely on the company’s sysadmins to install this custom root certificate on all their users’ computers, or require remote users to download and install the certificate, before the firewall will allow them to browse beyond the company network.
This is an effective way of scanning encrypted web traffic to detect obvious violations of company policy or to head off unexpected risks, and when done sensitively with each user’s knowledge and consent, is generally considered both ethical and legal.
With an interception CA available, legitimate web filters can pull off what is effectively a MitM (manipulator-in-the-middle) attack as described above, although the description “MitM” is usually reserved for unscrupulous uses of this technique, with cybersecurity vendors preferring marketing-friendly euphemisms including keybridging, decrypt-recrypt, and middleboxing.
When performing this sort of keybridging operation, a web filter can get away with Step 4 in the MitM list above (replying to you and pretending to be the site you thought you had visited directly) because it can automatically generate a fake certificate that pretends to have been issued by the site at the other end, and sign the fake server certificate using its own interception CA.
The middlebox can’t get away with submitting the real certificate that it received from the real website at the far end of the split-in-the-middle connection, because the middlebox doesn’t have the private key for the real certificate, which is why the ‘fake certificate’ ruse is needed.
But the artificial certificate from the middlebox will work fine, without errors or warnings, because your browser trusts the interception CA, and because the middlebox has the right private key to vouch for the fake certificate to your browser.
Unfortunately, traffic interception CAs aren’t always obvious once they’re installed: unlike security software, surveillance tools or malware, there are no background processes, hidden apps, or suspicious behavioral triggers that might give away their presence automatically.
Cybercriminals who can trick or coerce you or your sysadmins into installing a fraudulent interception CA can thereafter carry out exactly the same sort of MitM shenanigans, but for bad instead of for good, and theoretically trick you into thinking that their fraudulent servers are the real deal.
Similarly, a criminal who can break into or exploit a security vulnerability on a web filtering firewall that you already trust may be able to steal the private key for that middlebox’s interception certificate, which would allow them to impersonate the web filter, and therefore carry out an equally devastating attack.
Simply put, getting hold of a trusted middlebox’s private key is essentially equivalent to getting hold of the private key of every other server on the internet.
Why not ask how SolCyber can help you do cybersecurity in the most human-friendly way? Don’t get stuck behind an ever-expanding convoy of security tools that leave you at the whim of policies and procedures that are dictated by the tools, even though they don’t suit your IT team, your colleagues, or your customers!
Paul Ducklin is a respected expert with more than 30 years of experience as a programmer, reverser, researcher and educator in the cybersecurity industry. Duck, as he is known, is also a globally respected writer, presenter and podcaster with an unmatched knack for explaining even the most complex technical issues in plain English. Read, learn, enjoy!
Featured image by Tobias Tullius via Unsplash.