Huon on the internet

Mechanical sympathy for QR codes: making NSW check-in better

By Huon Wilson13 Oct 2021

Governments here in Australia have been telling us to keep distance from each other. Surprisingly, the same government has simultaneously put out posters that required people to get close, unnecessarily. They contain QR codes for contact-tracing check-ins that are small and dense, meaning they’re hard to scan. How could they be better?

Here’s how:

The central one labelled “works now” could be rolled out right now. The existing native app can understand that poster, so if the NSW government switched to issuing something like that, life is magically easier for everyone. The “optimised” poster shows what could’ve been, if the QR code was optimised to the extreme, when the feature was initially deployed into the app.

In this article, we’ll walk through how to get to that point, touching on:

  • the contents of the current check-in QR codes and how they function
  • the workings of QR codes in general (error correction, versions and encoding modes)
  • a walk-through of optimising the data stored in a QR code
  • the security of these QR codes

Let’s learn some general lessons for using/designing QR codes by looking at this specific example.

What are you writing about?

Almost every shopfront here in NSW and even across Australia has an A4 page with a QR code hidden in it. Customers entering need to scan to register their presence for contact tracing, which (if they’ve installed it) pushes them into the Service NSW app. The design is clever in some ways, but seemingly not so clever in other ways.

The check-in poster and the view of the Service NSW app after scanning it. Tap to view the poster alone (PDF).

The QR code encodes a rather long URL: https://www.service.nsw.gov.au/campaign/service-nsw-mobile-app?data=eyJ0IjoiY292aWQxOV9idXNpbmVzcyIsImJpZCI6IjEyMTMyMSIsImJuYW1lIjoiVGVzdCBOU1cgR292ZXJubWVudCBRUiBjb2RlIiwiYmFkZHJlc3MiOiJCdXNpbmVzcyBhZGRyZXNzIGdvZXMgaGVyZSAifQ==.

To fit all this in, it has to be dense: it has 81 modules (little squares) along each side, meaning it’s version 16.

The eyJ0…fQ== value of the data parameter looks like it might be encoded with base64, and indeed, if we try decoding it under that assumption, we find some structured JSON:

1
2
3
4
5
6
{
  "t": "covid19_business",
  "bid": "121321",
  "bname": "Test NSW Government QR code",
  "baddress": "Business address goes here "
}

Putting those keys into words, we can guess that each QR code tells us:

  • t: it’s a COVID-19 check-in code
  • bid: an identifier for the business
  • bname: the business name
  • baddress: the business address (this is a test QR code, but in real QR codes it looks something like “123 Street, Suburb NSW”)

Evaluation

These QR code posters have some great attributes:

  1. They exist at all: the contact tracing situation was a mess until the Service NSW codes were rolled out and made mandatory. Some venues would require hand written entries, others would require entering personal information into random websites (of varying ease-of-use). Now, it’s a single app that remembers your details.
  2. Using a URL (rather than just raw data) means they can be usefully interpreted by any scanner, such as a phone’s camera, rather than requiring scanning with a specific app.
  3. There’s a native app that could theoretically be used to do an offline check-in when there’s no internet access, for later syncing… although this doesn’t actually work in practice, with the current app. The QR code contains enough info to still display a confirmation to the user, so they know they scanned what they expected.
  4. The use of high error correction means they’re theoretically resilient to ‘damage’, like reflections off lamination or windows, or even dirt and holes.
  5. The poster explains how to scan the code (it seems unthinkable now, but these codes were introduced into a world where QR codes were not instantly recognisable).

The Service NSW app shows the venue name when checking in, even offline.

On the other hand, they could be better:

  1. The QR code is physically tiny.
  2. Using the highest level of error correction might be taking things a little far.
  3. The URL could be formatted better to take advantage of how QR codes are constructed.
  4. There’s unnecessary data stored in the URL.
  5. The URL could be much shorter overall.

Let’s go through these, and see what difference they can make. We’ll ignore the realities of actually implementing these in the context of a real and existing app/infrastructure, which can change the appropriate technical decisions dramatically.

Bigger QR code

The QR code is small in the poster. When printed at the default A40, it’s only ~5cm (2 inches) on each side, which means about less than 5% of the total page area is QR code, but that’s what people need to interact with.

Making it larger would make it much easier to scan, and doesn’t change anything about the QR code itself, so will definitely continue to work with the existing app. I imagine that most people in NSW are familiar with how to use the codes now, so the instructions could be de-emphasised.

I did some really quick shuffling and resizing of the page elements, without editing or reflowing text (just deleting the blue box around the code1), and it’s definitely possible to have the code be bigger. It could be even larger still with a little more editing elbow-grease applied.

Two pages placed horizontally. As above, they both contain a 'we're covid safe' logo, text like 'please check in before entering our premises.' and a QR code. The left-most one labelled 'original' has a very small QR and dense code; the right one labelled 'rearranged' has a much larger QR code with the surrounding text rearranged slightly. original rearranged

When printed on an A4 page, the code in the new version is 16cm on each side, more than three times larger, and so can probably be scanned from three times further away. It now consumes ~40% of the page area.

(After publishing, several people pointed out that truly huge QR codes seem to cause difficulties with scanning in practice in some cases. One would need to actually test the design in practice to validate a particular size.)

Better QR code

We’ve done the easiest step of making the QR larger, let’s now make the QR code within the overall poster better. Here’s the sequence of changes we’ll apply, moving from left to right and top to bottom. The codes get simpler, with larger modules (little squares), and thus become easier to scan.

Three QR codes placed horizontally. They get less dense from left to right, and are labelled: 'original' in red, 'error correction Q' in orange, 'without address' in orange. original error correction Q remove address
Three QR codes placed horizontally. They get less dense from left to right, and are labelled: 'better encoding' in orange, 'short path' in green, 'remove name' in orange, better encoding short path remove name

We’ll see how the “short path” QR code in green retains all of the valuable properties we identified above, including simple offline check-in support. The “remove name” code is even simpler, and is what is possible if we compromise offline check-ins nicely.

Less error correction

The QR code uses the highest error correction (EC) level: H. As explored in QR error correction helps and hinders scanning, there’s four levels of error correction, and they correspond to how many modules can be corrupted and still have the QR code scannable. They also influence how much data can be stored:

EC level max damage data storage (vs. L)
H (high) 30% 43%
Q (quartile) 25% 57%
M (medium) 15% 79%
L (low) 7% 100%

I imagine that most codes are placed in relatively non-hostile environments (indoors, or at least under cover), so the use of EC level H could be reduced, or made configurable. Dropping down one level, to Q, reduces the version from 16 to 13, making each module (small square) larger—17% more area—and thus the overall QR code easier to scan.

Dropping further makes the modules larger, although the biggest win comes from moving from H to Q, and each lower level is less resilient.

EC level version module size (vs. H) vs. previous
H 16 - -
Q 13 +17% +17%
M 11 +33% +13%
L 9 +53% +15%

This also doesn’t change anything about the data that is encoded, and so should continue to work with the existing app.

We do want these codes to be reasonably resilient to damage, so let’s choose Q.

Four QR codes placed horizontally. They get less dense from left to right, and are labelled: H in red, Q in green, M in orange and L in orange. H Q M L

Structure of the URL

We’ve done the easiest parts, now we need to actually think about the data encoded in the QR code. Less data means a smaller version, which means larger modules and thus easier scanning. This QR code stores a URL that contains a large base64 JSON snippet. We can optimise in a few ways, some of which seem to work with existing apps, and some which don’t. We can look at the parts of the URL independently:

  1. scheme: https://
  2. domain: www.service.nsw.gov.au
  3. path: campaign/service-nsw-mobile-app
  4. query: data=eyJ0IjoiY292aWQxOV9idXNpbmVzcyIsImJpZCI6IjEyMTMyMSIsImJuYW1lIjoiVGVzdCBOU1cgR292ZXJubWVudCBRUiBjb2RlIiwiYmFkZHJlc3MiOiJCdXNpbmVzcyBhZGRyZXNzIGdvZXMgaGVyZSAifQ==

The simplest part to look at is the scheme, because there’s two options https:// and old-school http://: we could save a character but it doesn’t seem worth dropping the encryption. Let’s leave it, and choose https://.

Less data: remove unnecessary information

The lowest hanging fruit for actual change is in 4, the query. This query is currently a single parameter data with value base64-encoded JSON value eyJ0…fQ== . The JSON includes a baddress field… I cannot find any place in the app or website that displays this, and indeed removing it seems to function just fine. The resulting encoded JSON is eyJ0IjoiY292aWQxOV9idXNpbmVzcyIsImJpZCI6IjEyMTMyMSIsImJuYW1lIjoiVGVzdCBOU1cgR292ZXJubWVudCBRUiBjb2RlIn0=, which is 104 characters, down from 160.

JSON value URL length version at Q
Original with baddress 228 13
New without baddress 172 11

Sounds good. Let’s choose to remove the baddress field.

Three QR codes placed horizontally. The left two labelled 'original' and 'previous' have denser patterns than the right one labelled 'remove address'. original previous remove address

Modes: upper-casing

QR codes encode the data in one of four modes, based on the characters that the data contains:

Mode Permitted characters Bits per character
Numeric 0123456789 3.33
Alphanumeric 0–9, A–Z, space, $%*+-./: 5.5
Binary any byte 8
Kanji Shift JIS X 0208 13

Lower bits per character is better: it means we can fit more characters in a given space. Most data in URL QR codes will use the Binary mode, because there will typically be lowercase letters, but we can do better using the Alphanumeric one: a lot of components of a URL can be upper case without causing issues or changing the behaviour. For instance, the domain name is case-insensitive, and a lot of servers treat the path case-insensitively too. Base64 encoding is case-sensitive so we cannot upper-case.

Fortunately, QR codes can encode the data in multiple segments with different modes2, so we can still benefit from this: the first part can be encoded in Alphanumeric, while the second part with the base64 value can be Binary.

url works with app L Q
Original:
https://www.service.nsw.gov.au/campaign/service-nsw-mobile-app?data=eyJ…
yes 8 11
Upper-case domain and scheme:
HTTPS://WWW.SERVICE.NSW.GOV.AU/campaign/service-nsw-mobile-app?data=eyJ…
yes 8 11
Only data lower-case:
HTTPS://WWW.SERVICE.NSW.GOV.AU/CAMPAIGN/SERVICE-NSW-MOBILE-APP?DATA=eyJ…
no 7 11

The second option—upper-casing only the domain and scheme—is as far as we get and still (seemingly) work with existing app installations.

The upper-casing of that option doesn’t happen to make a difference for the data we have here, although it does reduce the number of bits required total, so it may make a difference for some businesses where their name/ID happens to push them just above a version boundary.

The last option—upper-casing everything except for the base64 value—also doesn’t make much difference at our chosen error correction level Q, but it does at others like L.

Since we’re at the limit, we’re now going off into the world of imagination, where bureaucracy is made up and existing users don’t matter. So let’s lock in the last one.

Less data: encode better

Our URL now looks like: HTTPS://WWW.SERVICE.NSW.GOV.AU/CAMPAIGN/SERVICE-NSW-MOBILE-APP?DATA=eyJ0IjoiY292aWQxOV9idXNpbmVzcyIsImJpZCI6IjEyMTMyMSIsImJuYW1lIjoiVGVzdCBOU1cgR292ZXJubWVudCBRUiBjb2RlIn0=.

The base64 JSON value eyJ0…In0= looks like it’s still worth thinking about: it now contains less data, but it’s still takes 70% of the storage in the QR code (880 bits out of 1256 total). It’s stored in the URL in a ?DATA= query parameter, and the data it contains is:

1
2
3
4
5
{
  "t": "covid19_business",
  "bid": "121321",
  "bname": "Test NSW Government QR code",
}

Unfortunately there’s a lot of overhead from JSON (all the {":,s), and then even more overhead from base64.

We can do better, because we’ve got simple textual data and simple values (plain strings). This means that we could pass the values in the URL directly, either via a query parameters, or in the path directly, which saves us a little overhead of ?N=…. We’ve already got the /C in the URL to indicate a check-in, so we can probably drop the "t": "covid19_business", and our scheme here is very hand-crafted, so we can optimise the parameters down as much as we like.

encoding length Q
Original, JSON + Base64:
...?DATA=eyJ0…
172 11
URL parameters:
...?T=COVID19_BUSINESS&BID=121321&BNAME=name…
126 8
Better URL parameters:
...?I=121321&N=name…
101 7
ID in path:
.../121321&N=name…
99 7
Everything in path:
.../121321/name…
67 7

This makes a huge difference. Our URL is now down to 97 characters (from 228 originally): HTTPS://WWW.SERVICE.NSW.GOV.AU/CAMPAIGN/SERVICE-NSW-MOBILE-APP/121321/Test+NSW+Government+QR+code.

Let’s lock that last option in, choosing to encode things efficiently in the path.

Three QR codes placed horizontally. The left two labelled 'original' and 'previous' have denser patterns than the right one labelled 'better encoding'. original previous better encoding

Look at how much simpler that QR code is.

Less data: a better path

Next, the initial CAMPAIGN/SERVICE-NSW-MOBILE-APP components of the path are rather long… do we really need to spell out “campaign” and “mobile app”? At least it’s not “mobile telephone application”!

Let’s just cut that down. There’s two nice options here: no path at all, or a very short one.

path length Q
Original CAMPAIGN/SERVICE-NSW-MOBILE-APP 97 7
C 67 5
Empty 65 5

There’s not much difference here, so it feels better to include the C to distinguish when a link is for a check-in. If the path is empty, the URL looks like HTTPS://WWW.SERVICE.NSW.GOV.AU/121321/..., which means the front page of HTTPS://WWW.SERVICE.NSW.GOV.AU needs to be detecting whether the path looks like a check-in ID and redirect to the appropriate page (when loaded in a web browser), and that sounds annoying and would require relatively unusual code.

Let’s choose this short single-character path.

Three QR codes placed horizontally. The left two labelled 'original' and 'previous' have denser patterns than the right one labelled 'shorter path'. original previous shorter path

Less data: better domain

The domain being used is long: WWW.SERVICE.NSW.GOV.AU. We definitely don’t need to be spelling that all out. There’s a variety of options here: shortenings of the current URL like Q.SERVICE.NSW.GOV.AU or S.NSW.GOV.AU; or something really short, like NSW.AU (this domain doesn’t exist, but any 6 character domain would be equivalent). The best choice probably depends on the bureaucracy and dev/sys-ops requirements with the relevant organisations.

domain length L Q
Original
WWW.SERVICE.NSW.GOV.AU
67 4 5
Q.SERVICE.NSW.GOV.AU 65 4 5
S.NSW.GOV.AU 57 3 5
NSW.AU 51 3 4

Most of these choices don’t a difference at level Q, but they do reduce the overall data (and thus may make a difference in some cases) and reduce the version at level L. Let’s choose S.NSW.GOV.AU, because being scoped within the GOV.AU second-level domain seems more trustworthy: HTTPS://S.NSW.GOV.AU/C/121321/Test+NSW+Government+QR+code.

This 57-character URL still contains all the same useful information as the original 228-character one. Here’s the four pieces of data that were in the JSON snippet:

  • "t": "covid19_business"/C/ in the path
  • "bid": "121321" ⇒ first path parameter
  • "bname": "Test NSW Government QR code" ⇒ second path parameter
  • "baddress": "…" ⇒ removed, because it is not used
Three QR codes placed horizontally. The left one labelled 'original' has denser patterns than the right two labelled 'previous' and 'better domain'. original previous better domain

These final two QR codes look similar because they’re both version 5; the density and size of modules hasn’t changed, but their exact arrangement has.

Reducing offline support

We’re pretty much at the limit of what is possible when the URL includes the business name to enable offline check-ins3. What if we drop the requirement to encode this? For instance, the app could say “you’re checking into a business” rather than show the name for confirmation, or even have the app stores its own database mapping IDs to business names4.

If we do that, the URL can be 29 characters: HTTPS://S.NSW.GOV.AU/C/121321. This fits in a version 2 QR code!

Three QR codes placed horizontally. The left two labelled 'original' and 'previous' have denser patterns than the right one labelled 'without name'. original previous without name

(@dhsysusbsjsi on the orange site points out that one can even get to version 1, with some further tweaks, but these do require dropping error correction further, to L or M.)

Is this hacking? Do the changes lose security?

No. The current NSW QR codes are inherently insecure, and none of the changes we’ve walk through make them more or less secure.

There’s lots of people who will have the skills to pull them apart and generate new ones with different values for the business ID, name or address. The base64 encoding is not encryption, because there’s no secret key/password required to encode or decode it. Base64 is a very common way to store and communicate data on the internet, to the point that some people would likely even instantly recognise the three character prefix eyJ as “this is base64’d JSON”.

One way to make these secure5 would be to cryptographically sign the URLs, adding extra information that is computed from the data using a secret key held by Service NSW. When loading the URL for a check-in, the app validates that the signature matches. If someone tries to tamper with the data, they’ll need to update the signature, but without knowing the key, they’ll need a very lucky guess.

For instance, a 200 bit BLS signature may give sufficient security. To get a sense of the overhead this might impose, we can pretend we’ve generated a signature of this size somehow and add it to the URL (this isn’t a real cryptographic protocol, don’t take my word for it). First, encode it as a 61 digit number to benefit from the Numeric QR mode, and, then, add it as an extra query parameter …?…&S=16069…. This does make the URLs longer, and thus requires higher versions, but our optimisations have made room for this, meaning that the QR codes are still easier to scan while having better security:

QR code unsigned version signed version
original 16 17
optimised for app 11 12
fully optimised 5 7
Three QR codes placed horizontally. The left two labelled are labelled 'original' and 'insecure', while the right one is labelled 'signed'. The right one is slightly denser than the center, but much less dense than the first. original, unsigned unsigned signed

Other states

Every other state in Australia has a similar check-in process, including apps and QR codes on posters. Most of these use simpler QR codes than NSW, but they make other trade-offs:

  • South Australia and Victoria include the venue name, and thus potentially nicely support offline check-ins too. However, SA definitely requires internet to check-in, and I’m unsure about Victoria.
  • Victoria and Western Australia seem to have real security in their QR codes (for instance, WA uses a JSON Web Token, which includes a cryptographic signature, however, it was pointed out that the choice of a symmetric signature makes the value questionable).
  • ACT, Queensland, Northern Territory and Tasmania all have essentially the same app and QR codes, and use efficient URLs (with a lot of Numeric data), but have some seemingly unnecessary redundancy.
  • Most states use error correction level Q, but Victoria and Western Australia drop one level to M: it’s probably not a coincidence these two states also have the longest URLs, similar to or longer than the NSW URLs.

What did we learn?

Throughout this article we’ve stepped through a process of optimising QR codes for the real world, by looking at the check-in posters used here in NSW. We turned those posters into something that would be much easier to use, and in the process touched on:

  • the contents of the current check-in QR codes and how they function
  • the workings of QR codes in general (error correction, versions and encoding modes)
  • a walk-through of optimising the data stored in a QR code, moving from version 16 to version 5 without compromising functionality
  • the security of these QR codes, observing how it’s possible to improve security and have codes that are still easier to scan

We didn’t consider the realities of actually implementing these, within an existing app and/or support infrastructure. The appropriate technical trade-off can change dramatically.

  1. The page is nominally A4, but I’ve seen some instances where the scaling has been mixed up, and the “poster” is printed at A5 or smaller (on an A4 page). This is really hard to scan, especially when placed behind (reflective) glass in an outdoor shop window. 

  2. Deleting the blue box and particularly the region of white within it sacrifices the “quiet zone”: QR codes are meant to be surrounded by a zone of simple background four modules wide. However, this ends up being a lot of overhead, and doesn’t seem to be necessary in practice

  3. Choosing the best segmentation is an interesting optimisation problem, because each segment has some overhead via a header, so it’s better to encode something like A1b as one Binary segment to only pay for the header once, rather than three segments (A Alphanumeric, 1 Numeric, b Binary) which requires three headers. Optimal text segmentation for QR codes explores this and even provides a live demo. 

  4. There’s probably a lot of redundancy in the business names, like many that include “Pty Ltd”, “Cafe” or even “Australia”. One could potentially see some benefits by compressing them (and then encode the resulting binary output with base64 or similar again), especially using a compression method that can use a custom dictionary, to capture all ‘unusual’ commonalities among these business names. 

  5. Having the app manage a database brings a whole variety of exciting issues since presumably it’ll need to handle updates, and it may be relatively large. These can be solved, but it’s not nearly as easy as just grabbing the name out of the URL itself. 

  6. When talking about security, one should have an understanding of what the attack is being secured against. In this case, we’re securing against creating a QR code with a changed business ID or name, to deceive people into thinking they’re checking into venue A, when they’re actually checking into venue B. This may not actually be a particular concern.