Valuing user privacy — hCaptcha

What is hCaptcha?

Completely Automated Public Turing test to tell Computers and Humans Apart

If you’re a web developer hCaptcha is a free service to validate that your users are human. If you’re an AI researcher hCaptcha is a service to help train machine learning datasets using crowdsourcing. For everyone else on the internet hCaptcha is just another captcha like Google’s incredibly popular reCaptcha, but with one important distinction — privacy.

reCaptcha is used in 40% of the top 10,000 sites worldwide. Along with Google Analytics being used in over 70% of those same sites, it’s a growing concern to many people that Google is building a very detailed history of almost every internet user and are able to tie that information to a specific person though your unique device fingerprint, or even simply your Google account.

Advertising is at the core of Google’s business, and gathering detailed user data is their forte. Google’s reCaptcha v3 does such a good job of fingerprinting you that it often doesn’t even ask you to prove that you’re human. If you are signed in to your Google account in a Chrome browser, or if you have cookies from Google, reCaptcha v3 will consider you more likely to be a human. Likewise, if your mouse movements and your browsing habits on a site seem human enough, you won’t be questioned.

Being able to pass a captcha without being interrupted by a manual test is great for web users, but it comes at a cost. For reCaptcha to work optimally Google recommends including their scripts on every single page of your website to help them build an idea of what your real users look like. Google say that the data gathered by reCaptcha won’t be used for personalised advertising, and that’s great if it stays true, but we know how often Google changes their mind about things.

As well as it’s data harvesting potential, reCaptcha is designed to train Google’s AI so they can improve their products and services. Initially they used this free labour for transcribing words for Google Books, these days they are more focused on labelling datasets for their AI applications.

recaptcha training used in street view

How is hCaptcha any different?

hCaptcha is also built with AI training in mind. Research teams can pay hCaptcha to crowdsource their dataset labelling, this is the way they make profit and they have no interest in tracking your user’s activity to sell to advertisers. They even share some of this profit with you, the developer — although unless you’re serving an extremely high volume of users it’s likely worth very little (they give you the option to donate it to charity instead).

For end users there is very little visible difference, other than the fact they will have to manually complete captcha tests more frequently than with reCaptcha v3. I try to use captchas as sparingly as possible, so to me this is a worthy trade-off for the increased privacy afforded to my users.

As good as hCaptcha is when concerning privacy, there are of course some valid criticisms. Firstly, it’s not an open source project so we can’t outright say it’s 100% private. Most importantly, accessibility is a big concern as with most captchas.

Google’s reCaptcha is able to serve users with accessibility needs very well as long as they are not selected to manually complete the Turing test. For vision impaired users reCaptcha offers an audio fall-back. However, due to audio captchas being relatively easy for bots to pass these days, the audio is often obfuscated with static and spoken quickly to the point of being unintelligible (Give it a try yourself!).

hCaptcha takes a different approach to accessibility. Users who require a more accessible captcha must first sign up with hCaptcha in order to set a cookie in their browser which will be used to verify them automatically. The accessibility signup process requires navigating to their site, typing your email address, following a link sent to your email address, and clicking a button to set the cookie. It’s quite a simple process, but not without accessible concerns itself. When the cookie is deleted from your browser or otherwise expires, you will be required to log in to hCaptcha to set the cookie again. This makes it tedious to use in private browsing, where you will be required to set this cookie every session.

There is also an effort from hCaptcha to support Privacy Pass, an initiative developed with cryptography researchers and Cloudflare to allow users to verify their identity through captchas you’ve solved historically, without tying that identity together across multiple sites. Cloudflare do a much better job of explaining this, with a good analogy to sealed voting ballots.

I also recommend reading Cloudflare’s blog post about why they chose to use hCaptcha over reCaptcha across their network.

How do I add it to my site?

The client side hCaptcha widget is loaded with JavaScript and can be added to your site with two lines of code in your form, but first you’ll need an API key.

<!-- Create an empty div for the widget, and enter your API key  -->
<div class="h-captcha" data-sitekey="your_site_key"></div>
<!-- Load the hCaptcha script -->
<script src="https://hcaptcha.com/1/api.js" async defer></script>

There are also a number of libraries available for front-end frameworks like React and Angular to make it easier to integrate with your existing systems. Migration from Google reCaptcha is similarly easy, the JavaScript API is even compatible with reCaptcha methods and data attributes.

If you want to hide the captcha to streamline your forms, you can try the invisible captcha option. This offers a reCaptcha v3 feel, where the the test will only be shown if the user isn’t trusted by hCaptcha.

Once the form is submitted, you’ll want to validate on the server side that the captcha was completed successfully. You can do this by sending a POST request with the verification token returned from the submitted form to https://hcaptcha.com/siteverify.

curl -d "response=CLIENT-VERIFICATION-TOKEN&secret=YOUR-API-KEY" -X POST https://hcaptcha.com/siteverify

This will return a JSON response containing a "success" field with either true or false.

Alternatively, you can use an existing package for your web framework of choice to handle this for you. I have used the wonderful django-hcaptcha in my own projects.