What is a captcha, and how does it protect web resources
We tell you why websites force users to look for bicycles and buses in pictures and why we would be worse off without them.
The word “captcha” is derived from the English abbreviation CAPTCHA – completely automated public Turing test to tell computers and humans apart, which means “a fully automated public Turing test to distinguish between computers and humans.”
It is believed that a person can easily cope with this test, but for a computer, it is impossible. Until recently, this was the case, but the situation has changed. We will talk about this below.
What is a captcha, and how does it work
It’s a small task that sometimes appears on websites: you have to recognize a picture or text, perform a mathematical operation, put together a puzzle, or confirm that you don’t have a camel robot.
The captcha reads your answer, analyzes it, and decides whether to allow you to proceed further on the site or ask the next security question. Most tasks are generated automatically: the captcha algorithm creates a task based on the previous ones, draws images, distorts them, adds noise, and sends them to the user.
Similar tests for recognition, a person in front of you or a machine, began to be developed by Alan Turing in 1956 when he came to grips with the study of artificial intelligence. He came up with the following task: a person is given two invisible interlocutors: one of them is a man, and the other is a machine. The person asks the interlocutors several questions. The machine thinks that it cannot determine a robot or a person.
Until recently, no machine could pass the Turing test. Today, this stage has been passed with the development of neural networks. But this does not mean the machine can think – it just learned to imitate human thinking.
So why do we need a captcha now?
In Turing’s time, his work was more philosophical than practical. Now the situation has changed dramatically.
The Internet has become not just a means of exchanging information – some have seen it as a means of easy and only sometimes legal earnings. For example, using simple programs, you can register on thousands of sites and send out advertisements, links to viruses, details for transferring funds for fake patients, and much more.
Malicious programs have learned to leave comments and fill out feedback forms independently. Most web resources, even newly created ones, are filled with spam, malicious links, etc.
Their owners were in dire need of an app to weed out bots. The first such application was developed by specialists from Carnegie Mellon University, who wrote a script based on the Turing test. Before registering on the site, the user was prompted to enter characters from a “noisy” picture – a person could recognize them, but most hacker programs could not. At the same time, the university registered the CAPTCHA trademark, adopting a combination of letters close to the English catch (“catch”).
Yahoo was the first to use a captcha – in this way, the company tried to prevent the automatic registration of mailboxes for spam. At that time, it was not easy to cope with the captcha – to understand the task, you had to strain your eyesight and frequently refresh the page. It was especially difficult for people with disabilities.
Gradually, captcha services improved, audio accompaniment appeared, and automatic updating. Now recognition is fast and often invisible.
What does a captcha protect against?
Although annoying captchas irritate users, site owners need them. Let’s take a closer look at what they protect against:
Spam. Bots send out junk ads and leave negative comments and reviews. They are difficult to clean manually, especially on large sites. Captcha gets rid of this misfortune.
DDoS attacks. Attackers send many requests to the site, more than it can handle. The web resource is overwhelmed with fake visitors; it stops working normally and may collapse. Captcha keeps the onslaught of bots.
Attacks on online stores. During sales and promotions, bots gain access to shopping carts and place goods there without intending to pay for them. Real buyers need to see the desired product in stock and buy it. Captcha helps to protect the resource from the interception of goods.
Selection of logins and passwords. It is difficult for a person to select a username or password on the site manually. And the bot can generate them indefinitely until it finds the right ones. The captcha prevents this.
When a captcha is shown
Some resources show a captcha when the user performs some action. Others – when the user’s activity is above average, which seems to them suspiciously similar to the actions of a bot.
- The captcha appears when a user registers on the site or leaves comments and reviews – the protection system checks whether it is a person or a bot.
- The same happens when buying a product in an online store – the captcha checks if the buyer has been hacked.
- The user replies to messages too quickly, likes a lot, clicks hyperlinks often, and requests web pages. According to the defense, he behaves like a bot, and she launches a captcha.
- The computer decided that you were trying to guess your username and password. It thinks it’s a bot and wants to keep you from getting into the site. This is a really difficult case – the captcha will always appear.
Types of captcha
Let’s name the most common types of captcha – they most often come across to users.
Text. The oldest type of captcha. It was easily created on the server and displayed in different browsers. The work algorithm is as follows: a random set of numbers and symbols was generated on the server, often in different registers. Based on it, an image was formed, then deformed: the letters were tilted and crossed out, and color and noise filters were used. The user had to recognize the characters in the picture and enter them in a special field. For several years, such captchas successfully protected the Internet from bots, and they are still used on some sites.
Logical. The user must complete a small task:
- Do puzzles.
- Solve a simple mathematical equation.
- Select a number or photo from a given sequence.
- Name a word starting with the desired letter.
And such a logical captcha can be found on Facebook*. Here you need to choose the name of your friend:
Sound. Usually given in addition to the text captcha. It is a series of letters or numbers that the computer says to help the visually impaired. It can also be an alternative captcha, for example:
Recaptcha and artificial intelligence
The hackers quickly improved their bots and learned how to solve the captchas of the first generations.
In 2009, Carnegie Mellon University introduced an improved version called ReCAPTCHA. It had a more complex mechanism and was more reliable than previously existing captchas.
To pass ReCAPTCHA, the user had to guess two words, one known to the system, and the second is an unrecognizable photograph from a newspaper or book.
User verification is carried out only by the word known to the system; the second is optional. If the user nevertheless enters the second word, then it is saved by the system as one of the options for recognition. Thus, the first version of ReCAPTCHA was used to protect sites and digitize archives.
Google drew attention to the promising technology and bought it out. However, smarter bots soon learned to cope with this version of the captcha. In 2015, Google stopped using ReCAPTCHA v1.
Then ReCaptcha v2 was developed, which we often encounter. In it, you can take the test in one click. The captcha work begins with the appearance of a sign with the inscription “I am not a robot” – the user must tick the required field.
You may have wondered why in this way to prove that you are human. But the computer is not interested in the checkbox being checked but in the actions that the user performs. This test analyzes mouse movements. In humans, they have a certain degree of randomness – bots will not be able to imitate this. The algorithm also checks the reliability of the user’s IP connection, approximate location, time zone, time on their computer, stored cookies, and more.
If the test results do not satisfy the captcha, a well-known plate with photographs appears, where it is proposed to choose hydrants, cars, and traffic lights.
It also uses a complex algorithm that analyzes user responses and many other factors.
Google has also found a use for user actions: by choosing footpaths, traffic lights, and road signs, we train the artificial intelligence system used in uncrewed vehicles.
In 2018, the third version of ReCAPTCHA appeared. It is called invisible, or captcha, which is not. The user no longer needs to perform any actions – the captcha uses an improved mechanism for analyzing his behavior. The system analyzes typing delays, mouse movement, page scrolling, interactions with interactive elements, and more. However, Google must provide accurate information about how ReCAPTCHA works not to help bot developers.
Based on the analysis results, the system gives the user ratings from 0.0 to 1.0, where 0.0 means that this is probably a bot, and 1.0 is probably a person.
Captcha can be annoying, but this is the best way to deal with malware that interferes with the normal operation of sites, especially large ones. Developers of protection systems are trying to simplify the tests and make them less intrusive and even inconspicuous. But remember, those attackers also improve their bots, and protecting sites becomes increasingly difficult.