Features that allow the identification

Features that allow the identification

Bots

The following are considered to be quite ordinary, used in practice methods of distinguishing bots from people based on simple metrics resulting from common-sense (human) limitations.

Long or even uninterrupted operation

One can mention sporadic cases of, e.g., participating in a non-stop game for several or several dozen hours. There are even fatal cases like that of Seungseob Lee in 2005, who played 50 hours continuously in StarCraft, dying of a heart attack. However, this should be considered as an (extreme) outlier from the statistical point of view. Statistically speaking, those human limitations can be one of the basic methods of distinguishing a man from a bot that is not burdened with such (biologically) restrictive determinants.

Extraordinary performance or effectiveness

The statistics of visits are just one of the possibilities. An alternative is, e.g., measuring the number of actions performed on the websites. The actions mean any activity performed within a given website. Therefore, it can be clicking on a link, button or banner, as well as site scrolling to the right place. It can also be the number of video playbacks or even the number of hovering over the banner (without clicking).  In each of those cases, exceeding the set limit is a premise that we are dealing with an automaton, not a living person. Physically it is impossible (or unlikely) by the human to achieve values that are easily achievable for computer programs.

It is also important to specify that the limits mentioned depend on the specifics of the issue and the preferences of the person to identify the bots. Setting the limits too high will cause that some bots will be considered people. On the other hand, setting limits too low will result in the rejection of some users who are actually human. However, the choice of the optimal solution is purely subjective and depends on the purpose of a given person.

Incredible rapidity of action

Monitoring the time that passes from the moment of entering the website to the time of performing specific actions, or the time to fill a form, or even the speed of entering text into a field, is one of the key measures on the basis of which one can distinguish a living person from an artificial machine.

Surprising randomness

People’s behaviors are certainly not random, and in many cases, they fit easily-defined patterns. Some people’s activities can be predicted with a surprisingly high probability of over 90%. Meanwhile, the operation of computer programs is, in principle, deterministic with the possible addition of random elements. It is possible to distinguish those two approaches provided that one has enough data at our disposal.

Abnormal variability

Most people use computers as tools and get used to certain hardware and software configurations. High aversion to changes, or the desire for some kind of stabilization, or maybe simply lack of time or skills – for whatever reason, people rather rarely change the settings of the equipment they use. Therefore, the users whose operating systems or browsers change from day to day are extremely suspicious and based on the frequency of changes in the configuration of the hardware or software used, a bot can be identified.

Specificity of human behaviour

One of the previously mentioned human limitations is his speed and effectiveness. Others include relatively poor parallelisation of works. By measuring the number of simultaneously performed activities, or even open and active connections, we are able to determine that a human would not be able to do it. For example, such suspicious activities are watching many video materials at the same time, either scrolling through multiple sites at once or simultaneously filling out several forms. Therefore, the parallelization is one of the factors that allows users to be classified into the bots or people category.

However, there may be more such regularities, but their determination may not be an easy matter and is often a real research challenge. Potentially, the behaviour may be affected, e.g., by the daily rhythm in a given country (the existence of siesta or not), the attitude to holidays, but also the statistics of sites closed before reading them, and the specificity of moving the mouse.