Sunday, December 25, 2022

The audio CAPTCHA

Audio CAPTCHAs are a type of CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) used to determine if a user is a human or a bot. Audio CAPTCHAs are designed to be accessible to users with visual impairments, who may have difficulty reading visual CAPTCHAs. Hearing-based "I'm not a robot" quizzes play audio clips that read out the text or letters presented in the CAPTCHA. These recordings usually feature some form of background noise, which is intended to fool bots that try to interpret the audio. They used to be fairly effective since first generation bots struggled to properly interpret audio.  They even were used for shadow crowdsourcing, making people work for others without realizing it - eg, translate old newspapers and characters from street images for Google Maps. 

However, the effectiveness of audio CAPTCHAs has been challenged by the improving accuracy of automated speech recognition (ASR) systems, which are able to solve audio CAPTCHAs even in the presence of background noise. As a result, researchers have focused on developing new audio CAPTCHA designs that are more resistant to ASR attacks, as well as evaluating the performance of different audio CAPTCHA designs under various conditions.

Some of the audio CAPTCHA papers have explored the use of different types of audio signals, such as noise and music, to create audio CAPTCHAs that are more difficult for ASR systems to solve. Other papers have examined the use of multiple audio CAPTCHAs in sequence, or the use of visual and audio CAPTCHAs in combination, in an effort to increase the overall security of the CAPTCHA system.

Both text and image-based CAPTCHAs are becoming difficult for humans to resolve. Bruce Schneier, an internationally renowned security technologist, offered this humorous example on his blog: "select all images where if you were to add a decrescendo it would add to the musicality of the piece without being interpreted as an overly heavy-handed metaphor within the context of the thematic material "

ChatGPT answered to this challenge by writing lyrics to short musical pieces incorporating a decrescendo: 

 Verse 1: (Moderately fast tempo, 8th note = 120 bpm) 

The sun is shining bright, 

The birds are singing sweet, 

My heart is filled with joy, 

As I dance down the street. 

Chorus: (8th note = 120 bpm) 

Life is good, life is grand, 

I am free to roam this land. 

Verse 2: (Moderately fast tempo, 8th note = 120 bpm) 

But as the day wears on, The clouds start to roll in. My steps slow down, As the rain begins. 

Chorus: (8th note = 120 bpm) Life is good, life is grand, But sometimes it's hard to understand. 

Bridge:(Moderately slow tempo, 8th note = 60 bpm) The rain falls heavy now, My mood starts to change. I feel a sense of sadness, As I contemplate the pain. 

Outro: (Moderately slow tempo, 8th note = 60 bpm) But even in the darkest times, There is always a way. The rain will eventually stop, And a brighter day will come my way. (Decrescendo from 8th note = 60 bpm to pianissimo) 

In this musical piece, the decrescendo at the end helps to convey a sense of resolution and hopefulness, as the music gradually fades out to a pianissimo dynamic. The use of a decrescendo in this context adds to the musicality of the piece without being overly heavy-handed or interpreting the thematic material in a specific way.

It's a challenge to develop effective CAPTCHAs that are able to differentiate between humans and bots, while also being accessible to all users.


Alqarni M, Yu F, Raavi R, Sukhai M. An Accessible Web CAPTCHA Design for Visually Impaired Users.

Solanki S, Krishnan G, Sampath V, Polakis J. In (cyber) space bots can hear you speak: Breaking audio captchas using bots speech recognition. InProceedings of the 10th ACM Workshop on Artificial Intelligence and Security 2017 Nov 3 (pp. 69-80).

Reinheimer BM, Islam F, Shumailov I. Audio CAPTCHA with a Few Cocktails: It’s so Noisy I Can’t Hear You. InCambridge International Workshop on Security Protocols 2019 Apr 10 (pp. 20-40). Springer, Cham.

Shirali-Shahreza S, Penn G, Balakrishnan R, Ganjali Y. Seesay and hearsay captcha for mobile interaction. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems 2013 Apr 27 (pp. 2147-2156).

Huang CH, Wu PH, Liu YW, Wu SH. Attacking and Defending Behind A Psychoacoustics-Based Captcha. InICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021 Jun 6 (pp. 895-899). IEEE.

IG: Special thanks to OpenAI's Assistant for their help with writing this article.

No comments:

Post a Comment