Innovative Probabilistic Language-Based CAPTCHA System - Bridging the Semantic Gap for Machine Prejudice

a new probabilistic language based captcha system n.w

1 / 18

Embed Share

Explore a novel probabilistic language-based CAPTCHA system designed to leverage the semantic gap between humans and machines. Delve into the motivation behind perfect CAPTCHA challenges, the limitations of existing approaches, and the importance of multi-factor authentication in countering AI arms race attacks.

ayaa_9 Follow

Uploaded on Mar 19, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

A new probabilistic language-based CAPTCHA system New technique to prejudice computers against themselves Teslin Roys

Language and the semantic gap Syntax is the structure, grammar, or rules of language Semantics is the rest of language The semantic gap is the advantage in the human over the machine in our ability to discern semantics (or meaning)

CAPTCHA: The Completely Automated Public Turing test to tell Computers and Humans Apart Is a system that authenticates users as human by verifying solutions to a problem which is in principle straightforward for humans to solve but difficult for current computers or algorithms In other words, a mechanism that exploits the semantic gap It is a Turing test administered by a machine Used to prevent unwanted automated submissions (spam) Also helps mitigate Denial of Service (DoS) attacks

Motivation: a perfect CAPTCHA A perfect CAPTCHA would accurately sort 100% of respondents as human or machine This would presuppose not just a semantic gap but a semantic chasm to be exploited - an insurmountable rather than incremental difference in abilities There are various reasons to doubt that such a distinction exists In doubt, our best approach is to guess which areas have the most pronounced gap and hope it narrows slowly

Motivation, contd Some of today s most-used CAPTCHAs are eminently defeatable o An elliptical shape recognition attack had an 12.7% success rate vs. the reCAPTCHA system o Similar approaches have had success rates of 33% against GIMPY system or even 92% against EZ-GIMPY The process is an AI arms race A perfect CAPTCHA is possibly not philosophically sound let alone technically feasible Multi-factor authentication

Existing approaches Text-recognition in visual clutter: Optical Character Recognition (OCR) o Examples: reCAPTCHA, GIMPY/EZ-GIMPY, Authorize, Captcha.net, etc. o Susceptible to increasingly good segmentation attacks (most deployed text-based approaches have been broken [1][2][3][4][5][6]) Image recognition based CAPTCHAs o For example: finding correct image orientation, identifying animals or other objects in an image based on a known database o Most techniques of this kind are recent [7] so less well analysed Language based CAPTCHA o Rarely attempted, partially because they are typically non-scalable (tied to a specific static database)

Aims of a CAPTCHA Usability o Quick and painless for a human to solve Security o Computationally costly for an attacker, relatively cheap to produce problems Other qualities, e.g. o reCAPTCHA has users solve hard character recognition problems o What s Up? CAPTCHA has users tag images to identify orientation

A semantic language-based CAPTCHA with a social feedback mechanism The system provides the user with three phrase options and asks the user which phrases were most likely written by a human The phrases are relatively short (e.g. the soldiers march across the border ) Out of the three phrases, one phrase is known meaningful (selected from a match pool) Another is randomly generated from a dictionary (a random phrase) Lastly one is a possible match (the candidate pool)

Concept Match phrases generated from Princeton s semantic relational database WordNet Users get a positive score for their closeness to the match phrase, a negative score for their closeness to the random phrase If they fail to get a high enough score on the first try, they may try at a second problem A running total is kept -- a user may be banned if she runs under a minimum negative threshold, and passes if she exceeds some positive threshold

(candidate phrase) (match phrase) (random phrase)

Counting feedback and generating results Candidates are generated both from highly scored random phrases or mutated from match phrases If enough passing users weight a candidate phrase positively, the candidate is promoted to the match pool o Here we also check if a word was substituted for another (i.e. mutated) in the phrase and if so we note that word as a likely synonym for the original

Human success rate: 100%

Blind attack success rate: 0.016%

Comparison with other recent (non-OCR) approaches

Conclusions The proposed concept offers some unique advantages: o Sensitivity to failure and degrees of success o Scalable (basis database contains more than 11569 phrases, match pool grows with use) o Capable of identifying new synonyms Results: o Straightforward for human use (100% accuracy with preliminary test) o Shows reasonable resistance to a blind attack: 0.016% success rate Other avenues of attack need further exploration: o Search engine query based methods o Methods exploiting probability of word proximity o Machine learning approaches

References [1] Hocevar, S. PWNtcha - Captcha Decoder web site. http://sam.zoy.org/pwntcha/. [2] Mori, G. and Malik, J. 2003. Recognizing objects in adversarial clutter: Breaking a visual CAPTCHA. In Proc. IEEE Conf. on Computer Vision & Pattern Recognition, 2003. [3] Moy, G., Jones, N., Harkless, C., and Potter, R. 2004. Distortion estimation techniques in solving visual CAPTCHAs. In IEEE Conf. on Computer Vision & Pattern Recognition, 2004. [4] Chellapilla, K. and Simard, P. 2004. Using machine learning to break visual human interaction proofs. Neural Information Processing Systems (NIPS'04), MIT Press. [5] Yan, J. and El Ahmad, A. S. 2007. Breaking Visual CAPTCHAs with naive pattern recognition algorithms. In Proc. Ann. Comp. Security Applications Conf. 2007, 279-291. [6] Yan, J. and El Ahmad, A. S. 2008. A low-cost attack on a Microsoft CAPTCHA. In ACM CCS'2008, 543-554. [7] Zhu, Bin B., et al. "Attacks and design of image recognition CAPTCHAs." Proceedings of the 17th ACM conference on Computer and communications security. ACM, 2010.

Innovative Probabilistic Language-Based CAPTCHA System - Bridging the Semantic Gap for Machine Prejudice

Download Presentation

Presentation Transcript

Related

More Related Content