Turing Number: Wikis

Advertisements

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

(Redirected to CAPTCHA article)

From Wikipedia, the free encyclopedia

Early CAPTCHAs such as these, generated by the EZ-Gimpy program, were used on Yahoo!. However, technology was developed to read this type of CAPTCHA[1]
A modern CAPTCHA, rather than attempting to create a distorted background and high levels of warping on the text, might focus on making segmentation difficult by adding an angled line
Another way to make segmentation difficult is to crowd symbols together. This is Yahoo!'s current CAPTCHA format. This might be difficult for some people to read, as seen in the leftmost example (is it "klopsh" or "kbpsh"?).

A CAPTCHA or Captcha (pronounced /ˈkæptʃə/) is a type of challenge-response test used in computing to ensure that the response is not generated by a computer. The process usually involves one computer (a server) asking a user to complete a simple test which the computer is able to generate and grade. Because other computers are unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human. Thus, it is sometimes described as a reverse Turing test, because it is administered by a machine and targeted to a human, in contrast to the standard Turing test that is typically administered by a human and targeted to a machine. A common type of CAPTCHA requires that the user type letters or digits from a distorted image that appears on the screen.

The term "CAPTCHA" (based upon the word capture) was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford (all of Carnegie Mellon University). It is a contrived acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart." Carnegie Mellon University attempted to trademark the term,[2] but the trademark application was abandoned on 21 April 2008.[3]

Contents

Characteristics

A CAPTCHA is a means of automatically generating new challenges which:

  • Current software is unable to solve accurately.
  • Most humans can solve
  • Does not rely on the type of CAPTCHA being new to the attacker.

Although a checkbox "check here if you are not a bot" might serve to distinguish between humans and computers, it is not a CAPTCHA because it relies on the fact that an attacker has not spent effort to break that specific form. (Such 'check here' methods are very easy to defeat.) Instead, CAPTCHAs rely on difficult problems in artificial intelligence. In the short term, this has the benefit of distinguishing humans from computers. In the long term, it creates an incentive to advance the state of AI, which the originators of the term view as a benefit in its own right.

History

Moni Naor was the first person to theorize a list of ways to verify that a request comes from a human and not a bot.[4] Primitive CAPTCHAs seem to have been developed in 1997 by Andrei Broder, Martin Abadi, Krishna Bharat, and Mark Lillibridge to prevent bots from adding URLs to their search engine.[5] In order to make the images resistant to OCR (Optical Character Recognition), the team simulated situations that scanner manuals claimed resulted in bad OCR. In 2000, Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford coined the term 'CAPTCHA', improved and publicized the notion, which included any program that can distinguish humans from computers. They invented multiple examples of CAPTCHAs, including the first CAPTCHAs to be widely used, which were those adopted by Yahoo!.

Applications

CAPTCHAs are used to prevent automated software from performing actions which degrade the quality of service of a given system, whether due to abuse or resource expenditure. CAPTCHAs can be deployed to protect systems vulnerable to e-mail spam, such as the webmail services of Gmail, Hotmail, and Yahoo! Mail.

CAPTCHAs found active use in stopping automated posting to blogs, forums and wikis, whether as a result of commercial promotion, or harassment and vandalism. CAPTCHAs also serve an important function in rate limiting, as automated usage of a service might be desirable until such usage is done in excess, and to the detriment of human users. In such a case, a CAPTCHA can enforce automated usage policies as set by the administrator when certain usage metrics exceed a given threshold. The article rating systems used by many news web sites are another example of an online facility vulnerable to manipulation by automated software.[6]

Accessibility

Because CAPTCHAs rely on visual perception, users unable to view a CAPTCHA (for example, due to a disability or because it is difficult to read) will be unable to perform the task protected by a CAPTCHA. Therefore, sites implementing CAPTCHAs may provide an audio version of the CAPTCHA in addition to the visual method. The official CAPTCHA site recommends providing an audio CAPTCHA for accessibility reasons. This combination represents the most accessible CAPTCHA currently known to exist, but it is far from universally adopted, with most websites (including Wikipedia) offering only the visual CAPTCHA, with or without providing the option of generating a new image if one is too difficult to read.

Advertisements

Attempts at more accessible CAPTCHAs

Even an audio and visual CAPTCHA will require manual intervention for some users, such as those who have visual disabilities and also are deaf. There have been various attempts at creating CAPTCHAs that are more accessible. Attempts include the use of JavaScript, mathematical questions ("what is 1+1"), or "common sense" questions ("what color is the sky on a clear day"). However they do not meet both the criteria of being able to be automatically generated and not relying on the type of CAPTCHA being new to the attacker.

Circumvention

There are a few approaches to defeating CAPTCHAs:

  • exploiting bugs in the implementation that allow the attacker to completely bypass the CAPTCHA,
  • improving character recognition software, or
  • using cheap human labor to process the tests (see below).

Insecure implementation

Like any security system, design flaws in a system implementation can prevent the theoretical security from being realized. Many CAPTCHA implementations, especially those which have not been designed and reviewed by experts in the fields of security, are prone to common attacks.

Some CAPTCHA protection systems can be bypassed without using OCR simply by re-using the session ID of a known CAPTCHA image. A correctly designed CAPTCHA does not allow multiple solution attempts at one CAPTCHA. This prevents the reuse of a correct CAPTCHA solution or making a second guess after an incorrect OCR attempt.[7] Other CAPTCHA implementations use a hash (such as an MD5 hash) of the solution as a key passed to the client to validate the CAPTCHA. Often the CAPTCHA is of small enough size that this hash could be cracked.[8] Further, the hash could assist an OCR based attempt. A more secure scheme would use an HMAC. Finally, some implementations use only a small fixed pool of CAPTCHA images. Eventually, when enough CAPTCHA image solutions have been collected by an attacker over a period of time, the CAPTCHA can be broken by simply looking up solutions in a table, based on a hash of the challenge image.

Computer character recognition

A number of research projects have attempted (often with success) to beat visual CAPTCHAs by creating programs that contain the following functionality:

  1. Pre-processing: Removal of background clutter and noise.
  2. Segmentation: Splitting the image into regions which each contain a single character.
  3. Classification: Identifying the character in each region.

Steps 1 and 3 are easy tasks for computers.[9] The only step where humans still outperform computers is segmentation. If the background clutter consists of shapes similar to letter shapes, and the letters are connected by this clutter, the segmentation becomes nearly impossible with current software. Hence, an effective CAPTCHA should focus on the segmentation.

Several research projects have broken real world CAPTCHAs, including one of Yahoo's early CAPTCHAs called "EZ-Gimpy"[1] and the CAPTCHA used by popular sites such as PayPal,[10] LiveJournal, phpBB, and other open source solutions.[11][12][13] In January 2008 Network Security Research released their program for automated Yahoo! CAPTCHA recognition.[14] Windows Live Hotmail and Gmail, the other two major free email providers, were cracked shortly after.[15][16]

In February 2008 it was reported that spammers had achieved a success rate of 30% to 35%, using a bot, in responding to CAPTCHAs for Microsoft's Live Mail service[17] and a success rate of 20% against Google's Gmail CAPTCHA.[18] A Newcastle University research team has defeated the segmentation part of Microsoft's CAPTCHA with a 90% success rate, and claim that this could lead to a complete crack with a greater than 60% rate.[19]

Human solvers

CAPTCHA is vulnerable to a relay attack that uses humans to solve the puzzles. One approach involves relaying the puzzles to a group of human operators who can solve CAPTCHAs. In this scheme, a computer fills out a form and when it reaches a CAPTCHA, it gives the CAPTCHA to the human operator to solve.

Another variation of this technique involves copying the CAPTCHA images and using them as CAPTCHAs for a high-traffic site owned by the attacker. With enough traffic, the attacker can get a solution to the CAPTCHA puzzle in time to relay it back to the target site.[20] In October 2007, a piece of malware appeared in the wild which enticed users to solve CAPTCHAs in order to see progressively further into a series of striptease images.[21][22] A more recent view is that this is unlikely to work due to unavailability of high-traffic sites and competition by similar sites.[23]

These methods have been used by spammers to set up thousands of accounts on free email services such as Gmail and Yahoo!. [24] Since Gmail and Yahoo! are unlikely to be blacklisted by anti-spam systems, spam sent through these compromised accounts is less likely to be blocked.

Legal concerns

The circumvention of CAPTCHAs may violate the anti-circumvention clause of the Digital Millennium Copyright Act (DMCA) in the United States. In 2007, Ticketmaster sued software maker RMG Technologies[25] for its product which circumvented the ticket seller's CAPTCHAs on the basis that it violates the anti-circumvention clause of the DMCA. In October 2007, an injunction was issued stating that Ticketmaster would likely succeed in making its case.[26] In June 2008, Ticketmaster filed for Default Judgment against RMG. The Court granted Ticketmaster the Default and entered an $18.2M judgment in favor of Ticketmaster.

Image-recognition CAPTCHAs

Some researchers promote image recognition CAPTCHAs as a possible alternative for text-based CAPTCHAs. To date, only RapidShare, Linux Mint and Ubuntu have made use of an image based CAPTCHA. Many amateur users of the phpBB forum software (which has suffered greatly from spam) have implemented an open source image recognition CAPTCHA system in the form of an addon called KittenAuth[27] which in its default form presents a question requiring the user to select a stated type of animal from an array of thumbnail images of assorted animals. The images (and the challenge questions) can be customized, for example to present questions and images which would be easily answered by the forum's target userbase. Furthermore, for a time, RapidShare free users had to get past a CAPTCHA where you had to only enter letters attached to a cat, while others were attached to dogs.[28] This was later removed because users had trouble entering the correct letters.

Image recognition CAPTCHAs face many potential problems which have not been fully studied. It is difficult for a small site to acquire a large dictionary of images which an attacker does not have access to and without a means of automatically acquiring new labelled images, an image based challenge does not meet the definition of a CAPTCHA. KittenAuth, by default, only had 42 images in its database.[27] Microsoft's "Asirra," which it is providing as a free web service, attempts to address this by means of Microsoft Research's partnership with Petfinder.com, which has provided it with more than three million images of cats and dogs, classified by people at thousands of US animal shelters.[29] Unfortunately for Microsoft, researchers claim to have written a program that can break the Microsoft Asirra CAPTCHA.[30]

Human solvers are a potential weakness for strategies such as Asirra. If the database of cat and dog photos can be downloaded, then paying workers $0.01 to classify each photo as either a dog or a cat means that almost the entire database of photos can be deciphered for $30,000. Photos that are subsequently added to the Asirra database are then a relatively small data set that can be classified as they first appear. Causing minor changes to images each time they appear will not prevent a computer from recognizing a repeated image as there are robust image comparator functions (e.g., image hashes, color histograms) that are insensitive to many simple image distortions. Warping an image sufficiently to fool a computer will likely also be troublesome to a human.[31]

Researchers at Google used image orientation and collaborative filtering as a CAPTCHA[32]. Generally speaking, people know what "up" is but computers have a difficult time for a broad range of images. Images were pre-screened to be determined to be difficult to detect up (e.g. no skies, no faces, no text). Images were also collaboratively filtered by showing a "candidate" image along with good images for the person to rotate. If there was a large variance in answers for the candidate image, it was deemed too hard for people as well and discarded. Currently, CAPTCHA creators recommend use of reCAPTCHA as the official implementation.[33] In September 2009, Google acquired reCAPTCHA to aid their book digitization efforts.[34]

CAPTCHA to advance artificial intelligence

Since CAPTCHAs are designed as tasks that only humans can perform, they can be used to collect training data to improve OCR and image recognition systems. The reCAPTCHA project advances the digitization of printed texts by using a pair of words that were difficult for an OCR system to identify as a CAPTCHA. One of the pair is a control word, whose text has already been identified by a human user, and the other is unknown. Where human responses correlate on an unknown word, it can be assumed correct for digitization purposes.

See also

References

  1. ^ a b Greg, Mori,; Malik, Jitendra. "Breaking a Visual CAPTCHA". Simon Fraser University. http://www.cs.sfu.ca/~mori/research/gimpy/. Retrieved 2008-12-21. 
  2. ^ "Computer Literacy Tests: Are You Human?". Time (magazine). http://www.time.com/time/magazine/article/0,9171,1812084,00.html. Retrieved 2008-06-12. "The Carnegie Mellon team came back with the CAPTCHA. (It stands for "completely automated public Turing test to tell computers and humans apart"; no, the acronym doesn't really fit.) The point of the CAPTCHA is that reading those swirly letters is something that computers aren't very good at." 
  3. ^ "Latest Status of CAPTCHA Trademark Application". USPTO. 2008-04-21. http://tarr.uspto.gov/servlet/tarr?regser=serial&entry=78500434. Retrieved 2008-12-21. 
  4. ^ Moni Naor (July, 1996) (PS). Verification of a human in the loop or Identification via the Turing Test. http://www.wisdom.weizmann.ac.il/~naor/PAPERS/human.ps. Retrieved 2008-07-06. 
  5. ^ [1] US Patent no. 6,195,698, "Method for selectively restricting access to computer systems"
  6. ^ Amrinder Arora (2007). "Statistics Hacking — Exploiting Vulnerabilities in News Websites" (PDF). International Journal of Computer Science and Network Security 7: 342–347. http://paper.ijcsns.org/07_book/200703/20070348.pdf. 
  7. ^ "Breaking CAPTCHAs Without Using OCR". Howard Yeend (pureMango.co.uk). 2005. http://www.puremango.co.uk/cm_breaking_captcha_115.php. Retrieved 2006-08-22. 
  8. ^ "Online services allow MD5 hashes to be cracked". http://milw0rm.com/cracker/list.php. Retrieved 2007-01-04. 
  9. ^ Kumar Chellapilla, Kevin Larson, Patrice Simard, Mary Czerwinski (2005) (PDF). Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs). Microsoft Research. http://www.ceas.cc/papers-2005/160.pdf. Retrieved 2006-08-02. 
  10. ^ Kluever, Kurt (May 12, 2008). "Breaking the PayPal CAPTCHA". Kloover.com. http://www.kloover.com/2008/05/12/breaking-the-paypalcom-captcha/. Retrieved 2008-12-21. 
  11. ^ Kluever, Kurt (February 28, 2008). "Breaking ASP Security Image Generator". Kloover.com. http://www.kloover.com/2008/02/28/breaking-the-asp-security-image-generator/. Retrieved 2008-12-21. 
  12. ^ Hocevar, Sam. "PWNtcha - captcha decoder". Sam.zoy.org. http://sam.zoy.org/pwntcha/. Retrieved 2008-12-21. 
  13. ^ Sergei, Kruglov. "Defeating of some weak CAPTCHAs". Captcha.ru. http://www.captcha.ru/en/breakings/. Retrieved 2008-12-21. 
  14. ^ "Network Security Research and AI". http://network-security-research.blogspot.com/. Retrieved 2008-12-21. 
  15. ^ Dawson (2008-04-15). "Windows Live Hotmail CAPTCHA Cracked, Exploited". Slashdot (SourceForge). http://tech.slashdot.org/article.pl?sid=08/04/15/1941236&from=rss. Retrieved 2008-04-16. 
  16. ^ Dawson (2008-02-26). "Gmail CAPTCHA Cracked". Slashdot (SourceForge). http://it.slashdot.org/article.pl?sid=08/02/27/0045242. Retrieved 2008-04-16. 
  17. ^ Gregg Keizer, "Spammers' bot cracks Microsoft's CAPTCHA: Bot beats Windows Live Mail's registration test 30% to 35% of the time, says Websense", Computerworld"', February 7, 2008
  18. ^ Prasad, Sumeet (2008-02-22). "Google’s CAPTCHA busted in recent spammer tactics". Websense. http://www.websense.com/securitylabs/blog/blog.php?BlogID=174. Retrieved 2008-12-21. 
  19. ^ Jeff Yan; Ahmad Salah El Ahmad (April 13, 2008) (PDF). A Low-cost Attack on a Microsoft CAPTCHA. School of Computing Science, Newcastle University, UK. http://homepages.cs.ncl.ac.uk/jeff.yan/msn_draft.pdf. Retrieved 2008-12-21. 
  20. ^ Doctorow, Cory (2004-01-27). "Solving and creating CAPTCHAs with free porn". Boing Boing. http://www.boingboing.net/2004/01/27/solving_and_creating.html. Retrieved 2006-08-22. 
  21. ^ Robertson, Jordan (2007-11-01). "Scams Use Striptease to Break Web Traps". San Jose, California. Archived from the original on 2007-11-06. http://web.archive.org/web/20071106170737/http://ap.google.com/article/ALeqM5jnNrQKxFzt7mPu3DZcP7_UWr8UfwD8SKE6Q80. 
  22. ^ Vaas, Lisa (2007-11-01). "Striptease Used to Recruit Help in Cracking Sites". PC Magazine. http://www.pcmag.com/article2/0,2704,2210674,00.asp. Retrieved 2008-12-21. 
  23. ^ http://www.captcha.net/
  24. ^ "Spam filtering services throttle Gmail to fight spammers". 2008-04-10. http://www.theregister.co.uk/2008/04/10/web_mail_throttled/. Retrieved 2008-04-10. 
  25. ^ Ulanoff, Lance (October 31, 2007). "Deep-Sixing CAPTCHA". PC Magazine. Ziff Davis Media. http://www.pcmag.com/article2/0,2704,2209782,00.asp. Retrieved 2007-12-12. 
  26. ^ "TicketMaster v. RMG". http://www.scribd.com/doc/404395/ticketmaster-v-rmg. 
  27. ^ a b The Cutest Human-Test: KittenAuth from ThePCSpy.com
  28. ^ David (June 04, 2008). "Attached to a Captcha". randomwire.com. http://www.randomwire.com/2008/06/04/attached-to-a-captcha/. Retrieved 2008-12-21. 
  29. ^ Asirra from Microsoft Research (PDF)
  30. ^ Golle, Philippe. Machine Learning Attacks Against the Asirra CAPTCHA. Stanford Crypto. http://crypto.stanford.edu/~pgolle/papers/dogcat.html. Retrieved 2008-12-21. 
  31. ^ Asirra: A CAPTCHA that Exploits Interest-Aligned Manual Image Categorization from Microsoft Research (PDF)
  32. ^ What’s Up CAPTCHA? A CAPTCHA Based On Image Orientation from WWW'09by Rich Gossweiler, Maryam Kamvar, and Shumeet Baluja
  33. ^ "CAPTCHA homepage". Captcha.net. http://www.captcha.net/. Retrieved 2009-12-04. 
  34. ^ "Teaching computers to read: Google acquires reCAPTCHA". 2009. http://googleblog.blogspot.com/2009/09/teaching-computers-to-read-google.html. Retrieved 2009-09-16. 

External links


[[File:|thumb|290px|Early CAPTCHAs such as these, generated by the EZ-Gimpy program, were used on Yahoo!. However, technology was developed to read this type of CAPTCHA[1]]] [[File:|thumb|290px|A modern CAPTCHA, rather than attempting to create a distorted background and high levels of warping on the text, might focus on making segmentation difficult by adding an angled line]]

File:KCAPTCHA with crowded symbols.gif
Another way to make segmentation difficult is to crowd symbols together, as in Yahoo's current CAPTCHA format. This may occasionally present ambiguous challenges, as seen in the leftmost example, which could be read as "klopsh" or "kbpsh".

A CAPTCHA or Captcha (pronounced /ˈkæptʃə/) is a type of challenge-response test used in computing to ensure that the response is not generated by a computer. The process usually involves one computer (a server) asking a user to complete a simple test which the computer is able to generate and grade. Because other computers are unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human. Thus, it is sometimes described as a reverse Turing test, because it is administered by a machine and targeted to a human, in contrast to the standard Turing test that is typically administered by a human and targeted to a machine. A common type of CAPTCHA requires that the user type letters or digits from a distorted image that appears on the screen.

The term "CAPTCHA" (based upon the word capture) was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford (all of Carnegie Mellon University). It is a contrived acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart." Carnegie Mellon University attempted to trademark the term,[2] but the trademark application was abandoned on 21 April 2008.[3]

Contents

Characteristics

A CAPTCHA is a means of automatically generating challenges which intends to:

  • Provide a problem easy enough for all humans to solve.
  • Prevent standard automated software from filling out a form, unless it is specially designed to circumvent specific CAPTCHA systems.

A check box in a form that reads "check this box please" is the simplest (and perhaps least effective) form of a CAPTCHA. CAPTCHAs do not have to rely on difficult problems in artificial intelligence, although they can.

In the short term, this has the benefit of distinguishing humans from computers. In the long term, it creates an incentive to advance the state of AI.

Applications

CAPTCHAs are used to prevent automated software from performing actions which degrade the quality of service of a given system, whether due to abuse or resource expenditure. CAPTCHAs can be deployed to protect systems vulnerable to e-mail spam, such as the webmail services of Gmail, Hotmail, and Yahoo! Mail.

CAPTCHAs are used to stop automated posting to blogs, forums and wikis, whether as a result of commercial promotion, or harassment and vandalism. CAPTCHAs also serve an important function in rate limiting. Automated usage of a service might be desirable until such usage is done to excess and to the detriment of human users. In such cases, administrators can use CAPTCHA to enforce automated usage policies based on given thresholds. The article rating systems used by many news web sites are another example of an online facility vulnerable to manipulation by automated software.[4]

As of 2010, most CAPTCHAs display distorted text that is difficult to read by character recognition software. The alternative implementations[5] may include various tests, such as identifying an object that does not belong in a particular set of objects, locating the center of a distorted image, or identifying distorted shapes.

Accessibility

Because CAPTCHAs rely on visual perception, users unable to view a CAPTCHA (for example, due to a disability or because it is difficult to read) will be unable to perform the task protected by a CAPTCHA. Therefore, sites implementing CAPTCHAs may provide an audio version of the CAPTCHA in addition to the visual method. The official CAPTCHA site recommends providing an audio CAPTCHA for accessibility reasons, but it is not usable for deafblind people or for users of text web browsers. This combination is not universally adopted, with most websites (including Wikipedia) offering only the visual CAPTCHA, with or without providing the option of generating a new image if one is too difficult to read.[citation needed]

Attempts at more accessible CAPTCHAs

Even an audio and visual CAPTCHA will require manual intervention for some users, such as those who have visual disabilities and also are deaf. There have been various attempts at creating CAPTCHAs that are more accessible. Attempts include the use of JavaScript, mathematical questions ("what is 1+1"), or "common sense" questions ("what color is the sky on a clear day"). However these types of solutions do not meet two criteria for successful CAPTCHA tests: they are not automatically generated and they do not present a new problem or test to meet each attack.

Circumvention

There are a few approaches to defeating CAPTCHAs:

  • exploiting bugs in the implementation that allow the attacker to completely bypass the CAPTCHA,
  • improving character recognition software, or
  • using cheap human labor to process the tests (see below).

Insecure implementation

Like any security system, design flaws in a system implementation can prevent the theoretical security from being realized. Many CAPTCHA implementations, especially those which have not been designed and reviewed by experts in the fields of security, are prone to common attacks.

Some CAPTCHA protection systems can be bypassed without using OCR simply by re-using the session ID of a known CAPTCHA image. A correctly designed CAPTCHA does not allow multiple solution attempts at one CAPTCHA. This prevents the reuse of a correct CAPTCHA solution or making a second guess after an incorrect OCR attempt.[6] Other CAPTCHA implementations use a hash (such as an MD5 hash) of the solution as a key passed to the client to validate the CAPTCHA. Often the CAPTCHA is of small enough size that this hash could be cracked.[7] Further, the hash could assist an OCR based attempt. A more secure scheme would use an HMAC. Finally, some implementations use only a small fixed pool of CAPTCHA images. Eventually, when enough CAPTCHA image solutions have been collected by an attacker over a period of time, the CAPTCHA can be broken by simply looking up solutions in a table, based on a hash of the challenge image.

Computer character recognition

A number of research projects have attempted (often with success) to beat visual CAPTCHAs by creating programs that contain the following functionality:

  1. Pre-processing: Removal of background clutter and noise.
  2. Segmentation: Splitting the image into regions which each contain a single character.
  3. Classification: Identifying the character in each region.

Steps 1 and 3 are easy tasks for computers.[8] The only step where humans still outperform computers is segmentation. If the background clutter consists of shapes similar to letter shapes, and the letters are connected by this clutter, the segmentation becomes nearly impossible with current software. Hence, an effective CAPTCHA should focus on the segmentation.

Several research projects have broken real world CAPTCHAs, including one of Yahoo's early CAPTCHAs called "EZ-Gimpy"[1] and the CAPTCHA used by popular sites such as PayPal,[9] LiveJournal, phpBB, and other services.[10][11][12] In January 2008 Network Security Research released their program for automated Yahoo! CAPTCHA recognition.[13] Windows Live Hotmail and Gmail, the other two major free email providers, were cracked shortly after.[14][15]

In February 2008 it was reported that spammers had achieved a success rate of 30% to 35%, using a bot, in responding to CAPTCHAs for Microsoft's Live Mail service[16] and a success rate of 20% against Google's Gmail CAPTCHA.[17] A Newcastle University research team has defeated the segmentation part of Microsoft's CAPTCHA with a 90% success rate, and claim that this could lead to a complete crack with a greater than 60% rate.[18]

Human solvers

CAPTCHA is vulnerable to a relay attack that uses humans to solve the puzzles. One approach involves relaying the puzzles to a group of human operators who can solve CAPTCHAs. In this scheme, a computer fills out a form and when it reaches a CAPTCHA, it gives the CAPTCHA to the human operator to solve.

Spammers pay about $0.80 to $1.20 for each 1,000 solved captchas to companies employing human solvers in Bangladesh, China and India.[19]

Another approach involves copying the CAPTCHA images and using them as CAPTCHAs for a high-traffic site owned by the attacker. With enough traffic, the attacker can get a solution to the CAPTCHA puzzle in time to relay it back to the target site.[20] In October 2007, a piece of malware appeared in the wild which enticed users to solve CAPTCHAs in order to see progressively further into a series of striptease images.[21][22] A more recent view is that this is unlikely to work due to unavailability of high-traffic sites and competition by similar sites.[23]

These methods have been used by spammers to set up thousands of accounts on free email services such as Gmail and Yahoo!.[24] Since Gmail and Yahoo! are unlikely to be blacklisted by anti-spam systems, spam sent through these compromised accounts is less likely to be blocked.

Legal concerns

The circumvention of CAPTCHAs may violate the anti-circumvention clause of the Digital Millennium Copyright Act (DMCA) in the United States. In 2007, Ticketmaster sued software maker RMG Technologies[25] for its product which circumvented the ticket seller's CAPTCHAs on the basis that it violated the anti-circumvention clause of the DMCA. In October 2007, an injunction was issued stating that Ticketmaster would likely succeed in making its case.[26] In June 2008, Ticketmaster filed for Default Judgment against RMG. The Court granted Ticketmaster the Default and entered an $18.2M judgment in favor of Ticketmaster.

Image-recognition CAPTCHAs

Some researchers (e.g., Professor James Z. Wang of Penn State University) promote image recognition CAPTCHAs as a possible alternative for text-based CAPTCHAs. In 1995, the Penn State research team published a research paper on their IMAGINATION CAPTCHA system (demo). The system uses carefully-designed randomized distortions of images to prevent automatic attacks based on broad-concept image recognition systems such as the ALIPR (Automatic Linguistic Indexing of Pictures - Real Time) system. The idea is that computer-based recognition algorithms require the extraction of color, texture, shape, or special point features, which cannot be correctly extracted after the designed distortions. However, with the imagination power of human beings, we can still recognize the original concept depicted in the images even with these distortions.

A recent example of image recognition CAPTCHA is to present the website visitor with a grid of random pictures and instruct the visitor to click on specific pictures to verify that they are not a bot (such as “Click on the pictures of the airplane, the boat and the clock”).
File:Airplane CAPTCHA
An example of an image recognition CAPTCHA from Confident Technologies

Image recognition CAPTCHAs face many potential problems which have not been fully studied. It is difficult for a small site to acquire a large dictionary of images which an attacker does not have access to and without a means of automatically acquiring new labelled images, an image based challenge does not usually meet the definition of a CAPTCHA. KittenAuth, by default, only had 42 images in its database.[27] Microsoft's "Asirra," which it is providing as a free web service, attempts to address this by means of Microsoft Research's partnership with Petfinder.com, which has provided it with more than three million images of cats and dogs, classified by people at thousands of US animal shelters.[28] Researchers claim to have written a program that can break the Microsoft Asirra CAPTCHA.[29] The IMAGINATION CAPTCHA, however, uses a sequence of randomized distortions on the original images to create the CAPTCHA images. Their original images can be made public without risking image-retrieval or image-annotation based attacks.

Human solvers are a potential weakness for strategies such as Asirra. If the database of cat and dog photos can be downloaded, then paying workers $0.01 to classify each photo as either a dog or a cat means that almost the entire database of photos can be deciphered for $30,000. Photos that are subsequently added to the Asirra database are then a relatively small data set that can be classified as they first appear. Causing minor changes to images each time they appear will not prevent a computer from recognizing a repeated image as there are robust image comparator functions (e.g., image hashes, color histograms) that are insensitive to many simple image distortions. Warping an image sufficiently to fool a computer will likely also be troublesome to a human.[30]

Researchers at Google used image orientation and collaborative filtering as a CAPTCHA.[31] Generally speaking, people know what "up" is but computers have a difficult time for a broad range of images. Images were pre-screened to be determined to be difficult to detect up (e.g. no skies, no faces, no text). Images were also collaboratively filtered by showing a "candidate" image along with good images for the person to rotate. If there was a large variance in answers for the candidate image, it was deemed too hard for people as well and discarded.

Many users[who?] of the phpBB forum software (which has suffered greatly from spam) have implemented an open source image recognition CAPTCHA system in the form of an addon called KittenAuth[27] which in its default form presents a question requiring the user to select a stated type of animal from an array of thumbnail images of assorted animals. The images (and the challenge questions) can be customized, for example to present questions and images which would be easily answered by the forum's target userbase. Furthermore, for a time, RapidShare free users had to get past a CAPTCHA where they had to only enter letters attached to a cat, while others were attached to dogs.[32] This was later removed because (legitimate) users had trouble entering the correct letters.

Currently, CAPTCHA creators recommend use of reCAPTCHA as the official implementation.[33] In September 2009, Google acquired reCAPTCHA to aid their book digitization efforts.[34] However, this CAPTCHA has been cracked with 30% success rate, reported in August 2010.

See also

External links

References

  1. ^ a b Greg, Mori,; Malik, Jitendra. "Breaking a Visual CAPTCHA". Simon Fraser University. http://www.cs.sfu.ca/~mori/research/gimpy/. Retrieved 2008-12-21. 
  2. ^ Grossman, Lev (2008-06-05). "Computer Literacy Tests: Are You Human?". Time (magazine). http://www.time.com/time/magazine/article/0,9171,1812084,00.html. Retrieved 2008-06-12. "The Carnegie Mellon team came back with the CAPTCHA. (It stands for "completely automated public Turing test to tell computers and humans apart"; no, the acronym doesn't really fit.) The point of the CAPTCHA is that reading those swirly letters is something that computers aren't very good at." 
  3. ^ "Latest Status of CAPTCHA Trademark Application". USPTO. 2008-04-21. http://tarr.uspto.gov/servlet/tarr?regser=serial&entry=78500434. Retrieved 2008-12-21. 
  4. ^ Amrinder Arora (2007). "Statistics Hacking — Exploiting Vulnerabilities in News Websites" (PDF). International Journal of Computer Science and Network Security 7: 342–347. http://paper.ijcsns.org/07_book/200703/20070348.pdf. 
  5. ^ Wagner N.R (2003). Verifying the Presence of Humans: Three New CAPTCHAs.
  6. ^ "Breaking CAPTCHAs Without Using OCR". Howard Yeend (pureMango.co.uk). 2005. http://www.puremango.co.uk/cm_breaking_captcha_115.php. Retrieved 2006-08-22. 
  7. ^ "Online services allow MD5 hashes to be cracked". http://milw0rm.com/cracker/list.php. Retrieved 2007-01-04. 
  8. ^ Kumar Chellapilla, Kevin Larson, Patrice Simard, Mary Czerwinski (2005) (PDF). Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs). Microsoft Research. Archived from the original on 2006-06-13. http://web.archive.org/web/20060613111749/http://www.ceas.cc/papers-2005/160.pdf. Retrieved 2006-08-02. 
  9. ^ Kluever, Kurt (May 12, 2008). "Breaking the PayPal CAPTCHA". Kloover.com. http://www.kloover.com/2008/05/12/breaking-the-paypalcom-captcha/. Retrieved 2008-12-21. 
  10. ^ Kluever, Kurt (February 28, 2008). "Breaking ASP Security Image Generator". Kloover.com. http://www.kloover.com/2008/02/28/breaking-the-asp-security-image-generator/. Retrieved 2008-12-21. 
  11. ^ Hocevar, Sam. "PWNtcha - captcha decoder". Sam.zoy.org. http://sam.zoy.org/pwntcha/. Retrieved 2008-12-21. 
  12. ^ Sergei, Kruglov. "Defeating of some weak CAPTCHAs". Captcha.ru. http://www.captcha.ru/en/breakings/. Retrieved 2008-12-21. 
  13. ^ "Network Security Research and AI". http://network-security-research.blogspot.com/. Retrieved 2008-12-21. 
  14. ^ Dawson (2008-04-15). "Windows Live Hotmail CAPTCHA Cracked, Exploited". Slashdot (SourceForge). http://tech.slashdot.org/article.pl?sid=08/04/15/1941236&from=rss. Retrieved 2008-04-16. 
  15. ^ Dawson (2008-02-26). "Gmail CAPTCHA Cracked". Slashdot (SourceForge). http://it.slashdot.org/article.pl?sid=08/02/27/0045242. Retrieved 2008-04-16. 
  16. ^ Gregg Keizer, "Spammers' bot cracks Microsoft's CAPTCHA: Bot beats Windows Live Mail's registration test 30% to 35% of the time, says Websense", Computerworld"', February 7, 2008
  17. ^ Prasad, Sumeet (2008-02-22). "Google’s CAPTCHA busted in recent spammer tactics". Websense. Archived from the original on 2008-08-22. http://web.archive.org/web/20080822032312/http://www.websense.com/securitylabs/blog/blog.php?BlogID=174. Retrieved 2008-12-21. 
  18. ^ Jeff Yan; Ahmad Salah El Ahmad (April 13, 2008) (PDF). A Low-cost Attack on a Microsoft CAPTCHA. School of Computing Science, Newcastle University, UK. http://homepages.cs.ncl.ac.uk/jeff.yan/msn_draft.pdf. Retrieved 2008-12-21. 
  19. ^ Bajaj, Vikas (April 25, 2010). "Spammers Pay Others to Answer Security Tests". The New York Times. http://www.nytimes.com/2010/04/26/technology/26captcha.html?src=me&ref=technology. Retrieved 2010-04-28 
  20. ^ Doctorow, Cory (2004-01-27). "Solving and creating CAPTCHAs with free porn". Boing Boing. http://www.boingboing.net/2004/01/27/solving_and_creating.html. Retrieved 2006-08-22. 
  21. ^ Robertson, Jordan (2007-11-01). "Scams Use Striptease to Break Web Traps". San Jose, California. Archived from the original on 2007-11-06. http://web.archive.org/web/20071106170737/http://ap.google.com/article/ALeqM5jnNrQKxFzt7mPu3DZcP7_UWr8UfwD8SKE6Q80. 
  22. ^ Vaas, Lisa (2007-11-01). "Striptease Used to Recruit Help in Cracking Sites". PC Magazine. http://www.pcmag.com/article2/0,2704,2210674,00.asp. Retrieved 2008-12-21. 
  23. ^ Captcha.net
  24. ^ "Spam filtering services throttle Gmail to fight spammers". 2008-04-10. http://www.theregister.co.uk/2008/04/10/web_mail_throttled/. Retrieved 2008-04-10. 
  25. ^ Ulanoff, Lance (October 31, 2007). "Deep-Sixing CAPTCHA". PC Magazine. Ziff Davis Media. http://www.pcmag.com/article2/0,2704,2209782,00.asp. Retrieved 2007-12-12. 
  26. ^ "TicketMaster v. RMG". http://www.scribd.com/doc/404395/ticketmaster-v-rmg. 
  27. ^ a b The Cutest Human-Test: KittenAuth from ThePCSpy.com
  28. ^ Asirra from Microsoft Research (PDF)
  29. ^ Golle, Philippe. Machine Learning Attacks Against the Asirra CAPTCHA. Stanford Crypto. http://crypto.stanford.edu/~pgolle/papers/dogcat.html. Retrieved 2008-12-21. 
  30. ^ Asirra: A CAPTCHA that Exploits Interest-Aligned Manual Image Categorization from Microsoft Research (PDF)
  31. ^ What’s Up CAPTCHA? A CAPTCHA Based On Image Orientation from WWW'09 by Rich Gossweiler, Maryam Kamvar, and Shumeet Baluja
  32. ^ David (June 4, 2008). "Attached to a Captcha". randomwire.com. http://www.randomwire.com/2008/06/04/attached-to-a-captcha/. Retrieved 2008-12-21. 
  33. ^ "CAPTCHA homepage". Captcha.net. http://www.captcha.net/. Retrieved 2009-12-04. 
  34. ^ "Teaching computers to read: Google acquires reCAPTCHA". 2009. http://googleblog.blogspot.com/2009/09/teaching-computers-to-read-google.html. Retrieved 2009-09-16. 

Abbreviated as TN, a turing number is a randomly generated security code, usually a series of digits, displayed as an image that users may need to read and copy into a form field in order to submit or validate a form submission online via a web browser. Turing numbers are used to ensure there is a human user instead of automated (bot) submissions. Turing numbers are commonly used on e-commerce web sites or promotional or contest web sites, anywhere there is a need to avoid automated submissions by bots.

See also


Advertisements






Got something to say? Make a comment.
Your name
Your email address
Message