Comprehensive Human Computation Framework Li; Shipeng ; et al. [MICROSOFT TECHNOLOGY LICENSING, LLC]

Comprehensive Human Computation Framework

Li; Shipeng ; et al.

Patent Application Summary

U.S. patent application number 15/145563 was filed with the patent office on 2016-08-25 for comprehensive human computation framework. The applicant listed for this patent is MICROSOFT TECHNOLOGY LICENSING, LLC. Invention is credited to Rui Guo, Shipeng Li, Linjun Yang, Yang Yang, Bin Benjamin Zhu.

Application Number	20160247070 15/145563
Document ID	/
Family ID	42118468
Filed Date	2016-08-25

United States Patent Application	20160247070
Kind Code	A1
Li; Shipeng ; et al.	August 25, 2016

COMPREHENSIVE HUMAN COMPUTATION FRAMEWORK

Abstract

Technologies for a human computation framework suitable for answering common sense questions that are difficult for computers to answer but easy for humans to answer. The technologies support solving general common sense problems without a priori knowledge of the problems; support for determining whether an answer is from a bot or human so as to screen out spurious answers from bots; support for distilling answers collected from human users to ensure high quality solutions to the questions asked; and support for preventing malicious elements in or out of the system from attacking other system elements or contaminating the solutions produced by the system, and preventing users from being compensated without contributing answers.

Inventors:

Li; Shipeng; (Palo Alto, CA) ; Yang; Yang; (Hefei, CN) ; Zhu; Bin Benjamin; (Edina, MN) ; Guo; Rui; (Beijing, CN) ; Yang; Linjun; (Beijing, CN)

Applicant:

Name	City	State	Country	Type
MICROSOFT TECHNOLOGY LICENSING, LLC	Redmond	WA	US

Family ID:

42118468

Appl. No.:

15/145563

Filed:

May 3, 2016

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
13666814	Nov 1, 2012
15145563
12258991	Oct 27, 2008	8315964
13666814

Current U.S. Class:	1/1
Current CPC Class:	G06N 3/126 20130101; H04L 63/1416 20130101; G06N 3/12 20130101; G06F 2221/2133 20130101; G06N 5/022 20130101; G06N 5/04 20130101
International Class:	G06N 5/02 20060101 G06N005/02; H04L 29/06 20060101 H04L029/06

Claims

1. A method performed on a computing device, the method comprising: selecting, by the computing device, a common-sense problem from a first source; receiving, by the computing device, answers to the common-sense problem from a second source; identifying, by the computing device, any of the received answers that are arbitrary answers; removing, by the computing device, the identified arbitrary answers from the received answers; and designating, by the computing device in response to the removing, as final answers any remaining received answers.

2. The method of claim 1 further comprising sending, in response to the designating, the final answers to the first source.

3. The method of claim 1 where the computing device is configured for performing the method without a priori knowledge of the common-sense problem.

4. The method of claim 1 further comprising inhibiting compensation to a source that does not contribute an answer to the common-sense problem.

5. The method of claim 1 where the first source and the second source are the same source.

6. The method of claim 1 where the second source comprises at least one human.

7. The method of claim 1 where the identifying the arbitrary answers is based on modeling the arbitrary answers as a uniform distribution.

8. A computing device comprising: memory; a processor coupled to the memory and via which the computing device: orders answers according to their frequency of occurrence; determines a relative difference for each neighboring pair of the ordered answers, the relative distance based on the frequency of occurrence of each ordered answer of the each neighboring pair; and designates as final answers any of the ordered answers that have a frequency of occurrence that is greater than a frequency of occurrence of an ordered answer of a neighboring pair that has a greatest relative distance of the neighboring pairs.

9. The computing device of claim 8 where the relative distance is determined based on calculating a slope.

10. The computing device of claim 8 where the answers are directed to labeling an image.

11. The computing device of claim 10 where the labeling comprises a process including a plurality of refining stages.

12. The computing device of claim 11 where the plurality of refining stages comprise collecting candidate labels.

13. The computing device of claim 12 where the plurality of refining stages comprise further refining the candidate labels based on multiple choices.

14. The computing device of claim 12 where the plurality of refining stages comprise further refining based on locating an object in the image that corresponds to at least one of the refined candidate labels

15. At least one computer storage device that comprises computer-executable instructions that, based on execution by a computing device, configure the computing device to perform actions comprising: ordering, by a computing device, answers according to their frequency of occurrence; determining, by a computing device, a relative difference for each neighboring pair of the ordered answers, the relative distance based on the frequency of occurrence of each ordered answer of the each neighboring pair; and designating, by a computing device, as final answers any of the ordered answers that have a frequency of occurrence that is greater than a frequency of occurrence of an ordered answer of a neighboring pair that has a greatest relative distance of the neighboring pairs.

16. The at least one computer storage device of claim 15 where the determining the relative distance comprises calculating a slope.

17. The at least one computer storage device of claim 15 where the answers are directed to labeling an image.

18. The at least one computer storage device of claim 17 where the labeling comprises a process including a plurality of refining stages.

19. The at least one computer storage device of claim 18 where the plurality of refining stages comprise collecting candidate labels.

20. The at least one computer storage device of claim 19 where the plurality of refining stages comprise further refining the candidate labels based on multiple choices, and further refining based on locating an object in the image that corresponds to at least one of the refined candidate labels.

Description

RELATED APPLICATION(S)

[0001] This application is a Continuation of and claims benefit from U.S. patent application Ser. No. 13/666,814 that was filed Nov. 1, 2012, and that is a Continuation of U.S. patent application Ser. No. 12/258,991 (U.S. Pat. No. 8,315,964), filed Oct. 27, 2008 (issued Nov. 20, 2012), each of which is incorporated herein by reference in its entirety.

BACKGROUND

[0002] Certain types of problems are difficult for computing systems to solve. For example, image and video labeling is important for computers to understand images and videos and for image and video search. But automatically labeling images and videos is a hard problem for computers to solve on their own. Yet such is a fairly simple task for humans based on "common sense", although such manual labeling may be tedious and costly. Thus there may be advantages to combining humans and computers to solve certain "common sense" problems that would otherwise be very difficult for computers alone and very costly for humans alone.

SUMMARY

[0003] The following presents a simplified summary of the disclosure in order to provide a basic understanding to the reader. This summary is not an extensive overview of the disclosure and it does not identify key/critical elements of the invention or delineate the scope of the invention. Its sole purpose is to present some concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later.

[0004] The present examples provide technologies for a human computation framework suitable for answering common sense questions that are difficult for computers to answer but easy for humans to answer. The technologies support solving general common sense problems without a priori knowledge of the problems; support for determining whether an answer is from a bot or human so as to screen out spurious answers from bots; support for distilling answers collected from human users to ensure high quality solutions to the questions asked; and support for preventing malicious elements in or out of the system from attacking other system elements or contaminating the solutions produced by the system, and preventing users from being compensated without contributing answers.

[0005] Many of the attendant features will be more readily appreciated as the same become better understood by reference to the following detailed description considered in connection with the accompanying drawings.

DESCRIPTION OF THE DRAWINGS

[0006] The present description will be better understood from the following detailed description considered in connection with the accompanying drawings, wherein:

[0007] FIG. 1 is a block diagram showing an example human computation system typically referred to herein as HumanSense.

[0008] FIG. 2 is a block diagram showing example modules and interactions of an example human computation system.

[0009] FIG. 3 is a block diagram showing an example method for labeling images performed using an example human computation system.

[0010] FIG. 4 is a block diagram showing an example computing environment in which the technologies described herein may be implemented.

[0011] Like reference numerals are used to designate like parts in the accompanying drawings.

DETAILED DESCRIPTION

[0012] The detailed description provided below in connection with the accompanying drawings is intended as a description of the present examples and is not intended to represent the only forms in which the present examples may be constructed or utilized. The description sets forth at least some of the functions of the examples and/or the sequence of steps for constructing and operating examples. However, the same or equivalent functions and sequences may be accomplished by different examples.

[0013] Although the present examples are described and illustrated herein as being implemented in a computing and networking environment, the technologies described are provided as an examples and not limitations. As those skilled in the art will appreciate, the present examples are suitable for application in a variety of different types of computing and networking environments.

[0014] FIG. 1 is a block diagram showing an example human computation system ("HCS") 100 typically referred to herein as HumanSense. An HCS such as HCS 100 typically includes four elements types: problem provider 110, HumanSense server ("HSS") 120, participating web site 130, and users 140, all typically coupled via some network or the like. In one example, such a network may be the Internet. One or more of each of the foregoing elements may be included in an HCS.

[0015] A computational process that involves humans in performing certain steps is generally called "human-based computation", or simply "human computation". Such a system leverages differences in abilities and costs between humans and computers to achieve symbiotic human-computer interaction. HCS 100 is a framework that employs human computation to solve general common sense problems efficiently. The framework supports a range of viable business models, and can scale up to meet the demand of a large amount of common sense problems. A hosting web site or the like can be either large with heavy traffic or small with limited visitors so that every user can contribute. Such a system can be deployed at the entrance to web-based services such as web email services, software downloading services, etc. Such a system may also support a profit sharing ecosystem that motivates users to offer their solutions to problems in exchange for some form of compensation. The term "common sense problem" as used herein typically refers to a problem that is difficult for a computer to solve, but that may be fairly easy for a human to solve. One example of such a common sense problem is the identification of objects in scene, image, or video, or the like--this can be very difficult for a computer but is generally a simple common sense problem for a human to solve. Other such common sense problems may include identifying sounds; identifying human speakers or the like, or distinguishing between speakers; identifying or classifying smells or tastes; classifying music or the like; and so forth. Many other types of common sense problems may also benefit from an HCS system.

[0016] The HCS 100 framework provides several technical advantages that are novel and unique to human computation schemes, including but not limited to: support for solving general common sense problems without a priori knowledge of the problems or questions to be asked (that is, the system is problem-agnostic); support for determining whether an answer is from a bot or human so as to screen out spurious answers from bots; support for distilling answers collected from human users to ensure high quality solutions to the questions asked; and support for preventing malicious elements in or out of the system from attacking other system elements or contaminating the solutions produced by the system, and preventing users from being compensated without contributing answers.

[0017] HCS 100 typically provides a general human computation framework that binds together problem providers, web sites or the like, and users to solve large-scale common sense problems efficiently and economically, the binding provided by one or more HumanSense servers. The framework addresses technical challenges such as preventing a malicious party from attacking others, removing answers provided by bots, and distilling human answers to produce high-quality solutions to the problems. In one example described in connection with FIG. 3, the HCS 100 framework is applied to labeling images.

[0018] Problem provider 110 typically provides common sense problems that need to be solved with HumanSense. Answers from an HSS responsive to the provided problems are typically sent back to the problem provider. A problem provider and/or its administrator or the like may offers some form of compensation including money, souvenirs, free services, or anything else valuable to compensate the other elements or parties of the HCS for their contribution to solving the problems. An HCS may include one or more problems providers. In one example, a problem provider is a computer, server, web server, web service, or the like executing problem provider software. One example of such a computer is provided in connection with FIG. 4.

[0019] HumanSense server ("HSS") 120 typically selects problems provided by problem provider 110 and sends the selected problems to participating web sites, such as web site 130, fetches users' answers, analyzes them to produce solutions to the problems, and sends these answers back to problem provider 110. An HCS may include one or more HumanSense Servers. In one example, an HSS is a computer, server, web server, web service, or the like executing HSS software. One example of such a computer is provided in connection with FIG. 4.

[0020] Participating web sites, such as example web site 130, receive problems from HSS 120 and presents each of these problems to users to answer. In one example, such web sites are various Internet sites wherein a business relationship or the like has been established between administrators or the like of HSS 120 and the various Internet sites. Alternatively, web site 130 may be any suitable user interface including a non-Internet based interface and/or a non-browser based interface.

[0021] Users 140 typically contribute answers to the problems presented via web site 130 or the like. Users are typically humans that provide the answers via any suitable computing device, such as a desktop computer, laptop, mobile phone, or any other type of computing device. One example of such a computing device is provided in connection with FIG. 4. A user may provide answers in exchange for some form of compensation including money, rewards, services, or simply for fun or the like. An alternative user may be a bot or the like. The term "bot" as used herein typically refers to software applications or the like that run automated tasks over the Internet or the like. Bots are also known as Internet bots, web robots, and WWW bots. In one example, bots can be used to automate tasks at a much faster rate than humans could perform.

[0022] In a typical HCS it is generally assumed that only the HumanSense server is trusted. A problem provider may be anyone who seeks solutions to common sense problems through an HCS. A problem provider may be malicious, attacking participating web sites or tricking users into clicking a malicious link to go to a malicious web site or downloading a malicious file. A user may be untrusted too. A user may actually be a bot that provides arbitrary answers to the problems it is presented. A participating web site may also be malicious. It may collude with other participating web sites or users to greater compensation disproportionate to their contributions to problem solutions. In some cases it may be assumed that human users are benign when they are compensated for their correct answers, but they may sometimes be careless enough to provide incorrect answers.

[0023] FIG. 2 is a block diagram showing example modules and interactions of an example human computation system 200 such as HCS 100 of FIG. 1. Common sense problems are generally represented using a problem manifest template. Once a particular problem is encoded in such a template, it is generally known as a problem manifest that embodies the particular problem, such as example problem manifest 230. In one example, a problem manifest template ("PMT") for describing objects in an image or the like is structured using Extensible Markup Language ("XML") or the like as follows:

TABLE-US-00001 Example Problem Manifest Template <problem> <id>13 89</id> <resources> <image>13 89.jpg</image> </resources> <priority>3</priority> <value>2</value> <type>ImageLabeling</type> <stage>MultipleChoices</stage> <labels> <label>tiger</label> <label>claw</label> <label>tail</label> </labels> </problem>

[0024] Note that the above example PMT includes example values for a hypothetical problem. Other values may be used to define other problems; these values are provided only as examples and not as limitations. Problems may alternatively and/or additionally be described and/or defined using other structures or formats. Such a template generally includes the following parts that, when populated with specific values that represent a particular problem, form a problem manifest:

[0025] Problem--this is typically the root element of a problem manifest indicating that the manifest is a problem-describing manifest. In one example, a problem manifest is maintained as a file such as that stored on computer-readable media. Generally when a template includes values for a particular problem, it is considered a problem manifest for that particular problem.

[0026] ID--this field typically includes a value that is a globally unique identifier ("ID") for the particular problem, or the ID or actual ID of the particular problem.

[0027] Resources--this field typically includes various resources that may aid in the description and presentation of the problem. In one example, such resources are files that comprise graphics, video, audio, information, or the like associated with or related to the particular problem. In other examples, a resource may be a link such as a web link or the like, or some other type of information or pointer to information related to the particular problem.

[0028] Priority--this field typically includes a value indicating how often the particular problem is to be presented to users relative to problems with other priority values. In one example, the larger the priority value the higher priority the particular problem has over problems with smaller priority values.

[0029] Value--the field typically includes a value indicating how much the particular problem is worth, generally from a compensation point of view. Combined with other factors such as timeliness of answers, correctness of answers, and the like, this value is used to calculate a "score" or monetary award for an answer to the particular problem. Such a value may relate to any form of compensation or the like.

[0030] Type--this filed typically includes a value indicating a type classification for the particular problem. This value may be used by a HumanSense server 120, such as HSS 120 of FIG. 1, to determine how to process answers to the particular problem, and/or how to process and present the particular problem itself.

[0031] Considering the interactions between modules of example system 200, a user typically visits a participating web site or the like. The web site acts as a problem publisher 130 in this example. The participating web site 220 requests a common sense problem from a HumanSense server 120. HSS 120 selects a problem from a problem database or the like. In one example, such a problem database is maintained on, and the problem is provided by, a problem provider such as problem provider 110 of FIG. 1. In another example, a plurality of problems are provided to HSS 120 by distinct problem provider 110 and stored in a database associated with HSS 120.

[0032] Once a problem is selected, HSS 120 typically generates a random identifier ("ID") unique to the current session between HSS 120 and problem publisher 130 and unique to the problem selected, maps the random ID to an actual ID of the selected problem, and sends the random ID to problem publisher 130, indicated by arrow (1) of FIG. 2. Use of the random ID versus the actual ID of the selected problem helps prevent a malicious participating web site from tracking and/or logging a corresponding problem-answer pair. HSS 120 maintains the mapping between the random ID and the actual ID of the selected problem.

[0033] Once the random ID of the selected problem is received, problem publisher 130 typically prepares problem frame 222 for the selected problem, as indicated by arrow (2) of FIG. 2. In one example, the participating web site (problem publisher 130) aggregates problem frame 222 into a web page or the like. Such aggregation may be performed by creating an <iframe> or the like ("problem frame") 222 into which the selected problem may be loaded, typically using a universal resource locator ("URL") such as http://HumanSenseServer/problem?id-randomId.

[0034] A malicious problem provider may launch phishing attacks against a user by tricking the user to believe that problem frame 222 is from the web site, encouraging the user to input private data such as a password into the problem frame, resulting in the private data being secretly sent back to the malicious problem provider through embedded scripts. To prevent such phishing attacks the web site may wrap problem frame 222 in a different display style to differentiate problem frame 222 from other web page content from the web site. Further, the web site may also add a warning to problem frame 222 to warn users that problem frame 222 is used to answer common sense problems and not for private data.

[0035] Once the problem frame is created, HSS 120 typically generates a problem web page for the selected problem and sends it to problem publisher 130 for presentation in problem frame 222, as indicated by arrow (3) of FIG. 2. Generation of the problem web page involves several steps including those indicated by arrows (3.1), (3.2), and (3.3) of FIG. 2 described herein below.

[0036] As indicated by arrow (3.1), problem manifest 230 is typically modified to remove and/or modify information not needed by users resulting in modified problem manifest 232. "Not needed" information includes fields and field values of problem manifest 230 that do not contribute to a user's effort to understand and answer the problem represented by problem manifest 230. In one example, the information removed includes the unique problem ID, priority, type, and value. Further, resource references are replaced with a URL such as http://HumanSenseServer/resource?id=randomId&index=.theta., where the index parameter indicates the order of the resource in problem manifest 232 that the URL refers to. Since the HSS 120 maintains the association of the random ID with the actual problem ID, correct resources can be retrieved by HSS 120. Web sites or users, on the other hand, cannot tell from the resources or the random ID if the problem has already been answered or not. Therefore they cannot launch an attack to repeat an answer to the same problem.

[0037] The problem provider may be allowed to select a presentation template 240 for the selected problem, and for each problem it provides. In general, presentation template 240 is applied to modified problem manifest 232 resulting in problem presentation 250, as indicated by arrow (3.2). In one example, presentation template 240 is defined using Extensible Stylesheet Language Transformations ("XSLT") or Cascading Style Sheets ("CSS") or the like, which is applied to modified problem manifest 232 by XSLT engine or the like 218 to convert modified problem manifest 232 into problem presentation web page 250 comprised of Hypertext Markup Language ("HTML") and JavaScript to provide the user interface ("UI") for presenting the selected problem and providing for user input of answers. Further, in this example, problem presentation 250 generally includes a JavaScript function called "$collectAnswer" to designate how to collect answers from the generated UI. Since, in this example, the problem is presented in an <iframe> whose domain is different from that of the web site, the Same Origin Policy ("SOP") guarantees that the content in problem frame 222 does not introduce any cross-site scripting ("XSS") attacks to the web site.

[0038] Problem presentation 250 is typically modified resulting in modified problem presentation 252, as indicated by arrow (3.3) of FIG. 2. In one example, problem presentation 250 is a web page modified for scripts that support cross-domain communication used to, among other things, transmit tokens from problem frame 222 to host web page as further described herein below.

[0039] Modified problem presentation web page 252 is typically sent to problem publisher 130 that then presents the selected problem in problem frame 222, as indicated by arrow (3) of FIG. 2. As a user provides answers to the presented problem, it may be important to determine if the user providing the answers is human or a bot. In one example, HSS 120 adds a CAPTCHA to the problem that can be used to determining if the answering user is likely human or not. The term "CAPTCHA" as used herein refers to conventional challenge-response tests used in computing to determine that a response was not generated by a computer or bot or the like, but by a human. If a CAPTCHA was used with the problem, then verifier 216 determines that the answers were likely provided by a human. If a CAPTCHA is not added to the problem, then answers should be verified to determine if they are likely from a human user or not. Another form of CAPCHA, generally known as reCAPTCHA; poses two questions to a user--one the answer to which is known and the other the answer to which is unknown. Which is which is generally not disclosed to the user. Given answers from the user to both questions, if the known answer is correct (e.g., the user's answer matches the known answer) then the unknown answer is generally accepted as the correct answer.

[0040] When a pool of problems is small, it may be inevitable that some of the problems are repeated even though a problem is typically randomly selected. An HCS generally includes security mechanisms to protect against colluding attacks by bots and web sites unless content of a displayed problem is analyzed to extract its features that are then compared with those of previously presented problems to detect if the two problems are the same or not. Note that the problem web page sent to a participating web site does not contain any identifying information for the problem. Web sites or users generally cannot tell if two problems are the same from the web page content the problem frame receives. In addition, multiple versions of a problem can be generated, each copy being slightly different. For example, the presentation content of each version of a problem may be slightly modified without changing semantic meaning. Hence hash values of the version would be different such that it is impossible to use hash values to determine that two problems are the same. Therefore, the only way to find out if two variations of a problem are the same or not is to use content analysis, which tends to be a common sense problem itself.

[0041] When CAPTCHA or the like is not used with common sense problems, the collected answers may contain spurious answers provided by bots versus human users. These spurious answers should be removed from the collected answers to ensure the quality of the solutions produced by an HCS. Since the common sense problems cannot be reliably answered by computers (otherwise there would be no need to use human computation to find the answers), and it is highly unlikely that a fixed group of users would be able to see the same problem more than once, we can, in one example, assume that the probability that an answer provided by a bot is random with a uniform distribution, and that each answer provided by bots may be assumed to be independently and identically distributed ("IID"). Therefore the answers from bots can be modeled as an IID uniform distribution.

[0042] In one example, answers provided by bots may be detected by verifier 216 based on the IID uniform distribution modeling. For example, suppose the i-th answer to a problem P provided by a user is a.sub.i. Let DA be the set of distinct answers collected for problem P, and the j-th member of DA is denoted as A.sub.j. The frequency C.sub.A.sub.j at which answer A.sub.j appears in the collected answers for problem P is then C.sub.A.sub.j=.SIGMA..sub.ib.sub.i,j, where:

b.sub.i,j={1, if a.sub.i=C.sub.A.sub.j; 0, otherwise.

[0043] C.sub.A.sub.j typically includes two parts: the contribution from human users C.sub.A.sub.j.sup.h and the contribution from bots C.sub.A.sub.j.sup.b: C.sub.A.sub.j=C.sub.A.sub.j.sup.h+C.sub.A.sub.j.sup.b. Considering the distribution of C.sub.A.sub.j.sup.b, suppose that the total number of answers and the number of distinct answers from bots are T and N, respectively. Note that T.gtoreq.N. It is easy to deduce the average and standard deviation of C.sub.A.sub.j.sup.h for an IID uniform distribution:

C A j b = T / N , ( 1 ) .sigma. C A j b = ( T / N - T / N 2 ) 1 / 2 .apprxeq. ( T / N ) 1 / 2 ( 2 ) ##EQU00001##

[0044] In this example, the following recursive procedure is applied to remove spurious answers from bots when an HSS has collected a statistically significant number of answers to problem P: [0045] 1. Initialize the set of answers from bots, S.sub.bot, to be the set of all the answers collected for problem P. [0046] 2. Calculate the average and standard deviation of the answers provided by users in S.sub.bot by using Eqs. (1) and (2). [0047] 3. Any frequency

[0047] C A j > k .sigma. C A j b + C b5A j ##EQU00002##

is considered human contribution and removed from S.sub.bot, where k is a threshold parameter. If there is no human contribution, this process is complete. Otherwise go back to Step 2.

[0048] All answers in the resulting S.sub.bot of the procedure are considered answers from bots and are therefore removed from the set of all collected answers.

[0049] Generally, it may be assumed that human users are careless enough to occasionally provide erroneous answers. Evaluator 214 typically processes human answers, i.e., the collected answers if CAPTCHA is used with common sense problems or the remaining answers after the process described herein above is applied to remove spurious answers from bots, to deduce a final answer to the selected problem. This final answer is considered a solution to the selected problem.

[0050] In one example of deducing the final answer, simple majority voting is used to combine individual human answers and eliminate erroneous answers. In this example, the human answers are listed from high to low according to their frequencies of occurrence. The slope, i.e., the relative difference of the neighboring frequencies is calculated. The slope at each answer is compared with the slope of the neighboring answer, starting with the answer of the highest frequency. If there is a substantial increase in slope at an answer, that answer is the separation point. All the answers with frequencies higher than the separation point are considered as the final answer, while the remaining answers are discarded.

[0051] FIG. 3 is a block diagram showing an example method 300 for labeling images performed using an example HCS as described in connection with FIGS. 1 and 2. In this example, three incremental refinement stages 310, 320, and 330 are applied. In general, the first stage collects candidate labels of objects in an image. The second stage refines the candidate labels using multiple choices. Synonymic labels may also be correlated in this stage. To prevent bots and lazy humans from selecting all the choices, trap labels may be generated automatically and intermixed into the candidate labels. Semantic distance may be used to ensure that the selected trap labels would be different enough from the candidate labels so that human users are unlikely to incorrectly select trap labels. The last stage typically includes asking users to locate an object given a label from a segmented image. The results of these three steps are used to produce a solution to the problem of accurately labeling the image.

[0052] Block 310 typically indicates stage 1 of method 300, which includes presenting common sense questions asking users to describe objects in an image. In general, stage 1 comprises collecting raw descriptions of objects in an image and turning the collected descriptions into candidate labels for stage 2. The term "label" as used herein generally refers to a word or words descriptive of the image. Initially, all the images to be labeled are put into a pool of first stage images. Typically, there is no prior knowledge of the objects in an image. Users are requested to provide descriptions of objects that they see in the presented images. As sufficient data are collected, spurious answers from bots are removed as described herein above, and human answers evaluated as also described herein above to produce candidate labels. When candidate labels emerge, users providing more of the same candidate labels would not increase knowledge about the image. To restrict users from providing answers that are the same as these candidate labels, the candidate labels may be put into a "taboo phrase list". The "taboo phrase list" may be inserted in the problem manifest file with the information to be displayed with the image that is the subject of the common sense question. Users may then be restricted from providing labels in the "taboo phrase list". With more labels put into the "taboo phrase list", the value of the problem may be increased. When an HCS determines there sufficient labels in the "taboo phrase list", or when users commonly skip labeling an image which has labels in its "taboo phrase list", the HCS concludes that it has collected enough answers for the image. The image is then removed from the pool of the first stage images and put into the pool of the second stage images and method 300 typically continues at block 320.

[0053] Block 320 typically indicates stage 2 of method 300, which includes refining the candidate labels acquired in the first stage. In stage 2, for each image in the second stage pool, the candidate labels resulting from stage 1 are presented as a multiple choice list with the image. Users are asked to choose the multiple choice labels that are descriptive of the image. The purpose of stage 2 is typically to further improve the quality of the labels of the images. It is possible that labels collected from the first stage contain synonyms. Users may also be asked to correlate the synonyms in this stage. In some cases bots and/or lazy human users may simply choose all the labels presented resulting in no further knowledge about the image despite potentially providing compensation for answers. To deal with this problem, random trap labels may be intermixed with the candidate labels. These trap labels are typically fake labels that would not reasonably appear in the image. Selection of any trap label by a user would result in rejection of the answer.

[0054] In one example, trap labels are selected by the HCS automatically so as to not be semantically close to any candidate labels obtained from the first stage. Trap labels are typically words that may be selected from a lexical database including words and information that can be used to determine the semantic distance between the words. To obtain a trap label, a word is randomly selected from the lexical database, and then the semantic distance is calculate between the selected word and each of the candidate labels and other selected trap labels. If each of the distances is greater than a preset threshold, the selected word is considered sufficiently different--or semantically distant--from the other labels and is selected as a trap label.

[0055] Block 330 typically indicates stage 3 of method 300, which includes requesting users to locate objects in a segmented image corresponding to a given label refined at the second stage. The segmentation algorithm used may be any conventional segmentation algorithm sufficient to indicate or allow a user to indicate a specific portion of an image. In one example a segmented image is displayed such that a user can select all the segments belonging to the object represented by the given label. A user can select or deselect segments of the original image or various portion of it, so as to identify those portions described by the given label.

[0056] FIG. 4 is a block diagram showing an example computing environment 400 in which the technologies described herein may be implemented. A suitable computing environment may be implemented with numerous general purpose or special purpose systems. Examples of well known systems may include, but are not limited to, cell phones, personal digital assistants ("PDA"), personal computers ("PC"), hand-held or laptop devices, microprocessor-based systems, multiprocessor systems, servers, workstations, consumer electronic devices, set-top boxes, and the like.

[0057] Computing environment 400 typically includes a general-purpose computing system in the form of a computing device 401 coupled to various components, such as peripheral devices 402, 403, 404 and the like. System 400 may couple to various other components, such as input devices 403, including voice recognition, touch pads, buttons, keyboards and/or pointing devices, such as a mouse or trackball, via one or more input/output ("I/O") interfaces 412. The components of computing device 401 may include one or more processors (including central processing units ("CPU"), graphics processing units ("GPU"), microprocessors (".mu.P"), and the like) 407, system memory 409, and a system bus 408 that typically couples the various components. Processor 407 typically processes or executes various computer-executable instructions to control the operation of computing device 401 and to communicate with other electronic and/or computing devices, systems or environment (not shown) via various communications connections such as a network connection 414 or the like. System bus 408 represents any number of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a serial bus, an accelerated graphics port, a processor or local bus using any of a variety of bus architectures, and the like.

[0058] System memory 409 may include computer readable media in the form of volatile memory, such as random access memory ("RAM"), and/or non-volatile memory, such as read only memory ("ROM") or flash memory ("FLASH"). A basic input/output system ("BIOS") may be stored in non-volatile or the like. System memory 409 typically stores data, computer-executable instructions and/or program modules comprising computer-executable instructions that are immediately accessible to and/or presently operated on by one or more of the processors 407.

[0059] Mass storage devices 404 and 410 may be coupled to computing device 401 or incorporated into computing device 401 via coupling to the system bus. Such mass storage devices 404 and 410 may include non-volatile RAM, a magnetic disk drive which reads from and/or writes to a removable, non-volatile magnetic disk (e.g., a "floppy disk") 405, and/or an optical disk drive that reads from and/or writes to a non-volatile optical disk such as a CD ROM, DVD ROM 406. Alternatively, a mass storage device, such as hard disk 410, may include non-removable storage medium. Other mass storage devices may include memory cards, memory sticks, tape storage devices, and the like.

[0060] Any number of computer programs, files, data structures, and the like may be stored in mass storage 410, other storage devices 404, 405, 406 and system memory 409 (typically limited by available space) including, by way of example and not limitation, operating systems, application programs, data files, directory structures, computer-executable instructions, and the like.

[0061] Output components or devices, such as display device 402, may be coupled to computing device 401, typically via an interface such as a display adapter 411. Output device 402 may be a liquid crystal display ("LCD"). Other example output devices may include printers, audio outputs, voice outputs, cathode ray tube ("CRT") displays, tactile devices or other sensory output mechanisms, or the like. Output devices may enable computing device 401 to interact with human operators or other machines, systems, computing environments, or the like. A user may interface with computing environment 400 via any number of different I/O devices 403 such as a touch pad, buttons, keyboard, mouse, joystick, game pad, data port, and the like. These and other I/O devices may be coupled to processor 407 via I/O interfaces 412 which may be coupled to system bus 408, and/or may be coupled by other interfaces and bus structures, such as a parallel port, game port, universal serial bus ("USB"), fire wire, infrared ("IR") port, and the like.

[0062] Computing device 401 may operate in a networked environment via communications connections to one or more remote computing devices through one or more cellular networks, wireless networks, local area networks ("LAN"), wide area networks ("WAN"), storage area networks ("SAN"), the Internet, radio links, optical links and the like. Computing device 401 may be coupled to a network via network adapter 413 or the like, or, alternatively, via a modem, digital subscriber line ("DSL") link, integrated services digital network ("ISDN") link, Internet link, wireless link, or the like.

[0063] Communications connection 414, such as a network connection, typically provides a coupling to communications media, such as a network. Communications media typically provide computer-readable and computer-executable instructions, data structures, files, program modules and other data using a modulated data signal, such as a carrier wave or other transport mechanism. The term "modulated data signal" typically means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communications media may include wired media, such as a wired network or direct-wired connection or the like, and wireless media, such as acoustic, radio frequency, infrared, or other wireless communications mechanisms.

[0064] Power source 490, such as a battery or a power supply, typically provides power for portions or all of computing environment 400. In the case of the computing environment 400 being a mobile device or portable device or the like, power source 490 may be a battery. Alternatively, in the case computing environment 400 is a desktop computer or server or the like, power source 490 may be a power supply designed to connect to an alternating current ("AC") source, such as via a wall outlet.

[0065] Some mobile devices may not include many of the components described in connection with FIG. 4. For example, an electronic badge may be comprised of a coil of wire along with a simple processing unit 407 or the like, the coil configured to act as power source 490 when in proximity to a card reader device or the like. Such a coil may also be configure to act as an antenna coupled to the processing unit 407 or the like, the coil antenna capable of providing a form of communication between the electronic badge and the card reader device. Such communication may not involve networking, but may alternatively be general or special purpose communications via telemetry, point-to-point, RF, IR, audio, or other means. An electronic card may not include display 402, I/O device 403, or many of the other components described in connection with FIG. 4. Other mobile devices that may not include many of the components described in connection with FIG. 4, by way of example and not limitation, include electronic bracelets, electronic tags, implantable devices, and the like.

[0066] Those skilled in the art will realize that storage devices utilized to provide computer-readable and computer-executable instructions and data can be distributed over a network. For example, a remote computer or storage device may store computer-readable and computer-executable instructions in the form of software applications and data. A local computer may access the remote computer or storage device via the network and download part or all of a software application or data and may execute any computer-executable instructions. Alternatively, the local computer may download pieces of the software or data as needed, or distributively process the software by executing some of the instructions at the local computer and some at remote computers and/or devices.

[0067] Those skilled in the art will also realize that, by utilizing conventional techniques, all or portions of the software's computer-executable instructions may be carried out by a dedicated electronic circuit such as a digital signal processor ("DSP"), programmable logic array ("PLA"), discrete circuits, and the like. The term "electronic apparatus" may include computing devices or consumer electronic devices comprising any software, firmware or the like, or electronic devices or circuits comprising no software, firmware or the like.

[0068] The term "firmware" typically refers to executable instructions, code, data, applications, programs, or the like maintained in an electronic device such as a ROM. The term "software" generally refers to executable instructions, code, data, applications, programs, or the like maintained in or on any form of computer-readable media. The term "computer-readable media" typically refers to system memory, storage devices and their associated media, and the like.

[0069] In view of the many possible embodiments to which the principles of the present invention and the forgoing examples may be applied, it should be recognized that the examples described herein are meant to be illustrative only and should not be taken as limiting the scope of the present invention. Therefore, the invention as described herein contemplates all such embodiments as may come within the scope of the following claims and any equivalents thereto.

* * * * *

Comprehensive Human Computation Framework

Li; Shipeng ; et al.

References