ABSTRACT

The increasing needs for conducting complex computations have motivated computational outsourcing. Many users conduct complex computations only occasionally, and buying expensive computing equipment only for these occasional tasks is a waste of resources. In the situation like this, outsourcing their computational tasks to a third party (or a group of third parties) is a more viable solution. Moreover, the increasing reach and speed of the Internet have made computational outsourcing much convenient. Although computational outsourcing frees users fromconducting the computation by themselves,

it introduces two challenging security issues. The first is privacy. Computational outsourcing usually involves sending input data to a third party. These data might contain private information. For example, if the task is to conduct data mining on a medical database to discover useful knowledge, the database often needs to be sent to the party that conducts the computation. This causes a major threat to privacy, because the patients’ private information is in the database. A number of studies have been focusing on this privacy issue [1-4]. The other issue is integrity, i.e., to guarantee that computation tasks are carried out properly.

Once computing power becomes a commodity, there will be incentives for the third parties to cheat, namely, they might not conduct the actual computation, but still claim that they have done so.

It is important to guarantee the correctness of those computations. This chapter focuses on the integrity issue. A promising computational outsourcing paradigm is a computational grid, which is a novel,

evolving infrastructure that provides unified, coordinated access to computing resources such as processor cycles, storage, etc.Wide varieties of systems, from small workstations to supercomputers, are linked to form a grid, thus presenting what is essentially a powerful virtual computer. All the complexities involved inmanaging resources of a grid are hidden from the clients, providing seamless access to computing resources. As a great advancement towards cost reduction, computational grids can be used as a replacement for supercomputers that are presently used in many computationally intensive scientific problems [5,6]. The class of problems dealt by grid computing is that which involves tremendous computations

and can be broken down into independent tasks. A general grid computing environment includes a supervisor and a group of participants who allow the idle cycles of their processors to be used for computations. The participants need know the tasks assigned to others, nor do they need to communicate with other participants. After the participants complete their tasks, they report back the results to the supervisor. The past few years have seen a tremendous growth in grid computing with its effect being felt

in the biotechnology industry (e.g., DNA sequencing and drug discovery), entertainment industry, financial industry, etc. The success of the projects like SETI@home [7], IBM smallpox research [8], GIMPS [9] has made the potential of grid computing visible. For instance, IBM’s smallpox research [8] uses grid computing to find potential drugs to counter

the smallpox virus. Its main task is to screen hundreds of thousands of molecules, a task that can take years even with supercomputers. Using grid computing, any computer can be invited to participate and assist in this important effort. By downloading and running the software, participants can add their CPUs to the global grid. Every time their computers are idle, the computing resources can be contributed to the grid, accelerating the screening process while dramatically reducing the cost of the project. The result is that rather than spending years, it will be possible to screen hundreds ofmillions ofmolecules in justmonths.Anotherhighlyprofiledgrid computingproject is SETI@home[7],which is a scientific experiment that uses Internet-connected computers in the Search for Extraterrestrial Intelligence (SETI). SETI@home has more than 4.5 million users (according to 2004 statistics) contributing their computers’ unused processing power, to form a 15 teraflops grid, faster than IBM’s powerful supercomputer ASCI White (12 teraflops). Also the cost of the SETI grid is only 500K dollars, whereas ASCI White costs 110 million dollars [7]. On December 2, 2003, Michael Shafer a volunteer in the GIMPS project discovered the largest

known prime number, 220996011 − 1. He used a 2GHz Pentium 4 DEll Dimension PC for 19 days to prove the number prime. The new number, expressed as 2 to the 20,996,011th power minus 1, has 6,320,430 decimal digits and was discovered on November 17. It is more than two million digits larger than the previous largest known prime number. Tens of thousands of people have volunteered the use of their PCs in this worldwide project that has harnessed the power of 211,000 computers, in effect creating a supercomputer capable of performing 9 trillion calculations per second. This enabled GIMPS to find the prime in just 2 years instead of the 25,000 years a single PC would have required. However, the untrusted environments in which the computations are performed tend to cast

suspicion on the veracity of the results returned by the participants. The ease with which anyone can become a part of the grid poses a big threat to the computation being performed. The true identity of the participantmay not be known to the supervisor. The supervisor does not have any control over the participants’ machines and cannot prevent them from manipulating the codes provided. Thus, the participant may not have performed the necessary computations but claim to have done so. This cheating behavior, if undetected, may render the results useless. Project managers from SETI@home have reportedly uncovered attempts by some users “to forge the amount of time they have donated in

order to move up on theWeb listings of top contributors” [10]. Yet SETI participants are volunteers who do not get paid for the cycles they contribute.When participants are paid for their contribution, they have strong incentives to cheat for maximizing their gain. Therefore, we needmethods to detect the cheating behavior in grid computing. This objective can

be formulated as the following uncheatable grid computing problem: