Design of Parallel Systems

Design of Parallel Systems

Since large-scale parallel database systems are used primarily for storing large volumes of data and for processing decision-support queries on those data, these topics are the most important in a parallel database system. Parallel loading of data from external sources is an important requirement, if we are to handle large volumes of incoming data. A large parallel database system must also address the following availability issues: 

  1. Resilience to failure of some processors or disks,
  2. One-line re-organization of data and schema changes. 

With a large number of processors and disks, the probability that atleast one processor or disk will malfunction is significantly greater than in a single-processor system with one disk. A poorly designed parallel system will stop functioning if any component (processor or disk) fails. Assuming that the probability of failure of a single processor or disk is small, the probability of failure of the system goes up linearly with the number of processors and disks. If a single processor or disk would fail once every 5 years, a system with 100 processors would have a failure every 18 days. 

Therefore, large-scale parallel database systems, such as Tandem and Teradata machines, are designed to operate even if a processor or disk fails. Data are replicated across atleast two processors. If a processor fails, the data that is stored can still be accessed from the other processors. The system keeps track of failed processors and distributes the work among functioning processors. Requests for data stored at the failed site are automatically routed to the backup sites that store a replica of the data. If all the data of a processor A are replicated at a single processor B, B will have to handle all the requests to A as well as those to itself and that will result in B becoming a bottleneck. Therefore, the replicas of the data of a processor are partitioned across multiple other processors.