Future effectiveness of centralized and distributed memory architectures for parallel computing systems
Chinthamani, Meemakshisundaram Ramanthan
This work is concerned with the question of how current parallel systems would need to evolve in terms of hardware capabilities, architectures, and software control policies, if they are to continue to be useful computing platforms in the future. This question is motivated by the past and continuing increases in processor speeds that substantially exceed the increases in performance of other hardware resources, including, for example, the networks needed for communication between processors in parallel systems. The hardware capabilities considered include processor speed, inter-processor communication network latency, inter-processor communication network bandwidth, and cache sizes. The candidate parallel system architectures considered include centralized memory architectures, in which all main memory accesses must traverse an interconnection network to access a shared, centralized memory, and distributed memory architectures, in which the total system main memory is distributed among the processors, such that only accesses to a part of the main memory allocated to another processor need traverse a global interconnection network. The software control policies considered include affinity scheduling policies that assign computations to various processors based on the likely contents of their caches and the local memories, and latency tolerant scheduling policies that permit aggregation of computation and communication into larger units. Results of scalability analyses are presented in two forms: asymptotic results and transient results. Asymptotic results are developed by studying the execution characteristics of a number of data parallel numeric and scientific applications, as the system parameters of processor speed, cache sizes, inter-processor communication network bandwidth, and latency, and the application parameters are scaled, while holding the number of processors fixed, under a time-constrained scaling model. Asymptotic results are concerned with the question of whether an application will eventually become communication bound, and thus the machine becomes unsuitable as a parallel computing platform for the application. On the other hand, the transient results are concerned with the rate at which the asymptotic results take hold. Based on the asymptotic results for centralized memory architectures, the set of applications considered in this work are classified into four types, termed Type I, Type II, Type III, and Type IV. Affinity scheduling techniques do not have any impact on the asymptotic results for Type I or Type III applications. However, affinity scheduling techniques have a significant impact on the asymptotic results for Type II and Type IV applications. These scheduling techniques reduce significantly the bus bandwidth scaling factor required for Type II and Type IV applications to become computation bound asymptotically, provided cache sizes scale fast enough to eventually contain the entire application data sets. For the class of data parallel near-neighbour computations, a new latency tolerant scheduling policy is proposed. It is shown to alleviate substantially the potential "latency bottleneck" in high latency parallel computing environments. Yet, it is shown that for the class of one and two-dimensional near-neighbour computations, the proposed scheduling policy increases the total communication volume only by at most a constant factor compared to the conventional scheduling policy. It is also shown that, for arbitrary d-dimensional near-neighbour computations d>=1 , the asymptotic bandwidth scaling requirements are at the same level as for conventional scheduling. The benefits of the proposed latency tolerant scheduling policy are also demonstrated through an experimental study.
DegreeDoctor of Philosophy (Ph.D.)
CommitteeEager, Derek L.
Copyright DateApril 1999