Parallel programming : for multicore and cluster systems
Material type:
- 9783642378003
Item type | Home library | Call number | Status | Date due | Barcode | |
---|---|---|---|---|---|---|
![]() |
Biblioteca de la Facultad de Informática | D.1.3 RAU (Browse shelf(Opens below)) | Available | DIF-04822 |
Traducción de la edición alemana Parallele Programmierung, 3rd ed. Springer-Verlag, 2012. -- Incluye índice y bibliografía.
1 Introduction -- 1.1 Classical Use of Parallelism -- 1.2 Parallelism in Today’s Hardware -- 1.3 Basic Concepts -- 1.4 Overview of the Book -- 2 Parallel Computer Architecture -- 2.1 Processor Architecture and Technology Trends -- 2.2 Flynn’s Taxonomy of Parallel Architectures -- 2.3 Memory Organization of Parallel Computers -- 2.3.1 Computers with Distributed Memory Organization -- 2.3.2 Computers with Shared Memory Organization -- 2.3.3 Reducing memory access times -- 2.4 Thread-Level Parallelism -- 2.4.1 Simultaneous Multithreading -- 2.4.2 Energy Consumption of Processors -- 2.4.3 Multicore Processors -- 2.4.4 Architecture of Multicore Processors -- 2.4.5 Example: Architecture of the Intel Core i7 -- 2.5 Interconnection Networks -- 2.5.1 Properties of Interconnection Networks -- 2.5.2 Direct Interconnection Networks -- 2.5.3 Embeddings -- 2.5.4 Dynamic Interconnection Networks -- 2.6 Routing and Switching -- 2.6.1 Routing Algorithms -- 2.6.2 Routing in the Omega Network -- 2.6.3 Switching -- 2.6.4 Flow control mechanisms -- 2.7 Caches and Memory Hierarchy -- 2.7.1 Characteristics of Caches -- 2.7.2 Write Policy -- 2.7.3 Cache coherency -- 2.7.4 Memory consistency -- 2.8 Example: IBM Blue Gene supercomputer -- 2.9 Exercises for Chapter 2 -- 3 Parallel Programming Models -- 3.1 Models for parallel systems -- 3.2 Parallelization of programs -- 3.3 Levels of parallelism -- 3.3.1 Parallelism at instruction level -- 3.3.2 Data parallelism -- 3.3.3 Loop parallelism -- 3.3.4 Functional parallelism -- 3.3.5 Explicit and implicit representation of parallelism -- 3.3.6 Parallel programming patterns -- 3.4 SIMD Computations -- 3.4.1 Execution of vector operations -- 3.4.2 SIMD instructions -- 3.5 Data distributions for arrays -- 3.5.1 Data distribution for one-dimensional arrays -- 3.5.2 Data distribution for two-dimensional arrays -- 3.5.3 Parameterized data distribution -- 3.6 Information exchange -- 3.6.1 Shared variables -- 3.6.2 Communication operations -- 3.7 Parallel matrix-vector product -- 3.7.1 Parallel computation of scalar products -- 3.7.2 Parallel computation of the linear combinations -- 3.8 Processes and Threads -- 3.8.1 Processes -- 3.8.2 Threads -- 3.8.3 Synchronization mechanisms -- 3.8.4 Developing efficient and correct thread programs -- 3.9 Further parallel programming approaches -- 3.9.1 Approaches for new parallel languages -- 3.9.2 Transactional memory -- 3.10 Exercices for Chapter 3 -- 4 Performance Analysis of Parallel Programs -- 4.1 Performance Evaluation of Computer Systems -- 4.1.1 Evaluation of CPU Performance -- 4.1.2 MIPS and MFLOPS -- 4.1.3 Performance of Processors with a Memory Hierarchy -- 4.1.4 Benchmark Programs -- 4.2 Performance Metrics for Parallel Programs -- 4.2.1 Speedup and Efficiency -- 4.2.2 Scalability of Parallel Programs -- 4.3 Asymptotic Times for Global Communication -- 4.3.1 Implementing Global Communication Operations -- 4.3.2 Communications Operations on a Hypercube -- 4.4 Analysis of Parallel Execution Times -- 4.4.1 Parallel Scalar Product -- 4.4.2 Parallel Matrix-vector Product -- 4.5 Parallel Computational Models -- 4.5.1 PRAM Model -- 4.5.2 BSP Model -- 4.5.3 LogP Model -- 4.6 Loop Scheduling and Loop Tiling -- 4.6.1 Loop Scheduling -- 4.6.2 Loop Tiling -- 4.7 Exercises for Chapter 4 -- 5 Message-Passing Programming -- 5.1 Introduction to MPI -- 5.1.1 MPI point-to-point communication -- 5.1.2 Deadlocks with Point-to-point Communications -- 5.1.3 Nonblocking Operations and Communication Modes -- 5.1.4 Communication mode -- 5.2 Collective Communication Operations -- 5.2.1 Collective Communication in MPI -- 5.2.2 Deadlocks with Collective Communication -- 5.3 Process Groups and Communicators -- 5.3.1 Process Groups in MPI -- 5.3.2 Process Topologies -- 5.3.3 Timings and aborting processes -- 5.4 Introduction to MPI-2 -- 5.4.1 Dynamic Process Generation and Management -- 5.4.2 One-sided communication -- 5.5 Exercises for Chapter 5 -- 6 Thread Programming -- 6.1 Programming with Pthreads -- 6.1.1 Creating and Merging Threads -- 6.1.2 Thread Coordination with Pthreads -- 6.1.3 Condition Variables -- 6.1.4 Extended Lock Mechanism -- 6.1.5 One-Time Initialization -- 6.1.6 Implementation of a Task Pool -- 6.1.7 Parallelism by Pipelining -- 6.1.8 Implementation of a Client-Server Model -- 6.1.9 Thread Attributes and Cancellation -- 6.1.10 Thread Scheduling with Pthreads -- 6.1.11 Priority Inversion -- 6.1.12 Thread-specific Data -- 6.2 Java Threads -- 6.2.1 Thread Generation in Java -- 6.2.2 Synchronization of Java Threads -- 6.2.3 Wait and Notify -- 6.2.4 Extended Synchronization Patterns -- 6.2.5 Thread Scheduling in Java -- 6.2.6 Package java.util.concurrent -- 6.3 OpenMP -- 6.3.1 Compiler directives -- 6.3.2 Execution environment routines -- 6.3.3 Coordination and synchronization of threads -- 6.4 Exercises for Chapter 6 -- 7 General Purpose GPU Programming -- 7.1 The Architecture of GPUs -- 7.2 Introduction to CUDA Programming -- 7.3 Synchronization and Shared Memory -- 7.4 CUDA Thread Scheduling -- 7.5 Efficient Memory Access and Tiling Technique -- 7.6 Introduction to OpenCL -- 7.7 Exercises for Chapter 7 -- 8 Algorithms for Systems of Linear Equations -- 8.1 Gaussian Elimination -- 8.1.1 Gaussian Elimination and LU Decomposition -- 8.1.2 Parallel Row-Cyclic Implementation -- 8.1.3 Parallel Implementation with Checkerboard Distribution -- 8.1.4 Analysis of the Parallel Execution Time -- 8.2 Direct Methods for Linear Systems with Banded Structure -- 8.2.1 Discretization of the Poisson Equation -- 8.2.2 Tridiagonal Systems -- 8.2.3 Generalization to Banded Matrices -- 8.2.4 Solving the Discretized Poisson Equation -- 8.3 Iterative Methods for Linear Systems -- 8.3.1 Standard Iteration Methods -- 8.3.2 Parallel implementation of the Jacobi Iteration -- 8.3.3 Parallel Implementation of the Gauss-Seidel Iteration -- 8.3.4 Gauss-Seidel Iteration for Sparse Systems -- 8.3.5 Red-black Ordering -- 8.4 Conjugate Gradient Method -- 8.4.1 Sequential CG method -- 8.4.2 Parallel CG Method -- 8.5 Cholesky Factorization for Sparse Matrices -- 8.5.1 Sequential Algorithm -- 8.5.2 Storage Scheme for Sparse Matrices -- 8.5.3 Implementation for Shared Variables -- 8.6 Exercises for Chapter 8 -- References -- Index