ABSTRACT

Apex-MAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 8.4 Using Apex-MAP to Assess Processor Performance . . . . . . . . . . . . . 173 8.5 Apex-MAP Extension for Parallel Architectures . . . . . . . . . . . . . . . . 174

8.5.1 Modeling Communication for Remote Memory Access . 175 8.5.2 Assessing Machine Scaling Behavior Based on

Apex-MAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 8.5.3 Using Apex-MAP to Analyze Architectural Signatures . 179

8.6 Apex-MAP as an Application Proxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 8.6.1 More Characterization Parameters . . . . . . . . . . . . . . . . . . . . . . 180 8.6.2 The Kernel Changes for Apex-MAP . . . . . . . . . . . . . . . . . . . . 182 8.6.3 Determining Characteristic Parameters for Kernel

Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 8.6.4 Measuring the Quality of Approximation . . . . . . . . . . . . . . . 184 8.6.5 Case Studies of Applications Modeled by Apex-MAP . . 184

8.6.5.1 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . . 184 8.6.5.2 Random Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 8.6.5.3 DGEMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 8.6.5.4 STREAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

8.6.5.5 FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 8.6.6 Overall Results and Precision of Approximation . . . . . . . 190

8.6.6.1 Coefficient of Determination R2 . . . . . . . . . . . 190 8.6.6.2 Percentage of Performance Difference . . . . . 191

8.7 Limitations of Memory Access Modeling . . . . . . . . . . . . . . . . . . . . . . . . 192 8.8 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

The memory behavior of a program greatly influences the overall performance observed on a specific architecture. Memory references, if not cached, involve large latencies, because of the large (and increasing) disparities between processor speeds and memory speeds, a phenomenon often termed the “memory wall.” Indeed, memory wall problems have significantly influenced processor architecture for years, so that, for instance, most transistors on present-day microprocessors are dedicated to the cache subsystem.