CE 431: PARALLEL COMPUTER ARCHITECTURE

Class Information


Course description

This course provides a detailed study on modern computer architectures.

The course begins by explaining the need for parallelism at all levels of system design and the trends towards multi-core and heterogeneous systems due to the physical limitations of unicore high performance processors.

It goes on to describe forms and patterns of parallelism such as instruction level (ILP), data level (DLP) and thread level parallelism (TLP) in modern high performance processors. Techniques for dynamic ILP extraction and deployment such as superscalar, out-of-order execution, speculation and branch prediction. Techniques for static ILP extraction such as VLIW as well as the accompanying compiler optimizations such as loop unrolling, software pipelining, predication, speculation, etc. are covered in detail. DLP technologies such as vector processing and GPUs. The course makes a brief introduction to CUDA.

Memory hierarchy is a foundamental aspect of modern CPU systems. We review cache functionality and we introduce hardware and software-level cache optimizations to reduce memory latency. We also discuss DRAM technology and design, as well as virtual memory technology.

Multi-core (or many-core) architectures that exploit thread(task) level parallelism are discussed in detail. There is special emphasis on problems of multi-core systems such as memory coherence and memory consistency. The course describes hardware and software techniques to resolve these issues, such as cache coherence mechanisms, synchronization primitives, and latest advances such as transactional memory, and streaming archictectures.

Newer technologies such as Warehouse Scale Computing (WSC), design and analysis of WSC, efficiency and cost, with emphasis on Google Datacenters. We also give special attention to Domain Specific Architectures (DSA) as components of heterogeneous platforms. Such architectures include FPGAs, and ASICs such as Google's TPU used for machine learning (ML) workloads. The effect of machine learning on modern architectures will also be discussed.

The course emphasizes the practical application of all these technologies in real machines. Throughout the class, we will be describing the architecture of modern real processors, such as Intel's x86 i7 microarchitecture, Intel's Itanium ISA, GPU architectures, and domain specific architectures such as Google's TPUs , FPGAs, etc.

There will be a number of homeworks and a final exam covering the material. There will also be weekly recitations based on study of research papers. Finally, the students will engage on a term project on configuration, simulation, and study of a multicore system.

Teaching staff

Instructor : Nikos Bellas
Office : 37 Gklavani Str. (Office B3.7)
Phone : 24210-74704
Email : nbellas at inf dot uth dot gr
Webpage : https://faculty.e-ce.uth.gr/nbellas
Office Hours : By appointment

Class schedule

Mon 16:00-18:00
Wed 16:00-18:00
The class starts on Monday, 11/02

Prerequisites

Textbooks

Advice: Internet is a vast resource of information on embedded systems. You should use it

Grading policy

Final Exam 30%
Labs 40%
Homeworks 30%

You need to receive at least 5 in the final exam to pass the class.
Late homework submissions will be penalized by 20% of the grade for each late day,
except in case of health emergencies, war, nuclear meltdowns, etc.
No exceptions, please.