ABSTRACT

This chapter discusses looks at more advanced issues in cache coherence protocol design. One of the issues is how cache coherence protocols can scale to a larger system size. Broadcast and snoopy protocols hit a scalability issue relatively early because traffic and snoop frequency scale at least linearly with the number of processors. Available interconnect bandwidth gets saturated quickly with broadcast traffic. The chapter describes directory cache coherence protocols, which allows for scalable implementation by avoiding broadcasts. It also discusses implementation issues for coherence protocols such as how to deal with protocol races, the use of transient states, etc. Finally, the chapter considers contemporary multicore design issues, such as dealing with imprecise directory information, looking into whether coherence should be tracked at a single or multiple granularities, how coherence can be designed to allow a multicore system to be partitioned, and a how thread migration cost may be reduced.