Today’s embedded systems processor contains multiple processing cores and multiple memory modules (banks), as shown in Figure 3.1. The whole embedded system may have multiple processors with local and remote memory. Each memory module contains several memory units. There is special hardware called Special Hardware Memory Units (SHMU) inside each processor. A shared bus is used to exchange data in the system. The on-chip memory has a tight memory size constraint and fast access velocity while the remote memory (with multi-port) is larger and slower.