Paradoxically, while FPGA based configurable multi-processor platforms offer a high degree of flexibility for application development, performing design space exploration on configurable multi-processor platforms is very challenging. State-of-the-art design tools rely on low-level simulation, which is based on the register transfer level (RTL) and/or gate level implementations of the platform, for design space exploration. These low-level simulation techniques are inefficient for exploring the various hardware and software design tradeoffs offered by configurable multi-processor platforms. This is because of two major reasons. One reason is that low-level simulation based on register transfer/gate level implementations is too time consuming for evaluating the various possible configurations of the FPGA based multi-processor platform and hardware-software partitioning and implementation possibilities. Especially, RTL and gate-level simulation is inefficient for simulating the execution of software programs running on configurable multi-processor platforms. Considering the design examples shown in Section, low-level simulation using ModelSim [63] takes more than 25 minutes to simulate an FFT computation software program running on the MicroBlaze processor with a 1.5 msec execution time. This simulation speed can be overwhelming for development on configurable multi-processor platforms as software programs usually take minutes or hours to complete on soft processors. The other reason is that FPGA based configurable multi-processor platforms pose a potentially vast design space and optimization possibilities for application development. There are various hardware-software partitions of the target application and various possible mappings of the application to the multiple processors. For customized hardware development, there are many possible realizations of the customized hardware peripherals depending on the algorithms, architectures, hardware bindings, etc., employed by these peripherals. The communication interfaces between various hardware components (e.g., the topology for connecting the processors, the communication protocols for exchanging data

between the processors and the customized hardware peripherals and among the processors) would also significantly make efficient design space exploration more challenging. Exploring such a large design space using time-consuming low-level simulation techniques becomes intractable.