The explosion of Cloud-based services increased the demand for computing power. While this challenge can be easily tackled by adding more resources, achieving high energy efficiency is far more complex. In recent years, to cope with such challenge, different hardware systems (e.g., GPUs, FPGAs, DSPs) became part of the standard data center hardware ecosystem. Beside their growing adoption on the core of the Cloud, also edge systems started to embrace heterogeneity to enable data processing closer to the data sources. In this context, performance is not the only metrics to compare different platforms, but energy consumption is also relevant. This chapter discuss in depth the case study of a machine learning application implemented on the Parallella platform, a low power, low cost embedded system. Computer vision is a widely explored application domain, with large room for optimal implementations on edge-oriented devices, where the limited hardware resources become the main design constraint. First, machine learning approaches are presented, with a focus on Convolutional Neural Networks. Then, different design solutions are analysed, by highlighting the impact on performance and energy consumption.