This chapter discusses how video-based processing tasks are different from image-based tasks. It covers two main-stream method to extract information from temporal plane including optical flow and CNN, and further discusses how they can work together. Besides, this chapter introduces human pose estimation as a popular application in video tasks and several ways to achieve it. More importantly, it analyzes how video-based methods can apply to tiny devices and gain a real-time performance, including quantization, residual skip and patch embedding. At last, this chapter also provides a practice project Mobile Human Pose for exercise.