Anomaly occurrence detection in crowd scenes is one of the most essential concepts nowadays. Therefore, many existing studies are presented in this area. However, all of the existing studies use handcrafted features to identify and detect the anomalies. This chapter proposes a novel supervised learning context to detect anomalies in various crowded scenes. Crowded scenes have different features like visual features, motion features and energy features. Those features are extracted using spatiotemporal measurements. For mid-level feature representation, three convolutional machines are trained after the multimodal fusion model is exploited to deep learn such crowd patterns. It depends upon the result of multimodal fusion one session support vector machine used to trace and detect the anomalies present in a crowded scene. The proposed algorithm is tested and assessed using obtainable data sets and associated with various available existing algorithms. The results for the proposed algorithm are better than various existing studies for detecting anomalies in 3D images and even in videos.