ABSTRACT

Predicting stock market quotes has always been an attractive field of research, hence the large volume of analysis methods and software tools. The majority of such approaches are based on processing of historical data, domain knowledge and expertise and statistical analysis. Throughout recent developments, prediction has also relied on textual information, based on the logical hypothesis that the course of a stock price can also be affected by news articles. Nowadays, news is easily accessible, access to important data such as inside company information is relatively cheap and estimations emerge from a large resource such as economists, statisticians, journalists, etc., through the internet. When data tend to grow very large, both in terms of records and attributes, numerous classification algorithms face noteworthy complications, resulting in poor forecast performance. The goal of this study is to explore a potential solution to the task at hand, by considering an ensemble algorithm and shifting its core training stage by incorporating a Markov blanket methodology, which eliminates irrelevant features. Experimental results from real cases support our claim that the proposed approach can cope with large volumes of data and be more accurate than existing state-of-the-art classification approaches, resulting in higher profits.