ABSTRACT

Portfolio optimization is a critical tool for modern investors willing to maximize their capital utilization in the market and expected returns. As the number of decision factors and valuable data (produced by multiple automated systems) increases, it becomes more challenging to include such non-standard predictors in the optimization process. Many portfolio optimization techniques are based on different variants of constrained quadratic programming formalizations, taking as inputs expected asset returns, forms of risk metrics (like covariance matrices), and transaction costs. Next, the goal is selected (typically a maximization of return while keeping the risk level constant or minimizing the risk with varying returns), and the problem is solved either via a closed-form equation or a gradient-minimizing approach. Recent studies on DRL revealed that this family of algorithms could achieve satisfying performance results in complex, volatile, and stochastic environments. DRL originated from “classic” reinforcement learning (RL) that utilizes tabular calculation or function approximation methods. Portfolio optimization can be perceived as a Multi-Stage Decision Problem, which directly corresponds with the primary goal for DRL utilization.

The first part of this work presents a literature review of selected portfolio optimization techniques that use deep reinforcement learning algorithms. The second part presents experiments conducted on stock simulators and different assets in challenging market conditions after the COVID-19 outbreak and following pandemic events. DRL agents were compared with benchmarks (stock market indices and classic optimization strategies) in terms of Sharpe ratio, Calmar ratio, cumulative returns, and annual volatility, taking as their input multiple non-standard factors, like several technical indicators and summary statistics, beyond a typical quadratic programming setting. Presented DRL algorithms outperformed benchmarks in terms of cumulative returns and Share ratio, demonstrating the ability to optimize example portfolios properly, even in very demanding market conditions. Experimental results confirm the hypothesis that DRL is a promising set of tools for complex, stochastic, multi-stage decision problems, with various additional, potentially valuable information available. However, additional research is needed to ensure proper DRL results reproducibility, variance reduction, and interpretability.