ABSTRACT

In a natural learning scenario, people meet such qualitative feedback-based learning than supervised learning. This type of learning is called as "reinforcement-based" learning or reward-based learning, where one takes an action in response to an environmental situation, and the interaction between the action and the environment leads one taking the action to change its state relative to the environment. The most important feature in a reward-based learning paradigm is the estimate of the total expected future rewards at any given state. The function that estimates this is known as the value function. In certain cases, it can be a delayed reward where the reward is obtained once the whole behavior is completed. It should be clear that the accuracy of action selection depends on the accuracy of the value functions, because at each state, the agent takes the action with the best total expected future rewards.