When we had full knowledge of the states, we could use Markov Decision Processes to find the optimal policy. When this assumption breaks down, we need to come up with our best approximation. This is not a far stretch from how we might handle new scenarios in our own lives. When we begin a new task, we are certainly not experts. We may learn from a teacher or set off to explore on our own. As we practice and churn out the seemingly endless variations of our endeavour, we begin to develop a sense of what works and what doesn’t. We may not be able to articulate the exact rules that we follow, but we can certainly tell when we are doing well or poorly.
Table of Contents Key Terms Defining Goals Policies and Values Bellman Equations Optimality Optimizing the Policy Key Terms Agent: The learner or decision maker. Environment: The world that the agent can interact with. State: A representation of the agent and environment. Action: The agent can take an action in the environment. Reward: Given to the agent based on actions taken. Goal: Maximize rewards earned over time.
At time \(t\), the agent observes the state of the environment \(S_t \in \mathcal{S}\) and can select an action \(A_t \in \mathcal{A}(s)\), where \(\mathcal{A}(s)\) suggests that the available actions are dependent on the current state. At time \(t + 1\), the agent receives a reward \(R_{t+1} \in \mathcal{R}\).