Integrating Models of Interval Timing and Reinforcement Learning
Read:: - [ ] Petter et al. (2018) - Integrating Models of Interval Timing and Reinforcement Learning â2023-12-04 !!2 rd citation todoist Print:: Â â Zotero Link:: Zotero Files:: attachment Reading Note:: Web Rip:: url:: https://www.sciencedirect.com/science/article/pii/S1364661318301931
TABLE without id
file.link as "Related Files",
title as "Title",
type as "type"
FROM "" AND -"Obsidian Assets"
WHERE citekey = "petterIntegratingModelsInterval2018"
SORT file.cday DESCAbstract
We present an integrated view of interval timing and reinforcement learning (RL) in the brain. The computational goal of RL is to maximize future rewards, and this depends crucially on a representation of time. Different RL systems in the brain process time in distinct ways. A model-based system learns âwhat happens whenâ, employing this internal model to generate action plans, while a model-free system learns to predict reward directly from a set of temporal basis functions. We describe how these systems are subserved by a computational division of labor between several brain regions, with a focus on the basal ganglia and the hippocampus, as well as how these regions are influenced by the neuromodulator dopamine.
Quick Reference
Top Notes
Tasks
Extracted Annotations and Comments
Page 911
A model-based system learns âwhat happens whenâ, employing this internal model to generate action plans, while a model-free system learns to predict reward directly from a set of temporal basis functions.
Page 911
The goal of RL is to maximize expected cumulative future reward (i.e., value). This requires the agent to solve a âprediction problemâ, which requires estimating the value of each possible action, and an âoptimization problemâ, which involves balancing exploration and exploitation to select the optimal action.
Page 912
This example illustrates how an agent needs to anticipate when the reward will be delivered to solve the RL problem. In this case, the agent must explicitly encode time intervals as part of its state representation. Thus, IT is an integral part of RL, cutting across different algorithmic solutions.
Page 912
By contrast, different algorithms make use of time representation in different ways. Model-free algorithms use the time representation as a basis set for approximating the value function. Most commonly, this means approximating values as linear combinations of basis functions defined over time. In some instances, however, nonlinear function approximators, such as recurrent neural networks [9], may be more effective and biologically realistic representations of time.
Page 912
Model-based algorithms use the time representation to compute values by simulating the environmental dynamics. This is more computationally intensive than model-free algorithms, but endows model-based algorithms with more flexibility, because parameter changes in the internal model of temporal structure will immediately change the value estimates.
Page 912
Timing in Model-Free Reinforcement Learning Model-free algorithms directly estimate the value function by interacting with the environment. The canonical example is temporal difference (TD) learning [14], which uses the discrepancy between received and expected reward (the TD error) to update its value estimates (see Box 1 for technical details). This algorithm has been influential in neuroscience due to the correspondence between the TD error and the firing of midbrain dopamine neurons [15,16]. The value function is thought to be encoded at corticostriatal synapses, where plasticity is modulated by dopamine release similar to the circuits subserving IT [17â21]. This dopamine-dependent plasticity functions within a precise time window (i.e., 0.3â2.0 s), which has been demonstrated using optical activation of dopaminergic and glutamatergic afferents [22].
Figures
