Teaching Robots to Perceive Time: A Twofold Learning Approach
Read:: - [x] Lourenco et al. (2020) - Teaching Robots to Perceive Time: A Twofold Learning Approach 🛫2023-11-28 !!2 rd citation todoist Print:: ✔ Zotero Link:: Zotero PDF:: Lourenco et al. - 2020 - Teaching Robots to Perceive Time A Twofold Learni.pdf Files:: attachment Reading Note:: Web Rip:: url:: https://ieeexplore.ieee.org/document/9278033/
TABLE without id
file.link as "Related Files",
title as "Title",
type as "type"
FROM "" AND -"ZZ. planning"
WHERE citekey = "lourencoTeachingRobotsPerceive2020"
SORT file.cday DESC
Abstract
The concept of time perception is used to describe the phenomenological experience of time. There is strong evidence that dopaminergic neurons are involved in the timing mechanisms responsible for time perception. The phasic activity of these neurons resembles the behavior of the reward prediction error in temporal-difference learning models. Therefore, these models are used to replicate the neuronal behaviour of the dopamine system and corresponding timing mechanisms. However, time perception has also been shown to be shaped by time estimation mechanisms from external stimuli. In this paper we propose a framework that combines these two principles, in order to provide temporal cognition abilities to intelligent systems such as robots. A time estimator based on observed environmental stimuli is combined with a reinforcement learning approach, using a feature representation called Microstimuli to replicate dopaminergic behaviour. The elapsed time perceived by the robot is estimated by modeling sensor measurements as Gaussian processes to capture the second-order statistics of the natural environment. The proposed framework is evaluated on a simulated robot that performs a temporal discrimination task originally performed by mice. The ability of the robot to replicate the timing mechanisms of the mice is demonstrated by the fact that both exhibit the same ability to classify the duration of intervals.
Quick Reference
Top Notes
- Did a quick read of the paper
- Relies on Bayesian understanding
- Some parts get abstract
- Using input from environment to detect time intervals (how exactly?)
- Like it observes a stochastic input and figures it from there
- Tries to emulate RPE for TD learning
Tasks
Further Reading
Extracted Annotations and Comments
Page 6
The observations yt(i) are considered to be the values of the ith angle of the simulated robot’s LIDAR at time t, collected while the robot does the “Wait” action between tones
If the observations are the value of an angle that is tied to a timed cyclic motion, isn’t that just a clock?
Figures (blue)
Figure
Page 2
Fig. 1. Framework used to replicate the biological estimation of time as a combination of external environmental stimuli and internal neuronal mechanisms as follows. ET) The agent computes an estimate of the elapsed time ˆ τ from environmental observations yt. IT) This estimate is employed in a temporal-difference learning algorithm that replicates internal timing mechanisms: based on the elapsed time estimate ˆ τ and the state st of the environment, the agent computes the Q-values (1) of each state-action pair, performs a corresponding action at (2), and receives a reward rt. Page 2
[!]
Page 5
Fig. 2. Desired state transition, obtained when the optimal action (on the top row) is chosen. After pressing the Start button, the state of the environment changes to Tone and the number of Interval states between the next Tone state is uniformly sampled from the maximum interval length, which is a design variable for each experiment. The agent estimates the number of time steps spent in the Interval state, ˆ τ , and, after the second Tone state, chooses the action Short or Long that corresponds to its estimate. If the correct action is chosen, a positive reward is given to the agent. Page 5
[!]
Page 5
[!]
Page 5
Fig. 3. Estimated ˆ τ for each interval duration. The mean of the estimated interval almost perfectly matches the ground truth (blue dots), while the standard deviation (faded blue) increases linearly with the interval length. Its linear regression is shown in black. This illustrates the scalar property, a trend also exhibited by humans and animals. Page 5
[!]
Page 6
Fig. 4. Analysis of the TD error in different phases of learning. In the three episodes chosen, τ = 2 time steps. As learning occurs, the agent starts expecting a reward after its correct classification of the interval (t = 4). Therefore, the TD error at the end of the episode decreases. Page 6
[!]
Page 6
Fig. 5. Evolution of the Q-values with the interval duration, after training. The time step values correspond to the state numbers from Figure 2, and each line to the Q-value of each action from the same figure. In the top figure, τ = 1 (short interval), and in the bottom one, τ = 8 time steps (long interval). Page 6