Slow processes of neurons enable a biologically plausible approximation to policy gradient

About:: Read:: - [ ] Subramoney et al. () - Slow processes of neurons enable a biologically plausible approximation to policy gradient ➕2025-01-29 !!2 rd citation todoist Print::  ❌ Zotero Link:: Zotero Files:: attachment Reading Note:: Web Rip:: url:: https://slideslive.com/38924016/slow-processes-of-neurons-enable-a-biologically-plausible-approximation-to-policy-gradient?ref=og-meta-tags

TABLE without id
file.link as "Related Files",
title as "Title",
type as "type"
FROM "" AND -"Obsidian Assets"
WHERE citekey = "subramoneySlowProcessesNeurons" 
SORT file.cday DESC

Abstract

Recurrent neural networks underlie the astounding information processing capabilities of the brain, and play a key role in many state-of-the-art algorithms in deep reinforcement learning. But it has remained an open question how such networks could learn from rewards in a biologically plausible manner, with synaptic plasticity that is both local and online. We describe such an algorithm that approximates actor-critic policy gradient in recurrent neural networks. Building on an approximation of backpropagation through time (BPTT): e-prop, and using the equivalence between forward and backward view in reinforcement learning (RL), we formulate a novel learning rule for RL that is both online and local, called reward-based e-prop. This learning rule uses neuroscience inspired slow processes and top-down signals, while still being rigorously derived as an approximation to actor-critic policy gradient. To empirically evaluate this algorithm, we consider a delayed reaching task, where an arm is controlled using a recurrent network of spiking neurons. In this task, we show that reward-based e-prop performs as well as an agent trained with actor-critic policy gradient with biologically implausible BPTT.

Quick Reference

Top Notes

Tasks