講座名稱:A Causal View for Reward Redistribution in Reinforcement Learning
講座人:杜雅麗 助理教授
講座時(shí)間:7月9日14:30
地點(diǎn):北校區(qū)100號樓306
講座人介紹:
杜雅麗,倫敦大學(xué)國王學(xué)院助理教授。此前她是倫敦大學(xué)學(xué)院人工智能中心博士后研究員。主要研究興趣為強(qiáng)化學(xué)習(xí)、多智能體學(xué)習(xí)協(xié)作。 研究成果已廣泛發(fā)表在ICLR、ICML、NeurIPS以及AI Journal等頂級會(huì)議和期刊。她曾在 ACML2022, AAAI2023 大會(huì)上演講合作多智能體學(xué)習(xí)的教程 (Tutorial)。 曾多次擔(dān)任國際知名期刊的編輯和會(huì)議的審稿人或程序委員,AAMAS 2023組委會(huì), 擔(dān)任Journal of AAMAS (CCF B類) 特刊主編,IEEE Transactions on AI 副主編,擔(dān)任AAAI 2022/2023 高級程序委員。因在合作強(qiáng)化學(xué)習(xí)上的貢獻(xiàn),入選AAAI New Faculty Highlights programme (2023), Rising Stars in AI (KAUST 2023), WAIC云帆獎(jiǎng)(2023), KCL年度學(xué)術(shù)貢獻(xiàn)獎(jiǎng) (2022)。她的研究也受到英國工程和自然科學(xué)研究理事會(huì) (UKRI EPSRC)資助。
講座內(nèi)容:
In reinforcement learning, a significant challenge lies in identifying the state-action pairs responsible for delayed future rewards. Return Decomposition addresses this challenge by redistributing rewards from observed sequences while maintaining policy invariance. However, existing approaches lack interpretability. In this talk, we propose a novel framework called Generative Return Decomposition (GRD) that explicitly models the contributions of state and action from a causal perspective. GRD utilizes causal generative models to characterize the generation of Markovian rewards and trajectory-wise long-term return. By identifying the causal relations and unobservable Markovian rewards, GRD provides a compact representation for policy optimization within the most favorable subspace of the agent's state space. Theoretical analysis confirms the identifiability of the unobservable Markovian reward function and causal structure. Experimental results demonstrate the superior performance of GRD compared to state-of-the-art methods, while the provided visualization showcases the interpretability of our approach.
主辦單位:電子工程學(xué)院