回到顶部

OR 和 IEEE TAC 论文一作独家解读论文 | AIRS in the AIR

2022年11月15日 9:00 ~ 2022年11月15日 11:00
线上活动 (支持回放)

收起

活动票种
    付费活动,请选择票种
    展开活动详情

    活动内容收起

     近日,AIRS 机器学习与应用中心在运筹与管理科学领域顶刊 Operations Research 和自动控制领域国际顶刊 IEEE Transactions on Automatic Control 发表论文。

    本周二的 AIRS in the AIR,我们邀请两位论文一作佐治亚理工学院工业工程系博士生王捷和香港中文大学(深圳)数据科学学院博士生黄琨,围绕非策略评估问题的稳健型算法和提升分布式随机梯度算法的暂态时间,为我们带来最全面的论文解读,欢迎观看直播,与作者实时交流。

    香港中文大学(深圳)校长学勤讲座教授、AIRS 机器学习与应用中心主任查宏远担任执行主席。香港中文大学(深圳)博士后研究员郭丹丹担任主持人。


    报告主题:Reliable Off-Policy Evaluation for Reinforcement Learning

    报告嘉宾:王捷

    In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy. Reinforcement learning in high-stake environments, such as healthcare and education, is often limited to off-policy settings due to safety or ethical concerns or inability of exploration. Hence, it is imperative to quantify the uncertainty of the off-policy estimate before deployment of the target policy. In this paper, we propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged trajectories data. Leveraging methodologies from distributionally robust optimization, we show that with proper selection of the size of the distributional uncertainty set, these estimates serve as confidence bounds with nonasymptotic and asymptotic guarantees under stochastic or adversarial environments. Our results are also generalized to batch reinforcement learning and are supported by empirical analysis.


    王捷于2020年获香港中文大学(深圳)理工学院数学与应用数学理学学士学位(乙等一级),目前在佐治亚理工学院攻读工业工程系博士。他的研究兴趣包括统计学习、优化理论与算法,以及网络信息论等方向。



    报告主题:Improving the transient times for distributed stochastic gradient methods

    报告嘉宾:黄琨

    We consider the distributed optimization problem where n agents, each possessing a local cost function, collaboratively minimize the average of the n cost functions over a connected network. Assuming stochastic gradient information is available, we study a distributed stochastic gradient algorithm, called exact diffusion with adaptive stepsizes (EDAS) adapted from the Exact Diffusion method and NIDS and perform a non-asymptotic convergence analysis. We not only show that EDAS asymptotically achieves the same network independent convergence rate as centralized stochastic gradient descent (SGD) for minimizing strongly convex and smooth objective functions, but also characterize the transient time needed for the algorithm to approach the asymptotic convergence rate, which behaves as KT=O(n/(1-λ2)), where 1-λ2 stands for the spectral gap of the mixing matrix. To the best of our knowledge, EDAS achieves the shortest transient time when the average of the n cost functions is strongly convex and each cost function is smooth. Numerical simulations further corroborate and strengthen the obtained theoretical results.

    黄琨于2018年获同济大学数学科学学院数学与应用数学学士学位,2020年获康涅狄格大学统计学硕士学位,目前在香港中文大学(深圳)数据科学学院攻读数据科学博士学位。他的研究兴趣包括分布式优化。



    举报活动

    活动标签

    最近参与

    • 153****1413
      收藏

      (3天前)

    • 艺兴
      收藏

      (2个月前)

    • 微信用户
      报名

      (2年前)

    • 微信用户
      报名

      (2年前)

    • 微信用户
      报名

      (2年前)

    • 微信用户
      报名

      (2年前)

    报名须知

    1、本活动具体服务及内容由主办方【AIRS 研究院】提供,活动行仅提供票务技术支持,请仔细阅读活动内容后参与。

    2、如在活动参与过程中遇到问题或纠纷,双方应友好协商沟通,也可联络活动行进行协助。

    您还可能感兴趣

    您有任何问题,在这里提问!

    为营造良好网络环境,评价信息将在审核通过后显示,请规范用语。

    全部讨论

    还木有人评论,赶快抢个沙发!

    活动主办方更多

    微信扫一扫

    分享此活动到朋友圈

    免费发布