机器学习顶会NeurIPS 2022交流分享会-AIRS in the AIR 预约报名-AIRS 研究院活动-活动行

活动内容收起 展开

微信图片_20221123171155.png 讲座主题1:

Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification/针对不平衡分类的最优传输（OT）自动加权方法

Deep neural networks (DNNs) have achieved remarkable success in various applications, which is undoubtedly inseparable from the high-quality large-scale datasets. Imbalanced data pose challenges for deep learning based classification models. This paper introduces a novel automatic re-weighting method for imbalance classification based on optimal transport (OT). This method presents the imbalanced training set as a to-be-learned distribution over its training examples, each of which is associated with a probability weight. Similarly, our method views another balanced meta set as a balanced distribution over the examples. By minimizing the OT distance between the two distributions in terms of the defined cost function, the learning of weight vector is formulated as a distribution approximation problem. Our proposed re-weighting method bypasses the commonly-used classification loss on the meta set and uses OT to learn the weights, disengaging the dependence of the weight learning on the concerned classifier at each iteration. This is an approach different from most of the existing re-weighting methods and may provide new thoughts for future work. Experimental results on a variety of imbalanced datasets of both images and texts validate the effectiveness and flexibility of our proposed method.

嘉宾：

郭丹丹2020年博士毕业于西安电子科技大学，此后在香港中文大学（深圳）机器人与智能制造研究院（IRIM）、数据科学学院进行博士后研究，师从我校数据科学学院执行院长、机器学习著名学者查宏远教授。她的主要研究方向是模式识别机器学习，包括概率模型构建与统计推断，以数据为中心的机器学习算法，最优传输理论。所涉及的应用有图像生成及分类、文本分析、自然语言生成等。目前，她专注于现实应用中小样本分类、小样本生成、训练数据分布有偏等问题。她的科研成果发表在机器学习国际顶级会议、期刊上，如NeurIPS,ICML,ICLR, IJCV, TNNLS等。她也是多个国际会议的程序委员会委员和期刊审稿人，如ICML，NeurIPS，ICLR，JMLR, TSP等。

讲座主题2:

Learning Substructure Invariance for Out-of-Distribution Molecular Representations

分子表示学习(Molecular Representation Learning)已得到广泛关注，目前已有方法已在各种任务中表现出色，例如在分子特性预测和靶点识别任务中。然而，现有方法的模型设计或实验评估过程中都是基于训练和测试数据是独立同分布的这样的假设。而在实际应用中，这样的假设很可能会不成立，因为测试分子极有可能来自模型训练阶段未见过的环境，从而导致严重的性能下降。在这篇工作中，受来自不同环境(例如分子骨架、分子尺寸等)的分子们的生物化学性质通常与某些分子子结构稳定相关这样一个现象的启发，我们提出了一个名为MoleOOD的新分子表示学习框架，以增强分子表示学习模型对这种分布变化的鲁棒性。具体来说，我们引入了一个环境推理模型，以完全数据驱动的方式识别影响数据生成过程的潜在因素，即环境变量。我们还提出了一个新的学习目标来指导分子编码器利用这些与跨环境的分子性质标签更稳定相关的子结构。在十个真实数据集上的实验结果表明，即使缺少事先人为标注好的环境标签，在各种分布外(OOD)场景下，利用模型自行推理得到的环境标签，我们的模型比现有方法具有更强的泛化能力。

嘉宾：

杨念祖，上海交通大学计算机系博士生，目前是直博二年级。2021年毕业于上海交通大学IEEE试点班计算机专业，取得学士学位。他的研究兴趣包括图神经网络，生成模型，OoD泛化，以及AI制药等方向。

举报活动