1st MIAI Deeptails Seminar on October 28th from 3 p.m.

on the October 28, 2021

From 3 P.M
We are pleased to share with you the first MIAI Deeptails Seminar with Kazuki Irie and Robert Csordas.

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers


Strong baselines are crucial for properly measuring progress in machine learning. When revisiting configurations of existing baselines, we sometimes end up discovering some surprising results... Recently, many datasets have been proposed to test the systematic generalization ability of neural networks. The companion baseline Transformers, typically trained with default hyper-parameters from standard tasks, are shown to fail dramatically. Here we demonstrate that by revisiting model configurations as basic as scaling of embeddings, early stopping, relative positional embedding, and Universal Transformer variants, we can drastically improve the performance of Transformers on systematic generalization. We report improvements on five popular datasets: SCAN, CFQ, PCFG, COGS, and Mathematics dataset. Our models improve accuracy from 50% to 85% on the PCFG productivity split, and from 35% to 81% on COGS. On SCAN, relative positional embedding largely mitigates the EOS decision problem (Newman et al., 2020), yielding 100% accuracy on the length split with a cutoff at 26. Importantly, performance differences between these models are typically invisible on the IID data split. This calls for proper generalization validation sets for developing neural networks that generalize systematically.


Kazuki Irie is a postdoc working under Prof. Jürgen Schmidhuber at the Swiss AI Lab IDSIA, University of Lugano (USI) and SUPSI.
He is broadly interested in computer science, machine learning, and neural networks.
He completed his PhD in Computer Science at RWTH Aachen University in May 2020 under the supervision of Prof. Dr.-Ing. Hermann Ney, where he worked on neural network based language modelling to improve automatic speech recognition.
He worked twice as a research intern at Google USA, in New York in 2017 and in Mountain View in 2018.
Before joining RWTH Aachen, he studied Applied Mathematics at École Centrale Paris and ENS Cachan in France, and obtained Diplôme d'Ingénieur and Master of Science degrees.

Robert Csordas is a PhD candidate at the Swiss AI lab IDSIA, working with Prof. Jürgen Schmidhuber, where he works on systematic generalization, mainly in the context of algorithmic reasoning. This drives his research interest in network architectures (Transformers, DNC, graph networks) with inductive biases like information routing (attention, memory) and learning modular structures. His goal is to create a system that can learn generally applicable rules instead of pure pattern matching but with minimal hardcoded structure. He considers the lack of systematic generation to be the main obstacle to a more generally applicable artificial intelligence.

View the replay

Published on January 19, 2023