2022 in Review: Top language AI research papers + interesting papers to read

Jan 3

Written By Yi Tay

Disclaimer: Views are my own and do not reflect the opinions of my employer :)

Was reflecting on 2022 and tried to think of the top representative language AI research papers that defined the year.

This is my personal list that I’ve come up with, while consulting and gathering opinions from some folks (full list below).

It is intended as a resource if you’re new to the field, or just want a bird’s eye glance at 2022 pertaining to NLP research.

The only rules I imposed on this were that 1) the paper had to about language/NLP (I’m not including multimodal papers because thats an area that I’m not super familiar with) and 2) I cannot include my own first author papers in this list.

There’s also a bonus section for a somewhat curated list of noteworthy interesting papers at the end which folks might find interesting.

Best Language Papers of 2022

This list is not ordered.

I present a short statement and justification about each paper.

1) Training Compute-Optimal Large Language Models (“Chinchilla”)

This paper changed the way the field thinks about pretraining and data.

2) Chain of Thought Prompting Elicits Reasoning in Large Language Models

Staple of all LLM research and papers. Enough said.

3) Training Language Models to Follow Instructions with Human Feedback (“InstructGPT’)

Major impact via ChatGPT and popularized RLHF.

4) PaLM: Scaling Language Models with Pathways*

Current SOTA LLM for academic benchmarks and enabler of many breakthroughs such as Flan-PaLM, PaLM-SayCan, Med-PaLM.

5) Constitutional AI: Harmless from AI Feedback

Impactful idea that is already very impactful for LLM safety and also has applications beyond safety.

6) ST-MoE: Designing Stable and Transferable Sparse Expert Models

Best sparse model paper. Meticulous experiments, comprehensive evals and the goto reference for all sparse modeling.

7) Solving Quantitative Reasoning Problems with Language Models (“Minerva”)

Breakthrough results on Math.

8) Scaling Instruction-Finetuned Language Models (“Flan2”)*

Major Instruction Tuning work. Best open source models (Flan-T5).

9) Competition-level code Generation with AlphaCode

Breakthrough results on code.

10) LaMDA: Language Model for Dialog Applications

Breakthrough on dialog especially on factuality and safety.

11) Emergent Abilities of Large Language Models*

Emergence in LLMs is an impactful way to reason about scaling up LLMs and their capabilities. Incredible thought leadership.

12) Multitask Prompted Training Enables Zero-Shot Task Generalization (“T0”)

Great instruction tuning paper. While T0 is no longer SOTA, this was likely the best paper from BigScience.

Honourable Mentions

1) Galactica: A Large Language Model for Science

Well-executed paper with some good ideas.

2) Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Large Language Models (“BIG-Bench”)

Highly impactful benchmark.

—

Commentary &Thoughts:

Overall, I think this list is pretty self-explanatory and I personally think many would agree with these choices.

I also think many papers on this list are the usual suspects, e.g, CoT, Chinchilla, InstructGPT, Flan, PaLM, AlphaCode should be undeniable entries deserving of “game-changer” status. These papers have made tremendous impact on the field.

I personally really liked Galactica’s research paper for it’s execution and ideas. I also think that ST-MoE deserves more attention in general. The constitutional AI paper from Antrophic is also one of my fav ideas from this year.

Interesting/Good Papers of 2022

Here’s a list of 23 papers that I or people I’ve talked to really liked. Again, this list is not ordered. This list is relatively “crowd-sourced” and might miss out some interesting/good papers. Feel free to comment or email me if you think I missed something. Here are our picks!

1) What learning algorithm is in-context learning? Investigations with linear models

2) Transformers learn in-context by gradient descent

3) Diffusion-LM Improves Controllable Text Generation

4) SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions

5) Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints*

6) Efficiently Modeling Long Sequences with Structured State Spaces

7) TALM: Tool Augmented Language Models

8) Efficiently Scaling Transformer Inference

9) Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals

10) Large Language Models Encode Clinical Knowledge

11) Rethinking the Role of Demonstrations: What Makes In-Context Learning Work

12) Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

13) Deduplicating Training Data Mitigates Privacy Risks in Language Models

14) OPT: Open Pre-trained Transformer Language Models

15) GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

16) Cramming: Training a Language Model on a Single GPU in One Day

17) Large Language Models can Self-Improve

18) Self-Consistency Improves Chain-of-Thought Reasoning in Language Models

19) Recitation Augmented Language Models*

20) Language Modeling with Pixels

21) HELM: Holistic Evaluation of Language Models

22) Data Distributional Properties Drive Emergent In-Context Learning in Transformers

23) Human-level play in the game of Diplomacy by combining language models with strategic reasoning

*Change log: 5th Jan - Updated the list with “What learning algorithm is in-context learning? Investigations with linear models”

My own papers that I really like but can’t put on this list.

Just to put it somewhere!

1) UL2: Unifying Language Learning Paradigms

2) Transcending Scaling Laws with 0.1% Extra Compute

3) Transformer Memory as a Differentiable Search Index (“DSI”)

These are likely my own judgement of my “best work” for this year. Some of my collaborators feel they deserve to be on the list “somewhere” but they might just be trying to make me happy. Lol. Anyways, please do check them out too if you haven’t already!

—

(*) denotes that I’m a co-author on the paper. The initial rule was to only include papers that I’m not an co-author of, but this would pretty objectively ruin this list (removing PaLM, Flan-2 and Emergent Abilities). So we soften-ed the rule to only disallow first author papers.

With inputs & contributions from: