Course Description

The application of neural network methods - under the name Deep Learning - has led to breakthroughs in a wide range of fields, including in building language technologies (e.g. for search, translation, text input prediction). This course will provide a hands-on introduction to the use of deep learning methods for processing natural language. Methods to be covered include static word embeddings, feed-forward networks for text, recurrent neural networks, transformers, pre-training and transfer learning, with applications including sentiment analysis, translation, generation, and testing Linguistic theory.

Days Time Location
Monday and Wednesday 10:25 - 11:40 AM Hylan 307

Teaching Staff

Role Name Office Office Hours
Instructor C.M. Downey Lattimore 507 Wednesdays 2-4pm

Recommended Textbooks

While relevant readings are posted in the schedule below, the following are very good general resources. Names that are used to refer to these works are included in brackets.

Prerequisites

  • Programming in Python
  • Linux/Unix commands
  • Calculus 1

Course Resources

  • More information coming soon

Policies

Homework

Students will complete 8 homeworks, comprised of both written and (Python) programming assignments. Unless noted otherwise on the schedule, homeworks will be released on Wednesdays, and due at 11pm on the following Wednesday. All homework will be submitted via Blackboard.

All deadlines and meeting times for this class are in "Eastern Time". Please note: on Sunday November 3, this will change from Eastern Daylight Time (EDT/UTC-4) to Eastern Standard Time (EST/UTC-5).

Late work

All work should be submitted by 11:00pm the day it is due. Work that is received late will incur the following penalties:

  • Up to 1 hour late: 5%
  • Up to 24 hours late: 10%
  • Up to 48 hours late: 20%
  • Later than 48 hours: not graded (0 for the assignment)

Extensions (without penalty) may be offered if they are requested within a reasonable amount of time (relative to the reason for the extension) before the work is due. Please don't hesitate to ask for an extension if you need one.

Special topic presentations

The latter portion of the course will focus on examples of Deep Learning being applied to Linguistics and Linguistic Theory. Students will pick a scholarly paper featuring such an application and present the work during class, including leading a discussion. Depending on course enrollment, this may be completed inidividually or as a small group.

Final grading

  • 80%: Homework assignments
  • 15%: Special topic presentation / discussion
  • 5%: Participation / attendance

Exceptions

Students will not be penalized because of important civic, ethnic, family or religious obligations, or university service. You will have a chance, whenever feasible, to make up within a reasonable time any assignment that is missed for these reasons. Absences for these reasons will count as excused for the sake of the participation grade. But it is your job to inform me of any expected missed work in advance, as soon as possible.

Academic honesty

All assignments and activities associated with this course must be performed in accordance with the University of Rochester's Academic Honesty Policy. More information is available here. Please note: The use of Generative AI to produce any part of the written or programming assignments is not allowed. Due to the topicality of the course, I will make an exception if you implement and train the model yourself (i.e. no use of pre-trained weights or API calls to pre-existing models), and turn in the implementation with the assignment you used in on. For the sake of your time, I do not recommend this option.

Schedule


Date Topics + Slides Readings Events
Aug 26 Introduction / Overview; History
Aug 28 Linear Algebra Essence of Linear Algebra Ch.1-8
Sep 2 Labor Day: no class
Sep 4 Word vectors; Gradient descent JM 5.4-5.6, 6
YG 2
hw1 released
[pdf, tex]
[due Sep 11]
Sep 9 Word2Vec JM 6.8 - 6.12
Sep 11 Computation graphs; Backpropagation JM 7.5.3 - 7.5.5
YG 5.1.1 - 5.1.2

Calculus on computational graphs
CS 231n notes
Yes, you should understand backprop
Sep 16 Github Classroom and Codespaces (Demo) hw2 released
[pdf, tex]
[due Sep 23]
Sep 18 Neural Networks
edugrad library
JM 7.1 - 7.4
YG 4
Sep 23 Feed-forward networks for LM and classification JM 7.5
YG 9

A Neural Probabilistic Language Model (Bengio et al 2003)
Deep Unordered Composition Rivals Syntactic Methods for Text Classification (Iyyer et al 2015)
hw3 released
[pdf, tex]
[due Sep 30]
Sep 25 Recurrent Neural Networks JM 9.1-9.5

The Unreasonable Effectiveness of Recurrent Neural Networks
Sep 30 Vanishing gradients; RNN variants JM 9.6
YG 15

Understanding LSTMs
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
On the difficulty of training recurrent neural networks
hw4 released
[pdf, tex]
[due Oct 7]
Oct 2 Sequence-to-sequence; Attention JM 10

Sequence to Sequence Learning with Neural Networks (original seq2seq paper)
Neural Machine Translation by Jointly Learning to Align and Translate (original seq2seq + attention paper)
Oct 7 Transformers 1 JM 9.7-9.9

Attention is All You Need (original Transformer paper)
The Annotated Transformer
The Illustrated Transformer
Oct 9 Transformers 2 "
Oct 14 Fall Break: no class
Oct 16 Pre-training / fine-tuning paradigm JM 11
Contextual Word Representations: Putting Words into Computers
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
hw5 released
[pdf, tex]
[due Oct 23]
Oct 21 Pre-training / fine-tuning paradigm (cont.) "
Oct 23 Text tokenization in language models
Oct 28 Interpretability and analysis Analysis Methods in Natural Language Processing
A Primer in BERTology
Oct 30 Multilingual language models Cross-Lingual Language Model Pretraining
Optional / peruse if interested:
Are All Languages Created Equal in Multilingual BERT?
Emerging Cross-lingual Structure in Pretrained Language Models
On the Cross-lingual Transferability of Monolingual Representations
Word Translation Without Parallel Data
Bilingual alignment transfers to multilingual alignment for unsupervised parallel text mining
hw6 released
[pdf, tex]
[due Nov 6]
Nov 4 "Large Language Models" (LLMs)
Nov 6 Questions of LLM hype and dangers hw7 released
[pdf, tex]
[due Nov 13]
Nov 11 Instructor at conference, no class
Nov 13 Instructor at conference, no class
Nov 18 Class cancelled (illness)
Paper presentation assigned
hw8 released
[pdf, tex]
[due Nov 25]
Nov 20 EMNLP Conference Highlights
Nov 25 TBA
Nov 27 Thanksgiving Break: no class
Dec 2 Presentations 1 TBA
Dec 4 Presentations 2 TBA
Dec 9 Overflow / Summary / Review