ATCS – Learning and Prediction

ATCS – Selected Topics in Learning, Prediction, and Optimization (with applications in Finance)

2021 Spring

Lecturer: Jian Li ( lapordge at gmail dot com)

TA: Yitong Duan, Liang Zeng

time: every Monday 9:50am-12:15am

Room: 四教4303

We intend to cover a subset of the following topics (tentative):

(1) I assume you already know all basics (convex optimization and machine learning, stochastic gradient descent, gradient boosting, deep learning basics, CNN, RNN, please see my undergrad course). If you don't know much machine learning (e.g., you do not know how to derive the dual of SVM yet), please do NOT take this course. I will recall some concepts briefly when necessary.

(2) online learning/optimization, multi-armed bandit, statistical learning theory, theory of deep learning

I won't stickly follow the above order....I may skip something mentioned above and cover something not mentioned above...It is a graduate course.

I will be talking about several applications of ML and optimization in Finance (trading, pricing derivatives etc), and of course in typical CS areas like vision, nlp, social networks as well...

I will teach about 2/3 of the classes. For the rest, I will choose some topics and students need to do class presentation.

Tentative topics for class presentation: generative models (GAN), adverserial learning and robustness, unsupervised learning (co-training, pseudolabeling, contrastive learning), meta-learning, AutoML, various financial applications.

Some knowledge about convex optimization may be useful. See this course (by S. Boyd) and a previous course by myself. But it will be fine if you didn't take those courses. Basic machine learning knowledge is a must.
Andrew Ng's undergrad lecture notes

The course is a blending of theory and practice. We will cover both the underlying mathematics as well as interesting heuristics.

Grading:

Homeworks (35pts, 1 homework every two or three weeks. We will have 2 coding assignments for this semester)
5 pts for taking notes: Each student should take notes for at least one lecture (maybe two), using LaTex (use this template sample.tex algorithm2e.sty).
5pts for student presentation: The last 3-4 classes will be student presentations. Each student needs to present for around 30 min. I will provide some candidate topics/papers.
Course projects (55 pts. 10pts for mid-report, 10pts for final presentation, 35pts for final report)
No exam.

Schedule:

1.Feb 22	Basics of Convex optimization Strongly Convexity, Smoothness Convergence analysis of Gradient Descent Basics of Online learning, regret Online gradient descent	[Book] Convex Optimization, Chapter 9.1-9.3 Online convex programming and generalized infinitesimal gradient ascent. M. Zinkevich
2.Mar 1	The expert problem Hedge Algorithm (or Multiplicative weights method) Second order method: Squint Small loss bound, Quantile bound, Stochastic Setting Basics of Stock, future markets-1	Multiplicative weights method: a meta-algorithm and its applications. (A survey) Sanjeev Arora, Elad Hazan, and Satyen Kale. [pdf] The method can be used to solve approximately the zero-sum game and linear program, and is also closely related to Adaboost. Our exposition of expert problem follows this note The Squint paper by Koolen and Van Erven Our exposition follows this note
3.Mar 8	Some concepts in convex optimization: Fenchel conjugate, Bregman divergence Mirror Descent Online Mirror Descent Basics of Stock, future markets-2	Online mirror descent (see this note or this(chapter 5)) A proof of Pinsker's inequality (see this book) (note that this is equivalent to saying that negative entropy is strongly convex w.r.t. L1) Options, Futures and Other Derivatives Chapter 3.
4.Mar 15	Online Mirror Descent, Follow the Regularized Leader Adversarial Bandit (EXP3) Upper Confidence Bound	Finite-time Analysis of the Multiarmed Bandit Problem The non-stochastic multi-armed bandit problem Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems Selected Reading: OMD (this survey chapter 5)
5.Mar 22	Conjugate Prior Thompson Sampling Median Elimination Contextual Bandit (EXP4) eps-Greedy (Langford-Zhang)	A Tutorial on Thompson Sampling The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits Taming the monster: A fast and simple algorithm for contextual bandits
6.Mar 29	Gaussian process Bayesian optimization a sketch of regret analysis of GP-UCB Brownian Motion, Ito integral	Practical Bayesian Optimization of Machine Learning Algorithms Gaussian process optimization in the bandit setting: No regret and experimental design Selected reading: [Book] Gaussian process for machine learning
7.Apr 5	Holiday, no class
8.Apr 12	Ito's caculus. Introduction to options Binomial tree, Delta-hedge	Corresponding captures in [Book] Options, Futures and Other Derivatives Optional reading: A rigorous and through treatment of SDE (including construction of Brownian motion): [Book]Stochastic Calculus, Filtering, and Stochastic Control
9.Apr 19	BSM formula, BS formula for European call Risk neutral valution. Volatility Smile Feymann-Kac, Fokker Planck, Langevin Dynamics	Corresponding captures in [Book] Options, Futures and Other Derivatives 　 SDE and PDE: [Book]Arbitrage Theory in Continuous Time, Chapter 5 Optional reading: Fokker Planck Equation
10.Apr 26	Maxima of Gaussian Process. Subgaussian process Dudley theorem (chaining) Symmetrization, Rademacher complexity	[Book] Probability in High Dimension, Chapter 5 [Book] Foundation of Machine Learning, Chapter 3 Advanced Reading:Talagrand's Generic Chaining, [Book] Probability in High Dimension, Chapter 6
11.May 6	VC-dimension, Pseudo-dimension, Fat-Shattering dimension Algorithm Stability, generalization of GD (convex, strongly convex)	[Book] Probability in High Dimension, Chapter 5 Train faster, generalize better: Stability of stochastic gradient descent
12.May 10	PAC-Bayesian framework generalization of SGLD Generalization based on Pac-bayesian	Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data Generalization Bounds of SGLD for Non-convex Learning On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning
13.May 17	Learnability of convex functions. a brief survey of convex optimization (in particular finite sum form) RKHS	Convex parts can be found in [Book] Understanding Machine Learning: From Theory to Algorithms A Primer on Reproducing Kernel Hilbert Spaces
14.May 24	RKHS, Universal Kernel Max Mean Discrepancy (MMD) Equivalence between KRR and Gaussian process regression Integral operator local rademacher complexity	A Primer on Reproducing Kernel Hilbert Spaces Universality, characteristic kernels and RKHS embedding of measures A Kernel Two-Sample Test Rademacher and Gaussian complexities: Risk bounds and structural results On Learning with Integral Operators Learning Kernels Using Local Rademacher Complexity To understand deep learning we need to understand kernel learning
15.May 31	Margin related semisupervised learning

References:

[Book] Introduction to online convex optimization

[Book] Learning, Prediction and Games

[Book] Options, Futures and Other Derivatives

[Book] Advances in Financial Machine Learning

[Book] Convex Optimization

[Book] Foundation of Machine Learning

[Book] Probability in High Dimension

[Book] Understanding Machine Learning: From Theory to Algorithms

Lecture notes for STAT928: Statistical Learning and Sequential Prediction

Python is the default programming language we will use in the course.

If you haven't use it before, don't worry. It is very easy to learn (if you know any other programming language), and is a very efficient language, especially

for prototyping things related scientific computing and numerical optimization. Python codes are usually much shorter than C/C++ code (the lauguage has done a lot for you). It is also more flexible and generally faster than matlab.

A standard combination for this class is Python+numpy (a numeric lib for python)+scipy (a scientific computing lib for python)+matplotlab (for generating nice plots)

Another somewhat easier way is to install Anaconda (it is a free Python distribution with most popular packages).

Finite-time Analysis of the Multiarmed Bandit Problem

The non-stochastic multi-armed bandit problem

A Tutorial on Thompson Sampling

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits

Taming the monster: A fast and simple algorithm for contextual bandits

[Book]Stochastic Calculus, Filtering, and Stochastic Control

A Primer on Reproducing Kernel Hilbert Spaces

To understand deep learning we need to understand kernel learning

[Book] Probability in High Dimension

Lecture notes for STAT928: Statistical Learning and Sequential Prediction