1.Deep Learning Optimization Paper Presentations00:00
2.[Paper: An Investigation into Neural Net Optimization via hessian Eigenvalue Den...05:02
3.Overview05:40
4.Basic Definitions07:15
5.Hessian Computation in Deep Networks08:40
6.Estimating the Smoothed Density11:23
7.Algorithm Sketch13:42
8.Accuracy14:51
9.Let's Train a ResNet-3216:17
10.Experiments: Initialization16:46
11.Experiments: Further Training18:23
12.Experiments: Reducing Learning Rate18:55
13.Experiments: End of the Training19:54
14.Examining the Role of Architecture20:25
15.Experiments: Batch-Normalization20:54
16.BN with Population Statistics21:57
17.Q/A22:57
18.[Paper: Differentiable Linearized ADMM by Guangcan Liu]26:45
19.Background27:02
20.Learning-based Optimization27:38
21.D-LADMM: Differentiable Linearized ADMM29:10
22.Main Assumption30:09
23.Theoretical Result I 30:28
24.Training Approaches 30:58
25.Experiments31:20
26.[Paper: Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architec...31:49
27.Neural Architecture32:09
28.One-Shot Neural Architecture Search32:59
29.Difficulties for Practitioners34:11
30.Contributions35:50
31.Results and Details36:54
32.[Paper: A Quantitative Analysis of the Effect of Batch Normalization on Gradient...37:21
33.Batch Normalization37:26
34.Batch Normalization on Ordinary Least Squares39:30
35.Summary of Theoretical Results41:20
36.[Paper: The Effect of Network Width on Stochastic Gradient Descent and Generaliz...42:38
37.Motivation43:13
38.Main Result43:39
39.The Normalized Noise Scale44:02
40.Rule for Hyperparameter Selection45:17
41.Wider networks require smaller batch sizes45:45
42.Bigger networks perform better due to noise resistance46:14
43.[Paper: AdaGrad Stepsizes-Sharp Convergence Over Nonconvex Landscapes by Xiaoxia...48:25
44.Motivation49:24
45.Theory53:33
46.Practice56:43
47.Practice: Synthetic Data with Linear Regression58:18
48.Practice: ResNet-50 on ImageNet1:00:43
49.Conclusion1:01:30
50.Q/A1:02:13
51.[Paper: Beyond Backprop- Online Alternating Minimization with Auxiliary Variable...1:06:25
52.What's Wrong With Backprop?1:07:29
53.Alternatives: Prior Work1:10:27
54.Our Approach1:11:25
55.Online Alternating Minimization1:11:46
56.Fully Connected Nets1:11:56
57.Faster Initial Learning -Potential use as a good Init?1:12:11
58.Summary: Contributions1:12:24
59.[Paper: SWALP-Stochastic Weight Averaging in Low-Precision Training by Tianyi Zh...1:12:45
60.Low-precision Computation1:13:04
61.Problem Statement1:13:15
62.Low-precision Training1:13:44
63.Low-precision SGD1:13:58
64.Weight Averaging1:14:17
65.SWALP1:14:38
66.Convergence Analysis1:15:22
67.Experiments1:16:41
68.Poster @ Pacific Ballroom #581:17:28
69.[Paper: Efficient Optimization of Loops and Limits with Randomized Telescoping S...1:17:58
70.Motivation1:18:12
71.Randomized Telescopes: Unbiased Estimation of Limits1:19:41
72.Demonstration1:20:35
73.Visit the Poster for...1:21:54
74.[Paper: Self-similar Epochs-Value in arrangement by Eliav Buchnik]1:22:59
75.Arrangement methods of training examples for
Stochastic Gradient Descent(SGD)1:23:23
76.Test case - matrix factorization1:24:19
77.Motivation: Identical rows1:25:21
78.Self-similar epochs1:26:25
79.Properties of our arrangement method1:27:00
80.Thank you!1:27:40