CHAPTERS
1.Live from ICML 2019 in Long Beach, this session on Deep Reinforcement Learning i...00:00
2.[Talk: Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforceme...04:05
3.Intrinsic motivation for RL05:56
4.What else motivates people?06:56
5.Social learning is incredibly important07:20
6.Learn socially, train independently07:45
7.Influence via Modeling Other Agents (MOA)09:08
8.Marginalize out effect of agent A on agent10:03
9.Computing causal influence10:32
10.Relationship to Mutual Information11:22
11.Test in Sequential Social Dilemmas (SSDs)12:38
12.Cooperation is hard13:29
13.Results - Modeling Other Agents14:25
14.But wait, how does it work?15:17
15.A moment of high influence...15:48
16.Emergent communication via influence16:35
17.Training communication via influence16:50
18.Previous work in emergent communication18:07
19.Results - Communication analysis18:31
20.Results - Influence for communication18:52
21.Being influenced gets you higher reward19:02
22.Influence in human communication19:43
23.All results, top 5 settings, 5 random seeds each20:17
24.Discussion20:35
25.Conclusions21:33
26.Live from ICML 2019 in Long Beach, this session on Deep Reinforcement Learning i...25:05
27.[Talk: Maximum Entropy-Regularized Multi-Goal Reinforcement Learning]25:48
28.Motivation26:22
29.Contributions26:48
30.A Model Multi-Goal RL Objective Based on Weighted Entropy27:22
31.A Safe Surrogate Objective27:57
32.Maximum Entropy-based Prioritization (MEP)28:08
33.Entropy of achieved goals versus and training epoch29:30
34.Summary and Take-home Message29:49
35.[Talk: Imitating Latent Policies from Observation]30:34
36.Introduction31:06
37.Approach32:18
38.Experiments: Classic Control34:37
39.Experiments: CoinRun35:05
40.Thank You!35:17
41.[Talk: SOLAR: Deep Structured Representations for Model-Based Reinforcement Lear...35:38
42.Efficient reinforcement learning from images36:04
43.Preliminary: LQR-FLM37:23
44.Our method: SOLAR38:30
45.Real robot results39:25
46.Thank you40:21
47.[Talk: Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient R...40:37
48.Contributions41:10
49.Proximal Policy Optimization (PPO)41:39
50.The Vanishing Gradient Problem41:57
51.Dimension-Wise Clipping42:26
52.Off-Policy Generalization43:02
53.Evaluation43:35
54.Conclusion44:18
55.Thank you44:35
56.[Talk: Structured agents for physical construction46:22
57.Humans are a "Construction Species"46:27
58.Contributions47:13
59.Construction Tasks47:33
60.Action Format: Absolute vs. Relative50:16
61.Background: Graph Networks52:03
62.Graph Network Agent (GN-DQN)53:27
63.Baseline Architectures54:41
64.Key Questions55:00
65.What is the contribution of relative vs. absolute actions?55:48
66.What is the contribution of structured representations?56:33
67.Silhouette57:16
68.Connecting57:58
69.Covering58:51
70.What is the contribution of planning?59:35
71.Covering Hard1:01:43
72.Additional Results: Generalization1:02:24
73.Key Questions1:03:28
74.Live from ICML 2019 in Long Beach,1:07:11
75.[Talk: Learning Novel Policies For Tasks1:07:46
76.Motivation1:07:55
77.Key Aspects1:08:28
78.Method Overview1:08:57
79.Novelty Measurement1:09:21
80.Task-Novelty Bisector (TNB)1:09:43
81.Multiple Solutions1:10:39
82.Deceptive Reward Problems1:11:01
83.Poster: Pacific Ballroom #371:11:38
84.[Talk: Taming MAML: Efficient Unbiased Meta-Reinforcement Learning]1:11:54
85.Problematic Gradient Estimation in MAML1:12:32
86.Computational Efficient Solution: TMAML1:13:44
87.TMAML reduced meta-gradient variance and improve performance1:15:16
88.TMAML outperforms existing methods on most of meta reinforcement learning tasks1:15:53
89.[Talk: Self-Supervised Exploration via Disagreement]1:16:28
90.Exploration - a major challenge!1:16:49
91.Sample Inefficient1:17:14
92."Stuck" in Stochastic Envs1:17:21
93.Why inefficient?1:17:28
94.Environment is "black-box"1:18:16
95.Deterministic Environments1:19:06
96.Stochastic Environments: 3D Navigation1:19:38
97.Differentiable Exploration1:20:11
98.Summary: Exploration via Disagreement1:20:54
99.Code Available1:21:10
100.[Talk: PEARL: Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic...1:21:32
101.Meta-Reinforcement Learning1:22:25
102.Meta-RL Experimental Domains1:23:08
103.Disentangle task inference from control1:24:27
104.Off-Policy Meta-Training1:25:05
105.Efficient exploration by posterior sampling1:25:35
106.Posterior sampling in action1:26:00
CHAPTERS
Powered byVideoKen
MORE
ShareCOPIED
Share Topic...
×
Search in Video
Feedback
Powered byVideoKen
SEARCH IN VIDEO
Transcript is not available for this video
Powered byVideoKen
FEEDBACK
Love it, will recommend
Good, but won't recommend
Bad, needs a lot of improvement
Submit Feedback
Powered byVideoKen