課程目錄: 基于樣本的學習方法培訓
4401 人關注
(78637/99817)
課程大綱:

    基于樣本的學習方法培訓

 

 

 

Welcome to the Course!
Welcome to the second course in the Reinforcement Learning Specialization:
Sample-Based Learning Methods, brought to you by the University of Alberta,
Onlea, and Coursera.
In this pre-course module, you'll be introduced to your instructors,
and get a flavour of what the course has in store for you.
Make sure to introduce yourself to your classmates in the "Meet and Greet" section!
Monte Carlo Methods for Prediction & Control
This week you will learn how to estimate value functions and optimal policies,
using only sampled experience from the environment.
This module represents our first step toward incremental learning methods
that learn from the agent’s own interaction with the world,
rather than a model of the world.
You will learn about on-policy and off-policy methods for prediction
and control, using Monte Carlo methods---methods that use sampled returns.
You will also be reintroduced to the exploration problem,
but more generally in RL, beyond bandits.
Temporal Difference Learning Methods for Prediction
This week, you will learn about one of the most fundamental concepts in reinforcement learning:
temporal difference (TD) learning.
TD learning combines some of the features of both Monte Carlo and Dynamic Programming (DP) methods.
TD methods are similar to Monte Carlo methods in that they can learn from the agent’s interaction with the world,
and do not require knowledge of the model.
TD methods are similar to DP methods in that they bootstrap,
and thus can learn online---no waiting until the end of an episode.
You will see how TD can learn more efficiently than Monte Carlo, due to bootstrapping.
For this module, we first focus on TD for prediction, and discuss TD for control in the next module.
This week, you will implement TD to estimate the value function for a fixed policy, in a simulated domain.
Temporal Difference Learning Methods for ControlThis week,
you will learn about using temporal difference learning for control,
as a generalized policy iteration strategy.
You will see three different algorithms based on bootstrapping and Bellman equations for control: Sarsa,
Q-learning and Expected Sarsa. You will see some of the differences between
the methods for on-policy and off-policy control, and that Expected Sarsa is a unified algorithm for both.
You will implement Expected Sarsa and Q-learning, on Cliff World.
Planning, Learning & ActingUp until now,
you might think that learning with and without a model are two distinct,
and in some ways, competing strategies: planning with
Dynamic Programming verses sample-based learning via TD methods.
This week we unify these two strategies with the Dyna architecture.
You will learn how to estimate the model from data and then use this model
to generate hypothetical experience (a bit like dreaming)
to dramatically improve sample efficiency compared to sample-based methods like Q-learning.
In addition, you will learn how to design learning systems that are robust to inaccurate models.

主站蜘蛛池模板: 伊人丁香狠狠色综合久久| 区三区激情福利综合中文字幕在线一区亚洲视频1| 久久综合成人网| 99久久国产综合精品女同图片| 色婷婷久久综合中文久久蜜桃av| 久久综合国产乱子伦精品免费| 欧美日韩国产综合草草| 国产激情电影综合在线看| 欧美日韩综合精品| 婷婷国产天堂久久综合五月| 色爱区综合激情五月综合色| 天天做天天爱天天综合网| 色欲色香天天天综合网站免费| 欧美综合区综合久青草视频| 狠狠色丁香久久婷婷综合_中| 久久婷婷五月综合国产尤物app| 久久久久高潮综合影院| 欧美成人综合视频| 色爱区综合激情五月综合色| 国产人成精品综合欧美成人| 琪琪五月天综合婷婷| 国产色婷婷五月精品综合在线| 色欲老女人人妻综合网| 亚洲综合日韩精品欧美综合区| 亚洲欧美乱综合图片区小说区| 天天影视色香欲综合久久| 亚洲欧美国产日韩综合久久| 久久狠狠色狠狠色综合| 久久综合成人网| 久久综合久久综合久久| 欧美综合图区亚洲综合图区| 在线亚洲97se亚洲综合在线| 一个色综合国产色综合| 五月激情综合网| 激情伊人五月天久久综合| 伊人久久综合精品无码AV专区| 亚洲成A人V欧美综合天堂麻豆| 中文字幕亚洲综合久久2| 亚洲欧美日韩综合二区三区| 欧洲 亚洲 国产图片综合| 色综合伊人色综合网站|