課程目錄: 基于函數(shù)逼近的預(yù)測(cè)與控制培訓(xùn)
4401 人關(guān)注
(78637/99817)
課程大綱:

    基于函數(shù)逼近的預(yù)測(cè)與控制培訓(xùn)

 

 

 

Welcome to the Course!

Welcome to the third course in the Reinforcement Learning Specialization:

Prediction and Control with Function Approximation, brought to you by the University of Alberta,

Onlea, and Coursera.

In this pre-course module, you'll be introduced to your instructors,

and get a flavour of what the course has in store for you.

Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

On-policy Prediction with Approximation

This week you will learn how to estimate a value function for a given policy,

when the number of states is much larger than the memory available to the agent.

You will learn how to specify a parametric form of the value function,

how to specify an objective function, and how estimating gradient descent can be used to estimate values from interaction with the world.

Constructing Features for Prediction

The features used to construct the agent’s value estimates are perhaps the most crucial part of a successful learning system.

In this module we discuss two basic strategies for constructing features: (1) fixed basis that form an exhaustive partition of the input,

and (2) adapting the features while the agent interacts with the world via Neural Networks and Backpropagation.

In this week’s graded assessment you will solve a simple but infinite state prediction task with a Neural Network and

TD learning.Control with ApproximationThis week,

you will see that the concepts and tools introduced in modules two and three allow straightforward extension of classic

TD control methods to the function approximation setting. In particular,

you will learn how to find the optimal policy in infinite-state MDPs by simply combining semi-gradient

TD methods with generalized policy iteration, yielding classic control methods like Q-learning, and Sarsa.

We conclude with a discussion of a new problem formulation for RL---average reward---which will undoubtedly

be used in many applications of RL in the future.

Policy GradientEvery algorithm you have learned about so far estimates

a value function as an intermediate step towards the goal of finding an optimal policy.

An alternative strategy is to directly learn the parameters of the policy.

This week you will learn about these policy gradient methods, and their advantages over value-function based methods.

You will also learn how policy gradient methods can be used

to find the optimal policy in tasks with both continuous state and action spaces.

主站蜘蛛池模板: 亚洲综合国产精品第一页| 婷婷色中文字幕综合在线| 亚洲色欲久久久综合网| 久久综合九色综合久99| 色综合久久久久无码专区| 91精品国产综合久久四虎久久无码一级| 93精91精品国产综合久久香蕉| 久久综合偷偷噜噜噜色| 婷婷丁香五月天综合东京热| 狠狠色综合久久久久尤物| 天天操天天干天天综合网| 一本大道加勒比久久综合| 欧美综合图区亚欧综合图区| 色诱久久久久综合网ywww| 色综合综合色综合色综合| 色综合合久久天天综合绕视看| 欧美综合图区亚欧综合图区| 综合久久精品色| 婷婷综合久久中文字幕蜜桃三电影| 精品综合久久久久久97| 国产香蕉久久精品综合网| 精品综合久久久久久97| 国产色综合天天综合网| 一本久久知道综合久久| 一本久道久久综合狠狠躁AV| 亚洲VA欧美va国产va综合| 色青青草原桃花久久综合| 综合久久给合久久狠狠狠97色| 狠狠色综合色区| 青青热久久综合网伊人| 久久婷婷色香五月综合激情| 国产激情电影综合在线看| 一本色道久久88—综合亚洲精品| 亚洲va欧美va国产综合| 乱欧美综合| 色爱无码AV综合区| 狠狠色综合久色aⅴ网站| 国产精品亚洲综合专区片高清久久久| 亚洲综合国产一区二区三区| 国产亚洲综合成人91精品| 久久综合色之久久综合|