Reinforcement Learning function approximation with Neural Networks

Question

I'm trying to implement the Episodic Semi-gradient Sarsa for estimating q* with a Neural Network as a function approximator. My question is: does the weight vector w in q(S, A, w) refer to the weights in the Neural Network?

See: Sutton and Barto page 197/198 for a concrete algorithm.

If yes: then how to deal with the fact that there are multiple weight vectors in a multilayer Neural Network?

If no: How would I use it in the algorithm? My suggestion would be to append it to the state s and action a and plug it into the Neural Network to get an approximation of the state with the chosen action. Is this correct?

How is the dimension of the weight vector w determined?

Thanks in advance!

It is unclear where this q(S, A, w) in your question comes from. I suppose some paper or book? Please provide a link / page numbers etc. — Dennis Soemers
– Dennis Soemers, Commented Mar 28, 2018 at 11:35

Dennis Soemers · Accepted Answer · 2018-03-28 13:07:07Z

The w in the pseudocode does not strictly have to be just a single weight vector. The text in the beginning of the chapter does refer to w as a "weight vector" a couple of times, but the pseudocode itself only mentions that w are the parameters of a differentiable action-value function approximator. A Neural Network perfectly fits that description.

In the case of a Neural Network, you can think of w as the combination of all weight matrices (alternatively; you can view it as a really really long vector constructed by unrolling all of the weight matrices into a single vector). You can view the lines of pseudocode performing the update on w as regular backpropagation in Neural Networks, optimizing all the parameters w to make the prediction q(S, A, w) slightly closer to R + gamma*q(S', A', w).

That single line of pseudocode basically summarizes the entire backpropagation procedure in the case where w is a huge vector consisting of unrolled weight matrices of a Neural Network. In practice, it cannot be implemented in a single line of code, because partial derivatives of earlier layers of the network (components of that gradients-of-q vector) depend on partial derivatives in layers closer to the output layer, so those have to be computed sequentially (which is what backpropagation as you know it if you're familiar with Neural Networks does).

Collectives™ on Stack Overflow

Reinforcement Learning function approximation with Neural Networks

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related