I'm trying to implement the Episodic Semi-gradient Sarsa for estimating q* with a Neural Network as a function approximator. My question is: does the weight vector w in q(S, A, w) refer to the weights in the Neural Network?
See: Sutton and Barto page 197/198 for a concrete algorithm.
If yes: then how to deal with the fact that there are multiple weight vectors in a multilayer Neural Network?
If no: How would I use it in the algorithm? My suggestion would be to append it to the state s and action a and plug it into the Neural Network to get an approximation of the state with the chosen action. Is this correct?
How is the dimension of the weight vector w determined?
Thanks in advance!
q(S, A, w)in your question comes from. I suppose some paper or book? Please provide a link / page numbers etc.