function with unknown: Policy Network (Actor)
- Input of neural network: the observation of machine represented as a vector or a matrix
- Output neural network: each action corresponds to a neuron on output layer
- 分數總和為 1
- Policy Network 要自己設計
- action 取決於取得的分數,常見的方法是使用機率,採取 Sample 的好處是同樣的畫面機器每次採取的行動也會略有不同