minor tweaks in intro

This commit is contained in:
NT
2021-08-28 16:52:40 +02:00
parent e24d1b3ecc
commit 8b71d57e05
3 changed files with 25 additions and 25 deletions

View File

@@ -19,7 +19,7 @@ In its simplest form, the learning goal for reinforcement learning tasks can be
$$
\text{arg max}_{\theta} \mathbb{E}_{a \sim \pi(;s,\theta_p)} \big[ \sum_t r_t \big],
$$ (learn-l2)
$$ (rl-learn-l2)
where the reward at time $t$ (denoted by $r_t$ above) is the result of an action $a$ performed by an agent.
The agents choose their actions based on a neural network policy which decides via a set of given observations.