Policy Gradient

TSP From DP to Deep Learning. Episode 5: Reinforcement Learning

This is fifth episode of series: TSP From DP to Deep Learning. In this episode, we turn to Reinforcement Learning technology, in particular, a model-free policy gradient method that embeds pointer network to learn minimal tour without supervised best tour label in dataset. Full list of this series is listed below. Episode 1: AC TSP on AIZU with recursive DP Episode 2: TSP DP on a Euclidean Dataset Episode 3: Pointer Networks in PyTorch Episode 4: Search for Most Likely Sequence Episode 5: Reinforcement Learning PyTorch Implementation