Python

Combinatorial Games. Episode 3: Connect-N (Tic-Tac-Toe, Gomoku) OpenAI Gym GUI Env

In last episode, we finished up minimax strategy for Connect-N games, including Tic-Tac-Toe and Gomoku. This episode, we will implement its GUI environment based on Pygame library for human vs. human, AI vs. AI or human vs. AI plays, which is essential for self-play AlphaGo Zero reinforcement learning. The environment is further embedded into OpenAI Gym as it’s the standard in game reinforcement learning. All code in this series is in ConnectNGym github .

Combinatorial Games. Episode 2: Tic-Tac-Toe Problems in Leetcode and Solve the Game using Minimax

This episode extends last one, where Minimax and Alpha Beta Pruning algorithms are introduced. We will solve several tic-tac-toe problems in leetcode, gathering intuition and building blocks for tic-tac-toe game logic, which can be naturally extended to Connect-N game or Gomoku (N=5). Then we solve tic-tac-toe using Minimax and Alpha Beta pruning for small N and analyze their state space. In the following episodes, based on building blocks here, we will implement a Connect-N Open Gym GUI Environment, where we can play against computer visually or compare different computer algorithms.

Combinatorial Games. Episode 1: Minimax and Alpha Beta Pruning in Leetcode

This series, we deal with zero-sum turn-based board game algorithm, a sub type of combinatorial games. We start off with small search space problem, introduce classic algorithms and corresponding combinatorial gaming theory and ultimately end with modern approximating Deep RL techniques. From there, after stepping stone is laid, we are able to learn and appreciate how AlphaGo works. In this first episode, we illustrate 3 classic gaming problems in leetcode and solve them from brute force version to DP version then finally rewrite them using classic gaming algorithms, minimax and alpha beta pruning.

TSP From DP to Deep Learning. Episode 5: Reinforcement Learning

This is fifth episode of series: TSP From DP to Deep Learning. In this episode, we turn to Reinforcement Learning technology, in particular, a model-free policy gradient method that embeds pointer network to learn minimal tour without supervised best tour label in dataset. Full list of this series is listed below. Episode 1: AC TSP on AIZU with recursive DP Episode 2: TSP DP on a Euclidean Dataset Episode 3: Pointer Networks in PyTorch Episode 4: Search for Most Likely Sequence Episode 5: Reinforcement Learning PyTorch Implementation

TSP From DP to Deep Learning. Episode 4: Search for Most Likely Sequence

This is fourth episode of series: TSP From DP to Deep Learning. In this episode, we systematically compare different searching algorithms for finding most likely sequence in the context of simplied markov chain setting. These models can be further utilized in deep learning decoding stage, which will be illustrated in reinforcement learning, in the next episode. Full list of this series is listed below. Episode 1: AC TSP on AIZU with recursive DP Episode 2: TSP DP on a Euclidean Dataset Episode 3: Pointer Networks in PyTorch Episode 4: Search for Most Likely Sequence

TSP From DP to Deep Learning. Episode 3: Pointer Network

This is third episode of series: TSP From DP to Deep Learning. In this episode, we will be entering the realm of deep learning, specifically, a type of sequence-to-sequence called Pointer Networks is introduced. It is tailored to solve problems like TSP or Convex Hull. Full list of this series is listed below. Episode 1: AC TSP on AIZU with recursive DP Episode 2: TSP DP on a Euclidean Dataset Episode 3: Pointer Networks in PyTorch

TSP From DP to Deep Learning. Episode 2: DP on Euclidean Dataset

This is second episode of series: TSP From DP to Deep Learning. Episode 1: AC TSP on AIZU with recursive DP Episode 2: TSP DP on a Euclidean Dataset Episode 3: Pointer Networks in PyTorch Episode 4: Search for Most Likely Sequence Episode 5: Reinforcement Learning PyTorch Implementation AIZU TSP Bottom Up Iterative DP In last episode, we provided a top down recursive DP in Python 3 and Java 8.

TSP From DP to Deep Learning. Episode 1: DP Algorithm

Travelling salesman problem (TSP) is a classic NP hard computer algorithmic problem. In this series, we will first solve TSP problem in an exact manner by ACing TSP on aizu with dynamic programming, and then move on to train a Pointer Network with Pytorch to obtain an approximate solution with deep learning and reinforcement learning technology. Complete episodes are listed as follows: Episode 1: AC TSP on AIZU with recursive DP