Month: September 2015

Batch Gradient Descent of Linear Regression (Python)

Originated from Coursera Machine Learning, week 2.
Code at https://github.com/MyEncyclopedia/AlgImpl/tree/master/02_Machine_Learning/Linear_Regression

Hypothesis

Vector `X=[1, x_1, x_2, x_3, …, x_n]` is input data

Linear hypothesis is

`h_theta(x) = theta_0 + theta_1(x_1) + theta_2(x_2) + … + theta_n(x_n)`

Define cost function to be

`J(theta)=1/(2m)Sigma_(i=1)^m(h_theta(x^((i)))-y^((i)))^2`

The task is to find `theta` that minimizes `J`
Gradient descent method is to find converged `theta` iteratively.
The partial derivative of `J(theta)` is `1/(2m)Sigma_(i=1)^m(h_theta(x^((i)))-y^((i)))*x_(j)^((i))`

Therefore, iteration equation is
`theta_j=theta_j – alpha*1/(2m)Sigma_(i=1)^m(h_theta(x^((i)))-y^((i)))*x_(j)^((i))`

Feature Scaling

Get every feature into approximately [-1, 1] range.
`x_i = (x_i-mu_i)/ delta_i` where `mu_i` is mean of `i`-th feature and `delta_i` is standard deviation of `i`-th feature.

Python Code

import numpy as np

class GradientDescent:
    "Batch Gradient Descent method of linear regression"
    alpha = 0.01
    step = 100

    def __init__(self, alpha=0.01, step=100):
        self.alpha = alpha
        self.step = step

    def distance(self, v1, v2):
        v = v1 - v2
        v = v * v
        return np.sum(v)


    def h(self, X, theta):
        n = len(theta)
        return np.dot(theta, X)


    def cost(self, X):
        c = np.sum(X*X)
        m = len(X)
        c = 1.0/(2*m) * c
        return c


    def featureNormalize(self, X):
        "Get every feature into approximately [-1, 1] range."
        featureN = len(X)
        mu = np.mean(X[0:featureN + 1], axis=1, dtype=np.float64)
        sigma = np.std(X[0:featureN + 1], axis=1, dtype=np.float64)
        X_norm = X
        for i in range(featureN):
            X_norm[i] = (X[i]-mu[i])/sigma[i]
        return X_norm, mu, sigma

    def iterate(self, data, toNormalize=False):
        m = len(data)
        data = np.transpose(data)
        featureN = len(data) - 1
        Y = data[featureN]  # shape 1 * m
        X = data[0:featureN]  # shape N * m
        if toNormalize:
            X, mu, sigma = self.featureNormalize(X)
        X = np.vstack((np.ones(m), X))  # shape (N+1) * m

        theta = np.zeros(featureN + 1, dtype=np.float64)

        for iter in range(self.step):
            theta_temp = np.zeros(featureN + 1, dtype=np.float64)
            V = self.h(X, theta)
            V = V - Y
            costValue = self.cost(V)

            for i in range(featureN+1):
                theta_temp[i] = np.sum(V * X[i])
            theta_temp *= self.alpha / m
            theta_temp = theta - theta_temp
            # dist = distance(theta, theta_temp)
            theta = theta_temp
        return theta

The result running the code above with test data (in homework of course Machine Learning) is
ex1data1.txt: `theta=[-3.63029144, 1.16636235]`
ex1data2.txt: `theta=[ 338658.2492493, 103322.82942954, -474.74249522]`

PageRank algorithm using matrix power iteration (Python)

Algorithm Origin: Coursera: Mining of Massive Datasets week 1
Code is at https://github.com/MyEncyclopedia/AlgImpl/tree/master/01_Mining_of_Massive_Datasets/PageRank

Power Iteration Method

  1. Init: `r^((0))=[1/N, …1/N]^T`
  2. Iterate: `r^((t+1))=M * r^((t))`
  3. Stop when `|r^((t+1)) – r^((t))| < epsilon`

where `M_(ji)=1/d_i` if `i` points to `j`, else `M_(ji)=0`

Naive Implementation

import numpy as np

class PageRank_NoTeleport:
    "Power iteration but does not address Spider trap problem or Dead end problem "
    epsilon = 0.0001

    def __init__(self, beta=0.85, epsilon=0.0001):
        self.epsilon = epsilon

    def distance(self, v1, v2):
        v = v1 - v2
        v = v * v
        return np.sum(v)

    def compute(self, G):
        "G is N*N matrix where if j links to i then G[i][j]==1, else G[i][j]==0"
        N = len(G)
        d = np.zeros(N)
        for i in range(N):
            for j in range(N):
                if (G[j, i] == 1):
                    d[i] += 1

        r0 = np.zeros(N, dtype=np.float32) + 1.0 / N
        # construct stochastic M
        M = np.zeros((N, N), dtype=np.float32)
        for i in range(N):
            for j in range(N):
                if G[j, i] == 1:
                    M[j, i] = 1.0 / d[i]
        while True:
            r1 = np.dot(M, r0)
            dist = self.distance(r1, r0)
            if dist < self.epsilon:
                break
            else:
                r0 = r1

        return r1

The naive version solves normal matrix M like the following graph
regular
in case that
`G=((0,1,1),(1,1,0),(1,0,0)) => M=((0,0.5,1),(0.5,0.5,0),(0.5,0,0))`
The result is [0.40293884, 0.39887747, 0.1981837]

But cannot address Dead Ends problem (see picture below) or Spider trap problem
DeadEnds
in case that
`G=((0,1,0),(1,1,0),(1,0,0)) => M=((0,0.5,0),(0.5,0.5,0),(0.5,0,0))`
With the naive implementation, we get wrong result, [0.01896159, 0.03068034, 0.01171875]

Teleport Version

The Google Matrix `A`: `A=beta*M + (1-beta)*(1/n) e * e^T` where `e` is vector of all 1s.
The iteration becomes `r^(t+1) = A * r^t`

import numpy as np

class PageRank_Matrix:
    "Power iteration with random teleports that addresses Spider trap problem or Dead end problem "
    beta = 0.85
    epsilon = 0.0001

    def __init__(self, beta=0.85, epsilon=0.0001):
        self.beta = beta
        self.epsilon = epsilon

    def distance(self, v1, v2):
        v = v1 - v2
        v = v * v
        return np.sum(v)

    def compute(self, G):
        "G is N*N matrix where if j links to i then G[i][j]==1, else G[i][j]==0"
        N = len(G)
        d = np.zeros(N)
        for i in range(N):
            for j in range(N):
                if (G[j, i] == 1):
                    d[i] += 1
            if d[i]==0:   # i is dead end, teleport always
                d[i] = N

        r0 = np.zeros(N, dtype=np.float32) + 1.0 / N
        # construct stochastic M
        M = np.zeros((N, N), dtype=np.float32)
        for i in range(N):
            if (d[i]==N):  # i is dead end
                for j in range(N):
                    M[j, i] = 1.0 / d[i]
            else:
                for j in range(N):
                    if G[j, i] == 1:
                        M[j, i] = 1.0 / d[i]

        T = (1.0 - self.beta) * (1.0 / N) * (np.zeros((N, N), dtype=np.float32) + 1.0)
        A = self.beta * M +  T
        while True:
            r1 = np.dot(A, r0)
            dist = self.distance(r1, r0)
            if dist < self.epsilon:
                break
            else:
                r0 = r1

        return r1

In this version, `beta` is introduced to reflect the possibility of teleporting. In addition, we preprocessed matrix `M` to make it stochastic for dead end nodes. The result is [0.30623639, 0.43920633, 0.25455755].

Make ME-Mydoc unlimited Evernote

ME-Mydoc is a desktop application that is able to run on Windows, Linux and Mac.  Compared to other personal knowledge management app such as Evernote, it does not have intrinsic support for online data sharing.  So are we restricted to single machine usage?  Of course, no!  Let’s turn it into an online data sharing application!

  1. Install Google Drive/SkyDrive/Dropbox/BaiduYun Sync Drive (actually, you can install any online storage with local directory mapping support)
  2. Install ME-Mydoc and configure it such that all data paths point to the local folder mapped by the online storage app. See picture below.
    2015-09-18_050243
  3. Yes! You can now use ME-Mydoc to visit online data from any Windows, Mac or Linux as long as the online storage is available on that platform.
  4. Note that you must pause synchronization when you are using ME-Mydoc to avoid data corruption.

Powered by WordPress & Theme by Anders Norén