Originated from Coursera Machine Learning, week 2.

Code at https://github.com/MyEncyclopedia/AlgImpl/tree/master/02_Machine_Learning/Linear_Regression

# Hypothesis

Vector `X=[1, x_1, x_2, x_3, …, x_n]` is input data

Linear hypothesis is

`h_theta(x) = theta_0 + theta_1(x_1) + theta_2(x_2) + … + theta_n(x_n)`

Define cost function to be

`J(theta)=1/(2m)Sigma_(i=1)^m(h_theta(x^((i)))-y^((i)))^2`

The task is to find `theta` that minimizes `J`

Gradient descent method is to find converged `theta` iteratively.

The partial derivative of `J(theta)` is `1/(2m)Sigma_(i=1)^m(h_theta(x^((i)))-y^((i)))*x_(j)^((i))`

Therefore, iteration equation is

`theta_j=theta_j – alpha*1/(2m)Sigma_(i=1)^m(h_theta(x^((i)))-y^((i)))*x_(j)^((i))`

# Feature Scaling

Get every feature into approximately [-1, 1] range.

`x_i = (x_i-mu_i)/ delta_i` where `mu_i` is mean of `i`-th feature and `delta_i` is standard deviation of `i`-th feature.

# Python Code

import numpy as np class GradientDescent: "Batch Gradient Descent method of linear regression" alpha = 0.01 step = 100 def __init__(self, alpha=0.01, step=100): self.alpha = alpha self.step = step def distance(self, v1, v2): v = v1 - v2 v = v * v return np.sum(v) def h(self, X, theta): n = len(theta) return np.dot(theta, X) def cost(self, X): c = np.sum(X*X) m = len(X) c = 1.0/(2*m) * c return c def featureNormalize(self, X): "Get every feature into approximately [-1, 1] range." featureN = len(X) mu = np.mean(X[0:featureN + 1], axis=1, dtype=np.float64) sigma = np.std(X[0:featureN + 1], axis=1, dtype=np.float64) X_norm = X for i in range(featureN): X_norm[i] = (X[i]-mu[i])/sigma[i] return X_norm, mu, sigma def iterate(self, data, toNormalize=False): m = len(data) data = np.transpose(data) featureN = len(data) - 1 Y = data[featureN] # shape 1 * m X = data[0:featureN] # shape N * m if toNormalize: X, mu, sigma = self.featureNormalize(X) X = np.vstack((np.ones(m), X)) # shape (N+1) * m theta = np.zeros(featureN + 1, dtype=np.float64) for iter in range(self.step): theta_temp = np.zeros(featureN + 1, dtype=np.float64) V = self.h(X, theta) V = V - Y costValue = self.cost(V) for i in range(featureN+1): theta_temp[i] = np.sum(V * X[i]) theta_temp *= self.alpha / m theta_temp = theta - theta_temp # dist = distance(theta, theta_temp) theta = theta_temp return theta

The result running the code above with test data (in homework of course Machine Learning) is

ex1data1.txt: `theta=[-3.63029144, 1.16636235]`

ex1data2.txt: `theta=[ 338658.2492493, 103322.82942954, -474.74249522]`