在线性回归中,我们做出以下假设:
y = b + w1*x1 + w2*x2 + ... + wn*xn + e
其中,
线性回归的目标是找到最优的参数权重和截距项,使得预测值与真实值之间的残差(误差)尽可能小。通常采用最小二乘法来定义目标函数,即最小化残差的平方和(平方损失函数):
Loss = Σ(y_pred - y_actual)^2 / n
其中,
参数估计:
训练过程:
预测:
优点和限制:
线性回归具有以下优点:
然而,线性回归也有一些限制:
总结:
import numpy as npdef compute_error_for_line_given_points(b,w,points):totalError=0for i in range(0,len(points)):x=points[i,0]y=points[i,1]totalError+=(y-(w*x+b))**2return totalError/float(len(points))def step_gradient(b_current,w_current,points,learningRate):b_gradient=0w_gradient=0N=float(len(points))for i in range(0,len(points)):x = points[i, 0]y = points[i, 1]b_gradient+=(2/N)*((w_current*x+b_current)-y)w_gradient+=(2/N)*x*((w_current * x + b_current) - y)new_b=b_current-(learningRate*b_gradient)new_w=w_current-(learningRate*w_gradient)return [new_b,new_w]def gradient_descent_runner(points,starting_b,starting_w,learning_rate,num_iterations):b=starting_bw=starting_wfor i in range(num_iterations):b,w=step_gradient(b,w,points,learning_rate)return [b,w]def run():# points=np.genfromtxt(r"D:\pycharm pjt\1.csv",delimiter=",")points=[[0,0],[1,1],[2,2],[3,3]]learning_rate=0.0001initial_b=0initial_w=0num_iterations=1000print("Starting gradient descent at b={0},w={1},errer={2}".format(initial_b,initial_w,compute_error_for_line_given_points(initial_b,initial_w,np.array(points))))print("Running")[b,w]=gradient_descent_runner(np.array(points),initial_b,initial_w,learning_rate,num_iterations)print("After {0} iterations b={1},w={2},errer={3}".format(num_iterations,b, w, compute_error_for_line_given_points(b, w, np.array(points))))if __name__=="__main__":run()
Starting gradient descent at b=0,w=0,errer=3.5
Running
After 1000 iterations b=0.19624549351438988,w=0.47674666586859404,errer=0.688733148234418
基于Tensorflow的线性回归:
import tensorflow as tfimport numpy as np# 定义输入特征和目标变量x_train = np.array([1, 2, 3, 4, 5], dtype=np.float32)y_train = np.array([2, 4, 6, 8, 10], dtype=np.float32)# 初始化权重和偏置项w = tf.Variable(0.0)b = tf.Variable(0.0)# 定义线性回归模型def linear_regression(x):return w * x + b# 定义损失函数def loss_fn(y_true, y_pred):return tf.reduce_mean(tf.square(y_true - y_pred))# 设置优化器optimizer = tf.optimizers.SGD(learning_rate=0.01)# 定义训练函数def train_step(features, labels):with tf.GradientTape() as tape:predictions = linear_regression(features)loss_value = loss_fn(labels, predictions)gradients = tape.gradient(loss_value, [w, b])optimizer.apply_gradients(zip(gradients, [w, b]))# 模型训练for epoch in range(1000):train_step(x_train, y_train)# 打印训练结果print("Weight:", w.numpy())print("Bias:", b.numpy())# 进行预测x_test = np.array([6, 7, 8], dtype=np.float32)y_pred = linear_regression(x_test)print("Predictions:", y_pred.numpy())
想了解更多
赶紧扫码关注

