References：

非线性支持向量机（SVM）解析 | 统计学习方法学习笔记 | 数据分析 | 机器学习

核方法、核技巧和核函数

【引入】

软隔支持向量机是用来解决训练集近似线性可分情况的二分类模型，但其无法处理线性不可分的情况

为解决该问题，在软间隔支持向量机的基础上引入了核方法，通过一个非线性变换将输入空间对应于一个特征空间，从而使得在输入空间中的超曲面模型对应于特征空间中的超平面模型

这样，分类问题的学习任务通过在特征空间中求解软间隔支持向量机即可完成

【假设形式】

非线性支持向量机（Non-linear Support Vector Machine）是将软间隔支持向量机的输入空间通过映射函数 $\phi(\mathbf{x})$ 变换到一个新的特征空间，并将输入空间的内积 $\mathbf{x}_i\cdot\mathbf{x}_j$ 变换为特征空间中的内积 $\phi(\mathbf{x}_i)\cdot\phi(\mathbf{x}_j)$，在新的特征空间中从训练样本中学习软间隔支持向量机

对于容量为 $n$ 的线性不可分的训练集 $D=\{(\mathbf{x}_1,y_1),(\mathbf{x}_2,y_2),…,(\mathbf{x}_n,y_n)\}$，第 $i$ 组样本中的输入 $\mathbf{x}_i$ 具有 $m$ 个特征值，即：$\mathbf{x}_i=(x_i^{(1)},x_i^{(2)},…,x_i^{(m)})\in \mathbb{R}^m$，输出 $y_i\in\mathcal{Y}=\{+1,-1\}$

软间隔支持向量机学习算法的对偶问题：

$\begin{matrix} \min\limits_{\boldsymbol{\lambda}} && \frac{1}{2} \sum\limits_{i=1}^n \sum\limits_{j=1}^n \lambda_i\lambda_j y_i y_j (\mathbf{x}_i\cdot\mathbf{x}_j) - \sum\limits_{i=1}^n \lambda_i \\ s.t. && \sum\limits_{i=1}^n\lambda_i y_i= 0 \\ && 0\leq \lambda_i \leq C,&& i=1,2,\cdots,n \end{matrix}$

目标函数只涉及到输入实例与输入实例间的内积，将内积 $\mathbf{x}_i\cdot\mathbf{x}_j$ 用正定核函数 $K(\mathbf{x}_i,\mathbf{x}_j)=\phi(\mathbf{x}_i)\cdot\phi(\mathbf{x}_j)$ 来代替，此时对偶问题的目标函数为：

$W(\boldsymbol{\lambda}) = \frac{1}{2} \sum\limits_{i=1}^n \sum\limits_{j=1}^n \lambda_i\lambda_j y_i y_j K(\mathbf{x}_i,\mathbf{x}_j) - \sum\limits_{i=1}^n \lambda_i$

同样，分类决策函数中的内积也用核函数来代替：

$f(\mathbf{x}) = \text{sign} (\sum_{i=1}^n \lambda_i^*y_iK(\mathbf{x}_i,\mathbf{x})+\theta^*)$

这样一来，当映射函数 $\phi(\mathbf{x})$ 是非线性函数时，学习到的含有核函数的支持向量机就是非线性分类模型，即非线性支持向量机

关于核函数，详见：特征构建与核方法

【核技巧】

对于核方法来说，直接计算核函数 $K(\mathbf{x},\mathbf{z})$ 比较容易，而通过映射函数 $\phi(\mathbf{x})$ 和 $\phi(\mathbf{z})$ 计算 $K(\mathbf{x},\mathbf{z})$ 较为复杂

核技巧（Kernel Trick）是一种加速核方法的计算技巧，其在学习和预测中只显式的定义核函数 $K(\mathbf{x},\mathbf{z})$，不显式的定义 $\phi(\mathbf{x})$ 和 $\phi(\mathbf{z})$ ，从而避开分别计算 $\phi(\mathbf{x})$ 和 $\phi(\mathbf{z})$

利用核技巧，在核函数 $K(\mathbf{x},\mathbf{z})$ 给定的情况下，可以将线性分类的学习方法应用到非线性分类问题中，对于支持向量机来说，可以将软间隔支持向量机扩展到非线性支持向量机

在实际应用中，往往依赖于领域知识来直接选择核函数，而核函数选择的有效性需要通过实验验证

【学习算法】

下面给出非线性支持向量机的学习算法：

输入：容量为 $n$ 的线性不可分的训练集 $D=\{(\mathbf{x}_1,y_1),(\mathbf{x}_2,y_2),…,(\mathbf{x}_n,y_n)\}$，第 $i$ 组样本中的输入 $\mathbf{x}_i$ 具有 $m$ 个特征值，即：$\mathbf{x}_i=(x_i^{(1)},x_i^{(2)},…,x_i^{(m)})\in \mathbb{R}^m$，输出 $y_i\in\mathcal{Y}=\{+1,-1\}$

输出：分类决策函数

算法步骤：

Step1：选择适当的核函数 $K(\mathbf{x},\mathbf{z})$ 惩罚系数 $C>0$，构造并求解如下约束最优化问题（原始问题的对偶问题）

$\begin{matrix} \min\limits_{\boldsymbol{\lambda}} && \frac{1}{2} \sum\limits_{i=1}^n \sum\limits_{j=1}^n \lambda_i\lambda_j y_i y_j K(\mathbf{x}_i,\mathbf{x}_j) - \sum\limits_{i=1}^n \lambda_i \\ s.t. && \sum\limits_{i=1}^n\lambda_i y_i= 0 \\ && 0\leq \lambda_i \leq C,&& i=1,2,\cdots,n \end{matrix}$

求得最优解 $\boldsymbol{\lambda}^* = (\lambda_1^*,\lambda_2^*,\cdots,\lambda_n^*)^T$

Step2：根据最优解 $\boldsymbol{\lambda}^*$，选择一个正分量 $0<\lambda_j^*<C$，计算截距 $\theta^*$：

$\theta^* = y_j - \sum_{i=1}^n \lambda_i^* y_i K(\mathbf{x_i},\mathbf{x}_j)$

Step3：根据最优解 $\boldsymbol{\lambda}^*$ 和 $\theta^*$，构建分类决策函数

$f(\mathbf{x}) = \text{sign} (\sum_{i=1}^n \lambda_i^*y_iK(\mathbf{x}_i,\mathbf{x})+\theta^*)$

对于 Step1 中的约束最优化问题，当 $K(\mathbf{x},\mathbf{z})$ 是正定核函数时，其是一个凸二次规划问题，可以使用 SMO 算法求解，关于 SMO 算法，详见：序列最小最优化算法 SMO

【sklearn 实现】

以 sklearn 中的鸢尾花数据集为例，选取其后两个特征来实现非线性支持向量机

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report,precision_score,recall_score,f1_score
from matplotlib.colors import ListedColormap

# 特征提取
def deal_data():
    iris = load_iris()  # sklearn的鸢尾花数据集
    # iris分为三类，前50行一类，51-100行一类，101-150行一类
    X = iris.data[:, [2, 3]] # 选用后两个特征作为样本特征
    y = iris.target  #取species列，类别
    return X,y

# 数据归一化
def standard_scaler(X_train,X_test):
    sc = StandardScaler() # 初始化一个sc对象去对数据集作变换
    scaler = sc.fit(X_train) # 归一化，存有计算出的均值和方差
    X_train_std = scaler.transform(X_train) # 利用 scaler 进行标准化
    X_test_std = scaler.transform(X_test) # 利用 scaler 进行标准化
    return X_train_std, X_test_std

# 模型训练
def train_model(X_train_std, y_train):
    # 建立非线性SVM模型
    # kernel可选：
    #  - linear：线性核函数，与LinearSVC效果一致，但速度较慢
    #  - poly：多项式核函数，gamma为多项式的系数，degree为多项式的次数，coef0为多项式截距
    #  - rbf：高斯核函数，gamma为1/2σ
    #  - sigmoid：sigmoid核函数，coef0为多项式截距
    model = SVC(kernel='rbf', gamma=0.2, random_state=1)
    # 训练
    model.fit(X_train_std, y_train)
    return model

# 模型评估
def estimate_model(y_pred, y_test, model):
    # 混淆矩阵，三分类情况下，大小为 3*3
    cm2 = confusion_matrix(y_test,y_pred) 
    # 准确率
    acc = accuracy_score(y_test,y_pred)
    # 正确分类的样本数
    acc_num = accuracy_score(y_test,y_pred,normalize=False)
    # macro 分类报告
    macro_class_report = classification_report(y_test, y_pred,target_names=["类0","类1","类2"])
    # 微精确率
    micro_p = precision_score(y_test,y_pred,average='micro') 
    # 微召回率
    micro_r = recall_score(y_test,y_pred,average='micro')
    # 微F1得分
    micro_f1 = f1_score(y_test,y_pred,average='micro') 
    
    indicators = {"cm2":cm2,"acc":acc,"acc_num":acc_num,"macro_class_report":macro_class_report,"micro_p":micro_p,"micro_r":micro_r,"micro_f1":micro_f1}
    return indicators

# 可视化
def visualization(X, y, classifier, test_id=None, resolution=0.02):
    # 创建 color map
    markers = ('s', 'x', 'o', '^', 'v')
    colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
    cmap = ListedColormap(colors[:len(np.unique(y))])
    
    # 绘制决策边界
    x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1 #第一个特征取值范围作为横轴
    x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1 #第二个特征取值范围作为纵轴
    xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution)) # reolution为网格剖分粒度
    Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T) # 对组合的特征进行预测，ravel为数组展平
    Z = Z.reshape(xx1.shape) # Z是列向量
    plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap) # x和y为两个等长一维数组，z为二维数组，指定每一对xy所对应的z值
    plt.xlim(xx1.min(), xx1.max()) #对等高线间的区域进行填充
    plt.ylim(xx2.min(), xx2.max()) #对等高线间的区域进行填充
    
    # 全数据集，不同类别样本点的特征作为坐标(x,y)，用不同颜色画散点图
    for idx, cl in enumerate(np.unique(y)):
        plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1], alpha=0.8, c=cmap(idx), marker=markers[idx], label=cl) 
        
    # 高亮测试集
    if test_id:
        X_test, y_test = X[test_id, :], y[test_id]
        # c设置颜色，测试集不同类别的实例点画图不区别颜色
        plt.scatter(x=X_test[:, 0], y=X_test[:, 1], alpha=1.0, c='gray', marker='^', linewidths=1, s=55, label='test set')
        
    plt.xlabel('petal length [standardized]')
    plt.ylabel('petal width [standardized]')
    plt.legend(loc='upper left')
    plt.tight_layout()
    plt.show()
    
if __name__ == "__main__":
    # 特征提取
    X, y = deal_data()
    
    # 简单交叉验证
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
    
    # 数据标准化
    X_train_std, X_test_std = standard_scaler(X_train, X_test)
    
    # 模型训练
    model = train_model(X_train_std, y_train)
    
    # 预测结果
    y_pred = model.predict(X_test_std)
    print("y test:", y_test) # 测试集y值
    print("y pred:", y_pred) # 预测y值
    
    # 模型评估
    indicators = estimate_model(y_pred, y_test, model)
    cm2 = indicators["cm2"] 
    print("混淆矩阵：\n", cm2) 
    acc =  indicators["acc"]
    print("准确率：", acc)
    acc_num =  indicators["acc_num"]
    print("正确分类的样本数：", acc_num)
    macro_class_report = indicators["macro_class_report"]
    print("macro 分类报告：\n", macro_class_report)
    micro_p = indicators["micro_p"]
    print("微精确率：", micro_p)
    micro_r = indicators["micro_r"]
    print("微召回率：", micro_r)
    micro_f1 = indicators["micro_f1"]
    print("微F1得分：", micro_f1)
    
    # 可视化
    X_combined_std = np.vstack((X_train_std, X_test_std))
    y_combined = np.hstack((y_train, y_test))
    # classifier为分类器，test_id为测试集序号
    visualization(X_combined_std, y_combined, classifier=model, test_id=range(105, 150))