References：

机器学习——特征工程之数据降维

【数据降维-第4篇】多维尺度变换（MDS）快速理解，及MATLAB实现

机器学习-降维

六种常见数据降维方法简介及代码实现

【基本思想】

约束条件

对于给定容量为 $n$ 的样本集 $X=\{\mathbf{x}_1,\mathbf{x}_2,\cdots,\mathbf{x}_n\}$，第 $i$ 个样本的输入 $\mathbf{x}_i$ 具有 $m$ 个特征值，即：$\mathbf{x}_i = (x_i^{(1)},x_i^{(2)},\cdots,x_i^{(m)})\in \mathbb{R}^{m}$，这 $n$ 个样本在原始空间的欧氏距离矩阵为 $D\in \mathbb{R}^{n\times n}$，其第 $i$ 行第 $j$ 列的元素 $\text{dist}_{ij}$ 为样本 $\mathbf{x}_{i}$ 到 $\mathbf{x}_j$ 的欧氏距离

低维嵌入的目标是获得样本在 $m’$ 维空间的表示 $Z=\{\mathbf{z}_1,\mathbf{z}_2,\cdots,\mathbf{z}_n\}\in\mathbb{R}^{m’\times n},m’\leq m$，且任意两个样本在 $m’$ 维空间中的欧氏距离等于原始空间中的欧氏距离，即约束条件：

$\Vert \mathbf{z}_i-\mathbf{z}_j \Vert = \text{dist}_{ij}$

令降维后样本的内积矩阵为 $B$，即：

$B=Z^TZ\in \mathbb{R}^{n\times n},\quad b_{ij}=\mathbf{z}_i^T\mathbf{z}_j$

对约束条件两边取平方，有：

$\begin{align*} \text{dist}_{ij}^2 &= \Vert \mathbf{z}_i \Vert^2 + \Vert \mathbf{z}_j \Vert^2 - 2\mathbf{z}_i^T\mathbf{z}_j \\ &= b_{ii}+b_{jj}-2b_{ij} \end{align*}$

内积表示

为便于讨论，令降维后的样本 $Z$ 中心化，即：

$\sum_{i=1}^n \mathbf{z}_i = \mathbf{0}$

显然有矩阵 $B$ 的行列和均为 $0$，即：

$\begin{align*} \sum_{i=1}^n b_{ij} &= b_{1j} + b_{2j} + \cdots +b_{nj} \\ &= \mathbf{z}_1^T\mathbf{z}_j + \mathbf{z}_2^T\mathbf{z}_j + \cdots + \mathbf{z}_n^T\mathbf{z}_j \\ &= (\mathbf{z}_1^T + \mathbf{z}_2^T + \cdots + \mathbf{z}_n^T) \mathbf{z}_j \\ &= \Big( \sum_{i=0}^n \mathbf{z}_i^T \Big) \mathbf{z}_j \\ &= 0 \end{align*}$

由此，有：

$\begin{align*} \sum_{i=1}^n \text{dist}_{ij}^2 &= \sum_{i=1}^n (b_{ii}+b_{jj}-2b_{ij}) \\ &= b_{11}+b_{jj}-2b_{1j} + b_{22}+b_{jj}-2b_{2j} + \cdots +b_{nn}+b_{jj}-2b_{nj} \\ &= (b_{11}+b_{22}+\cdots+b_{nn}) + nb_{jj} + \sum_{i=1}^n b_{ij} \\ &= \sum_{i=1}^n \Vert\mathbf{z}_i\Vert^2 + nb_{jj} + \sum_{i=1}^n b_{ij} \\ &= \text{tr}(B) + nb_{jj} \end{align*}$

同理，有：

$\begin{gather*} \sum_{j=0}^n b_{ij}=0 \\ \sum_{j=1}^n \text{dist}_{ij}^2 = \text{tr}(B) + nb_{ii} \end{gather*}$

进一步，对于 $\sum\limits_{i=1}^n\sum\limits_{j=1}^n \text{dist}_{ij}^2$，有：

$\begin{align*} \sum_{i=1}^n\sum_{j=1}^n \text{dist}_{ij}^2 &= \sum_{i=1}^n \Big(\sum_{j=1}^n \text{dist}_{ij}^2 \Big) \\ &= \sum_{i=1}^n \big[ \text{tr}(B)+nb_{ii} \big] \\ &= n\cdot\text{tr}(B) + n(b_{11}+b_{22}+\cdots+b_{nn}) \\ &= n\cdot\text{tr}(B) + n\cdot\text{tr}(B) \\ &= 2n\cdot\text{tr}(B) \end{align*}$

令：

$\begin{gather*} \text{dist}_{i\cdot}^2 = \frac{1}{n} \sum_{j=1}^n \text{dist}_{ij}^2 \\ \text{dist}_{\cdot j}^2 = \frac{1}{n} \sum_{i=1}^n \text{dist}_{ij}^2 \\ \text{dist}_{\cdot\cdot}^2 = \frac{1}{n^2} \sum_{i=1}^n \sum_{j=1}^n \text{dist}_{ij}^2 \end{gather*}$

那么有：

$\begin{align*} & \text{dist}_{ij}^2 - \text{dist}_{i\cdot}^2 - \text{dist}_{\cdot j}^2 + \text{dist}_{\cdot\cdot}^2 \\ =& (b_{ii}+b_{jj}-2b_{ij} ) - \frac{1}{n}[\text{tr}(B) + nb_{ii}] -\frac{1}{n}[ \text{tr}(B) + nb_{jj}] + \frac{1}{n^2} [2n\cdot\text{tr}(B)] \\ =& b_{ii}+b_{jj} - \frac{2}{n} \text{tr}(B) - b_{ii} - b_{jj} -2b_{ij} +\frac{2} {n}\text{tr}(B) \\ =& -2b_{ij} \end{align*}$

由此可通过降维前后保持不变的距离矩阵 $D$ 中的元素来求取内积矩阵 $B$ 的元素，即：

$b_{ij} = -\frac{1}{2}(\text{dist}_{ij}^2 - \text{dist}_{i\cdot}^2 - \text{dist}_{\cdot j}^2 + \text{dist}_{\cdot\cdot}^2)$

至此为止，即获得样本在 $m’$ 维空间中的内积表示

特征值分解

现在，有了样本在 $m’$ 维空间中的内积表示 $B$，对其采用特征值分解（Eigenvalue Decomposition）的方式来获取样本在 $m’$ 维空间的表示 $Z$

令对角矩阵 $\Lambda=\text{diag}(\lambda_1,\lambda_2,\cdots,\lambda_m),\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_m$ 表示内积矩阵 $B$ 的特征值构构成的特征矩阵，$V$ 为特征值对应的特征向量组成的特征矩阵，则内积矩阵 $B$ 可表示为：

$B=V\Lambda V^T$

假定 $m$ 个特征值中有 $m^*$ 个非零特征值，它们构成对角矩阵 $\Lambda_* = \text{diag}(\lambda_1 , \lambda_2 , \cdots , \lambda_{m^*})$，用 $V_{*}$ 表示对应的特征向量矩阵，那么根据 $B=Z^TZ$，可得样本在 $m’$ 维空间的表示：

$Z = \Lambda_{*}^{\frac{1}{2}} V_{*}^{T} \in \mathbb{R}^{m^*\times n}$

在实际应用中，为了有效降维，往往仅需要降维后的距离与原始空间中的距离尽可能的接近，不必严格相等，此时通常取 $m’ \ll m$ 个最大特征值来构成对角矩阵 $\tilde{\Lambda} = \text{diag}(\lambda_1 , \lambda_2 , \cdots , \lambda_{m’})$，用 $\tilde{V}$ 表示对应的特征向量矩阵，则 $Z$ 可表示为：

$Z = \tilde{\Lambda}^{\frac{1}{2}} \tilde{V}^T \in \mathbb{R}^{m'\times n}$

【算法流程】

输入：$n$ 个样本的集合 $X=\{\mathbf{x}_1,\mathbf{x}_2,\cdots,\mathbf{x}_n\}\in \mathbb{R}^{m\times n}$，低维空间维数 $m’$

输出：样本在 $m’$ 维空间的表示 $Z = \tilde{\Lambda}^{\frac{1}{2}} \tilde{V}^T \in \mathbb{R}^{m’\times n}$，每行是一个样本在 $m’$ 维空间的坐标

算法流程：

Step1：构建距离矩阵 $D$。计算原始空间中各样本间的欧氏距离，以获取距离矩阵 $D\in \mathbb{R}^{n\times n}$，其第 $i$ 行第 $j$ 列的元素 $\text{dist}_{ij}$ 为样本 $\mathbf{x}_{i}$ 到 $\mathbf{x}_j$ 的欧氏距离

Step2：根据距离矩阵 $D$，计算 $\text{dist}_{i\cdot}^2,\text{dist}_{\cdot j}^2,\text{dist}_{\cdot\cdot}^2$

Step3：计算内积矩阵 $B$

$b_{ij} = -\frac{1}{2}(\text{dist}_{ij}^2 - \text{dist}_{i\cdot}^2 - \text{dist}_{\cdot j}^2 + \text{dist}_{\cdot\cdot}^2)$

Step4：对内积矩阵进行特征值分解

Step5：计算样本在 $m’$ 维空间的表示 $Z$。取 $\tilde{\Lambda}$ 为 $m’$ 个最大特征值所构成的对角矩阵，$\tilde{V}$ 为相应的特征向量矩阵

$Z = \tilde{\Lambda}^{\frac{1}{2}} \tilde{V}^T \in \mathbb{R}^{m'\times n}$

【sklearn 实现】

以 sklearn 中的鸢尾花数据集为例，将降维到二维空间来可视化

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import MDS

# 特征提取
def deal_data():
    iris = load_iris()  # sklearn的鸢尾花数据集
    # iris分为三类，前50行一类，51-100行一类，101-150行一类
    X = iris.data
    y = iris.target
    return X, y

# 数据标准化
def standard_scaler(X):
    sc = StandardScaler() # 初始化一个sc对象去对数据集作变换
    scaler = sc.fit(X) # 归一化，存有计算出的均值和方差
    X_std = scaler.transform(X) # 利用 scaler 进行标准化
    return X_std

# 模型训练
def train_model(features):
    # 建立MDS模型
    model = MDS(n_components=2, verbose=1, random_state=42)
    # 降维
    result = model.fit_transform(features)
    return result

# 可视化
def visualization(X, y):
    plt.scatter(X[:,0], X[:,1], c=y, s=10)
        
    plt.xlabel('x1')
    plt.ylabel('x2')
    plt.legend(loc='upper left')
    plt.tight_layout()
    plt.show()


if __name__ == "__main__":
    # 数据处理
    X, y = deal_data()

    # 数据标准化
    X_std = standard_scaler(X)

    # 获取降维结果
    result = train_model(X_std)

    # 可视化
    visualization(result, y)

Alex_McAvoy

多尺度变换 MDS