References：

深入理解提升树（Boosting tree）算法

集成学习(Ensemble Learning)——提升树（Boosting Tree）

统计学习方法—提升树模型（Boosting Tree）与梯度提升树（GBDT）

【提升树】

提升树（Boosting Tree）是 Boosting 算法族的一种

对于 Boosting 提升法的加法模型：

$f(\mathbf{x}) = \sum_{t=1}^T \alpha_t G_t(\mathbf{x})$

当基函数 $G_t(\mathbf{x})$ 为决策树，且基函数系数 $\alpha_t=1$ 时，为提升树模型，即：

$f(\mathbf{x}) = \sum_{t=1}^T G_t(\mathbf{x})$

关于 Boosting 算法，详见：Boosting 提升算法

【提升树的回归算法】

CART 回归树

对于回归问题，设给定的容量为 $n$ 的训练集 $D=\{(\mathbf{x}_1,y_1),(\mathbf{x}_2,y_2),…,(\mathbf{x}_n,y_n)\}$，第 $i$ 组样本中的输入 $\mathbf{x}_i$ 具有 $m$ 个特征值，即：$\mathbf{x}_i=(x_i^{(1)},x_i^{(2)},…,x_i^{(m)})\in \mathbb{R}^m$，输出 $y_i\in\mathcal{Y}\subset \mathbb{R}$

依照 CART 回归树，将输入空间 $\mathcal{X}$ 划分为 $K$ 个互不相交的区域 $R_1,R_2,\cdots,R_K$，并且在每个区域上确定输出的常量 $c_k$，那么决策树可表示为：

$G(\mathbf{x}) = \sum_{k=1}^K c_k \mathbb{I}(\mathbf{x}\in R_k)$

其中，$K$ 为回归树的复杂度即叶结点个数，$\{(R_1,c_1),(R_2,c_2),\cdots,(R_K,c_K)\}$ 为树的区域划分和各区域上的最优值

关于 CART 回归树，详见：决策树的 CART 生成算法

前向分步法

对于提升树算法，使用前向分步法进行求解，即：

$\begin{gather*} f_0(\mathbf{x}) = 0 \\ f_t(\mathbf{x}) = f_{t-1}(\mathbf{x}) + G_t(\mathbf{x}),\quad t=1,2,\cdots,T \\ f(\mathbf{x}) = \sum_{t=1}^T G_t(\mathbf{x}) \\ \end{gather*}$

对于前向分步法的第 $t$ 步来说，给定当前模型 $f_{t-1}(\mathbf{x})$，需求解：

$(R_t^*,c_t^*) = \arg\min_{R_t,c_t} \sum_{i=1}^n L\big(y_i,f_{t-1}(\mathbf{x}_i)+G_t(\mathbf{x}_i)\big)$

得到第 $t$ 棵树的参数，即第 $t$ 棵树的区域划分和各区域上的最优值

当采用平方损失函数 $L(y,f(\mathbf{x}))=(y-f(\mathbf{x}))^2$ 时，损失变为：

$\begin{align*} L(y,f(\mathbf{x})) &= L\big(y,f_{t-1}(\mathbf{x})+G_t(\mathbf{x})\big) \\ &= \big[y-f_{t-1}(\mathbf{x})-G_t(\mathbf{x})\big]^2\\ &= \big[ r-G_t(\mathbf{x}) \big]^2 \end{align*}$

其中，$r=y-f_{t-1}(\mathbf{x})$ 为当前模型 $G_t(\mathbf{x})$ 拟合数据的残差（Residual）

因此，对于回归问题的提升树来说，只需要简单地拟合当前模型 $G_t(\mathbf{x})$ 的残差即可

算法流程

下面给出回归问题的提升树的算法流程

输入：容量为 $n$ 的训练集 $D=\{(\mathbf{x}_1,y_1),(\mathbf{x}_2,y_2),…,(\mathbf{x}_n,y_n)\}$，第 $i$ 组样本中的输入 $\mathbf{x}_i$ 具有 $m$ 个特征值，即：$\mathbf{x}_i=(x_i^{(1)},x_i^{(2)},…,x_i^{(m)})\in \mathbb{R}^m$，输出 $y_i\in\mathcal{Y}\in \mathbb{R}$

输出：提升树 $f(\mathbf{x})$

算法流程：

Step1：初始化 $f_0(\mathbf{x})$

$f_0(\mathbf{x})=0$

Step2：对 $t=1,2,\cdots,T$

1）计算残差

$r_{ti} = y_i - f_{t-1}(\mathbf{x}_i),\quad i=1,2,\cdots,n$

2）根据 CART 算法，对残差 $r_{ti}$ 学习一个回归树，得到树的区域划分 $\{R_{t1},R_{t2},\cdots,R_{tK}\}$

3）根据树的区域划分 $R_{tk}$ ，计算各区域上的最优值 $\{c_{t1},c_{t2},\cdots,c_{tK}\}$

$c_{tk} = \arg \min_{c} \sum_{\mathbf{x}_i\in R_{tk}} L(y_i,f_{t-1}(\mathbf{x}_i)+c)$

4）得到回归决策树

$G_t(\mathbf{x}) = \sum_{k=1}^K c_k \mathbb{I}(\mathbf{x}\in R_{tk})$

5）更新回归决策树

$f_t(\mathbf{x}) = f_{t-1}(\mathbf{x})+G_t(\mathbf{x})$

Step3：根据加法模型得到最终的提升树

$f(\mathbf{x}) = \sum_{t=1}^T G_t(\mathbf{x})$

【提升树的分类算法】

对于分类问题，提升树算法可以看作是 AdaBoost 分类算法的特例，这里不再赘述

关于 AdaBoost 分类算法，详见：AdaBoost 算法