![MATLAB金融风险管理师FRM(高阶实战)](https://wfqqreader-1252317822.image.myqcloud.com/cover/187/36862187/b_36862187.jpg)
3.1 梯度向量
梯度(gradient)是优化问题中的重要概念,几乎所有优化方法都需要讨论梯度。本节首先用直观的方法介绍梯度。如图3.1所示,在坡面A点处放置一个小球,轻轻松开手的一瞬间,小球沿着坡面最陡峭方向滚下,瞬间滚动方向便是梯度下降方向(direction of gradient descent)。数学中,此方向的反方向即梯度方向,也称作梯度上升方向。
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P102_3712405.jpg?sign=1739158257-ROFAzjhWbM48mSeYGyCKjbjFOlWm1Nan-0-16e1963753ee92ac97effbd3270fcdae)
图3.1 梯度方向原理
丛书第一册第6章讲解方向微分(directional derivative)时,简单聊过图3.2。曲面上,经过P点有无数条切线,lx1和lx2为P点处沿着x方向的微分方向,ly1和ly2为P点处沿着y方向的微分方向。lh2为下降最快方向,lh1为上升最快方向,即本节要讲的梯度方向。lc1和lc2方向是和等高线相切的方向,沿着这两个方向微小移动,在曲面上高度不会变化。在此基础上,本节要深入介绍梯度和一些简单应用。丛书第三册第12章引入过倒三角微分算子(Nabla symbol)∇,它也叫Nabla 算子。本节开始用∇来表达梯度运算:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P102_3712407.jpg?sign=1739158257-XD2htyjkYPpOJMRUl7qOXLerIIjbFogV-0-b1f4951a1c7e7af0bca1f3188f93ed94)
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P102_3712408.jpg?sign=1739158257-PseR6AZoGGKNycAg6DburWLtaDVhDZKb-0-c33222a1ba7d71dafff817b43a21204e)
图3.2 曲面投影到x-y平面的等高线和P点的几条个性切线(图像来自丛书第一册第6章)
这一小节,[x1, x2] 表示[x, y],二元函数f(x1, x2)写作f(x)。
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P102_3712410.jpg?sign=1739158257-9FA2vmEoows6NSfh4T6kqFWujO9hJJ7A-0-bd1d9f92bbb2064ca8bc321cd88b4303)
x1-x2平面上,P(xP1, xP2)点处,任意偏离P点的微小移动(Δx1, Δx2)都导致f(x)大小发生变化,对应等高线数值变化。
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P103_3712413.jpg?sign=1739158257-2ya4rq9JnvtkcU7PcIDwlHaLzMzmtddf-0-0e770dc1941b6a9a7f85dcad2da49d28)
比如,当前点位于P点,微小移动后到达Q点,如图3.3所示。
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P103_3712414.jpg?sign=1739158257-E8krzW4pArsKkyTQJMXJZMgKbjxEP91j-0-e339dd7c872f28fdb19934d0f754ea6e)
图3.3 曲面从P点移动到Q点对应位置变化
用一阶偏微分做近似求解Δf:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P103_3712415.jpg?sign=1739158257-j2qQxvyCf0TqE4z5BNDxdjYjRKwJes0b-0-a31ab5fb72788d458992cfdedf606366)
上式便是丛书之前讲过的多元函数泰勒一阶展开。曲面上点P(xP1, xP2)在x1 和x2 两个维度上的偏微分,分别为该点处x1-z平面和x2-z平面内切线的斜率,如图3.4所示。
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P103_3712416.jpg?sign=1739158257-m6ylF65I7bkFMV0iNEBHjTgkE2RUDBPc-0-8a3fb70cccd028efad593810a9593676)
图3.4 偏微分和切线斜率关系
从几何角度来讲,P点处曲面切面替代曲面本身,估算函数值。图3.5给出这一过程。投影位移量(Δx1, Δx2)一致情况下,沿着曲面,从P点运动到Q点;而沿着P点切面,移动到了R点。R点对应高度与Q点高度近似。R点和Q点的高度差是估算误差。图3.6为图3.5局部放大图,这张图更清晰地展示了估算过程。
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P104_3712418.jpg?sign=1739158257-OgCa8kGn0L6spDGjmgx72idEZlpa1VZe-0-e205a81a988840cfb73b4c2b9581c3a2)
图3.5 曲面从P点线性移动到R点对应位置变化
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P104_3712419.jpg?sign=1739158257-S7OrYhBHI8t0n8N8HJ3H6bsURWTilBZE-0-f37298e7701f3637e5eb7b697f71ffaa)
图3.6 二元函数一阶泰勒展开估算
这种估算实际上相当于两个向量内积关系,这两个向量分别如下:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P104_3712420.jpg?sign=1739158257-UBzhSVjuLAvaFjxzWXpJqPGfrAIBSKLi-0-611e0b14bf9aaedfa1a24ede8c900504)
向量(Δx1, Δx2)决定了P点方向的微分方向,如图3.7所示。
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P105_3712422.jpg?sign=1739158257-7AlcS5J9nYZxTjEBu4ElhXXoyJSq2Pmm-0-c53468b4f54fbd61de986fc792ae10e2)
图3.7 x1-x2平面上方向微分
在没有特殊说明情况下,f(x1, x2)梯度一般表达为列向量:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P105_3712423.jpg?sign=1739158257-28F3h46WZnH5cYp12yRfOOpJ1tLG9y09-0-568644ad6c2add1653b6ca02248b27bc)
梯度也可用行向量表达,如下:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P105_3712424.jpg?sign=1739158257-UQyaBeNX8gbELWkLmKB3Ah9uCw5LU3E3-0-39dddfed1a704a68d09791e87b4d6738)
f(x1, x2)某一点P处梯度为:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P105_3712425.jpg?sign=1739158257-HqMAH4lkC0MLVgMrP3nhQHe8N2HmfETi-0-985ee3f92c5b12b9f27156f9992b01eb)
用另外一种方法解释。
x1-x2平面上,给定一个方向,用向量v 表示:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P105_3712427.jpg?sign=1739158257-nzZsSV1lkyuawRysDfcMpMkV0PYhhVqq-0-8f2ee8e3f5945401b26231fe8450e30c)
沿着v方向对f(x)求解方向微分:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P105_3712428.jpg?sign=1739158257-JfbscyBufgwR4L7U5htuAcvTrgBk02Zs-0-65d47837a23873ff83a74ba30533f669)
若v为单位向量,即:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P106_3712430.jpg?sign=1739158257-MQ9kBobPBvksnyVK18QSbXaBhGDhpqaB-0-8c359307ab61cdf2cc128066708091eb)
且,令单位向量v为:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P106_3712432.jpg?sign=1739158257-p1anqKoV754Ffq8lOUTwWbd1JzMeRxxL-0-32e20cbfa6b74484ce7988b51098b190)
图3.7给出了θ1和θ2角度定义。方向导数和偏导之间关系为:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P106_3712433.jpg?sign=1739158257-ont4jhkpJwh7leQFff4Ge6daBBXEF4Ac-0-b94e05ee87ad8098cb8050d0fb205830)
三元函数f(x1, x2, x3)空间中,同样获得类似结论:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P106_3712434.jpg?sign=1739158257-VbdcrizZ5spnWsGYQFTmkb6pHomomtpH-0-e6926a4aa6c904372641efd53e0be648)
多元函数也可得出类似结论。根据梯度和向量v定义,这样表达f(x)在v方向微分:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P106_3712436.jpg?sign=1739158257-a4cBpBKZwT7SdLxmCLbNVtf6T4jEWB2B-0-9607c5f2075d0c0f9eb620cd90eb604c)
根据向量点乘法则:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P106_3712437.jpg?sign=1739158257-ah0m9b4Aq8pvy3erWrs7974QUrE5J4PY-0-61ffa752d29ab3d49f3a6340e47b53b8)
若θ = 90°,则说明方向导数沿着等高线切向方向,函数值不会有任何变化,如图3.8(a)和(b)所示。若θ = 180°,如图3.8(c),则方向导数沿着梯度相反方向,这是函数值下降最快方向。
如图3.8(d),θ = 0°,方向导数和梯度同向,这是函数值最快上升方向。这种情况,方向导数和梯度同向,因此向量v 用∇f(x)表达:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P106_3712438.jpg?sign=1739158257-KCX57Il90ClZSwtb1syhmoOMc4MQpU7S-0-8cc2c914f3866ba6460c4bc625975acf)
因此,
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P106_3712439.jpg?sign=1739158257-YX2kuDK3VjGns9MZ6Tu9jh7K4iusnjxB-0-226b1b36a335d610fcadbfba3d7227e6)
本册优化部分还会继续深入讨论该性质。
当θ为锐角,函数变化大于0,函数值上升,如图3.8(e);当θ为钝角,函数变化小于0,函数值下降,如图3.8(f)。另外,∇f(x)和向量v的关系,和本书上一章介绍的投影(projection)几乎完全一致。
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P107_3712441.jpg?sign=1739158257-QazAttKeanf1kRvpnWnJrYdsj74I9cww-0-1b8b887a263338459bb61a958ce71269)
图3.8 x1-x2平面上六种方向微分情况
梯度向量模的大小决定了函数不同点上的最大变化率。换句话说,函数在不同点的最大变化率很可能不同。函数于该点处上升或者下降的幅度在下式限制范围之内:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P107_3712442.jpg?sign=1739158257-kvBJmxrUH89KNc1bSlTB1j6XkEyByFZd-0-e52a8f49784f0466df5f19778ee25eb1)
上式性质叫作Cauchy-Schwarz不等式。如图3.8所示,对于二元函数,x1-x2平面上,坐标轴刻度比例为1:1时,任意一点函数梯度方向和函数等高线切线方向相垂直。另外,优化问题中,一般采用归一化梯度向量(normalized gradient vector):
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P108_3712444.jpg?sign=1739158257-cMc43uh5vN2kBkYwieyWSXWTbimdryYZ-0-2160e1b81b9a393c500c749c8c313488)
归一化向量模为1:
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P108_3712445.jpg?sign=1739158257-1pprcL6Ps9KQaJO07CMIHpMxW7qm3dxH-0-73c8dbc3945c4b2660fea4f0f61afd2b)
函数梯度向量方向和大小随着位置变化,因此,在当前点上升或下降方向,一般不是相邻点上升或者下降最快方向。下面通过图像讲解这一点。图3.9展示了不同高度(-2, -4, -6和-8)等高线上不同位置点梯度向量的大小和方向。如前文讨论内容,对于该二次曲面,越靠近极值点,梯度向量越小。但此结论不适用于锥面,锥面梯度向量的模,除极点外,完全相同。由图3.9看到,当等高线高度相同时,等高线密集处(坡度越陡峭),即梯度向量模较大位置。请读者注意,图3.9中四个分图中的梯度经过了同样比例缩放。以下代码获得图3.9。
![](https://epubservercos.yuewen.com/745BB7/19549640201517806/epubprivate/OEBPS/Images/Figure-P108_3712447.jpg?sign=1739158257-NcLw8iQYQArCnZqlUBXWxNqnQUqWnzYo-0-0128acb8927891411eb2797d3a391e84)
图3.9 不同高度等高线上梯度向量分布情况
B4_Ch3_1.m clc; close all; clear all syms x1 x2 f = -x1^2 - 2*x2^2 - x1*x2; g = gradient(f, [x1, x2]) [XX1, XX2] = meshgrid(-3:0.4:3,-3:0.4:3); [XX1_fine, XX2_fine] = meshgrid(-3:.2:3,-3:.2:3); contour_f = subs(f, [x1 x2], {XX1_fine,XX2_fine}); figure(1) % c_start = floor(min(double(contour_f(:)))); % c_end = floor(max(double(contour_f(:)))); % c_levels = c_start:(c_end-c_start)/20:c_end; c_start = -24; c_end = 0; c_levels = c_start:2:c_end; ii = 1:4; for i = ii subplot(2,2,i) plot_fig(g,XX1_fine,XX2_fine,contour_f,c_levels,i) end function plot_fig(g,XX1_fine,XX2_fine,contour_f,c_levels,i) syms x1 x2 contour(XX1_fine,XX2_fine,double(contour_f),c_levels); hold on c_level = c_levels(end-i); % -1, -2, -3, - 4 [contour_loc,~] = contour(XX1_fine,XX2_fine,double(contour_f),[c_level,c_level],'L ineWidth',3); x1_contour_c = contour_loc(1,2:end); x2_contour_c = contour_loc(2,2:end); dFF_dx1 = subs(g(1), [x1 x2], {x1_contour_c x2_contour_c}); dFF_dx2 = subs(g(2), [x1 x2], {x1_contour_c x2_contour_c}); scale_factor = 0.15; h = quiver(x1_contour_c, x2_contour_c, double(dFF_dx1)*scale_factor, double(dFF_dx2)*scale_factor); h.AutoScale = 'off'; h.Color = [0,96,166]/255; h.Marker = '.'; h.MarkerSize = 3; h.MaxHeadSize = Inf; xlabel('${x_1}$','Interpreter','latex'); ylabel('${x_2}$','Interpreter','latex'); zlabel('${f(x_1,x_2)}$','Interpreter','latex') title(['Contour level = ',num2str(c_level)]) set(gca, 'FontName', 'Times New Roman','fontsize',10) grid off; axis equal xlim([-3,3]); ylim([-3,3]); caxis([-18 0]) end
有了这些向量计算基础内容,下面几节讲解直线、曲线、平面和曲面法向量和切向量性质。