中国一级毛片特级毛片,性欧美高清videofree,婷婷在线观看香蕉五月天

回想一下2.4 節(jié)，計算導(dǎo)數(shù)是我們將用于訓(xùn)練深度網(wǎng)絡(luò)的所有優(yōu)化算法中的關(guān)鍵步驟。雖然計算很簡單，但手工計算可能很乏味且容易出錯，而且這個問題只會隨著我們的模型變得更加復(fù)雜而增長。

幸運的是，所有現(xiàn)代深度學(xué)習(xí)框架都通過提供自動微分（通常簡稱為 autograd ）來解決我們的工作。當(dāng)我們通過每個連續(xù)的函數(shù)傳遞數(shù)據(jù)時，該框架會構(gòu)建一個計算圖來跟蹤每個值如何依賴于其他值。為了計算導(dǎo)數(shù)，自動微分通過應(yīng)用鏈?zhǔn)椒▌t通過該圖向后工作。以這種方式應(yīng)用鏈?zhǔn)椒▌t的計算算法稱為反向傳播。

雖然 autograd 庫在過去十年中成為熱門話題，但它們的歷史悠久。事實上，對 autograd 的最早引用可以追溯到半個多世紀以前（Wengert，1964 年）。現(xiàn)代反向傳播背后的核心思想可以追溯到 1980 年的一篇博士論文 ( Speelpenning, 1980 )，并在 80 年代后期得到進一步發(fā)展 ( Griewank, 1989 )。雖然反向傳播已成為計算梯度的默認方法，但它并不是唯一的選擇。例如，Julia 編程語言采用前向傳播（Revels等人，2016 年）. 在探索方法之前，我們先來掌握autograd這個包。

import torch

from mxnet import autograd, np, npx

npx.set_np()

from jax import numpy as jnp

import tensorflow as tf

2.5.1. 一個簡單的函數(shù)

假設(shè)我們有興趣區(qū)分函數(shù) y=2x?x關(guān)于列向量x. 首先，我們分配x一個初始值。

x = torch.arange(4.0)
x

tensor([0., 1., 2., 3.])

在我們計算梯度之前y關(guān)于 x，我們需要一個地方來存放它。通常，我們避免每次求導(dǎo)時都分配新內(nèi)存，因為深度學(xué)習(xí)需要針對相同參數(shù)連續(xù)計算導(dǎo)數(shù)數(shù)千或數(shù)百萬次，并且我們可能會面臨內(nèi)存耗盡的風(fēng)險。請注意，標(biāo)量值函數(shù)相對于向量的梯度x是向量值的并且具有相同的形狀x.

# Can also create x = torch.arange(4.0, requires_grad=True)
x.requires_grad_(True)
x.grad # The gradient is None by default

x = np.arange(4.0)
x

array([0., 1., 2., 3.])

Before we calculate the gradient of y with respect to x, we need a place to store it. In general, we avoid allocating new memory every time we take a derivative because deep learning requires successively computing derivatives with respect to the same parameters thousands or millions of times, and we might risk running out of memory. Note that the gradient of a scalar-valued function with respect to a vector x is vector-valued and has the same shape as x.

# We allocate memory for a tensor's gradient by invoking `attach_grad`
x.attach_grad()
# After we calculate a gradient taken with respect to `x`, we will be able to
# access it via the `grad` attribute, whose values are initialized with 0s
x.grad

array([0., 0., 0., 0.])

x = jnp.arange(4.0)
x

No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

Array([0., 1., 2., 3.], dtype=float32)

x = tf.range(4, dtype=tf.float32)
x

x = tf.Variable(x)

我們現(xiàn)在計算我們的函數(shù)x并將結(jié)果分配給y。

y = 2 * torch.dot(x, x)
y

tensor(28., grad_fn=)

我們現(xiàn)在可以通過調(diào)用它的方法來獲取y關(guān)于的梯度。接下來，我們可以通過的屬性訪問漸變。xbackwardxgrad

y.backward()
x.grad

tensor([ 0., 4., 8., 12.])

# Our code is inside an `autograd.record` scope to build the computational
# graph
with autograd.record():
  y = 2 * np.dot(x, x)
y

array(28.)

We can now take the gradient of y with respect to x by calling its backward method. Next, we can access the gradient via x’s grad attribute.

y.backward()
x.grad

[09:38:36] src/base.cc:49: GPU context requested, but no GPUs found.

array([ 0., 4., 8., 12.])

y = lambda x: 2 * jnp.dot(x, x)
y(x)

Array(28., dtype=float32)

We can now take the gradient of y with respect to x by passing through the grad transform.

from jax import grad

# The `grad` transform returns a Python function that
# computes the gradient of the original function
x_grad = grad(y)(x)
x_grad

Array([ 0., 4., 8., 12.], dtype=float32)

# Record all computations onto a tape
with tf.GradientTape() as t:
  y = 2 * tf.tensordot(x, x, axes=1)
y

We can now calculate the gradient of y with respect to x by calling the gradient method.

x_grad = t.gradient(y, x)
x_grad

我們已經(jīng)知道函數(shù)的梯度 y=2x?x關(guān)于 x應(yīng)該4x. 我們現(xiàn)在可以驗證自動梯度計算和預(yù)期結(jié)果是否相同。

x.grad == 4 * x

tensor([True, True, True, True])

現(xiàn)在讓我們計算另一個函數(shù)x并獲取它的梯度。請注意，當(dāng)我們記錄新的梯度時，PyTorch 不會自動重置梯度緩沖區(qū)。相反，新的漸變被添加到已經(jīng)存儲的漸變中。當(dāng)我們想要優(yōu)化多個目標(biāo)函數(shù)的總和時，這種行為會派上用場。要重置梯度緩沖區(qū)，我們可以調(diào)用x.grad.zero()如下：

x.grad.zero_() # Reset the gradient
y = x.sum()
y.backward()
x.grad

tensor([1., 1., 1., 1.])

x.grad == 4 * x

array([ True, True, True, True])

Now let’s calculate another function of x and take its gradient. Note that MXNet resets the gradient buffer whenever we record a new gradient.

with autograd.record():
  y = x.sum()
y.backward()
x.grad # Overwritten by the newly calculated gradient

array([1., 1., 1., 1.])

x_grad == 4 * x

Array([ True, True, True, True], dtype=bool)

y = lambda x: x.sum()
grad(y)(x)

Array([1., 1., 1., 1.], dtype=float32)

x_grad == 4 * x

Now let’s calculate another function of x and take its gradient. Note that TensorFlow resets the gradient buffer whenever we record a new gradient.

with tf.GradientTape() as t:
  y = tf.reduce_sum(x)
t.gradient(y, x) # Overwritten by the newly calculated gradient

2.5.2. 非標(biāo)量變量的后向

當(dāng)y是向量時，y關(guān)于向量的導(dǎo)數(shù)最自然的解釋是稱為雅可比x矩陣的矩陣，其中包含關(guān)于每個分量的每個分量的偏導(dǎo)數(shù)。同樣，對于高階和，微分結(jié)果可能是更高階的張量。yxyx

y 雖然 Jacobian 矩陣確實出現(xiàn)在一些高級機器學(xué)習(xí)技術(shù)中，但更常見的是，我們希望將的每個分量相對于完整向量的梯度求和x，從而產(chǎn)生與形狀相同的向量x。例如，我們通常有一個向量表示我們的損失函數(shù)的值，分別為一批訓(xùn)練示例中的每個示例計算。在這里，我們只想總結(jié)為每個示例單獨計算的梯度。

由于深度學(xué)習(xí)框架在解釋非標(biāo)量張量梯度的方式上有所不同，因此 PyTorch 采取了一些措施來避免混淆。調(diào)用backward非標(biāo)量會引發(fā)錯誤，除非我們告訴 PyTorch 如何將對象縮減為標(biāo)量。更正式地說，我們需要提供一些向量v這樣backward會計算v??xy而不是?xy. 下一部分可能令人困惑，但出于稍后會變得清楚的原因，這個論點（代表v) 被命名為gradient。更詳細的描述見楊章的Medium帖子。

x.grad.zero_()
y = x * x
y.backward(gradient=torch.ones(len(y))) # Faster: y.sum().backward()
x.grad

tensor([0., 2., 4., 6.])

MXNet handles this problem by reducing all tensors to scalars by summing before computing a gradient. In other words, rather than returning the Jacobian ?xy, it returns the gradient of the sum ?x∑iyi.

with autograd.record():
  y = x * x
y.backward()
x.grad # Equals the gradient of y = sum(x * x)

array([0., 2., 4., 6.])

y = lambda x: x * x
# grad is only defined for scalar output functions
grad(lambda x: y(x).sum())(x)

Array([0., 2., 4., 6.], dtype=float32)

By default, TensorFlow returns the gradient of the sum. In other words, rather than returning the Jacobian ?xy, it returns the gradient of the sum ?x∑iyi.

with tf.GradientTape() as t:
  y = x * x
t.gradient(y, x) # Same as y = tf.reduce_sum(x * x)

2.5.3. 分離計算

有時，我們希望將一些計算移到記錄的計算圖之外。例如，假設(shè)我們使用輸入來創(chuàng)建一些我們不想為其計算梯度的輔助中間項。在這種情況下，我們需要從最終結(jié)果中分離出相應(yīng)的計算圖。下面的玩具示例更清楚地說明了這一點：假設(shè)我們有，但我們想關(guān)注on的直接影響，而不是通過傳達的影響。在這種情況下，我們可以創(chuàng)建一個新變量，該變量具有與相同的值，但其出處（創(chuàng)建方式）已被清除。因此z = x * yy = x * xxzyuyu圖中沒有祖先，梯度不會u流向x. 例如，采用的梯度將產(chǎn)生結(jié)果，（與您自以來可能預(yù)期的不同）。z = x * ux3 * x * xz = x * x * x

x.grad.zero_()
y = x * x
u = y.detach()
z = u * x

z.sum().backward()
x.grad == u

tensor([True, True, True, True])

with autograd.record():
  y = x * x
  u = y.detach()
  z = u * x
z.backward()
x.grad == u

array([ True, True, True, True])

import jax

y = lambda x: x * x
# jax.lax primitives are Python wrappers around XLA operations
u = jax.lax.stop_gradient(y(x))
z = lambda x: u * x

grad(lambda x: z(x).sum())(x) == y(x)

Array([ True, True, True, True], dtype=bool)

# Set persistent=True to preserve the compute graph.
# This lets us run t.gradient more than once
with tf.GradientTape(persistent=True) as t:
  y = x * x
  u = tf.stop_gradient(y)
  z = u * x

x_grad = t.gradient(z, x)
x_grad == u

請注意，雖然此過程將y的祖先與的圖分離z，但導(dǎo)致的計算圖仍然存在，因此我們可以計算關(guān)于y的梯度。yx

x.grad.zero_()
y.sum().backward()
x.grad == 2 * x

tensor([True, True, True, True])

y.backward()
x.grad == 2 * x

array([ True, True, True, True])

grad(lambda x: y(x).sum())(x) == 2 * x

Array([ True, True, True, True], dtype=bool)

t.gradient(y, x) == 2 * x

2.5.4. 漸變和 Python 控制流

到目前為止，我們回顧了從輸入到輸出的路徑通過諸如. 編程為我們計算結(jié)果的方式提供了更多的自由。例如，我們可以使它們依賴于輔助變量或?qū)χ虚g結(jié)果的條件選擇。使用自動微分的一個好處是，即使構(gòu)建函數(shù)的計算圖需要通過迷宮般的 Python 控制流（例如，條件、循環(huán)和任意函數(shù)調(diào)用），我們?nèi)匀豢梢杂嬎憬Y(jié)果變量的梯度。為了說明這一點，請考慮以下代碼片段，其中循環(huán)的迭代次數(shù) 和語句的評估都取決于輸入的值。z = x * x * xwhileifa

def f(a):
  b = a * 2
  while b.norm() < 1000:
    b = b * 2
  if b.sum() > 0:
    c = b
  else:
    c = 100 * b
  return c

def f(a):
  b = a * 2
  while np.linalg.norm(b) < 1000:
    b = b * 2
  if b.sum() > 0:
    c = b
  else:
    c = 100 * b
  return c

def f(a):
  b = a * 2
  while jnp.linalg.norm(b) < 1000:
    b = b * 2
  if b.sum() > 0:
    c = b
  else:
    c = 100 * b
  return c

def f(a):
  b = a * 2
  while tf.norm(b) < 1000:
    b = b * 2
  if tf.reduce_sum(b) > 0:
    c = b
  else:
    c = 100 * b
  return c

下面，我們調(diào)用這個函數(shù)，傳入一個隨機值作為輸入。由于輸入是一個隨機變量，我們不知道計算圖將采用什么形式。然而，每當(dāng)我們f(a)對一個特定的輸入執(zhí)行時，我們就會實現(xiàn)一個特定的計算圖并可以隨后運行backward。

a = torch.randn(size=(), requires_grad=True)
d = f(a)
d.backward()

a = np.random.normal()
a.attach_grad()
with autograd.record():
  d = f(a)
d.backward()

from jax import random

a = random.normal(random.PRNGKey(1), ())
d = f(a)
d_grad = grad(f)(a)

a = tf.Variable(tf.random.normal(shape=()))
with tf.GradientTape() as t:
  d = f(a)
d_grad = t.gradient(d, a)
d_grad

盡管我們的函數(shù)f出于演示目的有點人為設(shè)計，但它對輸入的依賴性非常簡單：它是具有分段定義比例的線性函數(shù)。a因此，是一個包含常量項的向量，此外，需要匹配關(guān)于的梯度。f(a) / af(a) / af(a)a

a.grad == d / a

tensor(True)

a.grad == d / a

array(True)

d_grad == d / a

Array(True, dtype=bool)

d_grad == d / a

動態(tài)控制流在深度學(xué)習(xí)中很常見。例如，在處理文本時，計算圖取決于輸入的長度。在這些情況下，自動微分對于統(tǒng)計建模變得至關(guān)重要，因為不可能先驗地計算梯度。

2.5.5. 討論

您現(xiàn)在已經(jīng)領(lǐng)略了自動微分的威力。用于自動和高效計算導(dǎo)數(shù)的庫的開發(fā)極大地提高了深度學(xué)習(xí)從業(yè)者的生產(chǎn)力，使他們能夠?qū)Ｗ⒂诟呒壍膯栴}。此外，autograd 允許我們設(shè)計大量模型，筆和紙的梯度計算將非常耗時。有趣的是，雖然我們使用 autograd 來優(yōu)化模型（在統(tǒng)計意義上），但autograd 庫本身的優(yōu)化（在計算意義上）是框架設(shè)計者非常感興趣的一個豐富主題。在這里，來自編譯器和圖形操作的工具被用來以最方便和內(nèi)存效率最高的方式計算結(jié)果。

現(xiàn)在，試著記住這些基礎(chǔ)知識：(i) 將梯度附加到那些我們想要導(dǎo)數(shù)的變量；(ii) 記錄目標(biāo)值的計算；(iii) 執(zhí)行反向傳播功能；(iv) 訪問生成的梯度。

2.5.6. 練習(xí)

為什么二階導(dǎo)數(shù)的計算成本比一階導(dǎo)數(shù)高得多？

運行反向傳播函數(shù)后，立即再次運行它，看看會發(fā)生什么。為什么？

d在我們計算關(guān)于的導(dǎo)數(shù)的控制流示例中 a，如果我們將變量更改a為隨機向量或矩陣會發(fā)生什么？此時，計算的結(jié)果f(a)不再是標(biāo)量。結(jié)果會怎樣？我們?nèi)绾畏治鲞@個？

讓f(x)=sin?(x). 繪制圖形f及其衍生物f′. 不要利用這個事實 f′(x)=cos?(x)而是使用自動微分來獲得結(jié)果。

讓f(x)=((log?x2)?sin?x)+x?1. 寫出依賴圖跟蹤結(jié)果x到f(x).

使用鏈?zhǔn)椒▌t計算導(dǎo)數(shù)dfdx上述函數(shù)，將每個術(shù)語放在您之前構(gòu)建的依賴圖上。

給定圖形和中間導(dǎo)數(shù)結(jié)果，您在計算梯度時有多種選擇。從開始評估結(jié)果x到f一次來自f 追溯到x. 路徑從x到f通常稱為前向微分，而從 f到x被稱為向后微分。

你什么時候想用前向微分，什么時候用后向微分？提示：考慮所需的中間數(shù)據(jù)量、并行化步驟的能力以及涉及的矩陣和向量的大小。

阅读全文

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場。文章及其配圖僅供工程師學(xué)習(xí)之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請聯(lián)系本站處理。舉報投訴

pytorch

pytorch

+關(guān)注

關(guān)注
2

文章
809

瀏覽量
13987

色哟哟视频在线观看-色哟哟视频在线-色哟哟欧美15最新在线-色哟哟免费在线观看-国产l精品国产亚洲区在线观看-国产l精品国产亚洲区久久

搜索歷史

PyTorch教程-2.5. 自動微分

評論

電子發(fā)燒友