永久看日本大片免费,深夜福利软件,亚欧乱色视频视频在线

機器學習就是從數據中提取信息。所以你可能想知道，我們可以從合成數據中學到什么？雖然我們本質上可能并不關心我們自己融入人工數據生成模型的模式，但此類數據集仍然可用于教學目的，幫助我們評估學習算法的屬性并確認我們的實現是否按預期工作。例如，如果我們創建的數據的正確參數是先驗已知的，那么我們可以驗證我們的模型實際上可以恢復它們。

%matplotlib inline
import random
import torch
from d2l import torch as d2l

%matplotlib inline
import random
from mxnet import gluon, np, npx
from d2l import mxnet as d2l

npx.set_np()

%matplotlib inline
import random
import jax
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
from jax import numpy as jnp
from d2l import jax as d2l

No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

%matplotlib inline
import random
import tensorflow as tf
from d2l import tensorflow as d2l

3.3.1. 生成數據集

對于這個例子，我們將使用低維來簡潔。以下代碼片段生成 1000 個示例，這些示例具有從標準正態分布中提取的二維特征。生成的設計矩陣X屬于R1000×2. 我們通過應用地面真值線性函數生成每個標簽，通過加性噪聲破壞它們?，為每個示例獨立且相同地繪制：

(3.3.1)y=Xw+b+?.

為了方便起見，我們假設?取自均值為正態分布μ=0和標準差 σ=0.01. 請注意，對于面向對象的設計，我們將代碼添加到__init__子類的方法中d2l.DataModule （在3.2.3 節中介紹）。允許設置任何額外的超參數是一種很好的做法。我們用 save_hyperparameters(). batch_size稍后將確定。

class SyntheticRegressionData(d2l.DataModule): #@save
  """Synthetic data for linear regression."""
  def __init__(self, w, b, noise=0.01, num_train=1000, num_val=1000,
         batch_size=32):
    super().__init__()
    self.save_hyperparameters()
    n = num_train + num_val
    self.X = torch.randn(n, len(w))
    noise = torch.randn(n, 1) * noise
    self.y = torch.matmul(self.X, w.reshape((-1, 1))) + b + noise

class SyntheticRegressionData(d2l.DataModule): #@save
  """Synthetic data for linear regression."""
  def __init__(self, w, b, noise=0.01, num_train=1000, num_val=1000,
         batch_size=32):
    super().__init__()
    self.save_hyperparameters()
    n = num_train + num_val
    self.X = np.random.randn(n, len(w))
    noise = np.random.randn(n, 1) * noise
    self.y = np.dot(self.X, w.reshape((-1, 1))) + b + noise

class SyntheticRegressionData(d2l.DataModule): #@save
  """Synthetic data for linear regression."""
  def __init__(self, w, b, noise=0.01, num_train=1000, num_val=1000,
         batch_size=32):
    super().__init__()
    self.save_hyperparameters()
    n = num_train + num_val
    key = jax.random.PRNGKey(0)
    key1, key2 = jax.random.split(key)
    self.X = jax.random.normal(key1, (n, w.shape[0]))
    noise = jax.random.normal(key2, (n, 1)) * noise
    self.y = jnp.matmul(self.X, w.reshape((-1, 1))) + b + noise

class SyntheticRegressionData(d2l.DataModule): #@save
  """Synthetic data for linear regression."""
  def __init__(self, w, b, noise=0.01, num_train=1000, num_val=1000,
         batch_size=32):
    super().__init__()
    self.save_hyperparameters()
    n = num_train + num_val
    self.X = tf.random.normal((n, w.shape[0]))
    noise = tf.random.normal((n, 1)) * noise
    self.y = tf.matmul(self.X, tf.reshape(w, (-1, 1))) + b + noise

下面，我們將真實參數設置為w=[2,?3.4]? 和b=4.2. 稍后，我們可以根據這些真實值檢查我們估計的參數。

data = SyntheticRegressionData(w=torch.tensor([2, -3.4]), b=4.2)

data = SyntheticRegressionData(w=np.array([2, -3.4]), b=4.2)

data = SyntheticRegressionData(w=jnp.array([2, -3.4]), b=4.2)

data = SyntheticRegressionData(w=tf.constant([2, -3.4]), b=4.2)

每行由features一個向量組成R2 每一行labels都是一個標量。讓我們看一下第一個條目。

print('features:', data.X[0],'nlabel:', data.y[0])

features: tensor([-0.0499, -0.2817])
label: tensor([5.0533])

print('features:', data.X[0],'nlabel:', data.y[0])

features: [2.2122064 1.1630787]
label: [4.684836]

print('features:', data.X[0],'nlabel:', data.y[0])

features: [-0.86997527 -3.2320356 ]
label: [13.438176]

print('features:', data.X[0],'nlabel:', data.y[0])

features: tf.Tensor([-0.8617247 0.9828964], shape=(2,), dtype=float32)
label: tf.Tensor([-0.86415064], shape=(1,), dtype=float32)

3.3.2. 讀取數據集

訓練機器學習模型通常需要多次遍歷數據集，一次獲取一小批示例。然后使用此數據更新模型。為了說明這是如何工作的，我們實現了該方法，通過（在第 3.2.1 節中介紹）get_dataloader將其注冊到類中。它采用批量大小、特征矩陣和標簽向量，并生成大小為的小批量。因此，每個小批量包含一個特征和標簽的元組。請注意，我們需要注意我們是處于訓練模式還是驗證模式：在前者中，我們希望以隨機順序讀取數據，而對于后者，能夠以預定義的順序讀取數據可能對于調試目的很重要。SyntheticRegressionDataadd_to_classbatch_size

@d2l.add_to_class(SyntheticRegressionData)
def get_dataloader(self, train):
  if train:
    indices = list(range(0, self.num_train))
    # The examples are read in random order
    random.shuffle(indices)
  else:
    indices = list(range(self.num_train, self.num_train+self.num_val))
  for i in range(0, len(indices), self.batch_size):
    batch_indices = torch.tensor(indices[i: i+self.batch_size])
    yield self.X[batch_indices], self.y[batch_indices]

@d2l.add_to_class(SyntheticRegressionData)
def get_dataloader(self, train):
  if train:
    indices = list(range(0, self.num_train))
    # The examples are read in random order
    random.shuffle(indices)
  else:
    indices = list(range(self.num_train, self.num_train+self.num_val))
  for i in range(0, len(indices), self.batch_size):
    batch_indices = np.array(indices[i: i+self.batch_size])
    yield self.X[batch_indices], self.y[batch_indices]

@d2l.add_to_class(SyntheticRegressionData)
def get_dataloader(self, train):
  if train:
    indices = list(range(0, self.num_train))
    # The examples are read in random order
    random.shuffle(indices)
  else:
    indices = list(range(self.num_train, self.num_train+self.num_val))
  for i in range(0, len(indices), self.batch_size):
    batch_indices = jnp.array(indices[i: i+self.batch_size])
    yield self.X[batch_indices], self.y[batch_indices]

@d2l.add_to_class(SyntheticRegressionData)
def get_dataloader(self, train):
  if train:
    indices = list(range(0, self.num_train))
    # The examples are read in random order
    random.shuffle(indices)
  else:
    indices = list(range(self.num_train, self.num_train+self.num_val))
  for i in range(0, len(indices), self.batch_size):
    j = tf.constant(indices[i : i+self.batch_size])
    yield tf.gather(self.X, j), tf.gather(self.y, j)

為了建立一些直覺，讓我們檢查第一個小批量數據。每個小批量特征都為我們提供了它的大小和輸入特征的維度。同樣，我們的小批量標簽將具有由給出的匹配形狀batch_size。

X, y = next(iter(data.train_dataloader()))
print('X shape:', X.shape, 'ny shape:', y.shape)

X shape: torch.Size([32, 2])
y shape: torch.Size([32, 1])

X, y = next(iter(data.train_dataloader()))
print('X shape:', X.shape, 'ny shape:', y.shape)

X shape: (32, 2)
y shape: (32, 1)

X, y = next(iter(data.train_dataloader()))
print('X shape:', X.shape, 'ny shape:', y.shape)

X shape: (32, 2)
y shape: (32, 1)

X, y = next(iter(data.train_dataloader()))
print('X shape:', X.shape, 'ny shape:', y.shape)

X shape: (32, 2)
y shape: (32, 1)

雖然看似無害，但調用 iter(data.train_dataloader())說明了 Python 面向對象設計的強大功能。請注意，我們在創建對象后SyntheticRegressionData向類添加了一個方法。盡管如此，該對象受益于事后向類添加功能。data

在整個迭代過程中，我們獲得不同的小批量，直到整個數據集都用完（試試這個）。雖然上面實現的迭代有利于教學目的，但它的效率低下，可能會讓我們在實際問題上遇到麻煩。例如，它要求我們將所有數據加載到內存中，并執行大量隨機內存訪問。在深度學習框架中實現的內置迭代器效率要高得多，它們可以處理諸如存儲在文件中的數據、通過流接收的數據以及動態生成或處理的數據等來源。接下來讓我們嘗試使用內置迭代器實現相同的方法。

3.3.3. 數據加載器的簡潔實現

我們可以調用框架中現有的 API 來加載數據，而不是編寫我們自己的迭代器。和以前一樣，我們需要一個具有特征X 和標簽的數據集y。除此之外，我們設置了batch_size內置的數據加載器，讓它有效地處理混洗示例。

@d2l.add_to_class(d2l.DataModule) #@save
def get_tensorloader(self, tensors, train, indices=slice(0, None)):
  tensors = tuple(a[indices] for a in tensors)
  dataset = torch.utils.data.TensorDataset(*tensors)
  return torch.utils.data.DataLoader(dataset, self.batch_size,
                    shuffle=train)

@d2l.add_to_class(SyntheticRegressionData) #@save
def get_dataloader(self, train):
  i = slice(0, self.num_train) if train else slice(self.num_train, None)
  return self.get_tensorloader((self.X, self.y), train, i)

@d2l.add_to_class(d2l.DataModule) #@save
def get_tensorloader(self, tensors, train, indices=slice(0, None)):
  tensors = tuple(a[indices] for a in tensors)
  dataset = gluon.data.ArrayDataset(*tensors)
  return gluon.data.DataLoader(dataset, self.batch_size,
                 shuffle=train)

@d2l.add_to_class(SyntheticRegressionData) #@save
def get_dataloader(self, train):
  i = slice(0, self.num_train) if train else slice(self.num_train, None)
  return self.get_tensorloader((self.X, self.y), train, i)

JAX is all about NumPy like API with device acceleration and the functional transformations, so at least the current version doesn’t include data loading methods. With other libraries we already have great data loaders out there, and JAX suggests using them instead. Here we will grab TensorFlow’s data loader, and modify it slightly to make it work with JAX.

@d2l.add_to_class(d2l.DataModule) #@save
def get_tensorloader(self, tensors, train, indices=slice(0, None)):
  tensors = tuple(a[indices] for a in tensors)
  # Use Tensorflow Datasets & Dataloader. JAX or Flax do not provide
  # any dataloading functionality
  shuffle_buffer = tensors[0].shape[0] if train else 1
  return tfds.as_numpy(
    tf.data.Dataset.from_tensor_slices(tensors).shuffle(
      buffer_size=shuffle_buffer).batch(self.batch_size))

@d2l.add_to_class(SyntheticRegressionData) #@save
def get_dataloader(self, train):
  i = slice(0, self.num_train) if train else slice(self.num_train, None)
  return self.get_tensorloader((self.X, self.y), train, i)

@d2l.add_to_class(d2l.DataModule) #@save
def get_tensorloader(self, tensors, train, indices=slice(0, None)):
  tensors = tuple(a[indices] for a in tensors)
  shuffle_buffer = tensors[0].shape[0] if train else 1
  return tf.data.Dataset.from_tensor_slices(tensors).shuffle(
    buffer_size=shuffle_buffer).batch(self.batch_size)

@d2l.add_to_class(SyntheticRegressionData) #@save
def get_dataloader(self, train):
  i = slice(0, self.num_train) if train else slice(self.num_train, None)
  return self.get_tensorloader((self.X, self.y), train, i)

新數據加載器的行為與之前的數據加載器一樣，只是它更高效并且增加了一些功能。

X, y = next(iter(data.train_dataloader()))
print('X shape:', X.shape, 'ny shape:', y.shape)

X shape: torch.Size([32, 2])
y shape: torch.Size([32, 1])

X, y = next(iter(data.train_dataloader()))
print('X shape:', X.shape, 'ny shape:', y.shape)

X shape: (32, 2)
y shape: (32, 1)

X, y = next(iter(data.train_dataloader()))
print('X shape:', X.shape, 'ny shape:', y.shape)

X shape: (32, 2)
y shape: (32, 1)

X, y = next(iter(data.train_dataloader()))
print('X shape:', X.shape, 'ny shape:', y.shape)

X shape: (32, 2)
y shape: (32, 1)

例如，框架API提供的數據加載器支持內置__len__方法，所以我們可以查詢它的長度，即批次數。

len(data.train_dataloader())

len(data.train_dataloader())

len(data.train_dataloader())

len(data.train_dataloader())

3.3.4. 概括

數據加載器是一種抽象加載和操作數據過程的便捷方式。這樣，相同的機器學習算法無需修改即可處理許多不同類型和來源的數據。數據加載器的優點之一是它們可以組合。例如，我們可能正在加載圖像，然后有一個后期處理過濾器來裁剪或修改它們。因此，數據加載器可用于描述整個數據處理管道。

至于模型本身，二維線性模型是我們可能遇到的最簡單的模型。它使我們能夠測試回歸模型的準確性，而不必擔心數據量不足或方程組未定。我們將在下一節中充分利用它。

3.3.5. 練習

如果樣本數量不能除以批量大小，將會發生什么。如何通過使用框架的 API 指定不同的參數來更改此行為？

如果我們想要生成一個巨大的數據集，其中參數向量的大小w和示例的數量 num_examples都很大怎么辦？

如果我們不能將所有數據保存在內存中會怎樣？

如果數據保存在磁盤上，您將如何打亂數據？您的任務是設計一種不需要太多隨機讀取或寫入的高效算法。提示：偽隨機排列生成器允許您設計重新洗牌而不需要顯式存儲排列表（Naor 和 Reingold，1999）。

實現一個數據生成器，在每次調用迭代器時動態生成新數據。

您將如何設計一個隨機數據生成器，使其在每次調用時生成相同的數據？

聲明：本文內容及配圖由入駐作者撰寫或者入駐合作網站授權轉載。文章觀點僅代表作者本人，不代表電子發燒友網立場。文章及其配圖僅供工程師學習之用，如有內容侵權或者其他違規問題，請聯系本站處理。舉報投訴

數據加載器

數據加載器

+關注

關注
0

文章
2

瀏覽量
5747
pytorch

pytorch

+關注

關注
2

文章
808

瀏覽量
13202

色哟哟视频在线观看-色哟哟视频在线-色哟哟欧美15最新在线-色哟哟免费在线观看-国产l精品国产亚洲区在线观看-国产l精品国产亚洲区久久

搜索歷史

PyTorch教程-3.3. 綜合回歸數據

評論

Pytorch模型訓練實用PDF教程【中文】

回歸算法有哪些，常用回歸算法（3種）詳解

為什么學習深度學習需要使用PyTorch和TensorFlow框架

基于PyTorch的深度學習入門教程之PyTorch重點綜合實踐

PyTorch教程之數據預處理

PyTorch教程3.1之線性回歸

PyTorch教程3.3之綜合回歸數據

PyTorch教程3.4之從頭開始執行線性回歸

PyTorch教程3.5之線性回歸的簡潔實現

PyTorch教程4.1之Softmax回歸

PyTorch教程4.4之從頭開始實現Softmax回歸

PyTorch教程-3.1. 線性回歸

PyTorch教程-4.1. Softmax 回歸

pytorch如何訓練自己的數據

PyTorch 數據加載與處理方法