Skip to main content

PyTorch

What is PyTorch ?

  • An open source machine learning framework.
  • A Python package that provides two high-level features:
    • Tensor computation (like NumPy) with strong GPU acceleration
    • Deep neural networks builts on a tape-based autograd system (calculate gradient)

PyTorch vs TensorFlow

PyTorchTensorFlow
DeveloperFacebook AIGoogle Brain
InterfacePython & C++Python, C++, JavaScript, Swift
DebugEasierDifficult (easier in 2.0)
ApplicationResearchProduction

Tensor unit

  • High-dimensional matrix(array)

Data type

Data Typedtypetensor
32-bit floating pointtorch.float32torch.FloatTensor
64-bit integer (signed)torch.int64torch.LongTensor

Shape of Tensors

Constructor

Operator

Squeeze

Unsqueeze

Transpose

Concatenate

Others

  • Addition: z=x+yz=x+y
  • Subtraction: z=xyz=x-y
  • Power: y=x.pow(2)y=x.pow(2)
  • Summation: y=x.sum()y=x.sum()
  • Mean: y=x.mean()y=x.mean()

PyTorch vs NumPy

Attributes

PyTorchNumPy
x.shapex.shape
x.dtypex.dtype

Shape manipulation

PyTorchNumPy
x.reshape / x.viewx.reshape
x.squeeze()x.squeeze()
x.unsqueeze(1)np.expand_dims(x, 1)

Device

  • Default: tensors & modules will be computed with CPU

  • CPU: x = x.to("cpu")

  • GPU: x = x.to("cuda")

  • Check if your computer has NVIDIA GPU: torch.cuda.is_available()

  • Multiple GPUs: specify cuda:0, cuda:1, cuda:2, ...

How to calculate Gradient ?

Overview of the DNN Training Procedure

Dataset & Dataloader

from torch.utils.data import Dataset, DataLoader

class MyDataset(Dataset):
def __init__(self, file):
self.data = ... # Read data & preprocess

def __getitem__(self, index):
return self.data[index] # Returns one sample at a time

def __len__(self):
return len(self.data) # Returns the size of the dataset

dataset = MyDataset(file)

# Training: shuffle=True
# Testing: shuffle=False
dataloader = DataLoader(dataset, batch_szie, shuffle=True)

torch.nn

Neural Network Layers

  • Linear Layer (Fully-connected Layer) torch.nn.leanear(in_features, out_features)

layer = torch.nn.Linear(32, 64)
layer.weight.shape # torch.Size([64, 32])
layer.bias.shape # torch.Size([64])

Activation Functions

  • Sigmoid Activation: nn.Sigomid()
  • ReLU Activation: nn.ReLU()

Loss Functions

  • Mean Squared Error (for linear regression): nn.MSELoss()
  • Cross Entropy (for classification): nn.CrossEntropyLoss()

Build your own neural network

import torch.nn as nn

class MyModel(nn.Module):
def __init__(self): # Initialize your model & define lyaers
super(MyModel, self).__init__()
self.net = nn.Sequential(
nn.Linear(10, 32), # 第一层全连接层
nn.Sigmoid(), # 激活函数
nn.Linear(32, 1) # 第二层全连接层(输出层) 5)
)

def forword(self, x): # Compute output of your NN
return self.net(x)
  • 输入维度:10
    输入数据是一个 10 维的特征向量(或批量数据形状为 [batch_size, 10])。

  • 第一层 nn.Linear(10, 32)

    • 全连接层(线性层),将输入从 10 维映射到 32 维。
    • 参数数量:权重矩阵 W 的形状为 [32, 10],偏置 b 的形状为 [32],共 32*10 + 32 = 352 个可训练参数。
  • 激活函数 nn.Sigmoid()

    • 对第一层的输出逐元素应用 Sigmoid 函数(σ(x)=11+ex\sigma(x) = \frac{1}{1+e^{-x}}),将值压缩到 (0, 1) 区间。
    • 引入非线性,使网络能够学习复杂模式。
  • 第二层 nn.Linear(32, 1)

    • 输出层,将 32 维特征映射到 1 维输出(如回归任务的标量值或二分类的 logit)。
    • 参数数量:权重 W 的形状为 [1, 32],偏置 b 的形状为 [1],共 1*32 + 1 = 33 个参数。

Optimizer

Neural Network Training

Preparation

dataset = MyDataset(file) # read data via  MyDataset
batch_size = 16
train_set = DataLoader(dataset, batch_size, shuffle=True) # Put dataset into Dataloader
device = "cuda"
model = MyModel().to(device) # Contruct model and move to device (cpu/cuda)
criterion = nn.MSELoss() # Set loss function
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), learning_rate) # Set optimizer

Training (Training Set)

n_epochs = 2
for epoch in range(n_epochs): # Iterate n_epochs
model.train() # Set model to train mode
for x, y in train_set: # iterate through the dataloader
optimizer.zero_grad() # Set gradient to zero
x, y = x.to(device), y.to(device) # Move data to device (cpu/cuda)
pred = model(x) # Forward pass (compute output)
loss = criterion(pred, y) # Compute loss
loss.backword() # Compute gradient (backpropagation)
optimizer.step() # Update model with optimizer

Evaluation (Validation Set)

model.eval() # Set model to evaluation mode
total_loss = 0
for x, y in dv_set: # Iterate through the dataloader
x, y = x.to(device), y.to(device) # Move data to device (cpu/cuda)
with torch.no_grad(): # Disable gradient calculation
pred = model(x) # Forward pass (compute output)
loss = criterion(pred, y) # Compute loss
total_loss += loss.cpu().item() * len(x) # Accumulate loss
avg_loss = total_loss / len(dv_set.dataset) # Compute averaged loss

这些 loss 将决定要不要最终把模型存下来,取决 model 是否进步。

Evaluation (Testing Set)

model.eval() # Set model to evaluation mode
preds = []
for x in tt_set: # Iterate through the dataloader
x = x.to(device) # Move data to device (cpu/cuda)
with torch.no_grad(): # Disable gradient calculation
pred = model(x) # Forward pass (compute output)
preds.append(pred.cpu()) # Collect prediction

Save/Load a Neural Network

Save
torch.save(model.state_dict(), path)
Load
checkpoint = torch.load(path)
model.load_state_dict(checkpoint)

More About PyTorch

  • torchaudio: speech/audio processsing
  • torchtext: natural language processing
  • torchvision: computer vision
  • skorch: scikit-learn + pyTorch
  • Useful github repositories using PyTorch
    • Huggingface Transformers (transformer models: BERT, GPT, ...)
    • Fairseq (sequence modeling for NLP & speech)
    • ESPnet (speech recognition, translation, synthesis, ...)
    • Many implementation of parpers
    • ...

Reference

Other Frameworks