[PyTorch 2.0] (1) PyTorch Fundamentals

開始深度學習框架PyTorch的學習。這個系列的學習筆記是完全面向新手的，因為我也是PyTorch和DL的新手。但是，如果你想要跟隨本系列的文章一同學習，那麼你至少需要：

掌握Python。這是基本的要求。筆者不會在這個系列的文章中過多地解釋Python和程式設計的基本概念。筆記面向PyTorch新手，但不是面向程式設計新手。

在你的電腦上已經安裝Conda（或Python）環境，並瞭解程式虛擬環境（Virtual Environment）的基本概念，懂得如何建立虛擬環境。

有一定的英文閱讀能力。在筆記中我偶爾會使用全英文或中英文混合來描述，因為我遵照的教學影片是英文的，這很容易對我產生影響。

不需要有相應的數學基礎，本筆記側重於學習框架應用，而不是其背後的數學原理。

此外，因為平時比較忙，且在學習中間可能會有其他的事情，因此對於框架的學習並不是連續的，並且這個過程可能會相當長。因此這不是一個速成的完整教程，更像是一個成長過程的記錄。

專案建立

官方推薦可以在Google Colab裡面創建一個Jupyter專案，但是我並不推薦這種做法。我更希望你能夠locally運行它，這樣對於之後大型專案的開發有重要意義。所以，請遵循PyTorch官方指導來在你的電腦上安裝PyTorch。記得，把它安裝到Virtual Env中可能是一個更好的主意。

我使用一台M3Pro晶片的MacBook Pro。首先，在PyCharm中建立一個Pure Python Project，你可以選擇自動生成一個Welcome main.py檔案，無關痛癢。

而後，我在啟用了venv的終端機中執行：

1	pip3 install torch torchvision torchaudio

如果你使用的是Windows或Linux電腦，搭載了NVIDIA GPU，請首先安裝Cuda Toolkit，然後follow官網的steps去安裝適用於CPU或NVIDIA GPU的PyTorch。

Using GPUs to run your project

如果你恰好有NVIDIA GPU inside your computer，或者你租用了高密度計算伺服器，你可以切換到GPU去提升你的運算速度。

驗證GPU可用性：

1	torch.cuda.is_available()

然後設定運算硬體：

device = 'cuda' if torch.cuda.is_available() else 'cpu'

tensor = torch.tensor([1, 2, 3])

tensor_on_gpu = tensor.to(device)

有時候我們需要把tensor移回CPU，比如GPU的tensor無法用於NumPy計算。Let’s move it back to cpu.

1	tensor_on_gpu.cpu()

Tensor

So what is a tensor? Tensor是PyTorch計算的基本單位，就像R Lang的基本計算單位是Vector一樣。（相信我，你很快就會再見到它的）

在數學中，tensor代表一個代數對象，描述了與向量空間相關的代數對象集之間的多重線性映射。在這裡，它的概念基本相同。

Tensor可以有很多種類型，比如Scalar，比如Vector，比如Matrix。下面這張圖片形象展示了tensor家族：

tensor

我們不妨實作一下：

'''
@name: main.py
@author: Kynix Chan
@time: 05/11/113
'''
import torch

## Scalar
scalar = torch.tensor(7)

### Print the number of dimensions of a tensor
print(scalar.ndim)
### Result: 0

### Get tensor back as python int
print(scalar.item())
### Result: 7

## Vector
vector = torch.tensor([1, 2])

print(vector.ndim)
### Result: 1

### Get the shape of a vector.
print(vector.shape)
### Result: torch.Size([2])

## Matrix
matrix = torch.tensor([[1, 2, 3], 
                       [4, 5, 6]])

print(matrix.ndim)
### Result: 2

print(matrix.shape)
### Result: torch.Size([2, 3])

### Get a dim from matrix
print(matrix[0])
### Result: tensor([1, 2, 3])

## Tensor
tensor = torch.tensor([[[1, 2, 3],
                        [4, 5, 6]], 
                       [[7, 8, 9],
                        [10, 11, 12]]])

print(tensor.ndim)
### Result: 3

print(tensor.shape)
### Result: torch.Size([2, 2, 3])

print(tensor[0][0][0])
### Result: tensor([1, 2, 3])

Random tensors

Random tensor會按照要求生成一個隨機的tensor。這在提供廢資料方面非常有用。

1 2	## Create a random tensor of size (3, 4) random_tensor = torch.rand(3, 4)

要創建Reproducible random tensors，我們需要使用random seed。

RANDOM_SEED = 42

torch.manual_seed(RANDOM_SEED)
random_tensor_A = torch.random(3, 3)

torch.manual_seed(RANDOM_SEED)
random_tensor_B = torch.random(3, 3)

## The random_tensor_A is the same as random_tensor_B

Zeros and ones

## Create a tensor full of zeros
zeros_tensor = torch.zero(size=(3, 4))

## Create a tensor full of ones
ones_tensor = torch.ones(size = (3, 4))

A range of tensors and tensors-like

range可以產生一個遞增的數列tensor：

tensor_range = torch.arange(1, 13)
print(tensor_range)
## Result: tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

tensor_range_2 = torch.arange(start = 1, end = 10, step = 2)
## Result: tensor([1, 3, 5, 7, 9])

如果我們想按照某個tensor的shape創建一個full of zeros或full of ones的tensor，我們可以：

random_tensor = torch.rand(3, 4)
zeros_like = torch.zeros_like(input=random_tensor)
print(zeros_like)
## Result: tensor([[0., 0., 0., 0.],
##                 [0., 0., 0., 0.],
##                 [0., 0., 0., 0.]])

Data Types of tensor

Tensor中的資料是有特定的資料類型的。透過dtype參數控制。

具體的data types可以到這裡來查找。

關於data types，torch.tensor()有三個引數和它有關：dtype、device和requires_grad。

dtype：tensor中引數的類型。
device：你的tensor運行在哪個裝置上。可以以cpu的方式運行，也可以運行在cuda硬體上。
requires_grad：是否透過此tensor梯度追蹤參數。

轉換data types：

1	float_16_tensor = float_32_tensor.type(torch.float16)

Tensor operations

Tensor operations包括加減乘除和矩陣乘法。

## Attention: If we want to do some simple operations for two tensors, the shapes of them must be the same, or one of them should be a scalar.

tensor_1 = torch.tensor([1, 2, 3])
tensor_2 = torch.tensor([4, 5, 6])

## Addition
print(tensor_1 + tensor_2)
print(torch.add(tensor_1, tensor_2))
## Result: tensor([5, 7, 9])

## Multiplication
print(tensor_1 * tensor_2)
print(torch.mul(tensor_1, tensor_2))
## Result: tensor([ 4, 10, 18])

## Substract
print(tensor_1 - tensor_2)
print(torch.sub(tensor_1, tensor_2))
## Result: tensor([-3, -3, -3])

## Division
print(tensor_1 / tensor_2)
print(torch.div(tensor_1, tensor_2))
## Result: tensor([0.2500, 0.4000, 0.5000])

Matrix multiplication

筆者作為醫學生，沒有修習過線性代數，所以也是第一次接觸到矩陣乘法。那麼我們首先來一點數學基礎。

黨一個matrix乘一個scalar，當然很簡單，只需要把矩陣中的每一個元素乘這個scalar就可以得到結果：

matrix multiplation

比較困難的是兩個矩陣的乘法。兩個矩陣的乘法要求：第一個矩陣的欄數必須等於第二個矩陣的列數。

對中國大陸朋友的解釋：台灣和大陸的行列剛好是相反的。在台灣，我們習慣將column稱為“欄”或“行”，把row稱為“列”。

matrix multiplication

兩個矩陣的乘法:

matrix multiplication

Dot product的運算規則如下：

然後繼續產生第二個數字：

matrix multiplication

第三個、第四個。最後的結果是：

matrix multiplication

我們可以看到，一個的矩陣乘一個的矩陣，得到的結果是一個的新矩陣。So the inner dimensions should be matched, while the result will be the shape of outer dimensions.

接下來我們使用tensor實作它：

tensor_1 = torch.tensor([[1, 2, 3],
                         [4, 5, 6]])

tensor_2 = torch.tensor([[7, 8, 9],
                         [10, 11, 12],
                         [13, 14, 15]])

print(torch.matmul(tensor_1, tensor_2)) # or
print(torch.mm(tensor_1, tensor_2)) # or
print(tensor_1 @ tensor_2)
## Result: tensor([[ 66,  72,  78],
##                 [156, 171, 186]])

我們還可以用transpose對矩陣進行欄列轉換：

tensor_A = torch.tensor([[1, 2, 3],
                         [4, 5, 6]])
print(tensor_A.T)
## Result: tensor([[1, 4],
##                 [2, 5],
##                 [3, 6]])

Tensor aggregation

Tensor aggregation幫助我們尋找tensor中的min、max，計算mean，sum等等。

tensor = torch.arange(1, 10, dtype=torch.long)

## Min and max
print(tensor.min()) # or
print(torch.min(tensor))
## Result: tensor(1)
print(tensor.max()) # or
print(torch.max(tensor))
## Result: tensor(10)

## Mean: requires the input must be float or complex
print(tensor.type(torch.float).mean()) # or
print(torch.mean(tensor, dtype=torch.float))
## Result: tensor(5.)

## Sum
print(tensor.sum()) # or
print(torch.sum(tensor))
## Result: tensor(45)

## Positional min and max: return the minimized or maxmized index.
## Min and max
print(tensor.argmin()) # or
print(torch.argmin(tensor))
## Result: tensor(1)
print(tensor.argmax()) # or
print(torch.argmax(tensor))
## Result: tensor(10)

Reshaping

We can reshape a tensor to other shapes if numbers of the new tensor’s elements equals to the old one.

torch.reshape() and torch.view() method can both change the shape of a tensor.

tensor = torch.arange(1, 36, dtype=torch.long, step=2)

print(tensor)
## Result: tensor([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35])

## Reshape, 2 * 3 * 3 = 18
tensor_reshaped = tensor.reshape(2, 3, 3)
print(tensor_reshaped)
'''
Result:
tensor([[[ 1,  3,  5],
         [ 7,  9, 11],
         [13, 15, 17]],

        [[19, 21, 23],
         [25, 27, 29],
         [31, 33, 35]]])
'''

## View, 2 * 3 * 3 = 18
tensor_viewed = tensor.view(2, 3, 3)
print(tensor_viewed)
'''
Result:
tensor([[[ 1,  3,  5],
         [ 7,  9, 11],
         [13, 15, 17]],

        [[19, 21, 23],
         [25, 27, 29],
         [31, 33, 35]]])
'''

OK, now let’s change the old tensor and see what will happen:

tensor[0] = 5

print(tensor)
print(tensor_reshaped)
print(tensor_viewed)
## Result
'''
tensor([ 5,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35])
tensor([[[ 5,  3,  5],
         [ 7,  9, 11],
         [13, 15, 17]],

        [[19, 21, 23],
         [25, 27, 29],
         [31, 33, 35]]])
tensor([[[ 5,  3,  5],
         [ 7,  9, 11],
         [13, 15, 17]],

        [[19, 21, 23],
         [25, 27, 29],
         [31, 33, 35]]])
'''

Oh, my god! all of the three tensors have been changed! So, why? Let’s get a view on the storage mechanism of tensor.

當我們創建了一個tensor，事實上會佔用我們記憶體中的兩塊空間——是的，一個變數兩塊空間。第一塊空間稱為頭資料區，儲存這個tensor的諸如size、stride等基本資料，第二塊空間則是儲存tensor的真實資料。

而這時候，如果我們將tensor A進行reshape或view，再使用=交給tensor B，事實上A和B僅有頭資料區不同，而真實資料區是共享的。

好，我們接下來理解torch.view()和torch.reshape()到底有什麼區別。要理解之，必須先理解兩個properties：stride（步長）和storage_offset（偏移量）。

首先我們來解釋storage_offset，它表示了“從原tensor誕生的新tensor首個元素在記憶體中的位置相對於原tensor首個元素在記憶體中的位置之偏移量”。我們直接來看一個例子：

tensor_rand = torch.rand(3, 3)
print(tensor_rand)
tensor_k = tensor_rand[1:, 1:]
print(tensor_k)
print(tensor_k.storage_offset())
# Result
'''
tensor([[0.0598, 0.5079, 0.9185],
        [0.9518, 0.5277, 0.7667],
        [0.7457, 0.5774, 0.9873]])
tensor([[0.5277, 0.7667],
        [0.5774, 0.9873]])
4
'''

比較好理解。

接下來我們再來看stride，它表示的是“在指定dimension下，從一個element跳到下一個element所需要的步長”。我們繼續來看例子：

tensor = torch.rand(2, 3, 4)
print(tensor)
print(tensor.stride())
## Result
'''
tensor([[[0.5037, 0.0922, 0.4752, 0.8638],
         [0.8847, 0.9163, 0.7370, 0.8694],
         [0.8714, 0.0986, 0.0579, 0.7894]],

        [[0.8771, 0.1222, 0.8956, 0.6742],
         [0.5307, 0.2777, 0.7311, 0.7575],
         [0.9842, 0.0023, 0.8706, 0.6529]]])
(12, 4, 1)
'''

我們來解釋一下這個(12, 4, 1)是如何得出的：

12是指從第一層跳到第二層，需要跨越12個元素。比如在上面的例子中，第一層是：

1
2
3

[[0.5037, 0.0922, 0.4752, 0.8638],
 [0.8847, 0.9163, 0.7370, 0.8694],
 [0.8714, 0.0986, 0.0579, 0.7894]]

第二層是：

1
2
3

[[0.8771, 0.1222, 0.8956, 0.6742],
 [0.5307, 0.2777, 0.7311, 0.7575],
 [0.9842, 0.0023, 0.8706, 0.6529]]

其中有12個元素。

4是指同一層中，從第一列跳到第二列需要跨越4個元素。比如上面的例子第一層的第一列是：

1	[0.5037, 0.0922, 0.4752, 0.8638]

第二列是：

1	[0.8847, 0.9163, 0.7370, 0.8694]

可知需要跨越4個元素。

1表示在同一層的同一列中，從第一個元素跳到第二個元素需要跨越1個元素。比如上面的例子中，第一層，第一列的第一個元素是：

0.5037

第一列的第二個元素是：

0.8847

需要跨越1個元素。終此，釋畢。

接下來是另一個概念——連續性。

何為連續性？通俗來講，就是tensor在記憶體的真實資料區中，某個元素的下一個元素應該是tensor本身中該元素的下一個元素。啊好吧，也不怎麼通俗。我們舉例說明。

假如我們創建了一個tensor：

tensor = torch.rand(2, 3)
'''
tensor([[0.1753, 0.6297, 0.7549],
        [0.3359, 0.2473, 0.4115]])
'''

這時候，這個tensor在記憶體中的儲存空間是這樣的：

Index Value

[0][0] 0.1753

[0][1] 0.6297

[0][2] 0.7549

[0][3] 0.3359

[0][4] 0.2473

[0][5] 0.4115

Index	Value
[0][0]	0.1753
[0][1]	0.6297
[0][2]	0.7549
[0][3]	0.3359
[0][4]	0.2473
[0][5]	0.4115

我們可以看到，記憶體中的Location index和Value是一一對應的。元素0.1753在tensor中的下一個應該是0.6297，元素0.6297在記憶體中的位置也剛好是緊鄰0.1753的位置。這就叫做連續。

那什麼情況下會產生不連續呢？比如我們透過transpose改變了tensor，這個時候改變後的tensor仍然和原tensor共享真實資料區，也就是說，真實資料區中元素的位置沒有發生任何改變，還是上表所示的樣子。但是新tensor和舊tensor卻不相同。

tensor_t = tensor.t()
'''
tensor([[0.1753, 0.3359],
        [0.6297, 0.2473],
        [0.7549, 0.4115]])
'''

在記憶體中，0.1753的下一個位置的元素是0.6297，而在tensor中卻變成了0.3359，這樣子，tensor中的元素和記憶體儲存位置無法對應，這叫做不連續。

那麼torch.view()和torch.reshape()到底有什麼區別？理解了上面的內容，就很容易解釋了：torch.view()只適用於連續性tensor，而torch.reshape()同時適用於連續性和非連續性tensor。

Stack

torch.stack()函式用於將現有的tensor堆疊起來，形成一個新的tensor。它的基本用法如下：

1	torch.stack(tensors, dim=0)

tensors引數接受一個tensor tuple的輸入，表示用於堆疊的tensors，dim表示堆疊在第幾個維度發生。我們舉例說明：

tensor1 = torch.tensor([[1, 2, 3, 4],
                        [5, 6, 7, 8],
                        [9, 10, 11, 12]])
tensor2 = torch.tensor([[13, 14, 15, 16],
                        [17, 18, 19, 20],
                        [21, 22, 23, 24]])

stacked_tensors = torch.stack((tensor1, tensor2), dim=0)
print(stacked_tensors)
## Result
'''
tensor([[[ 1,  2,  3,  4],
         [ 5,  6,  7,  8],
         [ 9, 10, 11, 12]],

        [[13, 14, 15, 16],
         [17, 18, 19, 20],
         [21, 22, 23, 24]]])
'''

stacked_tensors = torch.stack((tensor1, tensor2), dim=1)
print(stacked_tensors)
## Result
'''
tensor([[[ 1,  2,  3,  4],
         [13, 14, 15, 16]],

        [[ 5,  6,  7,  8],
         [17, 18, 19, 20]],

        [[ 9, 10, 11, 12],
         [21, 22, 23, 24]]])
'''

要實現堆疊，必須滿足如下條件：

原tensor的shape應該完全相同。
dim不超過原tensor的dimension。

Squeeze and unsqueeze

torch.squeeze() method is used to remove all extra single dimensions from a tensor.

tensor2 = torch.tensor([[[1, 2, 3, 4, 5]]])
print(tensor2)
print(tensor2.shape)
print(torch.squeeze(tensor2))
print(tensor2.squeeze().shape)
## Result
'''
tensor([[[1, 2, 3, 4, 5]]])
torch.Size([1, 1, 5])
tensor([1, 2, 3, 4, 5])
torch.Size([5])
'''

torch.unsqueeze() will add a single dimension for a existing tensor.

tensor3 = torch.tensor([1, 2, 3, 4, 5])
print(tensor3.unsqueeze(dim=0))
## Result: tensor([[1, 2, 3, 4, 5]])
print(tensor3.unsqueeze(dim=1))
## Result:
'''
tensor([[1],
        [2],
        [3],
        [4],
        [5]])
'''

We can see that value of the dim para can’t be bigger than the old dimension.

Permute

torch.permute()函式可以根據specific order改變現有tensor的axis或dimension。

tensor = torch.rand(size = (224, 224, 3))

## Alter the axises: 2 -> 0, 0 -> 1, 1 -> 2
print(tensor.permute(2, 0, 1).shape)
## Result: torch.Size(3, 224, 224)

Indexing

基本情況是，我們可以使用tensor[i][j][k]或者tensor[i, j, k]這樣的形式來拿到一個tensor裡面的任何東西。

需要注意的一點是，我們可以透過:來表示specific dimension中的所有元素。比如：

tensor = torch.tensor([[1, 2, 3, 4],
                        [5, 6, 7, 8],
                        [9, 10, 11, 12]])
print(tensor[:, 1])
## Result: tensor([ 2,  6, 10])

With NumPy

PyTorch提供了和NumPy交流的通道。我們可以透過torch.from_numpy()來從NumPy的ndarray創建tensor。

import torch
import numpy as np

array = np.arange(1, 10)
tensor = torch.from_numpy(array)

需要注意的是，ndarray中的資料預設為float64，因此轉換到tensor，預設的類型是torch.float64而不是tensor預設的torch.float32。

當然，你也可以使用dtype引數來設定它。

使用tensor.numpy()將tensor轉換成ndarray：

1 2	tensor = torch.ones(7) numpy_tensor = tensor.numpy()