【Pytorch 深度學習筆記】用 Tensors 表示現實世界的資訊

哈囉大家好我是 LukeTseng，感謝您點進本篇筆記，該篇筆記主要配合讀本《Deep Learning with pytorch》進行學習，另外透過網路資料作為輔助。本系列筆記是我本人奠基深度學習基礎知識的開始，若文章有誤煩請各位指正，謝謝！

本篇為《Deep Learning with pytorch》這本書第四章 Real-world data representation using tensors 的相關筆記。

處理影像

一張彩色照片是如何用數字表示的？

用張量表示。假設有張 800×600 像素的照片：

每個像素有 3 個顏色通道（Channel）：RGB。
整張照片可以表示為形狀為 (3, 800, 600) 的張量。
- 第一個維度：3 個顏色 Channel。
- 第二、三個維度：圖片的寬和高。

由於書中範例是使用 imageio Module，但我比較習慣用 PIL，所以以下是 PIL 的範例：

有一張向日葵的圖片我拿來當範例（文章的圖片是壓縮過的，下載原檔執行程式才會是正確結果）：

Sunflower_from_Silesia2~1

Image Source：https://commons.wikimedia.org/wiki/File:Sunflower_from_Silesia2.jpg

程式碼（在 Jupyter Notebook 上執行）：

import torch
from PIL import Image
from torchvision import transforms

img = Image.open('Sunflower_from_Silesia2.jpg')
img_tensor = transforms.PILToTensor()(img)
print(img_tensor.shape)

Output：

1	torch.Size([3, 1697, 2434])

在以往的 numpy 跟 PIL 中，輸出的格式通常都是 HWC，H 就是 Height，W 就是 Width，C 就是 Channel。但到了 tensor 中，就變成了 CHW 的順序，因為在卷積神經網路運算中，CHW 格式比較有效率啦。

transforms.PILToTensor() 會自動重新排列維度，使用 permute(2, 0, 1) 將 HWC 轉換成 CHW。

也可直接輸出他的 tensor：

import torch
from PIL import Image
from torchvision import transforms

img = Image.open('Sunflower_from_Silesia2.jpg')
img_tensor = transforms.PILToTensor()(img)
print(img_tensor)

Output：

tensor([[[ 52,  56,  53,  ...,  40,  36,  37],
         [ 50,  53,  56,  ...,  41,  37,  38],
         [ 51,  51,  52,  ...,  40,  38,  38],
         ...,
         [ 49,  50,  48,  ...,  35,  35,  35],
         [ 48,  47,  50,  ...,  35,  37,  36],
         [ 51,  48,  49,  ...,  37,  40,  39]],

        [[104, 108, 105,  ...,  87,  91,  89],
         [105, 105, 108,  ...,  89,  89,  91],
         [106, 103, 103,  ...,  88,  89,  89],
         ...,
         [ 94,  94,  94,  ...,  81,  80,  80],
         [ 94,  93,  96,  ...,  81,  81,  80],
         [ 97,  94,  95,  ...,  81,  81,  80]],

        [[188, 194, 191,  ..., 177, 173, 172],
         [188, 191, 194,  ..., 174, 172, 171],
         [189, 189, 192,  ..., 173, 170, 170],
         ...,
         [179, 181, 180,  ..., 169, 171, 171],
         [182, 181, 182,  ..., 167, 170, 169],
         [183, 180, 181,  ..., 168, 171, 170]]], dtype=torch.uint8)

tensor 中的值代表每個像素點的顏色強度，數值範圍是 0 到 255（dtype=torch.uint8）。

看到佔有兩個中括號的 [[ 有三個，就分別代表三個 Channel：RGB。

torch.uint8 為 8 bit 的 unsigned int，剛好對應 0 ~ 255。

正規化（normalization）

之後若要訓練深度學習模型，通常會將剛才例子中的這些值除以 255，轉換成 0.0 到 1.0 的浮點數範圍，這個過程叫做正規化（normalization）。

正規化有兩種寫法，首先第一種就是直接除以 255，但記得要先把原本 tensor 裡面的 data type 轉成 float。

import torch
from PIL import Image
from torchvision import transforms

img = Image.open('Sunflower_from_Silesia2.jpg')
img_tensor = transforms.PILToTensor()(img)

img_normalized = img_tensor.float() / 255.0
print(img_normalized)
print(img_normalized.min(), img_normalized.max())

Output：

tensor([[[0.2039, 0.2196, 0.2078,  ..., 0.1569, 0.1412, 0.1451],
         [0.1961, 0.2078, 0.2196,  ..., 0.1608, 0.1451, 0.1490],
         [0.2000, 0.2000, 0.2039,  ..., 0.1569, 0.1490, 0.1490],
         ...,
         [0.1922, 0.1961, 0.1882,  ..., 0.1373, 0.1373, 0.1373],
         [0.1882, 0.1843, 0.1961,  ..., 0.1373, 0.1451, 0.1412],
         [0.2000, 0.1882, 0.1922,  ..., 0.1451, 0.1569, 0.1529]],

        [[0.4078, 0.4235, 0.4118,  ..., 0.3412, 0.3569, 0.3490],
         [0.4118, 0.4118, 0.4235,  ..., 0.3490, 0.3490, 0.3569],
         [0.4157, 0.4039, 0.4039,  ..., 0.3451, 0.3490, 0.3490],
         ...,
         [0.3686, 0.3686, 0.3686,  ..., 0.3176, 0.3137, 0.3137],
         [0.3686, 0.3647, 0.3765,  ..., 0.3176, 0.3176, 0.3137],
         [0.3804, 0.3686, 0.3725,  ..., 0.3176, 0.3176, 0.3137]],

        [[0.7373, 0.7608, 0.7490,  ..., 0.6941, 0.6784, 0.6745],
         [0.7373, 0.7490, 0.7608,  ..., 0.6824, 0.6745, 0.6706],
         [0.7412, 0.7412, 0.7529,  ..., 0.6784, 0.6667, 0.6667],
         ...,
         [0.7020, 0.7098, 0.7059,  ..., 0.6627, 0.6706, 0.6706],
         [0.7137, 0.7098, 0.7137,  ..., 0.6549, 0.6667, 0.6627],
         [0.7176, 0.7059, 0.7098,  ..., 0.6588, 0.6706, 0.6667]]])
tensor(0.) tensor(1.)

第二種方法則是使用 ToTensor()，取代掉原本的函式 PILToTensor()，那麼他就會自動正規化了：

import torch
from PIL import Image
from torchvision import transforms

img = Image.open('Sunflower_from_Silesia2.jpg')

img_tensor = transforms.ToTensor()(img)
print(img_tensor)
print(img_tensor.dtype)

Output：

tensor([[[0.2039, 0.2196, 0.2078,  ..., 0.1569, 0.1412, 0.1451],
         [0.1961, 0.2078, 0.2196,  ..., 0.1608, 0.1451, 0.1490],
         [0.2000, 0.2000, 0.2039,  ..., 0.1569, 0.1490, 0.1490],
         ...,
         [0.1922, 0.1961, 0.1882,  ..., 0.1373, 0.1373, 0.1373],
         [0.1882, 0.1843, 0.1961,  ..., 0.1373, 0.1451, 0.1412],
         [0.2000, 0.1882, 0.1922,  ..., 0.1451, 0.1569, 0.1529]],

        [[0.4078, 0.4235, 0.4118,  ..., 0.3412, 0.3569, 0.3490],
         [0.4118, 0.4118, 0.4235,  ..., 0.3490, 0.3490, 0.3569],
         [0.4157, 0.4039, 0.4039,  ..., 0.3451, 0.3490, 0.3490],
         ...,
         [0.3686, 0.3686, 0.3686,  ..., 0.3176, 0.3137, 0.3137],
         [0.3686, 0.3647, 0.3765,  ..., 0.3176, 0.3176, 0.3137],
         [0.3804, 0.3686, 0.3725,  ..., 0.3176, 0.3176, 0.3137]],

        [[0.7373, 0.7608, 0.7490,  ..., 0.6941, 0.6784, 0.6745],
         [0.7373, 0.7490, 0.7608,  ..., 0.6824, 0.6745, 0.6706],
         [0.7412, 0.7412, 0.7529,  ..., 0.6784, 0.6667, 0.6667],
         ...,
         [0.7020, 0.7098, 0.7059,  ..., 0.6627, 0.6706, 0.6706],
         [0.7137, 0.7098, 0.7137,  ..., 0.6549, 0.6667, 0.6627],
         [0.7176, 0.7059, 0.7098,  ..., 0.6588, 0.6706, 0.6667]]])
torch.float32

其實還有一種寫法，是 DL 中常用的技巧，就是做比較進階的正規化，正規化到 [-1, 1]：

import torch
from PIL import Image
from torchvision import transforms

img = Image.open('Sunflower_from_Silesia2.jpg')

transform = transforms.Compose([
    transforms.ToTensor(),  # 轉成 [0, 1]
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])  # 轉成 [-1, 1]
])

img_tensor = transform(img)
print(img_tensor)

Output：

tensor([[[-0.5922, -0.5608, -0.5843,  ..., -0.6863, -0.7176, -0.7098],
         [-0.6078, -0.5843, -0.5608,  ..., -0.6784, -0.7098, -0.7020],
         [-0.6000, -0.6000, -0.5922,  ..., -0.6863, -0.7020, -0.7020],
         ...,
         [-0.6157, -0.6078, -0.6235,  ..., -0.7255, -0.7255, -0.7255],
         [-0.6235, -0.6314, -0.6078,  ..., -0.7255, -0.7098, -0.7176],
         [-0.6000, -0.6235, -0.6157,  ..., -0.7098, -0.6863, -0.6941]],

        [[-0.1843, -0.1529, -0.1765,  ..., -0.3176, -0.2863, -0.3020],
         [-0.1765, -0.1765, -0.1529,  ..., -0.3020, -0.3020, -0.2863],
         [-0.1686, -0.1922, -0.1922,  ..., -0.3098, -0.3020, -0.3020],
         ...,
         [-0.2627, -0.2627, -0.2627,  ..., -0.3647, -0.3725, -0.3725],
         [-0.2627, -0.2706, -0.2471,  ..., -0.3647, -0.3647, -0.3725],
         [-0.2392, -0.2627, -0.2549,  ..., -0.3647, -0.3647, -0.3725]],

        [[ 0.4745,  0.5216,  0.4980,  ...,  0.3882,  0.3569,  0.3490],
         [ 0.4745,  0.4980,  0.5216,  ...,  0.3647,  0.3490,  0.3412],
         [ 0.4824,  0.4824,  0.5059,  ...,  0.3569,  0.3333,  0.3333],
         ...,
         [ 0.4039,  0.4196,  0.4118,  ...,  0.3255,  0.3412,  0.3412],
         [ 0.4275,  0.4196,  0.4275,  ...,  0.3098,  0.3333,  0.3255],
         [ 0.4353,  0.4118,  0.4196,  ...,  0.3176,  0.3412,  0.3333]]])

3D 影像體積資料（3D images: Volumetric data）

醫療掃描（如 CT 掃描、MRI）生成的資料即屬於 3D 影像。

例如一張 CT 掃描可能有 100 到 500 層，每層皆為一張 512x512 的影像。那 tensor shape 有可能就是 (100, 512, 512)。

以下是作者範例（From dlwpt-code/p1ch4/2_volumetric_ct.ipynb at master · deep-learning-with-pytorch/dlwpt-code）：

import numpy as np
import torch
import imageio

torch.set_printoptions(edgeitems=2, threshold=50)
dir_path = "./data/p1ch4/volumetric-dicom/2-LUNG 3.0  B70f-04083"
vol_arr = imageio.volread(dir_path, 'DICOM')
print(vol_arr.shape)

vol = torch.from_numpy(vol_arr).float()
vol = torch.unsqueeze(vol, 0)

print(vol.shape)

%matplotlib inline
import matplotlib.pyplot as plt

plt.imshow(vol_arr[50]) # 顯示第 50 個 slice
plt.show()

Output：

imageio 可讀取 DICOM 檔案，而 PIL 不行，PIL 需要搭配 pydicom 才能讀取。

而 DICOM 為醫療數位影像傳輸協定，全名是 Digital Imaging and Communications in Medicine。

1	torch.set_printoptions(edgeitems=2, threshold=50)

設定當印出 tensor 時，只顯示開頭和結尾各 2 個元素，超過 50 個元素就省略中間部分。

1
2
3

dir_path = "./data/p1ch4/volumetric-dicom/2-LUNG 3.0  B70f-04083"
vol_arr = imageio.volread(dir_path, 'DICOM')
print(vol_arr.shape)

為讀取 DICOM 檔案的程式。

imageio.volread(dir_path, 'DICOM') 會讀取整個資料夾中的 DICOM 檔案。

vol_arr.shape 顯示 (99, 512, 512)，表示有 99 張 CT slice，512x512 像素。

這部分跟前面 Image 的 tensors 表示是一樣的。

vol = torch.unsqueeze(vol, 0) 在第 0 個位置增加一個維度。原本形狀 (99, 512, 512) 變成 (1, 99, 512, 512)。

這多出的一個維度是 batch（批次）維度。

表格資料

最常見的就是像 .csv、.xlsx 這種二維的表格資料了，那該怎麼用 tensor 表示呢？如下（範例使用 pandas 分析讀取資料）：

import pandas as pd
import torch

data = pd.read_csv('data.csv')

# 分離特徵和標籤
input_data = data.iloc[:, :-1]  # 前幾欄為特徵
output_data = data.iloc[:, -1]   # 最後一欄為標籤

# 轉換為Tensor
input_tensor = torch.Tensor(input_data.to_numpy())
output_tensor = torch.tensor(output_data.to_numpy())

print('輸入格式:', input_tensor.shape, input_tensor.dtype)
print('輸出格式:', output_tensor.shape, output_tensor.dtype)

print(f'input_tensor = {input_tensor}')
print(f'output_tensor = {output_tensor}')

使用的 data.csv 檔案內容如下：

面積,房間數,屋齡,距離車站,房價
85.5,3,10,500,15000000
120.0,4,5,300,25000000
65.2,2,15,800,12000000
95.8,3,8,450,18000000
110.5,4,3,200,28000000

Output：

輸入格式: torch.Size([5, 4]) torch.float32
輸出格式: torch.Size([5]) torch.int64
input_tensor = tensor([[ 85.5000,   3.0000,  10.0000, 500.0000],
        [120.0000,   4.0000,   5.0000, 300.0000],
        [ 65.2000,   2.0000,  15.0000, 800.0000],
        [ 95.8000,   3.0000,   8.0000, 450.0000],
        [110.5000,   4.0000,   3.0000, 200.0000]])
output_tensor = tensor([15000000, 25000000, 12000000, 18000000, 28000000])

將資料分離成這樣的目的是為了可以預測房價輸出，而前面四項因素是影響房價的特徵，故而當作輸入特徵。

可以將這些資料建立 TensorDataSet，以利後續訓練模型（將資料打包成 TensorDataSet 的形式）：

在此之前要先引入 from torch.utils.data import TensorDataset, DataLoader。

from torch.utils.data import TensorDataset, DataLoader

# 建立TensorDataset
dataset = TensorDataset(input_tensor, output_tensor)

# 建立DataLoader進行批次讀取
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

for batch_idx, (features, labels) in enumerate(dataloader):
    print(f'\n批次 {batch_idx + 1}:')
    print(f'特徵形狀: {features.shape}')
    print(f'標籤形狀: {labels.shape}')
    print(f'特徵範例:\n{features[:2]}')  # 顯示前2筆
    print(f'標籤範例: {labels[:2]}')

Output：


批次 1:
特徵形狀: torch.Size([5, 4])
標籤形狀: torch.Size([5])
特徵範例:
tensor([[ 95.8000,   3.0000,   8.0000, 450.0000],
        [120.0000,   4.0000,   5.0000, 300.0000]])
標籤範例: tensor([18000000, 25000000])

批次只有 1 是因為 .csv 的資料筆數太少了，而 batch_size 又設定成 32，所以他會把那裡面所有筆的資料塞在同一個批次裡面，因而得到輸出只有批次 1 的內容。

處理類別資料（非數字資料）：One-hot encoding

假設有一列記錄「天氣情況」：

1=晴天
2=陰天
3=下雨
4=下雪

若直接用 1 2 3 4 會有個問題，就是模型可能會認為 4 > 3 > 2 > 1，但天氣之間沒有順序關係。

所以解決方法就是用 One-Hot 編碼來處理這些資料。

假設天氣值：[1, 2, 3, 4, 1]

One-hot 編碼過後：

1 → [1, 0, 0, 0]
2 → [0, 1, 0, 0]
3 → [0, 0, 1, 0]
4 → [0, 0, 0, 1]
1 → [1, 0, 0, 0]

在 PyTorch 中可以使用 torch.nn.functional.one_hot() 函數來實現 One-hot 編碼。

要使用這個之前要引入：import torch.nn.functional as F

範例：

import torch
import torch.nn.functional as F

# 原始天氣資料：1=晴天，2=陰天，3=下雨，4=下雪
weather_data = torch.tensor([1, 2, 3, 4, 1])

# 因為 based-index 從 0 開始所以要 -1
# 或使用 0~3 表示也行
weather_indices = weather_data - 1  # 轉換為 [0, 1, 2, 3, 0]

# 進行 One-Hot 編碼，num_classes=4 表示有 4 種類別
weather_onehot = F.one_hot(weather_indices, num_classes=4)

print("原始天氣資料:")
print(weather_data)
print("\n轉換後的索引:")
print(weather_indices)
print("\nOne-Hot 編碼結果:")
print(weather_onehot)
print("\n形狀:", weather_onehot.shape)  # torch.Size([5, 4])

Output：

原始天氣資料:
tensor([1, 2, 3, 4, 1])

轉換後的索引:
tensor([0, 1, 2, 3, 0])

One-Hot 編碼結果:
tensor([[1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0],
        [0, 0, 0, 1],
        [1, 0, 0, 0]])

形狀: torch.Size([5, 4])

為什麼需要在 1, 2, 3, 4 後面又加一個 ,1 ？主要是避免模型他學到每種天氣的樣子，而無法學到某種天氣比較常見的統計特性。

而以下是 One-hot 編碼的結果：

One-Hot 編碼結果:
tensor([[1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0],
        [0, 0, 0, 1],
        [1, 0, 0, 0]])

每一個陣列裡面的四個值代表晴天、陰天、下雨、下雪，如果當前索引值是晴天的話，那第一個值會是 1，其他都是 0，以此類推。

One-hot encoding

One-hot encoding（獨熱編碼）是 ML 中將類別資料轉換成數值形式的常用方法。

原理大致上是為每個類別建立一個二元向量，在該向量中只有一個位置的值為 1，其他位置都是 0。如同剛才範例所見的 output。

優點：

避免類別間被誤解為有數值大小順序。
讓機器學習模型能公平學習每個類別。
保留各類別的獨立性與非序關係。

缺點：

當類別數量很多時，會導致特徵維度劇增（維度災難）。
產生稀疏矩陣，計算上可能較耗費資源。

處理時間序列資料

time series 時間序列的意思是資料的順序是重要的，因為在時間上有因果關係。相反處理表格資料時，資料順序反而不是那麼重要，每一行都是獨立的。

實際的例子像是預測股票價格、天氣預測等等，前者的解釋可能是今天價格會受到昨天的影響，後者的解釋為今天下雨，而明天可能會繼續下雨之類的。

~~由於我找不太到有什麼比較好的 dataset 來取代這本書的範例，所以就用它的吧XD。~~

程式範例至：https://github.com/deep-learning-with-pytorch/dlwpt-code/blob/master/p1ch4/4_time_series_bikes.ipynb

這邊使用的資料集是華盛頓特區共享單車系統（Capital Bikeshare）2011-2012 年的每小時租車數量，加上天氣和季節資訊。

import numpy as np
import torch

torch.set_printoptions(edgeitems=2, threshold=50, linewidth=75)

bikes_numpy = np.loadtxt(
    "./data/p1ch4/bike-sharing-dataset/hour-fixed.csv", 
    dtype=np.float32, 
    delimiter=",", 
    skiprows=1, 
    converters={1: lambda x: float(x[8:10])})
bikes = torch.from_numpy(bikes_numpy)
bikes

以下的程式碼，skiprows 為跳過標題行，converters={1: lambda x: float(x[8:10])}) 為將日期字串轉換成日期數字。

bikes_numpy = np.loadtxt(
    "./data/p1ch4/bike-sharing-dataset/hour-fixed.csv", 
    dtype=np.float32, 
    delimiter=",", 
    skiprows=1, 
    converters={1: lambda x: float(x[8:10])})

為啥要把日期字串轉成日期數字，而且這個轉換結果會是什麼？請看以下範例：

原始日期字串	`x[8:10]`	`float(x[8:10])`
“2011-01-01”	“01”	1.0
“2011-01-15”	“15”	15.0
“2011-01-31”	“31”	31.0
“2012-12-05”	“05”	5.0

轉換後的「日期數字」就是一個月中的第幾天（1-31）。

這樣子設計的原因是為了簡化資料，一個 NN 不需要去知道完整的日期，只要知道這在一個月的哪幾天即可。

另外就是可以減少維度，用一個數字（1-31）比用完整日期字串來得簡單。

最後就是可以保留週期性的資訊，月份中的日期有週期性（每月重複），這在學習模式上有蠻大的幫助的。

輸出結果會是長下面這樣：

tensor([[1.0000e+00, 1.0000e+00,  ..., 1.3000e+01, 1.6000e+01],
        [2.0000e+00, 1.0000e+00,  ..., 3.2000e+01, 4.0000e+01],
        ...,
        [1.7378e+04, 3.1000e+01,  ..., 4.8000e+01, 6.1000e+01],
        [1.7379e+04, 3.1000e+01,  ..., 3.7000e+01, 4.9000e+01]])

用 shape 看的話會是這樣：

1	torch.Size([17520, 17])

其中表示有：

17,520 行：代表 17,520 個小時（730 天 × 24 小時）
17 列：代表 17 個特性（變數）

那程式碼當中又接著一個 bikes.stride()，stride() 是步長的意思。而這邊輸出是 (17, 1)，就是表示：

第一個維度（行）前進 1 步 → 在 storage 中前進 17 個位置（每行有 17 個數字）
第二個維度（列）前進 1 步 → 在 storage 中前進 1 個位置。

什麼是 stride？在多維 tensor 中，要從一個元素跳到「同一維度的下一個元素」時，在底層的一維記憶體中需要跳過多少個數字。

新增時間維度

接下來再新增時間維度。資料是一個長序列（17,520 小時），可把它按天去分組，這樣 NN 可以學習一天之內的模式（如早上 8 點通勤、晚上 6 點下班）。

要做的事情就是把形狀從 (17520, 17) 變成 (730, 24, 17)，也就是 730 天、24 小時、17 個變數特性。

在程式上用 .view() 去重塑 tensor。

1 2	daily_bikes = bikes.view(-1, 24, bikes.shape[1]) daily_bikes.shape, daily_bikes.stride()

最後就得到了 (torch.Size([730, 24, 17]), (408, 17, 1))。

.view() 的 -1 參數表示自動計算這個維度應該是多少。PyTorch 會算出： $17,520 ÷ 24 = 730$ 這個數字。

而第二個參數 24 表示第二個維度固定 24，第三個參數為維度固定 17。

調整維度順序

NN 通常所要的格式是 (N, C, L)：

N：樣本數量（Number of samples）= 730 天。
C：通道數（Channels）= 17 個特性。
L：序列長度（Length）= 24 小時。

但現在的順序是 (N, L, C) ，所以需要轉置（transpose）第 2 和第 3 維：

因此可寫下程式碼：

1 2	daily_bikes = daily_bikes.transpose(1, 2) daily_bikes.shape, daily_bikes.stride()

Output：

1	(torch.Size([730, 17, 24]), (408, 1, 17))

準備訓練資料

這部分就是處理類別資料的問題了，在原本的 dataset 中有個 weathersit，表示天氣情況，他是一個序數變數（ordinal），有 4 個等級：

1 = 好天氣
2 = 霧
3 = 小雨/小雪
4 = 大雨/大雪

可把它當作分類變數，用 One-Hot 編碼，也可以當作「連續變數」直接使用。而作者在這邊用 One-Hot encoding：

# 假設只看第一天的資料
first_day = bikes[:24].long()  # 取前 24 小時
weather_onehot = torch.zeros(first_day.shape[0], 4)  # 24 小時 × 4 種天氣

# 把天氣狀況（第 9 列）轉成索引
first_day[:, 9]

Output：

1 2	tensor([1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 2, 2, 2, 2])

接下來做 One-Hot encoding：

weather_onehot.scatter_(
    dim=1, # 在「列」方向散佈
    index=first_day[:,9].unsqueeze(1).long() - 1, # 天氣索引
    value=1.0) # 填入 1

Output：

tensor([[1., 0., 0., 0.],
        [1., 0., 0., 0.],
        ...,
        [0., 1., 0., 0.],
        [0., 1., 0., 0.]])

再接下來呢，就是做拼接的動作，把他接回去原始資料，好讓 NN 能去處理的 tensor：

1	torch.cat((bikes[:24], weather_onehot), 1)[:1]

Output：

1
2
3

tensor([[ 1.0000,  1.0000,  1.0000,  0.0000,  1.0000,  0.0000,  0.0000,
          6.0000,  0.0000,  1.0000,  0.2400,  0.2879,  0.8100,  0.0000,
          3.0000, 13.0000, 16.0000,  1.0000,  0.0000,  0.0000,  0.0000]])

在原本範例的後面，在做的事情基本上跟這邊一樣，只是在最後的最後多了去做正規化的動作。

用 tensor 表示文字

Deep Learning 在近年來於 NLP（Natural Language Processing）自然語言處理領域上有革命性的發展，當中有個 NN 叫做 RNNs（循環神經網路），應用於文本分類、分析、生成、自動翻譯等。

有個問題就是，NN 只能處理一堆數字，也就是 tensor，那要如何將文字轉成數字呢？

要處理文字共分兩個層級，一個是字元級別的，一個是單字級別的。

層級	處理單位	優點	缺點
字元級別（Character-level）	每次處理一個字元	字元種類少（只有 26 個英文字母 + 標點符號）	每個字元資訊量少
單字級別（Word-level）	每次處理一個單字	單字資訊量大	單詞數量過大（需要處理未見過的單字）

Character-level 的 One-Hot encoding

這本書作者所用的是 Jane Austen 的 “Pride and Prejudice” 當作範例。

該節的範例程式碼在 https://github.com/deep-learning-with-pytorch/dlwpt-code/blob/master/p1ch4/5_text_jane_austen.ipynb

1
2
3

lines = text.split('\n')
line = lines[200]
line

選一行文字做後續編碼，也就是那個熟悉的 One-Hot encoding。

1 2	letter_t = torch.zeros(len(line), 128) letter_t.shape

把每個字元轉換成一個長度為 128 的向量（ASCII 有 128 個字元）。

輸出得到：torch.Size([70, 128])。

就是表示說這行文字 line 有 70 個字元，然後每個字元都用 128 維的向量表示。

1
2
3

for i, letter in enumerate(line.lower().strip()):
    letter_index = ord(letter) if ord(letter) < 128 else 0 # 取得 ASCII Code
    letter_t[i][letter_index] = 1 # 在對應位置填 1

在這邊就是在做字元級別的 One-Hot encoding 了。

Word-level 的 One-Hot encoding

def clean_words(input_str):
    punctuation = '.,;:"!?”“_-'
    word_list = input_str.lower().replace('\n',' ').split()
    word_list = [word.strip(punctuation) for word in word_list]
    return word_list

words_in_line = clean_words(line)
line, words_in_line

這邊在做的事是做資料預處理，先把那些特殊字元拿掉，只要看單字本身即可。

Output：

('“Impossible, Mr. Bennet, impossible, when I am not acquainted with him',
 ['impossible',
  'mr',
  'bennet',
  'impossible',
  'when',
  'i',
  'am',
  'not',
  'acquainted',
  'with',
  'him'])

第二步就是建立字典，然後單字對應索引：

word_list = sorted(set(clean_words(text)))
word2index_dict = {word: i for (i, word) in enumerate(word_list)}

len(word2index_dict), word2index_dict['impossible']

首先第一行 word_list = sorted(set(clean_words(text))) 用到 set 資料結構，就是要用於去除重複這件事情，把重複的單字都給去除。

第二行就是建立一個字典。

最後的 Output 會 show 出：(7261, 3394)，表示這本書有 7261 個不重複的單字，”impossible” 這個單字的索引在 3394。

最後一步就是做 Word-level One-Hot encoding 啦：

word_t = torch.zeros(len(words_in_line), len(word2index_dict))
for i, word in enumerate(words_in_line):
    word_index = word2index_dict[word]
    word_t[i][word_index] = 1
    print('{:2} {:4} {}'.format(i, word_index, word))
    
print(word_t.shape)

Text embeddings

One-Hot 在前面有介紹過他的缺點，就是當資料量一旦大起來的時候，維度就會爆一個大的，然後讓你訓練的時候不好過。

但其實還有一個問題，就是不能表示單字之間的相似性，例如 “apple” 跟 “orange” 都是水果，但在 One-Hot encoding 中他們之間的距離都很遠，因此無法捕捉到語意關係。

解決上述問題的技術因而誕生，就是詞嵌入（Word Embeddings），或稱詞向量（Word vector）。

這主要是用一個低維度的浮點數向量（如 100 維）表示每個單字，並且語義相似的單字在這個空間中距離很近。

舉例：

One-Hot 編碼:
apple  → [0, 0, ..., 1, 0, ..., 0]  （7261 維）
orange → [0, 0, ..., 0, 1, ..., 0]  （7261 維）

Word Embedding:
apple  → [0.8, 0.2, -0.5, ..., 0.3]  （100 維）
orange → [0.75, 0.25, -0.4, ..., 0.35]  （100 維）
dog    → [-0.1, 0.9, 0.6, ..., -0.2]  （100 維）

One-Hot 不是 0 就是 1，之間的數字不連續，因此沒有辦法判斷的比較精準，相反 Word Embedding 就可以。可以發現 apple 跟 orange 的數值很相近，但兩者與 dog 的數值完全找不到任何關係。

Image Source：《Deep Learning with PyTorch》Page 100.

書中用了二維的空間去表示單字之間的遠近關係。

而向量之間的運算可以做到接近某個單字的向量，也就是說可以做一個類比：

apple_vector = [0.1, 0.1]
red_adjustment = [0, -0.1]
yellow_adjustment = [0, 0.5]

result = apple_vector + red_adjustment + yellow_adjustment
# result ≈ [0.1, 0.5]，接近 lemon 的 [0.2, 0.5]

apple_vector + red_adjustment + yellow_adjustment 運算完的結果能類比成 lemon。

總結

彩圖的 tensor 表示

一張 RGB 彩色照片可用三維 tensors 表示：

3 個通道（RGB）
寬 × 高像素矩陣

若影像為 800×600，則 tensor 形狀為 (3, 800, 600)（CHW 格式）。tensor 之所以使用 CHW，是因為卷積運算在此格式下較高效。

使用 transforms.PILToTensor() 可將 PIL Image 轉為此格式，並自動從 HWC → CHW。

tensor 內數值為 0~255 的 uint8，代表像素的顏色強度。

正規化（Normalization）

訓練模型時需將像素值調整到更適合神經網路學習的範圍：

簡單正規化到 [0, 1]
- 將 tensor 轉為 float，再除以 255。
使用 transforms.ToTensor()
- 自動完成：HWC 轉 CHW，並除以 255。
進階正規化到 [-1, 1]
- 使用 transforms.Normalize(mean=[0.5,...], std=[0.5,...])。

表格資料轉換為張量

以 pandas 讀取 CSV 後：

前 n 欄 → 特徵（input）
最後一欄 → 標籤（output）

如資料共有 5 列 4 特徵：

特徵張量形狀 → (5, 4)
標籤張量 → (5, )

這些 Tensor 可使用 TensorDataset 與 DataLoader 打包，以便批次訓練模型。

參考資料

《Deep Learning with PyTorch》第四章 Real-world data representation using tensors