【Python 筆記】正規運算式（Regular Expression）

發表於2025-10-12|更新於2025-10-12|程式設計Python

|總字數:3.4k|閱讀時間:14分鐘|瀏覽量:|評論數:

【Python 筆記】正規運算式（Regular Expression）

感謝您點進本篇文章，本篇文章為我 LukeTseng 個人首篇 Python 筆記系列文章，主要記錄我個人學習軌跡，所有內容皆用於個人學習用途，斟酌參考。若文章有任意處有誤，煩請各位指點，謝謝。

簡介

這東西的名字很多，有什麼正規表達式、正規表示法、規則運算式、常規表示法等等，我個人比較習慣叫他是正規運算式，總之他的名字就是 Regular Expression（簡稱 RE、regex）就對了，在這之後我都叫他是 regex 縮寫。

regex 是一套用來描述和比對字串樣式規則的「語法」或「表示法」，通常被嵌入在各大程式語言當中，例如我們最愛用的 Python 就是其一。regex 在處理和解析字串方面十分強大，因此也是我們要學習的對象。

你可能想說在 notepad 裡面就可以用搜尋、取代這些功能了，為什麼還需要 regex？因為 regex 他不只是只有搜尋、取代這些功能而已，他還有篩選的功能，例如可以篩出特定字元、字串，以此來找到電子郵件格式及電話號碼。

在 Python 使用 regex

import re

Python 本身就有內建 regex，因此只要直接引入就好了。

編譯 regex

在做 regex 之前，都需要做編譯的動作，因此以下就是我們 regex 最基本的語法：

1 2	import re p = re.compile('ab*')

:::info
建議在 regex 字串前加上 r 讓字串轉成原始字串，這樣反斜線 \ 就不會當成跳脫字元處理。
:::

而上面 'ab*' 的意思就是「必須要有一個字母 a」以及「0 個或 1 個以上的的字母 b」。

字元類別（Character Classes）

字元類別允許匹配指定集合中的任一個字元，用中括號 [] 圍起來。

常見的有以下幾種：

[abc]：匹配 a、b 或 c。
[a-z]：匹配任何小寫字母。
[A-Z]：匹配任何大寫字母。
[0-9]：匹配任何數字。
[^abc]：匹配除了 a、b、c 以外的任何字元（否定）。

特殊字元序列

\d：匹配任何數字，等同於 [0-9]。
\D：匹配任何非數字字元。
\w：匹配任何字母、數字或底線，等同於 [a-zA-Z0-9_]。
\W：匹配任何非字母數字字元。
\s：匹配任何空白字元（空格、tab、換行等）。
\S：匹配任何非空白字元。

量詞（Quantifiers）

用來指定某個模式應出現的次數：

*：匹配前面的字元 0 次或多次。
+：匹配前面的字元 1 次或多次。
?：匹配前面的字元 0 次或 1 次。
{n}：精確匹配 n 次。
{n,}：至少匹配 n 次。
{n,m}：匹配 n 到 m 次。

錨點（Anchors）

用來指定匹配在字串中的位置：

^：匹配字串的開頭。
$：匹配字串的結尾。
\b：匹配單詞邊界。
\B：匹配非單詞邊界。

常用方法

方法	語法	說明	回傳值
`re.compile()`	`re.compile(pattern, flags=0)`	將正規表達式編譯成模式物件，可重複使用以提升效能	Pattern 物件
`re.match()`	`re.match(pattern, string, flags=0)`	從字串開頭開始匹配，只檢查開頭是否符合模式	Match 物件或 None
`re.search()`	`re.search(pattern, string, flags=0)`	掃描整個字串，尋找第一個符合模式的位置	Match 物件或 None
`re.findall()`	`re.findall(pattern, string, flags=0)`	找出所有不重疊的匹配結果	包含所有匹配字串的列表
`re.finditer()`	`re.finditer(pattern, string, flags=0)`	找出所有匹配結果，以迭代器形式回傳	Match 物件的迭代器
`re.sub()`	`re.sub(pattern, repl, string, count=0, flags=0)`	將所有符合模式的部分替換成指定字串，count 可限制替換次數	替換後的新字串
`re.subn()`	`re.subn(pattern, repl, string, count=0, flags=0)`	與 `re.sub()` 類似，但會回傳替換次數	(新字串, 替換次數) 的元組
`re.split()`	`re.split(pattern, string, maxsplit=0, flags=0)`	根據符合的模式分割字串，maxsplit 可限制分割次數	分割後的字串列表
`re.fullmatch()`	`re.fullmatch(pattern, string, flags=0)`	檢查整個字串是否完全符合模式	Match 物件或 None

1. `re.match()` 範例

re.match() 為從字串開頭開始匹配模式（pattern）的方法。如果開頭符合模式則回傳 Match 物件，否則回傳 None。

以下程式碼中的 pattern 的 ^ 符號是一個錨點（anchor），表示匹配字串的開頭位置。這個符號確保 pattern 必須從字串的最開始就符合，而不是在字串中間找到符合的部分。

^ 放在 regex 的開頭，表示 text 字串一開始就要是 Hello 否則不匹配。

:::info
在當 re.match() 或 re.search() 成功匹配時，會回傳 match 物件，這物件會提供三個方法來取得匹配資訊：

match.group()：以字串形式回傳匹配到的內容。
match.start()：回傳匹配內容在字串中的起始位置（索引值）。
match.end()：回傳匹配內容的結束位置（最後一個字元的索引 + 1）。
:::

import re

text = "Hello, world!"
pattern = r'^Hello'

# 檢查字串是否以 "Hello" 開頭
match = re.match(pattern, text)

if match:
    print("字串以 'Hello' 開頭")
    print("匹配內容:", match.group())  # Hello
else:
    print("不匹配")

Output：

1 2	字串以 'Hello' 開頭匹配內容: Hello

2. `re.search()` 範例

import re
text = 'Hello World!, this is my first program for Python!'
m = re.search(r'Python', text)

if m:
    print('找到:', m.group())  # 找到: Python
    print('起始位置:', m.start())  # 起始位置: 43
    print('結束位置:', m.end())  # 結束位置: 49
else:
    print('未找到')

Output：

1
2
3

找到: Python
起始位置: 43
結束位置: 49

3. `re.findall()` 範例

\d 表示任何一個數字字元，同 [0-9] 就是 0 到 9 的意思。

而 \d 再加上一個 +，表示前面的 pattern 至少要出現 1 次以上。

\d+ 整句話翻譯過來的話就是 0 到 9 的數字至少要出現 1 次以上。

re.findall() 方法會回傳一個匹配到的字串列表。

import re

string = "My phone number is 0987654321, and my friend's phone number is 0912345678"
regex = r'\d+'
matches = re.findall(regex, string)
print(matches)  # ['0987654321', '0912345678']

Output：

1	['0987654321', '0912345678']

4. `re.sub()` 範例

sub 是取自 substitute 的縮寫，表示把字串交換，如下範例所示，將 Python 替換成 C++ 字串。

import re

text = "I like Python because Python is so powered."
result = re.sub(r'Python', 'C++', text)
print(result)  # I like C++ because C++ is so powered.

Output：

1	I like C++ because C++ is so powered.

那 re.sub() 還有他的親姊妹叫做 re.subn()，兩者用法完全一樣，只差在 re.subn() 回傳一個元組，前面是替換完成的字串，後面是替換的次數，沿用上述範例：

import re

text = "I like Python because Python is so powered."
result = re.subn(r'Python', 'C++', text)
print(result)  # ('I like C++ because C++ is so powered.', 2)

Output：

1	('I like C++ because C++ is so powered.', 2)

5. `re.split()` 範例

re.split() 根據匹配到的 pattern 來分割字串。

以下範例展示了將 ,;: 的 pattern 分割字串，達到如內建方法 .split() 一樣的事情。但差別在於內建的 .split() 只能找到一種 pattern 而已。

import re

text = "apple,banana;orange:grape"
result = re.split(r'[,;:]', text)
print(result)  # ['apple', 'banana', 'orange', 'grape']

Output：

1	['apple', 'banana', 'orange', 'grape']

字元類別、特殊字元序列、量詞、錨點的一些例子

前面說明過了一些使用的方法，那麼就可以來講一些特殊的例子來玩玩看了。

[^0-9] 除了 0 到 9 以外的字元：

1
2
3

import re

print(re.findall(r'[^0-9]', "abc123"))  # ['a', 'b', 'c']

Output：

1	['a', 'b', 'c']

\w+ 匹配任何字母、數字或底線：

1
2
3

import re

print(re.findall(r'\w+', "他說了 *** 在某種語言中")) # ['他說了', '在某種語言中']

Output：

1	['他說了', '在某種語言中']

*、?、{n, m} 範例：

import re

# * 匹配 0 個或多個 'b'
print(re.findall(r'ab*', "a ab abb abbb"))  # ['a', 'ab', 'abb', 'abbb']

# ? 匹配前面的字元 0 次或 1 次
print(re.findall(r'colou?r', "color and colour"))  # ['color', 'colour']

# {n,m} 匹配 2 到 4 個數字
print(re.findall(r'\d{2,4}', "1 22 333 4444 55555"))  # ['22', '333', '4444', '5555']

Output：

1
2
3

['a', 'ab', 'abb', 'abbb']
['color', 'colour']
['22', '333', '4444', '5555']

$、\b 範例：

import re

# $ 匹配 Hello World 字串結尾
print(re.search(r'World$', "Hello World"))  # <re.Match object; span=(6, 11), match='World'>
print(re.search(r'Hello$', "Hello World"))  # None

# \b 匹配單詞邊界
text = "cat cats caterpillar"
print(re.findall(r'\bcat\b', text))  # ['cat']
print(re.findall(r'cat', text))  # ['cat', 'cat', 'cat']

一些小應用

1. 電子郵件驗證

import re

def validate_email(email):
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    if re.match(pattern, email):
        return True
    return False

print(validate_email("user@example.com"))  # True
print(validate_email("invalid.email"))  # False
print(validate_email("user@domain.co.uk"))  # True

2. 電話號碼擷取

import re

text = """
Contact Information:
Office phone number: 02-1234-5678
phone number: 0912-345-678
home phone number: (02)8765-4321
"""

# 台灣電話號碼格式
patterns = [
    r'\d{2}-\d{4}-\d{4}',  # 02-1234-5678
    r'\d{4}-\d{3}-\d{3}',  # 0912-345-678
    r'\(\d{2}\)\d{4}-\d{4}'  # (02)8765-4321
]

for pattern in patterns:
    matches = re.findall(pattern, text)
    print(f"找到: {matches}")

Output：

1
2
3

找到: ['02-1234-5678']
找到: ['0912-345-678']
找到: ['(02)8765-4321']

3. 資料擷取

import re

# 從網頁內容擷取所有連結
html_content = """
<a href="https://example.com">範例</a>
<a href="https://test.com">測試</a>
<a href="/local/path">本地連結</a>
"""

urls = re.findall(r'href=["\']([^"\']+)["\']', html_content)
print(urls)  # ['https://example.com', 'https://test.com', '/local/path']

# 擷取價格資訊
text = "商品 A: $1,299 元，商品 B: $599 元，商品 C: $2,999 元"
prices = re.findall(r'\$[\d,]+', text)
print(prices)  # ['$1,299', '$599', '$2,999']

Output：

1 2	['https://example.com', 'https://test.com', '/local/path'] ['$1,299', '$599', '$2,999']

4. 密碼強度驗證

import re

def check_password_strength(password):
    # 至少 8 個字元
    if len(password) < 8:
        return False, "密碼長度至少需要 8 個字元"
    
    # 至少包含一個大寫字母
    if not re.search(r'[A-Z]', password):
        return False, "至少需要一個大寫字母"
    
    # 至少包含一個小寫字母
    if not re.search(r'[a-z]', password):
        return False, "至少需要一個小寫字母"
    
    # 至少包含一個數字
    if not re.search(r'\d', password):
        return False, "至少需要一個數字"
    
    # 至少包含一個特殊字元
    if not re.search(r'[!@#$%^&*(),.?":{}|<>]', password):
        return False, "至少需要一個特殊字元"
    
    return True, "密碼強度良好"

print(check_password_strength("weak"))  # False, 不符合要求
print(check_password_strength("Strong123!"))  # True, 密碼強度良好

Output：

1 2	(False, '密碼長度至少需要 8 個字元') (True, '密碼強度良好')

Regex 線上除錯工具

網路上隨便找就有了，這邊推薦兩個網站：

輸入自己的 Regex 就可以在他的測試文件裡面知道哪些資料是被篩選的。

當在設計複雜的 Regex 的時候，就可以用這個試試看。

總結

基礎語法架構：

字元類別：[abc]、[a-z]、[0-9] 等用於匹配指定集合中的字元。
特殊字元序列：\d（數字）、\w（字母數字底線）、\s（空白字元）。
量詞：*（0次或多次）、+（1次或多次）、?（0次或1次）、{n,m} 控制出現次數。
錨點：^（字串開頭）、$（字串結尾）、\b（單詞邊界）等定位符號。

常用方法表：

方法	語法	說明	回傳值
`re.compile()`	`re.compile(pattern, flags=0)`	將正規表達式編譯成模式物件，可重複使用以提升效能	Pattern 物件
`re.match()`	`re.match(pattern, string, flags=0)`	從字串開頭開始匹配，只檢查開頭是否符合模式	Match 物件或 None
`re.search()`	`re.search(pattern, string, flags=0)`	掃描整個字串，尋找第一個符合模式的位置	Match 物件或 None
`re.findall()`	`re.findall(pattern, string, flags=0)`	找出所有不重疊的匹配結果	包含所有匹配字串的列表
`re.finditer()`	`re.finditer(pattern, string, flags=0)`	找出所有匹配結果，以迭代器形式回傳	Match 物件的迭代器
`re.sub()`	`re.sub(pattern, repl, string, count=0, flags=0)`	將所有符合模式的部分替換成指定字串，count 可限制替換次數	替換後的新字串
`re.subn()`	`re.subn(pattern, repl, string, count=0, flags=0)`	與 `re.sub()` 類似，但會回傳替換次數	(新字串, 替換次數) 的元組
`re.split()`	`re.split(pattern, string, maxsplit=0, flags=0)`	根據符合的模式分割字串，maxsplit 可限制分割次數	分割後的字串列表
`re.fullmatch()`	`re.fullmatch(pattern, string, flags=0)`	檢查整個字串是否完全符合模式	Match 物件或 None