Pydantic 是一個使用Python類型注解進行數(shù)據(jù)驗證和管理的模塊。安裝方法非常簡單,打開終端輸入:
pip install pydantic
它類似于 Django DRF 序列化器的數(shù)據(jù)校驗功能,不同的是,Django里的序列化器的Field是有限制的,如果你想要使用自己的Field還需要繼承并重寫它的基類:
# Django 序列化器的一個使用例子,你可以和下面Pydantic的使用作對比
class Book(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=32)
price = models.DecimalField(max_digits=5, decimal_places=2)
author = models.CharField(max_length=32)
publish = models.CharField(max_length=32)
而 Pydantic 基于Python3.7以上的類型注解特性,實現(xiàn)了可以對任何類做數(shù)據(jù)校驗的功能:
上滑查看更多代碼
# Pydantic 數(shù)據(jù)校驗功能
from datetimeimport datetime
from typingimport List, Optional
from pydanticimport BaseModel
class User(BaseModel):
id: int
name ='John Doe'
signup_ts: Optional[datetime] =None
friends: List[int] = []
external_data = {
'id':'123',
'signup_ts':'2019-06-01 12:22',
'friends': [1,2,'3'],
}
user = User(**external_data)
print(user.id)
print(type(user.id))
# > 123
# > < class 'int' >
print(repr(user.signup_ts))
# > datetime.datetime(2019, 6, 1, 12, 22)
print(user.friends)
# > [1, 2, 3]
print(user.dict())
"""
{
'id': 123,
'signup_ts': datetime.datetime(2019, 6, 1, 12, 22),
'friends': [1, 2, 3],
'name': 'John Doe',
}
"""
從上面的基本使用可以看到,它甚至能自動幫你做數(shù)據(jù)類型的轉(zhuǎn)換,比如代碼中的 user.id, 在字典中是字符串,但經(jīng)過Pydantic校驗器后,它自動變成了int型,因為User類里的注解就是int型。
當我們的數(shù)據(jù)和定義的注解類型不一致時會報這樣的Error:
from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel
class User(BaseModel):
id: int
name = 'John Doe'
signup_ts: Optional[datetime] = None
friends: List[int] = []
external_data = {
'id': '123',
'signup_ts': '2019-06-01 12:222',
'friends': [1, 2, '3'],
}
user = User(**external_data)
"""
Traceback (most recent call last):
File "1.py", line 18, in < module >
user = User(**external_data)
File "pydanticmain.py", line 331, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for User
signup_ts
invalid datetime format (type=value_error.datetime)
"""
即 "invalid datetime format", 因為我傳入的 signup_ts 不是標準的時間格式(多了個2)。
1. Pydantic模型數(shù)據(jù)導出 ****
通過Pydantic模型中自帶的 json 屬性方法,能讓經(jīng)過校驗后的數(shù)據(jù)一行命令直接轉(zhuǎn)成 json 字符串,如前文中的 user 對象:
print(user.dict()) # 轉(zhuǎn)為字典
"""
{
'id': 123,
'signup_ts': datetime.datetime(2019, 6, 1, 12, 22),
'friends': [1, 2, 3],
'name': 'John Doe',
}
"""
print(user.json()) # 轉(zhuǎn)為json
"""
{"id": 123, "signup_ts": "2019-06-01T12:22:00", "friends": [1, 2, 3], "name": "John Doe"}
"""
非常方便。它還支持將整個數(shù)據(jù)結(jié)構(gòu)導出為 schema json,它能完整地描述整個對象的數(shù)據(jù)結(jié)構(gòu)類型:
上滑查看更多代碼
print(user.schema_json(indent=2))
"""
{
"title": "User",
"type": "object",
"properties": {
"id": {
"title": "Id",
"type": "integer"
},
"signup_ts": {
"title": "Signup Ts",
"type": "string",
"format": "date-time"
},
"friends": {
"title": "Friends",
"default": [],
"type": "array",
"items": {
"type": "integer"
}
},
"name": {
"title": "Name",
"default": "John Doe",
"type": "string"
}
},
"required": [
"id"
]
}
"""
2.數(shù)據(jù)導入
除了直接定義數(shù)據(jù)校驗模型,它還能通過ORM、字符串、文件導入到數(shù)據(jù)校驗模型:
比如字符串(raw):
from datetime import datetime
from pydantic import BaseModel
class User(BaseModel):
id: int
name = 'John Doe'
signup_ts: datetime = None
m = User.parse_raw('{"id": 123, "name": "James"}')
print(m)
# > id=123 signup_ts=None name='James'
此外,它能直接將ORM的對象輸入,轉(zhuǎn)為Pydantic的對象,比如從Sqlalchemy ORM:
上滑查看更多代碼
from typingimport List
from sqlalchemyimport Column, Integer, String
from sqlalchemy.dialects.postgresqlimport ARRAY
from sqlalchemy.ext.declarativeimport declarative_base
from pydanticimport BaseModel, constr
Base = declarative_base()
class CompanyOrm(Base):
__tablename__ ='companies'
id = Column(Integer, primary_key=True, nullable=False)
public_key = Column(String(20), index=True, nullable=False, unique=True)
name = Column(String(63), unique=True)
domains = Column(ARRAY(String(255)))
class CompanyModel(BaseModel):
id: int
public_key: constr(max_length=20)
name: constr(max_length=63)
domains: List[constr(max_length=255)]
class Config:
orm_mode =True
co_orm = CompanyOrm(
id=123,
public_key='foobar',
name='Testing',
domains=['example.com','foobar.com'],
)
print(co_orm)
# > < models_orm_mode.CompanyOrm object at 0x7f0bdac44850 >
co_model = CompanyModel.from_orm(co_orm)
print(co_model)
# > id=123 public_key='foobar' name='Testing' domains=['example.com',
# > 'foobar.com']
從Json文件導入:
from datetime import datetime
from pathlib import Path
from pydantic import BaseModel
class User(BaseModel):
id: int
name = 'John Doe'
signup_ts: datetime = None
path = Path('data.json')
path.write_text('{"id": 123, "name": "James"}')
m = User.parse_file(path)
print(m)
從pickle導入:
import pickle
from datetime import datetime
from pydantic import BaseModel
pickle_data = pickle.dumps({
'id': 123,
'name': 'James',
'signup_ts': datetime(2017, 7, 14)
})
m = User.parse_raw(
pickle_data, content_type='application/pickle', allow_pickle=True
)
print(m)
# > id=123 signup_ts=datetime.datetime(2017, 7, 14, 0, 0) name='James'
3.自定義數(shù)據(jù)校驗
你還能給它增加 validator 裝飾器,增加你需要的校驗邏輯:
上滑查看更多代碼
from sklearn.metrics import confusion_matrix
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# 1.導入數(shù)據(jù)集
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [1, 2, 3]].values
Y = dataset.iloc[:, 4].values
# 性別轉(zhuǎn)化為數(shù)字
labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
# 2.將數(shù)據(jù)集分成訓練集和測試集
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.25, random_state=0)
# 3.特征縮放
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# 4.訓練
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
# 5.預測
y_pred = classifier.predict(X_test)
# 6.評估預測
# 生成混淆矩陣
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
上面,我們增加了三種自定義校驗邏輯:
1.name 必須帶有空格
2.password2 必須和 password1 相同
3.username 必須為字母
讓我們試試這三個校驗是否有效:
user = UserModel(
name='samuel colvin',
username='scolvin',
password1='zxcvbn',
password2='zxcvbn',
)
print(user)
# > name='Samuel Colvin' username='scolvin' password1='zxcvbn' password2='zxcvbn'
try:
UserModel(
name='samuel',
username='scolvin',
password1='zxcvbn',
password2='zxcvbn2',
)
except ValidationError as e:
print(e)
"""
2 validation errors for UserModel
name
must contain a space (type=value_error)
password2
passwords do not match (type=value_error)
"""
可以看到,第一個UserModel里的數(shù)據(jù)完全沒有問題,通過校驗。
第二個UserModel里的數(shù)據(jù),由于name存在空格,password2和password1不一致,無法通過校驗。因此我們定義的自定義校驗器完全有效。
4.性能表現(xiàn)
這是最令我驚訝的部分,Pydantic 比 Django-rest-framework 的校驗器還快了12.3倍:
Package | 版本 | 相對表現(xiàn) | 平均耗時 |
---|---|---|---|
pydantic | 1.7.3 | 93.7μs | |
attrs + cattrs | 20.3 | 1.5x slower | 143.6μs |
valideer | 0.4.2 | 1.9x slower | 175.9μs |
marshmallow | 3.10 | 2.4x slower | 227.6μs |
voluptuous | 0.12 | 2.7x slower | 257.5μs |
trafaret | 2.1.0 | 3.2x slower | 296.7μs |
schematics | 2.1.0 | 10.2x slower | 955.5μs |
django-rest-framework | 3.12 | 12.3x slower | 1148.4μs |
cerberus | 1.3.2 | 25.9x slower | 2427.6μs |
-
模塊
+關(guān)注
關(guān)注
7文章
2716瀏覽量
47526 -
數(shù)據(jù)
+關(guān)注
關(guān)注
8文章
7067瀏覽量
89112 -
代碼
+關(guān)注
關(guān)注
30文章
4791瀏覽量
68688 -
python
+關(guān)注
關(guān)注
56文章
4797瀏覽量
84752
發(fā)布評論請先 登錄
相關(guān)推薦
評論