腳本 api
This is the continuation of my previous article:
這是我上一篇文章的延續:
From Jupyter Notebook To Scripts
從Jupyter Notebook到腳本
Last time we discussed how to convert Jupyter Notebook to scripts, together with all sorts of basic engineering practices such as CI, unit testing, package environment, configuration, logging, etc
上次我們討論了如何將Jupyter Notebook轉換為腳本,以及各種基本工程實踐,例如CI,單元測試,軟件包環境,配置,日志記錄等。
Even with the script form, it still requires us to change the configuration and run the script, It is OK for Kaggle competitions because all you need is just the submission.csv, but you probably don’t want to sit behind the computer 24/7 and hit Run whenever users send you a prediction request 🙁
即使使用腳本形式,它仍然需要我們更改配置并運行腳本。對于Kaggle競賽來說,這是可以的,因為您所需要的只是submitt.csv ,但您可能不想坐在計算機后面24 / 7,并在用戶向您發送預測請求時點擊運行🙁
In this article, we will discuss how to utilize the models we have built last time and create prediction APIs to do model serving using FastAPI!
在本文中,我們將討論如何利用上次構建的模型并創建預測API來使用FastAPI進行模型服務!
For ML/DL folks, we are talking about FastAPI, NOT fast.ai !!!
對于ML / DL人士,我們正在談論FastAPI ,而不是fast.ai !
背景:FastAPI (Background: FastAPI)
There are many frameworks in Python ecosystem for API, my original thought was to use Flask. But I am impressed by how simple and intuitive [and fast as the name suggests] FastAPI is and love to try it out in this mini-project!
Python生態系統中有許多API框架,我最初的想法是使用Flask。 但是,令我印象深刻的是FastAPI如此簡單和直觀(而且顧名思義,它很快),我喜歡在這個小型項目中嘗試一下!
“Rome wasn’t built in a day”, FastAPI has learned a lot from the previous frameworks such as Django, Flask, APIStar, I cannot explain better than the creator himself and this article is great!
“羅馬不是一天建成的”,FastAPI從Django,Flask,APIStar之類的先前框架中學到了很多東西,我無法比創建者本人解釋得更好,并且這篇文章很棒!
無聊但必要的設置 (The boring but necessary setup)
Things are all in one repo which is probably not a good practice, it should be different GitHub repo in the real use case, maybe I will refactor [professional way to say “clean my previous sxxt”] later!
事情全放在一個倉庫中 ,這可能不是一個好習慣,在實際用例中應該是不同的GitHub倉庫,也許以后我會重構 [專業方式來“清理我以前的sxxt”]!
*CS folks always say single responsibility principle, instead of saying “don’t put code with different functionalities together”, next time maybe you can say “we should follow single responsibility principle for this!”
* CS員工總是說單一責任原則 ,而不是說“不要將具有不同功能的代碼放在一起”,下次您可以說“我們應該遵循單一責任原則!”
First of all, let’s update requirements.txt with the new packages, as we mentioned last time, we should specify the exact version such that others can reproduce the work!
首先,讓我們用新的軟件包更新requirements.txt,正如我們上次提到的那樣,我們應該指定確切的版本 ,以便其他人可以復制作品!
# for last article
pytest==6.0.1
pandas==1.0.1
Click==7.0
scikit-learn==0.22.1
black==19.10b0
isort==4.3.21
PyYAML==5.2# for FastAPI
fastapi==0.61.0
uvicorn==0.11.8
chardet==3.0.4
After this, we need to install requirements.txt again in the conda env [because we have new packages]
此后,我們需要在conda env中再次安裝requirements.txt [因為我們有新軟件包]
# You can skip the line below if you have created conda env
conda create - name YOU_CHANGE_THIS python=3.7 -yconda activate YOU_CHANGE_THISpip install –r requirements.txt
游戲計劃 (The game plan)
Let’s think about what is happening, we want to have API endpoint to do prediction, to be specific if users give us the input, we need to use the model to predict and return prediction.
讓我們考慮發生了什么,我們想要讓API端點進行預測,具體來說,如果用戶給我們輸入,我們需要使用模型來預測并返回預測。
Instead of having us [human] handle the incoming request, we just create an API server to wait for the requests, parse the inputs, do prediction, and return results. API is just the structural way to talk to our computer and ask for the service [prediction in this case]
代替讓我們 [人類]處理傳入的請求,我們只是創建一個API服務器來等待請求,解析輸入,進行預測并返回結果。 API只是與我們的計算機對話并請求服務的結構化方式[在這種情況下為預測]
Below is the pseudocode:
下面是偽代碼:
# Load trained model
trained_model = load_model(model_path)# Let's create a API that can receive user request
api = CreateAPI()# If user send us the request to `predict` endpoint
when user sends request to `api`.`predict`:
input = api[`predict`].get(input) # get input
prediction = trained_model(input) # apply model
return prediction # return prediction
This is good for the happy flow! But we should NEVER trust the user, just ask yourself, will you ever read the user manual in your daily life?
這對幸福的流好! 但是,我們永遠不要相信用戶,只是問問自己,您是否會在日常生活中閱讀用戶手冊?
For example, we expect {‘a’: 1, ‘b’: 2, ‘c’: 3} from the user but we may get:
例如,我們希望用戶收到{'a':1,'b':2,'c':3},但我們可能會得到:
- Wrong order {‘b’: 2, ‘a’: 1, ‘c’: 3}, or 錯誤的順序{'b':2,'a':1,'c':3},或
- Wrong key {‘a’: 1, ‘b’: 2, ‘d’: 3}, or 錯誤的鍵{'a':1,'b':2,'d':3},或
- Missing key{‘a’: 1, ‘b’: 2}, or 缺少鍵{'a':1,'b':2}或
- Negative value {‘a’: -1, ‘b’: 2, ‘c’: 3}, or 負值{'a':-1,'b':2,'c':3},或
- Wrong type {‘a’: “HELLO WORLD”, ‘b’: 2, ‘c’: 3}, or 錯誤的類型{'a':“ HELLO WORLD”,“ b”:2,“ c”:3}或
- etc etc 等
This is fatal to our API because our model doesn’t know how to respond to this. We need to introduce some input structures to protect us! Therefore, we should update our pseudocode!
這對我們的API是致命的,因為我們的模型不知道如何響應。 我們需要引入一些輸入結構來保護我們! 因此,我們應該更新我們的偽代碼!
# Define input schema
input_schema = {......}# Load trained model
trained_model = load_model(model_path)# Let's create a API that can receive user request
api = CreateAPI()# If user send us the request to `predict` endpoint
when user sends request to `api`.`predict`:
input = api[`predict`].get(input) # get input
transformed_input = apply(input_schema, input)
if not transformed_input.valid(): return Error prediction = trained_model(transformed_input) # apply model
return prediction # return prediction
代碼 (The Code)
Looks good to me now! Let’s translate them using FastAPI part by part!
現在對我來說很好! 讓我們使用FastAPI進行部分翻譯!
Input schema
輸入模式
It seems many lines but things are the same, as you can guess, we define a class called `Sample` which defines every predictor as float and greater than [gt] zero!
似乎有很多行,但是事情都是一樣的,正如您所猜到的,我們定義了一個名為`Sample`的類,該類將每個預測變量定義為float且大于[gt]零!
Load model
負荷模型
Then we load the trained model, hmmm what is `Predictor`? it is just a custom class that wraps the model with different methods so we can call a method instead of implementing the logic in API server
然后我們加載訓練后的模型hmmm,什么是“預測變量”? 它只是一個自定義類,使用不同的方法包裝模型,因此我們可以調用方法而不是在API服務器中實現邏輯
Create an API server
創建一個API服務器
Then we create the API using FastAPI……pseudocode is almost the code already
然后我們使用FastAPI創建API……偽代碼幾乎已經是代碼
predict endpoint
預測終點
This looks complicated but they are very straightforward
這看起來很復雜,但是非常簡單
Instead of saying “when the user sends a request to `api`.`predict`”
而不是說“當用戶向api發送請求時。預測”。
We say: “Hey, app, if people send “GET request” to `predict`, please run function predict_item, we expect the input follows the schema we defined in `Sample`”
我們說: “嘿,應用程序,如果人們向“ predict” 發送“ GET request”,請運行函數predict_item,我們期望輸入遵循我們在“ Sample”中定義的模式。”
What predict_item does is only transform input shape, feed to trained model and return the prediction, simple Python function
Forecast_item所做的只是變換輸入形狀,輸入經過訓練的模型并返回預測,簡單的Python函數
If you want to know more about HTTP request methods
如果您想進一步了解HTTP請求方法
But you may ask: hey! One line is missing!!! Where is the input validation? What if users provide the wrong data type/key or miss a field?
但是您可能會問:嘿! 缺少一行!!! 輸入驗證在哪里? 如果用戶提供了錯誤的數據類型/密鑰或缺少字段怎么辦?
Well…….Remember we have defined `Sample` class for the input schema? Fast API automatically validates it for us according to the schema and we don’t need to care about that!!! This saves a lot of brainpower and many lines of code to build a robust and well-tested API!
好吧……。還記得我們為輸入模式定義了`Sample`類嗎? 快速API會根據架構自動為我們驗證它,我們不需要在意!!!! 這樣可以節省大量的人力資源和許多行代碼,以構建強大且經過良好測試的API!
嘗試使用 (Try to use)
# At project root, we can run this
# --reload is for development, API server autorefresh
# when you change the codeuvicorn prediction_api.main:app --reload
You should be able to see these, the API server is now running on “http://127.0.0.1:8000”!
您應該能夠看到這些,API服務器現在正在“ http://127.0.0.1:8000”上運行!

There are different ways to experiment with the API, depending on your environment, you may use requests in Python or cURL in the command line. BTW there is a handy tool is called Postman, try it out, it is a very intuitive and user-friendly tool for API!
有多種嘗試API的方法,具體取決于您的環境,您可以在命令行中使用Python或cURL中的 請求 。 順便說一句,有一個方便的工具叫做Postman ,試試看,它是一個非常直觀且用戶友好的API工具!
We will use Python requests for the following examples, you can see them in this Notebook [sometimes Jupyter is helpful 😎]
我們將針對以下示例使用Python請求,您可以在本筆記本中查看它們[有時Jupyter會有所幫助😎]
Example below uses a valid input: YEAH! 😍 We made it! The endpoint returns the prediction!!!
以下示例使用有效輸入:YEAH! 😍我們做到了! 端點返回預測!!!
payload = {
"fixed_acidity": 10.5,
"volatile_acidity": 0.51,
"citric_acid": 0.64,
"residual_sugar": 2.4,
"chlorides": 0.107,
"free_sulfur_dioxide": 6.0,
"total_sulfur_dioxide": 15.0,
"density": 0.9973,
"pH": 3.09,
"sulphates": 0.66,
"alcohol": 11.8,
}result = requests.get("http://127.0.0.1:8000/predict", data = json.dumps(payload))print(result.json())Output
{'prediction': 1, 'utc_ts': 1597537570, 'model': 'RandomForestClassifier'}
Example below misses a field and FastAPI helps us to handle it according to our defined schema, I literally write nothing other than the schema class
下面的示例錯過了一個字段,FastAPI可以幫助我們根據定義的模式來處理它,我只寫了架構類
payload = {
"volatile_acidity": 0.51,
"citric_acid": 0.64,
"residual_sugar": 2.4,
"chlorides": 0.107,
"free_sulfur_dioxide": 6.0,
"total_sulfur_dioxide": 15.0,
"density": 0.9973,
"pH": 3.09,
"sulphates": 0.66,
"alcohol": 11.8,
}result = requests.get("http://127.0.0.1:8000/predict", data = json.dumps(payload))print(result.json())Output
{'detail': [{'loc': ['body', 'fixed_acidity'], 'msg': 'field required', 'type': 'value_error.missing'}]}
Just for fun, I also implemented a update_model PUT API to swap the models, for example, originally we were using Random Forest, I updated it to Gradient Boosting??
只是為了好玩,我還實現了update_model PUT API來交換模型,例如,最初我們使用隨機森林,我將其更新為GradientBoosting??
result = requests.put("http://127.0.0.1:8000/update_model")print(result.json())Output
{'old_model': 'RandomForestClassifier', 'new_model': 'GradientBoostingClassifier', 'utc_ts': 1597537156}
自動產生的文件 (Auto-generated Document)
One of the cool FastAPI features is auto-document, just go to http://127.0.0.1:8000/docs#/ and you will have the interactive and powerful API document out of the box! So intuitive that I don’t need to elaborate
FastAPI的一項很酷的功能是自動文檔,只需轉到http://127.0.0.1:8000/docs#/ ,您便可以立即獲得強大的交互式API文檔! 如此直觀,我無需贅述

重新訪問pytest (Revisit pytest)
I cannot emphasize enough about the importance of unit testing, it verifies the functions are doing what we expect them to do such that you will not break things accidentally!
對于單元測試的重要性,我不能太強調,它可以驗證功能是否正在按我們期望的方式進行,以確保您不會意外破壞!
But if I try to cover every test it will be too boring and lengthy. What I plan to do here is to share some areas I will brainlessly test & some [probably useful] articles. Then I will talk about a pytest feature called parameterized unit test and some testing options in pytest. The easiest way to motivate yourself to learn unit testing is to try to refactor your previous code, the larger the better!
但是,如果我嘗試涵蓋所有測試,那將太無聊且冗長。 我打算在這里做的是分享我將無腦地測試的一些領域和一些[可能有用的]文章。 然后,我將討論一種稱為參數化單元測試的pytest功能以及pytest中的一些測試選項。 激勵自己學習單元測試的最簡單方法是嘗試重構您以前的代碼,越大越好!
Unit testing
單元測試
Whenever you found difficult to write/understand the unit tests, you probably need to review your code structure first. Below are 4 areas that I will consider brainlessly:
每當發現難以編寫/理解單元測試時,您可能需要首先查看您的代碼結構。 以下是我將全力以赴的4個方面:
- Input data: dimension [eg: df.shape], type [eg: str], value range [eg: -/0/+] 輸入數據:尺寸[例如:df.shape],類型[例如:str],值范圍[例如:-/ 0 / +]
- Output data: dimension [eg: df.shape], type [eg: str], value range [eg: -/0/+] 輸出數據:尺寸[例如:df.shape],類型[例如:str],值范圍[例如:-/ 0 / +]
- Compare: output and expected result 比較:輸出和預期結果
- After I debug, prevent it happens again 調試后,防止再次發生
For example, I focus quite a lot on output dimension, type, value range below. It seems simple but if you modify any output format, it will remind you what is the expected formats!
例如,我將很多精力放在下面的輸出尺寸,類型,值范圍上。 看起來很簡單,但是如果您修改任何輸出格式,它將提醒您期望的格式是什么!
Some articles FYR:
FYR的一些文章:
Unit Testing for Data Scientists
數據科學家的單元測試
How to unit test machine learning code [deep learning]
如何對機器學習代碼進行單元測試 [深度學習]
Parameterized unit test
參數化單元測試
Suppose you have 100 mock data [annotate by D_i, i: 1..100] and you want to run the same unit test for each of them, how will you do?
假設您有100個模擬數據[用D_i注釋,我:1..100注釋],并且您要為每個模擬數據運行相同的單元測試,您將如何處理?
A brute force solution
蠻力解決方案
def test_d1():
assert some_operation(D_1)def test_d2():
assert some_operation(D_2)def test_d3():
assert some_operation(D_3)......def test_d100():
assert some_operation(D_100)
But if you need to modify `some_operation`, you need to modify it 100 times LOL………Although you can make it as a utility function, this makes the tests hard to read and very lengthy
但是,如果您需要修改“ some_operation”,則需要對其進行100倍的LOL修改…………盡管您可以將其作為實用程序功能使用,但這會使測試難以閱讀且非常冗長
A better way maybe for-loop?
更好的方法也許是循環的?
def test_d():
for D in [D_1, D_2, D_3, ..., D_100]:
assert some_operation(D)
But you can’t know exactly which tests fail because these 100 tests are all in one test
但是您無法確切知道哪些測試失敗,因為這100個測試全部在一個測試中
pytest offers us a feature called parametrize
pytest為我們提供了一個稱為 參數化 的功能
@pytest.mark.parametrize("test_object", [D_1, D_2, ..., D_100])
def test_d(test_object):
assert some_operation(test_object)
Common pytest options
常見的pytest選項
pytest FOLDER
pytest文件夾
Last time we mentioned we can just run `pytest` in command line and pytest will find ALL the tests under the folder itself. But sometimes we may not want to run all the unit tests during development [maybe some tests take a long time but unrelated to your current tasks]
上次提到時,我們可以在命令行中運行pytest ,而pytest將在文件夾本身下找到所有測試。 但是有時我們可能不希望在開發過程中運行所有的單元測試[也許某些測試需要很長時間,但與您當前的任務無關]
In that case, you can simply run pytest FOLDER, eg: `pytest ./scripts` or `pytest ./prediction_api` in the demo
在這種情況下,您可以簡單地運行pytest FOLDER ,例如:演示中的`pytest。/ scripts`或`pytest。/ prediction_api`。
parallel pytest
并行pytest
Sometimes your test cases are too heavy, it may be a good idea to run things in parallel! You can install pytest-xdist and replace pytest by py.test in your command, eg: py.test -n 4
有時您的測試用例過于繁重,最好并行運行! 您可以在命令中安裝pytest-xdist并用py.test替換pytest,例如:py.test -n 4
pytest -v
pytest -v
This is personal taste, I prefer the verbose output and see green PASSED ? to start my day
這是個人喜好,我更喜歡詳細的輸出,看到綠色的PASSED ASS開始我的一天


You can read more from the materials below:
您可以從以下材料中了解更多信息:
https://docs.pytest.org/en/stable/
https://docs.pytest.org/en/stable/
https://www.guru99.com/pytest-tutorial.html#5
https://www.guru99.com/pytest-tutorial.html#5
最后,我希望您能像我一樣欣賞這段1分鐘的YouTube視頻 😆 (At last, I hope you can enjoy this 1 min Youtube video as I do 😆)
結論 (Conclusions)
Yooo? we have created a prediction API that consumes our model, users can now send the request and get the prediction without a human sitting behind, this over-simplifies the reality [throughput, latency, model management, authentication, AB testing, etc] but this is the idea!
?,我們創建了一個預測API,該API可以使用我們的模型,用戶現在可以發送請求并獲得預測,而無需人工干預,這過分簡化了現實情況(吞吐量,延遲,模型管理,身份驗證,AB測試等)但這是主意!
At least if your prototype is at this level, engineers are much happier to take over from this point and hence speed up the whole process, and you can show them you know something 😈
至少如果您的原型處于此級別,工程師會更樂于接受這一點,從而加快了整個過程的速度,您可以向他們展示您知道的東西😈
To wrap up, we:
最后,我們:
a. Update conda env [requirements.txt]
b. Brainstorm pseudocode and convert to code [FastAPI, uvicorn]
c. Utilize API [cURL, requests, Postman]
d. Talk about Auto-generated documents by FastAPI
e. Some pytest techniques [parallel, parameterized, -v]
File tree below to show the development steps
下面的文件樹顯示了開發步驟
.
├── notebook
│ ├── prediction-of-quality-of-wine.ipynb
│ └── prediction_API_test.ipynb [c] <-consume API
├── prediction_api
│ ├── __init__.py
│ ├── api_utility.py [b] <-wrap up methods
│ ├── main.py [b] <-modify demo
│ ├── mock_data.py [e] <-Unit test
│ ├── test_api_utility.py [e] <-Unit test
│ └── test_main.py [e] <-Unit test
├── requirements.txt [a] <-FastAPI doc
.
.
.
BUT (again, bad news usually starts with BUT) they are still on my local computer.
但是 (同樣,壞消息通常以BUT開始)它們仍然在我的本地計算機上。
Although we don’t need to sit behind and hit Run, the user requests cannot reach the API endpoints. Even if they do, it means I cannot close my Macbook, it means I cannot scale if there are many incoming prediction requests😱!!!
盡管我們不需要坐在后面并單擊“運行”,但用戶請求無法到達API端點。 即使它們這樣做,也意味著我無法關閉我的Macbook,這意味著如果有很多傳入的預測請求,我也無法擴展😱!
The way to escape from this hell, as we mentioned in the last article, is to either buy another computer OR rent a server from cloud providers such as AWS
就像我們在上一篇文章中提到的那樣,擺脫困境的方法是要么購買另一臺計算機,要么從云服務提供商(例如AWS)租用服務器。
But first, we also need to ensure the code is working fine there! How?
但是首先,我們還需要確保該代碼在這里能正常工作! 怎么樣?
Short answer: Docker
簡短答案:Docker
Aside:
在旁邊:
Although I haven’t tried, there is a startup called Cortex which focuses on open source machine learning API framework and they also use FastAPI under the hood!
盡管我還沒有嘗試過,但是有一家名為Cortex的初創公司專注于開源機器學習API框架,他們還使用了FastAPI !
By now, you should be able to understand their tutorial, in short, they solve many production-level problems behind the scene such as rolling update, DL model inference, integration with AWS, Autoscaling etc…… [These are DevOps concerns? Or maybe fancier term: MLOps]
到目前為止,您應該已經能夠理解他們的教程,簡而言之,他們解決了幕后的許多生產級問題,例如滾動更新,DL模型推斷,與AWS集成,Autoscaling等……[這些是DevOps所關注的嗎? 也許是更特別的術語: MLOps
But from user [aka you] perspective, they deploy the APIs using declarative yml [similar to how we configure model in the last article], have a predictor class [similar to our Predictor class], trainer.py [similar to train.py in the last article]
但是從用戶[aka you]的角度來看,他們使用聲明性yml部署API [類似于上一篇文章中的配置模型的方式],有一個預報器類[類似于我們的Predictor類],trainer.py [類似于train.py在上一篇文章中]
Writing the code is relatively easy but writing an article for the code is hard, if you found this article useful, you can leave some comments
編寫代碼相對容易,但是為代碼編寫文章卻很困難,如果您發現本文有用,則可以留下一些評論
OR you can star my repo!
或者你可以給我的倉庫加注星號!
OR my LinkedIn [Welcome but please leave a few words to indicate you are not zombie]!
或我的LinkedIn (歡迎使用,請留下幾句話以表明您不是僵尸)!
BTW, we know COVID-19 has bad if not catastrophic impacts on everyone’s career, especially for graduates. Research shows the unlucky effect can set a graduate back for years given they do nothing wrong. Well……what else can you say🤒? If you [or know people who] are hiring, feel free to reach out and we can forward the opportunities to people who are in need 🙏
順便說一句,我們知道COVID-19對每個人的職業都有嚴重的影響,即使不是災難性的影響,尤其是對于畢業生而言。 研究 表明,不幸的是,只要他們沒有做錯任何事情,就會使畢業生退縮多年。 好吧……你還能說什么? 如果你[或知道是誰的人]雇用,隨意 伸手 ,我們可以轉發機會的人誰需要 🙏
翻譯自: https://towardsdatascience.com/from-scripts-to-prediction-api-2372c95fb7c7
腳本 api
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/388355.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/388355.shtml 英文地址,請注明出處:http://en.pswp.cn/news/388355.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!