python 網頁編程
The internet and the World Wide Web (WWW), is probably the most prominent source of information today. Most of that information is retrievable through HTTP. HTTP was invented originally to share pages of hypertext (hence the name Hypertext Transfer Protocol), which eventually started the WWW.
互聯網和萬維網(WWW)可能是當今最突出的信息來源。 大多數信息可通過HTTP檢索。 最初是發明HTTP來共享超文本頁面的(因此被稱為超文本傳輸??協議),該頁面最終啟動了WWW。
This process occurs every time we request a web page through our devices. The exciting part is we can perform these operations programmatically to automate the retrieval and processing of information.
每當我們通過設備請求網頁時,都會發生此過程。 令人興奮的部分是我們可以以編程方式執行這些操作,以自動進行信息的檢索和處理。
This article is an excerpt from the book Python Automation Cookbook, Second Edition by Jamie Buelta, a comprehensive and updated edition that enables you to develop a sharp understanding of the fundamentals required to automate business processes through real-world tasks, such as developing your first web scraping application, analyzing information to generate spreadsheet reports with graphs, and communicating with automatically generated emails.
本文摘自 Jamie Buelta撰寫 的《 Python Automation Cookbook,第二版》 ,這是一個全面而更新的版本,使您能夠深入了解通過實際任務(例如,開發第一個任務)來實現業務流程自動化的基本原理。網絡抓取應用程序,分析信息以生成帶有圖表的電子表格報告,以及與自動生成的電子郵件進行通信。
In this article, we will learn how to leverage the Python language to fetch HTTP. Python has an HTTP client in its standard library. Further, the fantastic request modules make obtaining web pages very convenient.
在本文中,我們將學習如何利用Python語言來獲取HTTP。 Python在其標準庫中有一個HTTP客戶端。 此外,出色的請求模塊使獲取網頁非常方便。
[Related article: Web Scraping News Articles in Python]
[相關文章: Python中的Web搜刮新聞文章 ]
與表格互動 (Interacting with forms)
A common element present in web pages is forms. Forms are a way of sending values to a web page, for example, to create a new comment on a blog post, or to submit a purchase.
網頁中常見的元素是表單。 表單是一種將值發送到網頁的方法,例如,在博客文章上創建新評論或提交購買。
Browsers present forms so you can input values and send them in a single action after pressing the submit or equivalent button. We’ll see how to create this action programmatically in this recipe.
瀏覽器顯示表單,因此您可以輸入值并在按下提交或等效按鈕后以單個操作發送它們。 我們將在本食譜中了解如何以編程方式創建此動作。

做好準備 (Getting ready)
We’ll work against the test server https://httpbin.org/forms/post, which allows us to send a test form and sends back the submitted information.
我們將針對測試服務器https://httpbin.org/forms/post進行工作,該服務器允許我們發送測試表單并發回已提交的信息。
The following is an example form to order a pizza:
以下是訂購比薩餅的示例表格:

Figure 1 Rendered form
圖1呈現的表單
You can fill the form in manually and see it return the information in JSON format, including extra information such as the browser being used.
您可以手動填寫表單,然后查看它以JSON格式返回信息,包括其他信息,例如正在使用的瀏覽器。
The following is the frontend of the web form that is generated:
以下是生成的Web表單的前端:

Figure 2: Filled-in form
圖2:填寫表格
The following screenshot shows the backend of the web form that is generated:
以下屏幕快照顯示了生成的Web表單的后端:

Figure 3: Returned JSON content
圖3:返回的JSON內容
We need to analyze the HTML to see the accepted data for the form. The source code is as follows:
我們需要分析HTML以查看表單的可接受數據。 源代碼如下:

Figure 4: Source code
圖4:源代碼
Check the names of the inputs, custname, custtel, custemail, size (a radio option), topping (a multiselection checkbox), delivery (time), and comments.
檢查輸入的名稱,客戶名稱,客戶名稱,客戶郵件,大小(單選),打頂(多選復選框),傳遞(時間)和注釋。
怎么做… (How to do it…)
1. Import the requests, BeautifulSoup, and re modules:
1.導入請求,BeautifulSoup,然后重新模塊:
>>> import requests >>> from bs4 import BeautifulSoup >>> import re
2. Retrieve the form page, parse it, and print the input fields. Check that the posting URL is /post (not /forms/post): >>> response = requests.get(‘https://httpbin.org/forms/post’)
2.檢索表單頁面,對其進行解析,然后打印輸入字段。 檢查發布URL是否為/ post(不是/ forms / post): >>> response = requests.get('https://httpbin.org/forms/post')
>>> page = BeautifulSoup(response.text) >>> form = page.find('form') >>> {field.get('name') for field in form.find_all(re. compile('input|textarea'))} {'delivery', 'topping', 'size', 'custemail', 'comments', 'custtel', 'custname'}
3. Note that textarea is a valid input and is defined in the HTML format. Prepare the data to be posted as a dictionary. Check that the values are as defined in the form:
3.請注意,textarea是有效輸入,并以HTML格式定義。 準備要作為字典發布的數據。 檢查值是否符合以下格式中的定義:
>>> data = {'custname': "Sean O'Connell", 'custtel': '123-456- 789', 'custemail': 'sean@oconnell.ie', 'size': 'small', 'topping': ['bacon', 'onion'], 'delivery': '20:30', 'comments': ''}
4. Post the values and check that the response is the same as returned in the browser:
4.發布值,并檢查響應是否與瀏覽器中返回的相同:
>>> response = requests.post('https://httpbin.org/post', data) >>> response <Response [200]> >>> response.json() {'args': {}, 'data': '', 'files': {}, 'form': {'comments': '', 'custemail': 'sean@oconnell.ie', 'custname': "Sean O'Connell", 'custtel': '123-456-789', 'delivery': '20:30', 'size': 'small', 'topping': ['bacon', 'onion']}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Content-Length': '140', 'Content-Type': 'application/x-wwwform- urlencoded', 'Host': 'httpbin.org', 'User-Agent': 'pythonrequests/ 2.22.0'}, 'json': None, 'origin': '89.100.17.159', 'url': 'https://httpbin.org/post'}
這個怎么運作… (How it works…)
Requests directly encodes and sends data in the configured format. By default, it sends POST data in the application/x-www-form-urlencoded format.
請求以配置的格式直接編碼并發送數據。 默認情況下,它以application / x-www-form-urlencoded格式發送POST數據。
The key aspect here is to respect the format of the form and the possible values that can return an error if incorrect, typically a 400 error, indicating a problem with the client.
此處的關鍵方面是尊重表單的格式和可能的值,如果不正確,則可能返回錯誤,通常為400錯誤,這表明客戶端存在問題。
[Related article: Building a Scraper Using Browser Automation]
[相關文章: 使用瀏覽器自動化構建刮板 ]
還有更多… (There’s more…)
Other than following the format of forms and inputting valid values, the main problem when working with forms is the multiple ways of preventing spam and abusive behavior. You will often have to ensure that you have downloaded a form before submitting it, to avoid submitting multiple forms or Cross-Site Request Forgery (CSRF).
除了遵循表格的格式和輸入有效值外,使用表格時的主要問題還在于防止垃圾郵件和濫用行為的多種方法。 您通常必須確保在提交表單之前已經下載了表單,以避免提交多個表單或跨站點請求偽造 ( CSRF )。
To obtain the specific token, you need to first download the form, as shown in the recipe, obtain the value of the CSRF token, and resubmit it. Note that the token can have different names; this is just an example:
要獲取特定令牌,您需要先下載表單,如配方所示,獲取CSRF令牌的值,然后重新提交。 請注意,令牌可以具有不同的名稱。 這只是一個例子:
>>> form.find(attrs={'name': 'token'}).get('value') 'ABCEDF12345'
In this article, we learned how to obtain data from the forms of the web, parse it, and print the input fields using Python’s HTTP client. We also explored the role and application of requests, Beautiful Soup, and re–modules.
在本文中,我們學習了如何使用Python的HTTP客戶端從Web表單中獲取數據,進行解析并打印輸入字段。 我們還探討了請求,“美麗的湯”和“重新模塊”的作用和應用。
關于作者 (About the Author)
Jaime Buelta is a full-time Python developer since 2010 and a regular speaker at PyCon Ireland. He has been a professional programmer for over two decades with a rich exposure to a lot of different technologies throughout his career. He has developed software for a variety of fields and industries, including aerospace, networking and communications, industrial SCADA systems, video game online services, and financial services.
Jaime Buelta自2010年以來一直是Python的專職開發人員,并在PyCon Ireland擔任定期發言人。 在過去的二十多年中,他一直是一名專業的程序員,在他的整個職業生涯中,他對許多不同的技術有著豐富的了解。 他開發了適用于各個領域和行業的軟件,包括航空航天,網絡和通信,工業SCADA系統,視頻游戲在線服務以及金融服務。
Editor’s note: Interested in learning more about coding beyond just retrieving webpages through Python? Check out some of these upcoming similar ODSC talks:
編者注:除了通過Python檢索網頁之外,您還想了解更多有關編碼的信息嗎? 查看以下即將舉行的類似ODSC講座:
ODSC Europe: “Programming with Data: Python and Pandas” — In this training, you will learn how to accelerate your data analyses using the Python language and Pandas, a library specifically designed for tabular data analysis.
ODSC歐洲:“ 使用數據編程:Python和Pandas ” —在本培訓中,您將學習如何使用Python語言和Pandas(專門用于表格數據分析的庫)來加速數據分析。
ODSC Europe: “Introduction to Linear Algebra for Data Science and Machine Learning With Python” — The goal of this session is to show you that you can start learning the math needed for machine learning and data science using code.
ODSC歐洲:“ 使用Python進行數據科學和機器學習的線性代數簡介 ” —本課程的目的是向您展示您可以開始使用代碼學習機器學習和數據科學所需的數學。
翻譯自: https://medium.com/@ODSC/retrieving-webpages-through-python-programming-8f3bae8518a5
python 網頁編程
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389406.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389406.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389406.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!