Python BS4庫的安裝與使用詳解

2020-02-15 22:42:51

字體：大中小

來源：轉載

供稿：網友

Beautiful Soup 庫一般被稱為bs4庫，支持Python3，是我們寫爬蟲非常好的第三方庫。因用起來十分的簡便流暢。所以也被人叫做“美味湯”。目前bs4庫的最新版本是4.60。下文會介紹該庫的最基本的使用，具體詳細的細節還是要看：[官方文檔](Beautiful Soup Documentation)

bs4庫的安裝

Python的強大之處就在于他作為一個開源的語言，有著許多的開發者為之開發第三方庫，這樣我們開發者在想要實現某一個功能的時候，只要專心實現特定的功能，其他細節與基礎的部分都可以交給庫來做。bs4庫就是我們寫爬蟲強有力的幫手。

安裝的方式非常簡單：我們用pip工具在命令行里進行安裝

$ pip install beautifulsoup4

接著我們看一下是否成功安裝了bs4庫

$ pip list

這樣我們就成功安裝了 bs4 庫

bs4庫的簡單使用

這里我們先簡單的講解一下bs4庫的使用，

暫時不去考慮如何從web上抓取網頁，

假設我們需要爬取的html是如下這么一段：

下面的一段HTML代碼將作為例子被多次用到.這是愛麗絲夢游仙境的的一段內容(以后內容中簡稱為愛麗絲的文檔):

<html><head><title>The Dormouse's story</title></head><body><p class="title"><b>The Dormouse's story</b></p>  <p class="story">Once upon a time there were three little sisters; and their names werehttp://example.com/elsie" class="sister" id="link1">Elsie,http://example.com/lacie" class="sister" id="link2">Lacie andhttp://example.com/tillie" class="sister" id="link3">Tillie;and they lived at the bottom of a well.</p>  <p class="story">...</p></html>

下面我們開始用bs4庫解析這一段html網頁代碼。

#導入bs4模塊from bs4 import BeautifulSoup#做一個美味湯soup = BeautifulSoup(html，'html.parser')#輸出結果print(soup.prettify())  '''OUT:  # <html># <head>#  <title>#  The Dormouse's story#  </title># </head># <body>#  <p class="title">#  <b>#   The Dormouse's story#  </b>#  </p>#  <p class="story">#  Once upon a time there were three little sisters; and their names were#  <a class="sister"  rel="external nofollow" id="link1">#   Elsie#  </a>#  ,#  <a class="sister"  rel="external nofollow" id="link2">#   Lacie#  </a>#  and#  <a class="sister"  rel="external nofollow" id="link2">#   Tillie#  </a>#  ; and they lived at the bottom of a well.#  </p>#  <p class="story">#  ...#  </p># </body># </html>'''

上一篇：Random 在 Python 中的使用方法

下一篇：python文件操作之批量修改文件后綴名的方法

學習交流

如何重啟打印機打印服務

如何重啟打印機打印服務...

熱門圖片

猜你喜歡的新聞

猜你喜歡的關注