本篇文章介紹了批量抓取某個列表頁的教程分享,下面我們就來看看詳細的教程,需要的朋友可以參考下。
有些人當抓取程序是個寶,到目前還TND有人在賣錢,強烈BS一下這些家伙 真是的!可能偶下邊這段東西比較爛哈
下邊這個沒有寫入庫功能,已經到這一步了,入庫功能是很簡單的事了,需要的請自己去完成吧,其它功能各位自行完善吧!把代碼拷貝過去直接運行即可看到效果
Dim?Url,List_PageCode,Array_ArticleID,i,ArticleID
Dim?Content_PageCode,Content_TempCode
Dim?Content_CategoryID,Content_CategoryName,BorderID,ClassID,BorderName,ClassName
Dim?ArticleTitle,ArticleAuthor,ArticleFrom,ArticleContent
Url?=?"http://www.webasp.net/article/class/1.htm"
List_PageCode?=?getHTTPPage(Url)
List_PageCode?=?RegExpText(List_PageCode,"打印","
List_PageCode?=?RegExpText(List_PageCode," '取得當前列表頁的文章鏈接,以,分隔
Array_ArticleID?=?Split(List_PageCode,",")????'創建數組,存儲文章ID
For?i=0?To?Ubound(Array_ArticleID)-1
????ArticleID?=?Array_ArticleID(i)????'文章ID
????Content_PageCode?=?getHTTPPage("http://www.webasp.net/article/"&ArticleID)????'取得文章頁的內容
????'=========取文章分類及相關ID參數?開始=======================
????Content_TempCode?=?RegExpText(Content_PageCode,"技術教程?>>?",">>?內容",0)
????Content_CategoryID?=?RegExpText(Content_PageCode,"",1)
????BorderID?=?Split(Content_CategoryID,",")(0)????'大類ID
????ClassID?=?Split(Content_CategoryID,",")(1)????'子類ID
????????'==========檢查大類是否存在?開始===============
????????'如果不存在則入庫
????????'==========檢查大類是否存在?結束===============
????'Response.Write(BorderID?&?","?&?ClassID?&?"
")
????Content_CategoryName?=?RegExpText(Content_PageCode,"/'>","",1)
????BorderName?=?Split(Content_CategoryName,",")(0)????'大類名稱
????ClassName?=?Split(Content_CategoryName,",")(1)????'子類名稱
????????'==========檢查子類是否存在?開始===============
????????'如果不存在則入庫
????????'==========檢查子類是否存在?結束===============
????'=========取文章分類及相關ID參數?結束=======================
????'=========取文章標題及內容?開始=============================
????ArticleTitle?=?RegExpText(Content_PageCode,"","",0)
????ArticleAuthor?=?RegExpText(Content_PageCode," 作者:","",0)
????ArticleFrom?=?RegExpText(Content_PageCode," 來源:","",0)
????ArticleContent?=?RegExpText(Content_PageCode,"",""&VBCrlf&"????????"&VBCrlf&"????",0)
????'=========取文章標題及內容?結束=============================
????Response.Write(ArticleTitle&?"
")
????Response.Flush()
Next
附幾個函數:
新聞熱點
疑難解答