python使用心得之獲得github代碼庫列表

2020-02-23 05:30:02

字體：大中小

來源：轉載

供稿：網友

1.背景

項目需求，要求獲得github的repo的api，以便可以提取repo的數據進行分析。研究了一天，終于解決了這個問題，雖然效率還是比較低下。

因為github的那個顯示repo的api，列出了每個repo的詳細信息，而且是json格式的。現在貌似還沒有找到可以分析多個json格式數據的方法，所以用的是比較蠢得splite加re的方法。如果大家有更好的方法，不發留言討論！

2.代碼

import reimport osdef GetUrl(num): str = os.popen("curl -G https://api.github.com/repositories?since=%d"%(num)).read() pattern = '"url"' pattern1='repos' urls=str.split(',/n')   for i in urls:  if pattern in i and pattern1 in i:   #  text1=i.splite(':')  text=re.compile('"(.*?)"').findall(i)[1]  print textif __name__=='__main__': GetUrl(1000)

其中num的值指的是頁面的id，我們可以做一個循環，不斷增大num的值，就可以無限提取repo。因為github的api對于流量是有限制的，所以這么做是一個可行的方法。

效果如下（提取下來的repo的api地址）：

https://api.github.com/repos/wycats/merb-core

https://api.github.com/repos/rubinius/rubinius

https://api.github.com/repos/mojombo/god

https://api.github.com/repos/vanpelt/jsawesome

https://api.github.com/repos/wycats/jspec

https://api.github.com/repos/defunkt/exception_logger

https://api.github.com/repos/defunkt/ambition

https://api.github.com/repos/technoweenie/restful-authentication

https://api.github.com/repos/technoweenie/attachment_fu

https://api.github.com/repos/topfunky/bong