織夢采集,一般用不到采集網址有端口的情況,少數有端口的網址就無法采集了。總結了下dede無法采集端口不為80的網址錯誤解決:
問題描述,當采集的網址后代端口時(為防止有推廣嫌疑就把網址換成xxx了。):
測試采集網址:http://www.xxx.com:89/index.php/main/news/index.html?channel_id=104&page=1
獲取的列表測試信息網址是不帶端口的結果是不帶端口的數組集合:
測試的列表網址: http://www.xxx.com:89/index.php/main/news/index.html?channel_id=104&page=1
Array ( [0] => Array ( [title] => 講座回放|施奠東—西湖,世界風景園林的 [link] => http://www.xxx.com/index.php/main/news/15529.html [image] => http://www.xxx.com/uploadfiles/articles/20190528/15529.png ) [1] => Array ( [title] => 喜報|恭賀我院2019年度西湖杯榮獲佳績! [link] => http://www.xxx.com/index.php/main/news/15528.html [image] => http://www.xxx.com/uploadfiles/articles/20190522/15528.jpg ) [2] => Array ( [title] => 講座預告|西湖——世界風景園林的杰出范 [link] => http://www.xxx.com/index.php/main/news/15526.html [image] => http://www.xxx.com/uploadfiles/articles/20190516/15526.jpg ) [3] => Array ( [title] => 講座回放|胡理琛—西湖七十年流變憶勝 [link] => http://www.xxx.com/index.php/main/news/15524.html [image] => http://www.xxx.com/uploadfiles/articles/20190513/15524.png ) [4] => Array ( [title] => 講座回放|彭嘉恒—“南師、禪及其在西方 [link] => http://www.xxx.com/index.php/main/news/15518.html [image] => http://www.xxx.com/uploadfiles/articles/20190507/15518.png ) [5] => Array ( [title] => 講座預告|胡理琛—西湖七十年流變憶勝 [link] => http://www.xxx.com/index.php/main/news/15516.html [image] => http://www.xxx.com/uploadfiles/articles/20190430/15516.jpg ) ) |
這樣顯然得到的網址是錯誤的。根本無法訪問,也就無法采集了。
經過一番查找,原來是dede 設置HTML的內容和來源網址 的函數問題,漏寫端口判斷了。
在include/dedehtml2.class.php
function SetSource 函數里大概79行加上紅框里的內容:
再測試一下。ok 了,這樣網址就可以正常打開,采集到了。
付上代碼:
function SetSource(&$html, $url = '', $linktype='') { $this->__construct(); $this->CAtt = new DedeAttribute2(); $url = trim($url); $this->SourceHtml = $html; $this->BaseUrl = $url; //判斷文檔相對于當前的路徑 $urls = @parse_url($url); $port=$urls['port']=='80'?'':':'.$urls['port'];//lyy 為80時候可以省略,否則就加上 $this->HomeUrl = $urls['host'].$port; $this->BaseUrlPath = $this->HomeUrl.$urls['path']; $this->BaseUrlPath = preg_replace("///([^//]*)/.(.*)$/","/",$this->BaseUrlPath); $this->BaseUrlPath = preg_replace("///$/",'',$this->BaseUrlPath); if($linktype!='') { $this->GetLinkType = $linktype; } if($html != '') { $this->Analyser(); } } |
新聞熱點
疑難解答