PHP判斷字符串編碼是否為utf-8的程序代碼

2024-05-04 21:47:59

字體：大中小

來源：轉載

供稿：網友

我們以前常用mb_detect_encoding()此函數檢測字符編碼,代碼如下:

//判斷字符串是什么編碼

if ($tag === mb_convert_encoding(mb_convert_encoding($tag, "GB2312", "UTF-8"), "UTF-8", "GB2312")) {

}

else {//如果是gb2312 的就轉換為utf8的

$tag = mb_convert_encoding($tag, 'UTF-8', 'GB2312');

}

$keytitle = “%D0%BE%C6%AC”;時,檢測結果卻是UTF-8.這個bug其實不算是bug,寫程序時也不應當過于依賴mb_detect_encoding,當字符串較短時,檢測結果產生偏差的可能性很大,怎么解決呢,我的辦法是:

$encode = mb_detect_encoding($keytitle, array('ASCII','GB2312′,'GBK','UTF-8');

參數分別是:被檢測的輸入變量、編碼方式的檢測順序(一旦為真,后面自動忽略)、strict模式,對編碼檢測的順序進行調整,將最大可能性放在前面,這樣減少被錯誤轉換的機會.

上面辦法還是解決不了,下面又找到了一個解決方法,代碼如下:

// Returns true if $string is valid UTF-8 and false otherwise.

function is_utf8($word)

{

if (preg_match("/^([".chr(228)."-".chr(233)."]{1}[".chr(128)."-".chr(191)."]{1}[".chr(128)."-".chr(191)."]{1}){1}/",$word) == true || preg_match("/([".chr(228)."-".chr(233)."]{1}[".chr(128)."-".chr(191)."]{1}[".chr(128)."-".chr(191)."]{1}){1}$/",$word) == true || preg_match("/([".chr(228)."-".chr(233)."]{1}[".chr(128)."-".chr(191)."]{1}[".chr(128)."-".chr(191)."]{1}){2,}/",$word) == true)

{

return true;

}

else

{

return false;

}

} // function is_utf8