這里,我用dom進行xml解析,應為它簡單。
1 客戶首先使用vb進行編輯表單,生成一個apply.xml文件。
在vb中,使用msxml 4.0。如果不設定編碼方式,保存的時候,文件默認就是utf-8編碼
set dom = createdom
set node = dom.createprocessinginstruction("xml", "version='1.0'")
dom.appendchild node
set node = nothing
2 接下來,客戶將這個xml通過web上傳到服務器
在php中,xmldom只支持utf-8作為默認編碼。所以生成的xml文件,上傳以后可以直接解析這個文件,獲得一些信息
if (!$dom = domxml_open_mem($content)) {
$t->assign('msg', "文件解析錯誤!");
$t->render('noavailable.html', page_title, 'wrap.html');
exit;
}
接下來,要將這個文件存到數據庫里面,因為數據庫使用ms sql server,它不支持utf-8的數據結構,所以將整個文件以二進制的方式存到數據庫里面,這里讓我搞了半天的就是二進制文件的存放方式,如果是mysql,那不需要做任何轉換就可以直接存了,但是mssql不行,原因是:
this is because the mssql parser makes a clear distinction between binary an character constants. you can therefore not easilly insert binary data with "column = '$data'" syntax like in mysql and others.
the mssql documentation states that binary constants should be represented by their unquoted hexadecimal byte-string. that is.. to set the binary column "col" to contain the bytes 0x12, 0x65 and 0x35 you shold do "col = 0x126535" in you query.
具體操作如下:
//讀取上傳的文件
$original = $_files['content']['name'];
if (!empty($original)) {
if ($_files['content']['type'] == "text/xml") {
$filename = $_files['content']['tmp_name'];
$handle = fopen($filename, "rb");
$originalcontent = fread($handle, filesize($filename));
fclose($handle);
}
} //end if(!empty($original))
$originalcontent = unpack("h*hex", $originalcontent); //這步是關鍵
$db->query("insert into ".tbl_sb_online_user." (sb_id, user_id, username, sbmc, content, created_date) values ("
.$newid.", "
.$u.", "
.$db->quote(stripslashes($name)).", "
.$db->quote(stripslashes($sbmc)).", 0x"
.$originalcontent['hex'].", " //注意這里,前面有0x
."'$now')");
3 上傳之后,用戶也可以在網上對這個文件進行在線編輯,這時需要將這個文件從數據庫讀出,然后還原成utf-8編碼,再進行解析。雖然我們上面使用了unpack,但讀出的時候不需要還原。
$sb = $db->getrow('select sbmc, content from '.tbl_sb_online_user." where sb_id = $sb_id");
$originalcontent =$sb[content];
if (!$dom = domxml_open_mem($originalcontent)) {
$t->assign('msg', "文件解析錯誤!");
$t->render('noavailable.html', page_title, 'wrap.html',true);
exit;
}
$context = xpath_new_context($dom);
$xpath = $context->xpath_eval("http://material/xm");
$t->assign('xm',iconv("utf-8","gbk",$xpath->nodeset[0]->get_content()));
讀出的時候,mssql除了用于 sql server 的 microsoft ole db 提供程序和 sql server odbc 驅動程序自動將 @@textsize 設置為最大值 2 gb。其他的都是4096 (4 kb),所以用php訪問時候,務必將下面打開mssql.textlimit = 2147483647
mssql.textsize = 2147483647
4 后臺用vb,要解析該函數需要添加以下代碼,用來將byte()轉換成utf-8編碼
public declare function multibytetowidechar lib "kernel32" (byval codepage as long, byval dwflags as long, byval lpmultibytestr as long, _
byval cchmultibyte as long, byval lpwidecharstr as long, byval cchwidechar as long) as long
public const cp_utf8 = 65001
public function utf8_decode(butf8() as byte) as string
dim lret as long
dim llen as long
dim lbuffersize as long
dim sbuffer as string
dim bbuffer() as byte
llen = ubound(butf8) + 1
if llen = 0 then exit function
lbuffersize = llen * 2
sbuffer = string$(lbuffersize, chr(0))
lret = multibytetowidechar(cp_utf8, 0, varptr(butf8(0)), llen, strptr(sbuffer), lbuffersize)
if lret <> 0 then
sbuffer = left(sbuffer, lret)
end if
utf8_decode = sbuffer
end function
具體讀數據庫的操作是
dim varcontent() as byte
varfilesize = mrc.fields("content").actualsize
varcontent = mrc.fields("content").getchunk(varfilesize)
content = utf8_decode(varcontent)
xmldoc.async = false
xmldoc.resolveexternals = false
xmldoc.loadxml (content)
if (xmldoc.parseerror.errorcode <> 0) then
dim myerr
set myerr = xmldoc.parseerror
msgbox ("發生錯誤 " & myerr.reason)
else
xmldoc.setproperty "selectionlanguage", "xpath"
5 后臺,在java里面就更好操作了,將讀出的數據變成byte[],然后轉換成utf-8的字符串。
最后要說的是,php的確是一個非常強大的腳本語言,如果開發php過程中遇到難以解決,google都不容易搜到的問題,大家直接上php.net的在線文檔,文檔里面通常有很多好心人將自己的使用心得寫在上面,非常有幫助。
新聞熱點
疑難解答