生活随笔
收集整理的這篇文章主要介紹了
一网打尽中文编码转换---6种编码30个方向的转换
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
一網打盡中文編碼轉換 ——6種 編碼30個方向的轉換
1.問題提出
??? 在學編程序時,曾經有人問過“你可以編一個記事本程序嗎?”當時很不屑一顧,但是隨著學習MFC的深入,了解到記事本程序也并非易事,難點就是四種編碼之間的轉換。
對于編碼,這是一個令初學者頭疼的問題,特別是對于編碼的轉換,更是難以捉摸。筆者為了完成畢業設計中的一個編碼轉換模塊,研究了中文編碼和常見的字符集后,決定解決"記事本"程序的編碼問題,更進一步完成GB2312、Big5、GBK、Unicode 、Unicode big endian、UTF-8共6種編碼之間的任意轉換。
2.問題解決?????????? ???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
(1) 編碼基礎知識
a.了解編碼和字符集
這部分內容,我不在贅述,可參見CSDN Ancky的專欄中《各種字符集和編碼詳解》
博客地址 :http://blog.csdn.net/ancky/article/details/2034809
b. 單字節、雙字節、多字節
這部分內容,可參見我先前翻譯的博文《C++字符串完全指南--第一部分:win32?字符編碼》
博客地址: http://blog.csdn.net/ziyuanxiazai123/article/details/7482360
c. 區域和代碼頁
這部分內容,可參見博客 ? ?? http://hi.baidu.com/tzpwater/blog/item/bd4abb0b60bff1db3ac7636a.html
d.中文編碼GB2312、GBK、Big5,這部分內容請參見CSDN? lengshine 博客中《GB2312、GBK、Big5漢字編碼 》,博客地址:http://blog.csdn.net/lengshine/article/details/5470545
e.Windows程序的字符編碼
這部分內容,可參見博客http://blog.sina.com.cn/s/blog_4e3197f20100a6z2.html 中《Windows程序的字符編碼》
(2)編碼總結
a.六種編碼的特點
六種編碼的特點 如下圖所示:
b.編碼存儲差別
ANSI(在簡體中文中默認為GB2312)、Unicode、Unicode big endian 、UTF-8存儲存在差別。
以中文"你好"二字為例,他們存貯格式如下圖所示:
c.GB2312、Big5、GBK編碼的區別
三者中漢字均采用二個字節表示,但是字節表示的值范圍有所不同,如下圖所示:
(3)編碼轉換方式
6種編碼互相轉換,由排列組合知識知道共有30個方向的轉換.筆者采用的轉換方法,
多字節文件與Unicode文件轉換如下圖所示:
多字節文件之間轉換如下圖所示:
(4)編碼轉換使用的三個函數
a.MultiByteToWideChar
該函數完成多字節字符串向Unicode寬字符串的轉換.
函數原型為:
int MultiByteToWideChar(UINT CodePage,???????? // 代碼頁DWORD dwFlags,???????? // 轉換標志LPCSTR lpMultiByteStr, // 待轉換的字符串int cbMultiByte,?????? // 待轉換字符串的字節數目LPWSTR lpWideCharStr,? // 轉換后寬字符串的存儲空間int cchWideChar??????? // 轉換后寬字符串的存儲空間大小? 以寬字符大小為單位
);
b.WideCharToMultiByte
該函數完成Unicode寬字符串到多字節字符串的轉換,使用方法具體參見MSDN。
以上兩個函數可以完成大部分的字符串轉換,可以將其封裝成多字節和寬字節之間的轉換函數:[cpp] view plaincopyprint?
wchar_t *?Coder::MByteToWChar(UINT ?CodePage,LPCSTR ?lpcszSrcStr)??{?? ????LPWSTR ?lpcwsStrDes=NULL;?? ????int ???len=MultiByteToWideChar(CodePage,0,lpcszSrcStr,-1,NULL,0);?? ????lpcwsStrDes=new ?wchar_t [len+1];?? ????if (!lpcwsStrDes)?? ????????return ?NULL;?? ????memset(lpcwsStrDes,0,sizeof (wchar_t )*(len+1));?? ????len=MultiByteToWideChar(CodePage,0,lpcszSrcStr,-1,lpcwsStrDes,len);?? ????if (len)?? ????????return ?lpcwsStrDes;?? ????else ?? ????{????? ????????delete []?lpcwsStrDes;?? ????????return ?NULL;?? ????}?? }?? ?? char *?Coder::WCharToMByte(UINT ?CodePage,LPCWSTR ?lpcwszSrcStr)??{?? ????char *?lpszDesStr=NULL;?? ????int ?len=WideCharToMultiByte(CodePage,0,lpcwszSrcStr,-1,NULL,0,NULL,NULL);?? ????lpszDesStr=new ?char [len+1];?? ????memset(lpszDesStr,0,sizeof (char )*(len+1));?? ????if (!lpszDesStr)?? ????????return ?NULL;?? ????len=WideCharToMultiByte(CodePage,0,lpcwszSrcStr,-1,lpszDesStr,len,NULL,NULL);?? ????if (len)?? ????????return ?lpszDesStr;?? ????else ?? ????{????? ????????delete []?lpszDesStr;?? ????????return ?NULL;?? ????}?? }???
c.LCMapString
依賴于本地機器的字符轉換函數,尤其是中文編碼在轉換時要依賴于本地機器,
直接利用上述a、b中敘述的函數會產生錯誤,例如直接從GB2312轉換到Big5,利用MultiByteToWideChar函數將GB2312轉換到Unicode字符串,然后從Unicode字符串利用函數 WideCharToMultiByte轉換成Big5,將會發生錯誤,錯誤的結果如下圖所示:
因此中文編碼轉換時適當使用LCMapString函數,才能完成正確的轉換.
例如: [cpp] view plaincopyprint?
?? char *?Coder::GB2312ToBIG5(const ?char *?szGB2312Str)??{????????? ????????LCID ?lcid?=?MAKELCID(MAKELANGID(LANG_CHINESE,SUBLANG_CHINESE_SIMPLIFIED),SORT_CHINESE_PRC);?? ????????int ?nLength?=?LCMapString(lcid,LCMAP_TRADITIONAL_CHINESE,szGB2312Str,-1,NULL,0);?? ????????char *?pBuffer=new ?char [nLength+1];?? ????????if (!pBuffer)?? ????????????return ?NULL;?? ????????LCMapString(lcid,LCMAP_TRADITIONAL_CHINESE,szGB2312Str,-1,pBuffer,nLength);?? ????????pBuffer[nLength]=0;?? ????????wchar_t *?pUnicodeBuff?=?MByteToWChar(CP_GB2312,pBuffer);?? ????????char *?pBIG5Buff?=?WCharToMByte(CP_BIG5,pUnicodeBuff);?? ????????delete []?pBuffer;?? ????????delete []?pUnicodeBuff;?? ????????return ?pBIG5Buff;?? }??? (5)編碼實現
實現Coder類完成編碼轉換工作.
Coder類的代碼清單如下:[cpp] view plaincopyprint?
?? ?? ?? ?? #if?!defined(AFX_ENCODING_H__2AC955FB_9F8F_4871_9B77_C6C65730507F__INCLUDED_) ??#define?AFX_ENCODING_H__2AC955FB_9F8F_4871_9B77_C6C65730507F__INCLUDED_ ???? #if?_MSC_VER?>?1000 ??#pragma?once ??#endif?//?_MSC_VER?>?1000 ???? ?? ?? ?? ?? ?? ?? ?? ?? typedef ?enum ?CodeType??{?? ????CP_GB2312=936,?? ????CP_BIG5=950,?? ????CP_GBK=0?? }CodePages;?? ?? typedef ?enum ?TextCodeType??{????? ????GB2312=0,?? ????BIG5=1,?? ????GBK=2,?? ????UTF8=3,?? ????UNICODE=4,?? ????UNICODEBIGENDIAN=5,?? ????DefaultCodeType=-1?? }TextCode;?? class ?Coder????{?? public :??????Coder();?? ????virtual ?~Coder();?? public :???????? ????UINT ??PREDEFINEDSIZE;?? ?????? ????void ?SetDefaultConvertSize(UINT ?nCount);?? ?????? ????CString??CodeTypeToString(TextCode?tc);?? ?????? ????BOOL ?????FileToOtherFile(CString?filesourcepath,?CString?filesavepath,TextCode?tcTo,TextCode??tcCur=DefaultCodeType);?? ?????? ????BOOL ?????UnicodeEndianFileConvert(CString?filesourcepath,?CString?filesavepath,TextCode?tcTo);?? ?????? ????BOOL ?????MBFileToMBFile(CString?filesourcepath,?CString?filesavepath,TextCode?tcTo,TextCode??tcCur=DefaultCodeType);?? ?????? ????BOOL ?????UnicodeFileToMBFile(CString?filesourcepath,?CString?filesavepath,TextCode?tcTo);?? ?????? ????BOOL ?????MBFileToUnicodeFile(CString?filesourcepath,CString?filesavepath,TextCode?tcTo,TextCode??tcCur=DefaultCodeType);?? ?????? ????TextCode?GetCodeType(CString?filepath);?? ?????? ????char *?BIG5ToGB2312(const ?char *?szBIG5Str);?? ?????? ????char *?GB2312ToBIG5(const ?char *?szGB2312Str);?? ?????? ????char *?GBKToGB2312(const ?char ?*szGBkStr);?? ?????? ????char *????GB2312ToGBK(const ?char ?*szGB2312Str);?? ?????? ????char *?????GBKToBIG5(const ?char ?*szGBKStr);?? ?????? ????char *?????BIG5ToGBK(const ?char ?*szBIG5Str);?? ?????? ????char *?????WCharToMByte(UINT ?CodePage,LPCWSTR ?lpcwszSrcStr);?? ?????? ????wchar_t *??MByteToWChar(UINT ?CodePage,LPCSTR ?lpcszSrcStr);?? protected :???????? ????UINT ?GetCodePage(TextCode?tccur);?? ?????? ????char *??MByteToMByte(UINT ?CodePageCur,UINT ?CodePageTo,const ?char *?szSrcStr);?? ?????? ????void ???UnicodeEndianConvert(LPWSTR ??lpwszstr);?? ?????? ????const ??static ???byte?UNICODEBOM[2];?? ????const ??static ???byte?UNICODEBEBOM[2];?? ????const ??static ???byte?UTF8BOM[3];?? ??};?? ?? #endif?//?!defined(AFX_ENCODING_H__2AC955FB_9F8F_4871_9B77_C6C65730507F__INCLUDED_) ??[cpp] view plaincopyprint?
?? ?? ?? ?? #include?"stdafx.h" ??#include?"Coder.h" ??#include?"Encoding.h" ???? #ifdef?_DEBUG ??#undef?THIS_FILE ??static ?char ?THIS_FILE[]=__FILE__;??#define?new?DEBUG_NEW ??#endif ???? ?? ?? ?? ?? ?const ?????byte?Coder::UNICODEBOM[2]={0xFF,0xFE};?? ?const ?????byte?Coder::UNICODEBEBOM[2]={0xFE,0xFF};?? ?const ?????byte?Coder::UTF8BOM[3]={0xEF,0xBB,0xBF};?? Coder::Coder()?? {?? ???PREDEFINEDSIZE=2097152;?? }?? Coder::~Coder()?? {?? ???? }?? ?? char *?Coder::BIG5ToGB2312(const ?char *?szBIG5Str)??{????????? ????????CString?msg;?? ????????LCID ?lcid?=?MAKELCID(MAKELANGID(LANG_CHINESE,SUBLANG_CHINESE_SIMPLIFIED),SORT_CHINESE_PRC);?? ????????wchar_t *?szUnicodeBuff?=MByteToWChar(CP_BIG5,szBIG5Str);?? ????????char *?szGB2312Buff?=WCharToMByte(CP_GB2312,szUnicodeBuff);?? ????????int ?nLength?=?LCMapString(lcid,LCMAP_SIMPLIFIED_CHINESE,?szGB2312Buff,-1,NULL,0);?? ????????char *?pBuffer?=?new ?char [nLength?+?1];?? ????????if (!pBuffer)?? ??????????return ?NULL;?? ????????memset(pBuffer,0,sizeof (char )*(nLength+1));?? ????????LCMapString(0x0804,LCMAP_SIMPLIFIED_CHINESE,szGB2312Buff,-1,pBuffer,nLength);?? ????????delete []?szUnicodeBuff;?? ????????delete []?szGB2312Buff;?? ????????return ?pBuffer;?? }?? ?? char *?Coder::GB2312ToGBK(const ?char ?*szGB2312Str)??{?? ???????int ?nStrLen?=?strlen(szGB2312Str);?? ???????if (!nStrLen)?? ???????????return ?NULL;?? ???????LCID ?wLCID?=?MAKELCID(MAKELANGID(LANG_CHINESE,?SUBLANG_CHINESE_SIMPLIFIED),?SORT_CHINESE_PRC);?? ???????int ?nReturn?=?LCMapString(wLCID,?LCMAP_TRADITIONAL_CHINESE,?szGB2312Str,?nStrLen,?NULL,?0);?? ???????if (!nReturn)?? ??????????return ?NULL;?? ???????char ?*pcBuf?=?new ?char [nReturn?+?1];?? ???????if (!pcBuf)?? ??????????return ?NULL;?? ???????memset(pcBuf,0,sizeof (char )*(nReturn?+?1));?? ???????wLCID?=?MAKELCID(MAKELANGID(LANG_CHINESE,?SUBLANG_CHINESE_SIMPLIFIED),?SORT_CHINESE_PRC);?? ???????LCMapString(wLCID,?LCMAP_TRADITIONAL_CHINESE,?szGB2312Str,?nReturn,?pcBuf,?nReturn);?? ???????return ?pcBuf;?? }?? ?? char *?Coder::GBKToGB2312(const ?char ?*szGBKStr)??{?? ????int ?nStrLen?=?strlen(szGBKStr);?? ????if (!nStrLen)?? ????????return ?NULL;?? ????LCID ?wLCID?=?MAKELCID(MAKELANGID(LANG_CHINESE,?SUBLANG_CHINESE_SIMPLIFIED),?SORT_CHINESE_BIG5);?? ????int ?nReturn?=?LCMapString(wLCID,?LCMAP_SIMPLIFIED_CHINESE,?szGBKStr,?nStrLen,?NULL,?0);?? ????if (!nReturn)?? ????????return ?NULL;?? ????char ?*pcBuf?=?new ?char [nReturn?+?1];?? ????memset(pcBuf,0,sizeof (char )*(nReturn?+?1));?? ????wLCID?=?MAKELCID(MAKELANGID(LANG_CHINESE,?SUBLANG_CHINESE_SIMPLIFIED),?SORT_CHINESE_BIG5);?? ????LCMapString(wLCID,?LCMAP_SIMPLIFIED_CHINESE,?szGBKStr,?nReturn,?pcBuf,?nReturn);?? ????return ?pcBuf;?? }?? ?? char *???Coder::GBKToBIG5(const ?char ?*szGBKStr)??{????? ????char ?*pTemp=NULL;?? ????char ?*pBuffer=NULL;?? ????pTemp=GBKToGB2312(szGBKStr);?? ????pBuffer=GB2312ToBIG5(pTemp);?? ????delete []?pTemp;?? ????return ?pBuffer;?? }?? ?? char *???Coder::BIG5ToGBK(const ?char ?*szBIG5Str)??{?? ??????char ?*pTemp=NULL;?? ??????char ?*pBuffer=NULL;?? ??????pTemp=BIG5ToGB2312(szBIG5Str);?? ??????pBuffer=GB2312ToGBK(pTemp);?? ??????delete []?pTemp;?? ??????return ?pBuffer;?? }?? ?? char *?Coder::GB2312ToBIG5(const ?char *?szGB2312Str)??{????????? ????????LCID ?lcid?=?MAKELCID(MAKELANGID(LANG_CHINESE,SUBLANG_CHINESE_SIMPLIFIED),SORT_CHINESE_PRC);?? ????????int ?nLength?=?LCMapString(lcid,LCMAP_TRADITIONAL_CHINESE,szGB2312Str,-1,NULL,0);?? ????????char *?pBuffer=new ?char [nLength+1];?? ????????if (!pBuffer)?? ????????????return ?NULL;?? ????????LCMapString(lcid,LCMAP_TRADITIONAL_CHINESE,szGB2312Str,-1,pBuffer,nLength);?? ????????pBuffer[nLength]=0;?? ????????wchar_t *?pUnicodeBuff?=?MByteToWChar(CP_GB2312,pBuffer);?? ????????char *?pBIG5Buff?=?WCharToMByte(CP_BIG5,pUnicodeBuff);?? ????????delete []?pBuffer;?? ????????delete []?pUnicodeBuff;?? ????????return ?pBIG5Buff;?? }??? ?? ?? ?? ?? TextCode?Coder::GetCodeType(CString?filepath)?? {?? ????CFile?file;?? ????byte??buf[3];?? ????TextCode?tctemp;?? ????if (file.Open(filepath,CFile::modeRead))?? ????{?????? ????????file.Read(buf,3);?? ????????if (buf[0]==UTF8BOM[0]?&&?buf[1]==UTF8BOM[1]?&&?buf[2]==UTF8BOM[2])?? ????????????return ?UTF8;?? ????????else ?? ????????if (buf[0]==UNICODEBOM[0]?&&buf[1]==UNICODEBOM[1]?)?? ????????????return ?UNICODE?;?? ????????else ?? ????????if (buf[0]==UNICODEBEBOM[0]?&&buf[1]==UNICODEBEBOM[1]?)?? ????????????return ?UNICODEBIGENDIAN;?? ????????else ?? ????????{????? ????????????int ?time=30;?? ????????????while (file.Read(buf,2)?&&time?)?? ????????????{????? ????????????????if ?(?(buf[0]>=176?&&?buf[0]<=247)?&&?(buf[1]>=160?&&?buf[1]<=254)?)?? ????????????????????????????tctemp=GB2312;???? ????????????????else ?? ????????????????????if ?(?(buf[0]>=129?&&?buf[0]<=255)?&&?(??(?buf[1]>=64?&&?buf[1]<=126)??||??(?buf[1]>=161?&&?buf[1]<=254)?)?)?? ????????????????????????????tctemp=BIG5;?? ????????????????????else ?? ????????????????????????if ?(?(buf[0]>=129?&&?buf[0]?<=254)?&&?(buf[1]>=64?&&?buf[1]<=254))?? ????????????????????????????tctemp=GBK;??? ????????????????time--;?? ????????????????file.Seek(100,CFile::current);?? ????????????}?? ????????????return ?tctemp;?? ????????}?? ????}?? ????else ?? ????????return ?GB2312;?? }?? ?? BOOL ?Coder::MBFileToUnicodeFile(CString?filesourcepath,?CString?filesavepath,TextCode?tcTo,TextCode?tcCur)??{?? ???TextCode?curtc;?? ???CFile?filesource,filesave;;?? ???char ?????*pChSrc=NULL;?? ???char ?????*pChTemp=NULL;?? ???wchar_t ??*pwChDes=NULL;?? ???DWORD ??filelength,readlen,len;?? ???int ????bufferlen,strlength;?? ???UINT ?CodePage;?? ????? ???if (tcCur!=DefaultCodeType)?? ???????curtc=tcCur;?? ???else ?? ???????curtc=GetCodeType(filesourcepath);?? ???if (curtc>UTF8?||?tcTo<?UNICODE?||?curtc==tcTo)?? ???????return ?FALSE;?? ????? ???if (!filesource.Open(filesourcepath,CFile::modeRead)?||?0==(filelength=filesource.GetLength()))?? ???????return ?FALSE;?? ???if (?!filesave.Open(filesavepath,CFile::modeCreate|CFile::modeWrite))?? ????????return ?FALSE;?? ????? ???if (filelength<PREDEFINEDSIZE)?? ???????bufferlen=filelength;?? ???else ?? ???????bufferlen=PREDEFINEDSIZE;?? ???pChSrc=new ?char [bufferlen+1];?? ???if (!pChSrc)?? ????????????return ?FALSE;?? ????? ???switch (curtc)?? ???{?? ???case ?GB2312:?? ???????CodePage=CP_GB2312;?? ???????break ;?? ???case ?GBK:?? ???????CodePage=CP_GB2312;?? ???????break ;?? ???case ?BIG5:?? ???????CodePage=CP_BIG5;?? ???????break ;?? ???case ?UTF8:?? ???????CodePage=CP_UTF8;?? ???????break ;?? ???default :?? ???????break ;?? ????}?? ????? ???if (UTF8==curtc)?? ???????filesource.Seek(3*sizeof (byte),CFile::begin);?? ????? ???if (UNICODEBIGENDIAN==tcTo)?? ???????filesave.Write(&UNICODEBEBOM,2*sizeof (byte));?? ???else ?? ???????filesave.Write(&UNICODEBOM,2*sizeof (byte));?? ????? ???while (filelength>0)?? ???{?? ???????memset(pChSrc,0,?sizeof (char )*(bufferlen+1));?? ???????if (filelength>PREDEFINEDSIZE)?? ???????????len=PREDEFINEDSIZE;?? ???????else ?? ???????????len=filelength;?? ???????readlen=filesource.Read(pChSrc,len);?? ???????if (!readlen)?? ????????????break ;?? ????????? ???????if (GBK==curtc)?? ???????{????? ???????????pChTemp=pChSrc;?? ???????????pChSrc=GBKToGB2312(pChSrc);?? ???????}?? ???????pwChDes=MByteToWChar(CodePage,pChSrc);?? ???????if (pwChDes)?? ???????{?? ???????????if (UNICODEBIGENDIAN==tcTo)?? ???????????????UnicodeEndianConvert(pwChDes);?? ???????????strlength=wcslen(pwChDes)*2;?? ???????????filesave.Write(pwChDes,strlength);?? ???????????filesave.Flush();?? ???????????filelength-=readlen;?? ???????}?? ???????else ?? ???????????break ;?? ???}?? ???delete []?pChSrc;?? ???delete []?pChTemp;?? ???delete []?pwChDes;?? ???return ?TRUE;?? }?? ?? wchar_t *?Coder::MByteToWChar(UINT ?CodePage,LPCSTR ?lpcszSrcStr)??{?? ????LPWSTR ?lpcwsStrDes=NULL;?? ????int ???len=MultiByteToWideChar(CodePage,0,lpcszSrcStr,-1,NULL,0);?? ????lpcwsStrDes=new ?wchar_t [len+1];?? ????if (!lpcwsStrDes)?? ????????return ?NULL;?? ????memset(lpcwsStrDes,0,sizeof (wchar_t )*(len+1));?? ????len=MultiByteToWideChar(CodePage,0,lpcszSrcStr,-1,lpcwsStrDes,len);?? ????if (len)?? ????????return ?lpcwsStrDes;?? ????else ?? ????{????? ????????delete []?lpcwsStrDes;?? ????????return ?NULL;?? ????}?? }?? ?? char *?Coder::WCharToMByte(UINT ?CodePage,LPCWSTR ?lpcwszSrcStr)??{?? ????char *?lpszDesStr=NULL;?? ????int ?len=WideCharToMultiByte(CodePage,0,lpcwszSrcStr,-1,NULL,0,NULL,NULL);?? ????lpszDesStr=new ?char [len+1];?? ????memset(lpszDesStr,0,sizeof (char )*(len+1));?? ????if (!lpszDesStr)?? ????????return ?NULL;?? ????len=WideCharToMultiByte(CodePage,0,lpcwszSrcStr,-1,lpszDesStr,len,NULL,NULL);?? ????if (len)?? ????????return ?lpszDesStr;?? ????else ?? ????{????? ????????delete []?lpszDesStr;?? ????????return ?NULL;?? ????}?? }??? ?? void ?Coder::UnicodeEndianConvert(LPWSTR ?lpwszstr)??{?????? ?????wchar_t ??wchtemp[2];???? ?????long ?????index;??? ?????int ?len=wcslen(lpwszstr);?? ?????if (!len)?? ?????????return ;?? ????? ???index=0;?? ???while (?index<len)?? ???{?? ???????wchtemp[0]=lpwszstr[index];?? ???????wchtemp[1]=lpwszstr[index+1];?? ????????? ???????unsigned?char ?high,?low;?? ???????high?=?(wchtemp[0]?&?0xFF00)?>>8;?? ???????low??=?wchtemp[0]?&?0x00FF;?? ???????wchtemp[0]?=?(?low?<<8)?|?high;?? ???????high?=?(wchtemp[1]?&?0xFF00)?>>8;?? ???????low??=?wchtemp[1]?&?0x00FF;?? ???????wchtemp[1]?=?(?low?<<8)?|?high;?? ????????? ???????lpwszstr[index]=wchtemp[0];?? ???????lpwszstr[index+1]=wchtemp[1];?? ???????index+=2;?? ???}?? }?? ?? BOOL ?Coder::UnicodeFileToMBFile(CString?filesourcepath,?CString?filesavepath,TextCode?tcTo)??{????? ????TextCode?curtc;?? ????CFile?filesource,filesave;;?? ????char ????*pChDes=NULL;?? ????char ????*pChTemp=NULL;?? ????wchar_t ?*pwChSrc=NULL;?? ????DWORD ??filelength,readlen,len;?? ????int ????bufferlen,strlength;?? ????UINT ?CodePage;?? ????curtc=GetCodeType(filesourcepath);?? ?????? ????if (curtc<=UTF8?||??tcTo>UTF8?||?curtc==tcTo)?? ????????return ?FALSE;?? ?????? ????if (!filesource.Open(filesourcepath,CFile::modeRead)?||?0==(filelength=filesource.GetLength()))?? ????????return ?FALSE;?? ????if (?!filesave.Open(filesavepath,CFile::modeCreate|CFile::modeWrite))?? ????????return ?FALSE;?? ?????? ????if (filelength<PREDEFINEDSIZE)?? ????????bufferlen=filelength;?? ????else ?? ????????bufferlen=PREDEFINEDSIZE;?? ????pwChSrc=new ?wchar_t [(bufferlen/2)+1];?? ????if (!pwChSrc)?? ????????return ?FALSE;?? ?????? ????switch (tcTo)?? ????{????? ????case ?GB2312:?? ????????CodePage=CP_GB2312;?? ????????break ;?? ????case ?GBK:?? ????????CodePage=CP_GB2312;?? ????????break ;?? ????case ?BIG5:??? ????????CodePage=CP_GB2312;?? ????????break ;?? ????case ?UTF8:?? ????????CodePage=CP_UTF8;?? ????????break ;?? ????default :?? ????????break ;?? ????????}?? ????filesource.Seek(sizeof (wchar_t ),CFile::begin);?? ????while (filelength>0)?? ????{?? ????????memset(pwChSrc,0,sizeof (wchar_t )*((bufferlen/2)+1));?? ????????if (filelength>PREDEFINEDSIZE)?? ????????????len=PREDEFINEDSIZE;?? ????????else ?? ????????????len=filelength;?? ????????readlen=filesource.Read(pwChSrc,len);?? ????????if (!readlen)?? ????????????break ;?? ????????if (UNICODEBIGENDIAN==curtc)?? ????????????UnicodeEndianConvert(pwChSrc);?? ????????pChDes=WCharToMByte(CodePage,pwChSrc);?? ?????????? ????????if (GBK==tcTo)?? ????????{?? ????????????pChTemp=pChDes;?? ????????????pChDes=GB2312ToGBK(pChDes);?? ????????}?? ????????if (BIG5==tcTo)?? ????????{?? ????????????pChTemp=pChDes;?? ????????????pChDes=GB2312ToBIG5(pChDes);?? ????????}?? ????????if (pChDes)?? ????????{????? ????????????strlength=strlen(pChDes);?? ????????????filesave.Write(pChDes,strlength);?? ????????????filesave.Flush();?? ????????????filelength-=readlen;?? ????????}?? ????????else ?? ????????????break ;?? ????}?? ????delete []?pChDes;?? ????delete []?pChTemp;?? ????delete []?pwChSrc;?? ????return ?TRUE;?? }?? ?? ?? BOOL ?Coder::MBFileToMBFile(CString?filesourcepath,?CString?filesavepath,TextCode?tcTo,TextCode??tcCur)??{?? ????BOOL ?bret=FALSE;?? ????TextCode?curtc;?? ????CFile?filesource,filesave;?? ????char ????*pChDes=NULL;?? ????char ????*pChSrc=NULL;?? ????DWORD ??filelength,readlen,len;?? ????int ????bufferlen,strlength;?? ????UINT ???CodePageCur,CodePageTo;?? ?????? ????if (DefaultCodeType!=tcCur)?? ?????????curtc=tcCur;?? ????else ?? ????????curtc=GetCodeType(filesourcepath);?? ?????? ????if (curtc>UTF8?||?tcTo>UTF8?||?curtc==tcTo)?? ????????return ?FALSE;?? ?????? ????if (!filesource.Open(filesourcepath,CFile::modeRead)?||?0==(filelength=filesource.GetLength()))?? ????????return ?FALSE;?? ????if (?!filesave.Open(filesavepath,CFile::modeCreate|CFile::modeWrite))?? ????????return ?FALSE;?? ?????? ????if (filelength<PREDEFINEDSIZE)?? ????????bufferlen=filelength;?? ????else ?? ????????bufferlen=PREDEFINEDSIZE;?? ????pChSrc=new ?char [bufferlen+1];?? ????if (!pChSrc)?? ????????????return ?FALSE;?? ????if (UTF8==curtc)?? ????????filesource.Seek(3*sizeof (byte),CFile::begin);?? ????CodePageCur=GetCodePage(curtc);?? ????CodePageTo=GetCodePage(tcTo);?? ????while (filelength>0)?? ????{????? ????????memset(pChSrc,0,sizeof (char )*(bufferlen+1));?? ????????if (filelength>PREDEFINEDSIZE)?? ????????????len=PREDEFINEDSIZE;?? ????????else ?? ????????????len=filelength;?? ????????readlen=filesource.Read(pChSrc,len);?? ????????if (!readlen)?? ????????????break ;?? ????????pChDes=MByteToMByte(CodePageCur,CodePageTo,pChSrc);?? ????????if (pChDes)?? ????????{????? ????????????strlength=strlen(pChDes);?? ????????????filesave.Write(pChDes,strlength);?? ????????????filelength-=readlen;?? ????????}?? ????????else ?? ????????????break ;?? ????}?? ????delete []?pChSrc;?? ????delete []?pChDes;?? ????return ?TRUE;?? }?? ?? BOOL ?Coder::UnicodeEndianFileConvert(CString?filesourcepath,?CString?filesavepath,TextCode?tcTo)??{?? ????TextCode?curtc=GetCodeType(filesourcepath);?? ????if (curtc!=UNICODE?&&?curtc!=UNICODEBIGENDIAN)?? ????????return ?FALSE;?? ????if (curtc==tcTo)?? ????????return ?FALSE;?? ????CFile?filesource,filesave;;?? ????wchar_t ?*pwChDes;?? ????DWORD ?length;?? ????if (!filesource.Open(filesourcepath,CFile::modeRead)?||?!filesave.Open(filesavepath,CFile::modeCreate|CFile::modeWrite))?? ????????return ?FALSE;?? ????length=filesource.GetLength();?? ????if (!length)?? ????????return ?FALSE;?? ????pwChDes=new ?wchar_t [(length/2)+1];?? ????if (!pwChDes)?? ????????return ?FALSE;?? ????memset(pwChDes,0,sizeof (wchar_t )*((length/2)+1));?? ????filesource.Read(pwChDes,length);?? ????UnicodeEndianConvert(pwChDes);?? ????length=wcslen(pwChDes)*2;?? ????if (UNICODE==tcTo)?? ????????filesave.Write(&UNICODEBOM,2*sizeof (byte));?? ????else ?? ????????filesave.Write(&UNICODEBEBOM,2*sizeof (byte));?? ????filesave.Write(pwChDes,length);?? ????filesave.Flush();?? ????delete []?pwChDes;?? ????return ?TRUE;?? }?? ?? ?? BOOL ?Coder::FileToOtherFile(CString?filesourcepath,?CString?filesavepath,?TextCode?tcTo,TextCode??tcCur)??{????? ????TextCode?curtc;?? ????BOOL ?bret=FALSE;?? ????if (DefaultCodeType!=tcCur)?? ????????curtc=tcCur;?? ????else ?? ????????curtc=GetCodeType(filesourcepath);?? ????if (curtc==tcTo)?? ????????return ?FALSE;?? ?????? ????if (curtc>=UNICODE&&?tcTo>=UNICODE)?? ????????????bret=UnicodeEndianFileConvert(filesourcepath,filesavepath,tcTo);?? ????else ?? ?????????? ????????if (curtc<UNICODE?&&?tcTo>=UNICODE)?? ????????????bret=MBFileToUnicodeFile(filesourcepath,filesavepath,tcTo,curtc);?? ????else ?? ?????????? ????????if (curtc>=UNICODE?&&?tcTo<UNICODE)?? ????????????bret=UnicodeFileToMBFile(filesourcepath,filesavepath,tcTo);?? ????else ?? ?????????? ????????if (curtc<UNICODE?&&?tcTo<UNICODE)?? ????????????bret=MBFileToMBFile(filesourcepath,filesavepath,tcTo,curtc);?? ????return ?bret;?? }?? ?? CString?Coder::CodeTypeToString(TextCode?tc)?? {?? ??????CString?strtype;?? ??????switch (tc)?? ??????{?? ??????case ?GB2312:?? ???????????strtype=_T("GB2312" );?? ???????????break ;?? ??????case ?BIG5:?? ??????????strtype=_T("Big5" );?? ???????????break ;?? ??????case ?GBK:?? ??????????strtype=_T("GBK" );?? ???????????break ;?? ??????case ?UTF8:?? ??????????strtype=_T("UTF-8" );?? ???????????break ;?? ??????case ?UNICODE:?? ??????????strtype=_T("Unicode" );?? ???????????break ;?? ??????case ?UNICODEBIGENDIAN:?? ??????????strtype=_T("Unicode?big?endian" );?? ???????????break ;?? ??????}?? ??????return ?strtype;?? }?? ?? char *?Coder::MByteToMByte(UINT ?CodePageCur,?UINT ?CodePageTo,?const ?char *?szSrcStr)??{?? ????char ????*pchDes=NULL;?? ????char ????*pchTemp=NULL;?? ????wchar_t ?*pwchtemp=NULL;?? ?????? ????if (CodePageCur!=CP_UTF8??&&?CodePageTo!=CP_UTF8)?? ????{?? ????????switch (CodePageCur)?? ????????{?? ????????????case ?CP_GB2312:?? ????????????????{?? ????????????????????if (CP_BIG5==CodePageTo)??? ???????????????????????pchDes=GB2312ToBIG5(szSrcStr);?? ????????????????????else ?? ???????????????????????pchDes=GB2312ToGBK(szSrcStr);?? ????????????????????break ;?? ????????????????}?? ????????????case ?CP_BIG5:?? ????????????????{????? ????????????????????if (CP_GB2312==CodePageTo)?? ????????????????????????pchDes=BIG5ToGB2312(szSrcStr);?? ????????????????????else ?? ????????????????????????pchDes=BIG5ToGBK(szSrcStr);?? ????????????????????break ;?? ????????????????}?? ????????????case ?CP_GBK:?? ????????????????{????? ????????????????????if (CP_GB2312==CodePageTo)?? ????????????????????????pchDes=GBKToGB2312(szSrcStr);?? ????????????????????else ?? ????????????????????????pchDes=GBKToBIG5(szSrcStr);?? ????????????????????break ;?? ????????????????}?? ????????}?? ????}?? ????else ?? ????{?????? ?????????if (CP_UTF8==CodePageCur)?? ?????????{???? ????????????pwchtemp=MByteToWChar(CodePageCur,szSrcStr);?? ????????????if (CP_GB2312==CodePageTo)?? ????????????{?? ????????????????pchDes=WCharToMByte(CP_GB2312,pwchtemp);?? ????????????}?? ????????????else ?? ????????????{?????? ????????????????pchTemp=WCharToMByte(CP_GB2312,pwchtemp);?? ?????????????????if (CP_GBK==CodePageTo)?? ????????????????????pchDes=GB2312ToGBK(pchTemp);?? ?????????????????else ?? ????????????????????pchDes=GB2312ToBIG5(pchTemp);?? ????????????}?? ?????????}?? ??????????? ?????????else ??? ?????????{?????? ??????????????if (CP_GBK==CodePageCur)?? ??????????????{????? ?? ??????????????????pchTemp=GBKToGB2312(szSrcStr);?? ??????????????????pwchtemp=MByteToWChar(CP_GB2312,pchTemp);?? ??????????????}?? ??????????????else ?? ????????????????pwchtemp=MByteToWChar(CodePageCur,szSrcStr);?? ??????????????pchDes=WCharToMByte(CodePageTo,pwchtemp);?? ?????????}?? ????}?? ????delete []?pchTemp;?? ????delete []?pwchtemp;?? ????return ?pchDes;?? }?? ?? UINT ?Coder::GetCodePage(TextCode?tccur)??{?? ??????UINT ?CodePage;?? ??????switch (tccur)?? ??????{?? ??????case ?GB2312:?? ??????????CodePage=CP_GB2312;?? ??????????break ;?? ??????case ?BIG5:?? ??????????CodePage=CP_BIG5;?? ??????????break ;?? ??????case ?GBK:?? ??????????CodePage=CP_GBK;?? ??????????break ;?? ??????case ?UTF8:?? ??????????CodePage=CP_UTF8;?? ??????????break ;?? ??????case ?UNICODEBIGENDIAN:?? ??????case ?UNICODE:?? ???????????break ;?? ????}?? ??????return ?CodePage;?? }?? ?? void ?Coder::SetDefaultConvertSize(UINT ?nCount)??{?????? ?????if (nCount!=0)?? ????????PREDEFINEDSIZE=nCount;?? }?? 3.運行效果
在win7 VC 6.0下測試六種編碼的轉換測試通過,30個方向的轉換如下圖所示:
測試程序運行效果如下圖所示:
GB2312轉換到GBK編碼效果如下圖所示:
UTF-8轉換到Big5編碼的效果如下圖所示:
本文代碼及轉碼程序下載 :http://download.csdn.net/user/ziyuanxiazai123
4.尚未解決的問題?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
(1)LCMapString函數的理解還不完全熟悉,其中參數偏多,理解需要一定基礎知識。
(2)為什么記事本程序的轉碼后存在些亂碼,亂碼是正確的嗎?因為我的程序使用了中間過渡形式,因此沒有任何亂碼。
(3)是否有更簡單和清晰的方式實現編碼轉換,待進一步研究。
總結
以上是生活随笔 為你收集整理的一网打尽中文编码转换---6种编码30个方向的转换 的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔 網站內容還不錯,歡迎將生活随笔 推薦給好友。