文件批量改名工具官网–新起飞部落 ? lazarus UTF8 unicode

来源:百度文库 编辑:神马文学网 时间:2024/04/28 16:05:22

lazarus UTF8 unicode

2009年10月6日 Admin 发表评论 阅读评论

最近在用Lazarus学写程序,被他的string搞郁闷了,中文乱码,汗一个

没办法了,一咬牙,闹清UTF8大概是杂回事吧,查资料查到这个东东,不错,存个档。

Unicode和UTF-8之间的转换关系表

UCS-4编码 UTF-8字节流 U-00000000 – U-0000007F: 0xxxxxxx U-00000080 – U-000007FF: 110xxxxx 10xxxxxx U-00000800 – U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx U-00010000 – U-001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx U-00200000 – U-03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx U-04000000 – U-7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx


看完上面的东东,在来看这一段程序,会有更深的理解。来自LCLProc。

function UTF8CharacterLength(p: PChar): integer;beginif p<>nil then beginif ord(p^)<%11000000 then begin// regular single byte character (#0 is a character, this is pascal ;)Result:=1;endelse if ((ord(p^) and %11100000) = %11000000) then begin// could be 2 byte characterif (ord(p[1]) and %11000000) = %10000000 thenResult:=2elseResult:=1;endelse if ((ord(p^) and %11110000) = %11100000) then begin// could be 3 byte characterif ((ord(p[1]) and %11000000) = %10000000)and ((ord(p[2]) and %11000000) = %10000000) thenResult:=3elseResult:=1;endelse if ((ord(p^) and %11111000) = %11110000) then begin// could be 4 byte characterif ((ord(p[1]) and %11000000) = %10000000)and ((ord(p[2]) and %11000000) = %10000000)and ((ord(p[3]) and %11000000) = %10000000) thenResult:=4elseResult:=1;endelseResult:=1end elseResult:=0;end;
分类: 2.Delphi 标签: delphi, freepascal, lazarus, unicode, utf8已阅 467 次