代碼頁

来源:百度文库 编辑:神马文学网 时间:2024/04/27 15:18:01

代碼頁

code page 是 [def=IBM]IBM[/def] 公司用來描述特定[def=%E5%AD%97%E7%AC%A6%E7%B7%A8%E7%A2%BC]字符編碼[/def]表的一個術語:一組[def=%E4%BD%8D]位[/def]序列的映射,通常由單個八位字節代表值 0-255,與特定字符相關聯。IBM 和 [def=%E5%BE%AE%E8%BB%9F]微軟[/def]公司通常會分配一個代碼頁號映射到某個字符集,雖然[def=%E5%AD%97%E7%AC%A6%E9%9B%86]字符集[/def]比它的另一個名稱更著名。

術語“代碼頁”來自於 IBM 的基於[def=EBCDIC]EBCDIC[/def]的主機系統,該術語通常特指 IBM PC 的代碼頁。[def=%E5%BE%AE%E8%BB%9F]微軟[/def],一個 PC [def=%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%B5%B1]操作系統[/def]的制造商家,將“代碼頁”指定為 [def=OEM]OEM[/def] 代碼頁,還將自己的 [def=ANSI]ANSI[/def] 代碼頁添加到其中。

除了 [def=CJK]CJK[/def] 和 [def=%E8%B6%8A%E5%8D%97%E8%AA%9E]越南語[/def]之外,大多數著名的代碼頁,代表適合 8 位並且其中的每個代碼都可以映射到單個位圖上(如組合字符,復雜腳本)的字符集。

PC 圖形硬件的文本模式標准(兼容 VGA)是建立在 8 位代碼頁基礎上的,雖然可以在損失部分色彩的前提下同時使用兩種代碼頁,到了 8 種就可以存儲在顯卡上來進行簡單地轉換。[1]可載入這種硬件的代碼頁需要對其進行選擇。但是,操作系統商家可以提供他們自己的字符編碼和翻譯系統,它可運行在圖形模式下並完全避開此系統,這是很常見的。這些圖形系統(特別是 Windows)使用的字符編碼有時也叫做“代碼頁”。

Relationship to ASCII

The basis of the IBM PC code pages is [def=ASCII]ASCII[/def], a 7-bit code representing 128 characters and control codes. In the past, 8-bit extensions to the ASCII code often either set the top bit to zero, or used it as a [def=parity+bit]parity bit[/def] in network data transmissions. When this bit was instead made available for representing character data, another 128 characters and control codes could be represented. IBM used this extended range to encode characters used by various languages. No formal standard existed for these ‘[def=extended+character+sets]extended character sets[/def]’; IBM merely referred to the variants as code pages, as it had always done for variants of [def=EBCDIC]EBCDIC[/def] encodings.

IBM PC (OEM)

These codepages are most often used under [def=MS-DOS]MS-DOS[/def]-like operating systems; they include a lot of [def=box+drawing+characters]box drawing characters[/def]. Since the original IBM PC code page ( number 437) was not really designed for international use, several incompatible variants emerged. Microsoft refers to these as the OEM code pages. Examples include:
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81437]437[/def] — 原始的 IBM PC 代碼頁
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81737]737[/def] — [def=%E5%B8%8C%E8%87%98%E8%AA%9E]希臘語[/def]
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81850]850[/def] — "Multilingual (Latin-1)" (西歐語系)
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81852]852[/def] — "Slavic (Latin-2)" (東歐語系)
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81855]855[/def] — [def=%E8%A5%BF%E8%A3%A1%E7%88%BE%E5%AD%97%E6%AF%8D]西裡爾字母[/def]
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81857]857[/def] — [def=%E5%9C%9F%E8%80%B3%E5%85%B6%E8%AA%9E]土耳其語[/def]
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81858]858[/def] — "Multilingual" with euro symbol
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81860]860[/def] — [def=%E8%91%A1%E8%90%84%E7%89%99%E8%AA%9E]葡萄牙語[/def]
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81861]861[/def] — [def=%E5%86%B0%E5%B3%B6%E8%AA%9E]冰島語[/def]
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81863]863[/def] — [def=%E6%B3%95%E8%AA%9E]法語[/def] [def=%E5%8A%A0%E6%8B%BF%E5%A4%A7]加拿大[/def]英語
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81865]865[/def] — [def=%E5%8C%97%E6%AD%90]北歐[/def]
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81866]866[/def] — [def=%E8%A5%BF%E8%A3%A1%E7%88%BE%E5%AD%97%E6%AF%8D]西裡爾字母[/def]
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81869]869[/def] — [def=%E5%B8%8C%E8%87%98%E8%AA%9E]希臘語[/def]


其他代碼頁的筆記

  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%8110000]10000[/def] — [def=Macintosh+Roman+encoding]Macintosh Roman encoding[/def] (followed by several other Mac character sets)
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%8110007]10007[/def] — [def=MacCyrillic+encoding]Macintosh Cyrillic encoding[/def]
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%8110029]10029[/def] — [def=Macintosh+Central+European+encoding]Macintosh Central European encoding[/def]
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81932] 932[/def] — 日文
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81936] 936[/def] — [def=GBK]GBK[/def] 簡體中文
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81949] 949[/def] — 韓文
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%81950] 950[/def] — 繁體中文
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%811200]1200[/def] — UCS-2LE [def=Unicode]Unicode[/def] [def=little-endian]little-endian[/def]
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%811201]1201[/def] — UCS-2BE [def=Unicode]Unicode[/def] [def=big-endian]big-endian[/def]
  • [def=%E4%BB%A3%E7%A2%BC%E9%A0%8165001] 65001[/def] — [def=UTF-8+] UTF-8[/def] [def=Unicode]Unicode[/def]


In modern applications, operating systems and programming languages, the IBM code pages have been rendered obsolete by international standards, such as [def=ISO_8859-1]ISO 8859-1[/def] and Unicode.

Windows (ANSI)代碼頁

[def=Microsoft]Microsoft[/def] defined [def=windows+code+pages]a number of code pages[/def] known as the ANSI code pages (as the first one, 1252 was based on an ansi draft of what became [def=ISO+8859-1]ISO 8859-1[/def]). Code page 1252 is built on [def=ISO+8859-1]ISO 8859-1[/def] but uses the range 0x80-0x9F for extra printable characters rather than the C1 control codes used in [def=ISO-8859-1%23ISO-8859-1]ISO-8859-1[/def]. Some of the others are based in part on other parts of [def=ISO+8859]ISO 8859[/def] but often rearranged to make them closer to 1252.

  • [def=Windows-1250]1250[/def] — 東歐拉丁文
  • [def=Windows-1251]1251[/def] — [def=Cyrillic+alphabet]古斯拉夫語[/def]
  • [def=Windows-1252]1252[/def] — 西歐拉丁文
  • [def=Windows-1253]1253[/def] — [def=Greek+alphabet]希臘語[/def]
  • [def=Windows-1254]1254[/def] — [def=Turkish+language]土耳其語[/def]
  • [def=Windows-1255]1255[/def] — [def=Hebrew+alphabet]希伯來語[/def]
  • [def=Windows-1256]1256[/def] — [def=Arabic+alphabet]阿拉伯語[/def]
  • [def=Windows-1257]1257[/def] — [def=Baltic+state]巴爾[/def]
  • [def=Windows-1258]1258[/def] — [def=Vietnamese+language]越南[/def]


Many Microsoft products produce characters in these ranges automatically, notably with ‘smart quotes’. This means that other software has to choose between
  • not interoperating with documents produced with Microsoft applications
  • mis-rendering the text in question
  • adding support for the Microsoft code pages, in effect making Microsoft's implementation a de facto standard.


Microsoft applications also mislabeled text in [def=Windows-1252]Windows-1252[/def] as [def=ISO-8859-1]ISO-8859-1[/def] and many Windows-based developers, ignorant of the issues involved, followed their example. Whilst current Microsoft applications seem to correctly label Windows-1252 text as such when they can (such as when sending e-mail), they still allow both reading and writing (e.g., through forms) these characters on websites declared as ISO-8859-1. The most popular competing web browsers do so too, favoring compatibility over standards compliance.

These code pages were sometimes viewed as part of Microsoft's [def=embrace%2C+extend+and+extinguish]embrace, extend and extinguish[/def] strategy towards open standards, though something as simple as an 8 bit character table could never really be kept proprietary. On the other hand, since standards bodies had decided to not assign graphical characters to the upper-half control-character positions 80–9F, which are hardly used in practice for control functions, 12.5% of the available code positions were wasted.

Private code pages

When, early in the history of personal computers, users didn't find their character encoding requirements met, private or local codepages were created using [def=Terminate+and+Stay+Resident]Terminate and Stay Resident[/def] utilities or by re-programming [def=BIOS]BIOS[/def] [def=EPROM]EPROMs[/def]. In some cases, unofficial code page numbers were invented (e.g., cp895).

When more diverse character set support became available most of those code pages fell into disuse,with some exceptions such as the [def=Kamenick%C3%BD+encoding]Kamenický[/def] or KEYBCS2 encoding for the [def=Czech+alphabet]Czech[/def] and [def=Slovak+alphabet]Slovak[/def] alphabets.

相關

  • [def=Character+encoding]Character encoding[/def]


External links

  • IBM code pages
  • IBM/ICU Charset Information
  • Microsoft code page identifiers (Microsoft's list contains only code pages actively used by normal apps on Windows see also Torsten Mohrin's list for the full list of supported code pages)
  • Shorter Microsoft list containing only the ANSI and OEM code pages but with links to more detail on each
  • Character Sets And Code Pages At The Push Of A Button


[def=Character+encoding]Character encoding[/def]


[def=Codepage]Codepage[/def][def=Code+page]Code page[/def][def=P%C3%A1gina+de+c%C3%B3digo]Página de código[/def][def=Page+de+code]Page de code[/def][def=Strona+kodowa]Strona kodowa[/def][def=%D0%9A%D0%BE%D0%B4%D0%BE%D0%B2%D0%B0%D1%8F+%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0]Кодовая страница[/def]