uuid/guid

来源:百度文库 编辑:神马文学网 时间:2024/04/28 16:32:49
什么是UUID
UUID(Universally Unique Identifier) ,通用惟一标识符是128位比特的数字, 用来惟一地标识因特网上的某些对象或者实体。它是指在一台机器上生成的数字,它保证对在同一时空中的所有机器都是唯一的。通常平台会提供生成UUID的API。UUID按照开放软件基金会(OSF)制定的标准计算,用到了以太网卡地址、纳秒级时间、芯片ID码和许多可能的数字。
UUID由以下几部分的组合:
(1)当前日期和时间,UUID的第一个部分与时间有关,如果你在生成一个UUID之后,过几秒又生成一个UUID,则第一个部分不同,其余相同。
(2)时钟序列
(3)全局唯一的IEEE机器识别号,如果有网卡,从网卡MAC地址获得,没有网卡以其他方式获得。
UUID的唯一缺陷在于生成的结果串会比较长。关于UUID这个标准使用最普遍的是微软的GUID(Globals Unique Identifiers)。在ColdFusion中可以用CreateUUID()函数很简单的生成UUID,其格式为:xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxxxxxx(8-4-4-16),其中每个 x 是 0-9 或 a-f 范围内的一个十六进制的数字。而标准的UUID格式为:xxxxxxxx-xxxx-xxxx-xxxxxx-xxxxxxxxxx (8-4-4-4-12),可以从cflib 下载CreateGUID() UDF进行转换。
使用UUID的好处在分布式的软件系统中(比如:DCE/RPC, COM+,CORBA)就能体现出来,它能保证每个节点所生成的标识都不会重复,并且随着WEB服务等整合技术的发展,UUID的优势将更加明显。根据使用的特定机制,UUID不仅需要保证是彼此不相同的,或者最少也是与公元3400年之前其他任何生成的通用惟一标识符有非常大的区别。
通用惟一标识符还可以用来指向大多数的可能的物体。微软和其他一些软件公司都倾向使用全球惟一标识符(GUID),这也是通用惟一标识符的一种类型,可用来指向组建对象模块对象和其他的软件组件。第一个通用惟一标识符是在网罗计算机系统(NCS)中创建,并且随后成为开放软件基金会(OSF)的分布式计算环境(DCE)的组件。
UUID祥解
什么是UUID/GUID
An answer to "What is an UUID?" would be "A GUID" this answer wouldn‘t help most of us any further though.
UUID意为Universally Unique IDentifier(通用唯一标识), GUID 意为Globally Unique IDentifier(全局唯一标识).构造此表示正确的方法在RFC 4122, ITU-T Rec. X.667 和ISO/IEC 11578:1996有描述.
UUID的一个主要功能是确保唯一性.
它用于做什么:
UUID应用于计算机工业许多领域作为标识用途.
可能的用处是(但不限于):
* 作为Windows操作系统注册表标识符.
* 用于数据库中作为标识符.
* 用于远过程调用(remote procedure calls(COM,CORBA))标识符.
To ensure nobody else would - by accident - provide something conflicting.
它看起来像什么样子:
UUIDs 基本上是个128为数字,通常用表示为如下分组的16进制形式:
58e0a7d7-eebc-11d8-9669-0800200c9a66
nil(零,不存在的) UUID:
UUID的一个特殊类型,不保证唯一,但易于识别.这就是 nil UUID: 00000000-0000-0000-0000-000000000000. 它用于是UUID的概念更加清晰,并且在构造一个新的UUID时作为模板使用.
一个 UUID / GUID 是怎样构成的
一般概述
UUIDs 基本上是个128为数字,它的结构,以及构造它的方法由variant 和版本决定.正确构造它的方法在 RFC 4122 中有描述.本文内容是作者对RFC 4122 的理解.本文只在给出一个容易理解的对RFC 4122的解释和说明.
Variants
To accomodate for past mistakes and future improvements a variant is encoded into the UUID. The variant is stored in the most significant bits of the fourth group. Currently known variants are:
* 0 NCS (reserved for backward compatibility)
* 10 The current variant, described used and produced on these pages.
* 110 Microsoft (reserved for backward compatibility)
As can be seen the variant can currently not start with 111, all variants starting with 111 are reserved for future use.
The variant can be extrected from the UUID, after doing some work: 58e0a7d7-eebc-11d8-9669-0800200c9a66
The bit representation of hexadecimal number 96 is 10010110. This number starts with 10 and therefore is an UUID constructed according to the current variant. One should currently not encouter UUIDs with E or F on the position of the (underlined, big, fat) 9 in this example, as these variants are reserved for future use. In the current variant this group starts with a 8, 9, a or b.
版本
In the current variant 4 basic types "versions" of UUIDs are in used.
1. Time-based with unique or random host identifier
2. DCE Security version (with POSIX UIDs)
3. 基于名称 Name-based (MD5 散列)
4. 随机 Random
5. 基于名称 Name-based (SHA-1 散列)
版本非常容易识别,因为它存储为第三组第一个数字: 58e0a7d7-eebc-11d8-9669-0800200c9a66
每个版本用它自己的方法识别(Every version is made in its own way).
基于时间的UUID / GUID 是怎样构成的
基于时间的UUID 使用, 除版本和variant之外,还有三个组成部分.
1. 一个时间戳.
2. 一个时钟ID (开头基于一个随机数).
3. 一个节点ID (IEEE 802 地址,Mac地址,就是网卡的物理地址).
时间戳
时间戳是一个60位值,它表示从15 October 1582 00:00:000000000以来,100纳秒间隔的数量.
如果不能获得UTC时间,本地时间将被使用.
The timestap is a 60 bit value, representing the number of 100 nanosecond intervals since 15 October 1582 00:00:000000000
If no UTC time can be obtained, the local time can be used. Care has to be taken that the clock ID has to be incremented when the clock is set backwards in this case.
通常,程序员不能得到自15 October 1582 00:00:000000000以来以100纳秒间隔的时间,但可以得到从1970年1月1号开始到现在的毫秒数.这种情况下,要把毫秒数换算为自15 October 1582 00:00:000000000以来以100纳秒间隔的时间数量,
Generally a programmer does not get the current time in 100 nanosecond intervals since 15 October 1582, but for instance in millisecond precision since 1 January 1970. In this case, to come from milliseconds to nanoseconds precision multiply the time returned from the system by 10000 and to correct the start date add an offset of 122192928000000000. (With other resolutions another multiplier needs to be applied, with other start dates another offset needs to be used. In case multiple UUIDs need to be generated at the same time according to the millisecond precision this can be done by adding 1 to this interval for for every UUID requested inside the same millisecond period. This must not be performed more than 10000 times in a millisecond period though (when we would have a timer that would increment once a microsecond the multiplier would only be 10 and at most 10 UUIDs can be returned in the same microsecond period.
When more than 1 UUID is needed per 100 nanoseconds multiple node IDs will be needed.
时钟ID
The clock ID is a 14 bit value, the initial value should be a random one, if a new timestamp appearst to be older than the previous timestamp used (for instance after a clock skew error or after a reboot where the clock was re-synchronized) the clock ID should be incremented by one.
节点ID
The node ID should preferrably an IEEE 802 MAC address a.k.a "ethernet address". IEEE 802 addressses should be unique, though some no-name manafacturers are known to re-use these addresses. The address can be from a network card of the machine where the UUID is generated, or of another machine / network card if you have control over that network card and can be absolutely sure that no other equipment will use this address to generate UUIDs. (For instance buy a network card only for its UUID, write down the UUID, enter it in a single program and then fry the network card, or buy a range of IEEE 802 addresses from the ieee.)
The IEEE 802 address can often be retrieved using the ipconfig /all or ifconfig eth0 commands, is often called "Hardware" or "Physical" address and looks like this: 08:00:20:0c:9a:66 (sometimes also a - or space is used as delimiter).
If no IEEE 802 address is available one can also generate a random part for this address, in this case the multicast bit (least significant bit of the first byte) must be set to 1, this to avoid clashes with legitimate IEEE 802 addresses. In case a random address is used uniqueness cannot be guaranteed. (The recipy in this case crate a 6 byte random number, perform an or operation on the first byte with 0x01)
映射UUID的组成部分
The timestamp is mapped as follows to the UUID:
When the timestamp has the (60 bit) hexadecimal value: 1d8eebc58e0a7d7
The following parts of the UUID are set:: 58e0a7d7-eebc-11d8-9669-0800200c9a66. The 1 before the most significant digits of the timestamp indicates the UUID version, for time-based UUIDs this is 1.
The remaining parts of the UUID are the clock ID and the node ID.
The clock ID was 1669 and is put in the following part of the UUID: 58e0a7d7-eebc-11d8-9669-0800200c9a66. The first digit isn‘t a 1 because also the most significant bit is set, as mandated by the variant.
The node ID was 08:00:20:0c:9a:66. If the node ID were an generated one the first octed would have to be 09 instead of 08.
Note: the correct way to construct these identifiers is described in RFC 4122 the contents of this document is my interpretation of that document.
基于名称的UUID / GUID 是怎样构成的
The name based UUID uses a "name" in the broadest sense imaginable and an UUID indicating the type of name used. Of course againt the variant and version are present in this UUID as well.
The UUID indicating the type of name used
For every namespace an UUID must be defined, for certain namespaces predefined UUIDs have been defined. The Leach, Mealling and Salz draft predefines the following namespace UUIDs:
* 6ba7b810-9dad-11d1-80b4-00c04fd430c8 for DNS
* 6ba7b811-9dad-11d1-80b4-00c04fd430c8 for URL
* 6ba7b812-9dad-11d1-80b4-00c04fd430c8 for ISO OID
* 6ba7b814-9dad-11d1-80b4-00c04fd430c8 for X.500 DN
To create a name based UUID first convert the UUID into an array of 16 bytes, do this by creating one byte from the hex representation in the string (skipping the dashes, leftmost in the lowest position of the array).
Then convert the name into a sequence of octets (byte array) as defined by the standard or coventions of the name space.
Calculate the MD5 or SHA-1 hash over the combined byte array of name space id and the byte array representing the name.
Take the 16 bytes of the hash as the new UUID after putting in the version and the variant.
To put in the version, take the 7th byte and perform an and operation using 0x0f, followed by an or operation with 0x30 for MD5 and 0x50 for SHA-1 hashes.
To put in the variant, take the 9th byte and perform an and operation using 0x3f, followed by an or operation with 0x80.
Note: the correct way to construct these identifiers is described in RFC 4122 the contents of this document is my interpretation of that document.
Appendix B of the draft seems to contain an incorrectly calculated value for the domainwww.widgets.com. Running the program of appendix A and manual calculations with the help of the md5sum tool show the value should be 3d813cbb-47fb-32ba-91df-831e1593ac29, and not e902893a-9d22-3c7e-a7b8-d6e313b71d9f.
Foreseen problems
Some namespaces allow for multple conversion into a sequence of octets, so a question that goes unanswered is: How is a name inside a namespace converted to a canonical sequence of octets? For instance how to treat DNS names conforming to the IDN standards (z??z.de vs xn--zz-viaa.de which are the same in the DNS namespace) and different allowed forms of encoding. The RFC for URL‘s specifically state "A URI is represented as a sequence of characters, not as a sequence of octets", while the EBDIC encoded sequence of characters will undoubtedly yield another result than the UTF-16 encoded character sequence.
基于随机数的UUID / GUID 是怎样构成的
这是生成UUID最容易的方法, (至少在你有个很好的随机数生成器的时候). No guarantees can be made towards uniqueness though.
Here is a simple recipy:
取得16个随即字节(八位组),Take 16 random bytes (octets), put them all behind each other, for the description the numbering starts with byte 1 (most significant, first) to byte 16 (least significant, last). Then put in the version and variant.
To put in the version, take the 7th byte and perform an and operation using 0x0f, followed by an or operation with 0x40.
To put in the variant, take the  9th byte and perform an and operation using  0x3f, followed by an or operation with 0x80.
To make the string representation, take the hexadecimal presentation of bytes 1-4  (without 0x in front of it) let them follow by a -, then take bytes 5 and 6, - bytes 7 and 8, - bytes 9 and 10, - then followed by bytes 11-16.
The result is something like:
00e8da9b-9ae8-4bdd-af76-af89bed2262f
The bold 4 and the bold a are the positions where the version and the variant are introduced. Instead of the a, also an 8, 9 or b could be present on that position.
Note: the correct way to construct these identifiers is described in RFC 4122 the contents of this document is my interpretation of that document.
Linux操作系统下取得UUID的方法
Linux下面,有专门生成UUID的命令:uuidgen [-r] [-t]。即可以生成一个32位的字符串。这个是在命令行得到。在/usr/include/lib里面有个/uuid/uuid.h,其中定义了个数据uuid,无符号的字符指针。同时有专门生成UUID的函数:uuid_generate(uuid_t uu),生成的UUID放在参数UU里面。此时得到的结果是一个8位数的16进制数。
在UUID生成函数的过程中经过了一些处理,才生成的是8位的16进制数,原因在于,在它生成的过程中,本来生成的是32位的长整形,结果经过uuid_parse进行转换变成8位的16进制数。相反,我们有uuid_unparse函数,可以反向将16进制数转换为32位的整形。
注意的是:
在linux下要编译生成uuid函数的时候,我们要进行库的连接也就是最后要加上一个 –luuid。完全形式为:gcc –o uuid uuid.c –luuid。同时在定义保存变换的32位长整形的字符串时,我们要合理分配空间。最不安全的办法就是,申请一个指针去存放一个字符传,切忌避免这样做!