Jserv's blog: C99 的 offsetof macro

来源：百度文库编辑：神马文学网时间：2024/04/19 10:17:28

December 20, 2005
C99 的 offsetof macro
歷久彌新的 C 語言在 1999 年時，ANSI/ISO 批准之前的 C90 Draft，成為新標準，也稱為 C99，offsetof 是個其中制定的新 macro，在規格中提到："The offsetof() macro returns the offset of the element name within the struct or union composite. This provides a portable method to determine the offset." Nigel Jones 在 Embedded.com 撰寫了一篇名為 [Learn a new trick with the offsetof() macro] 的文章，介紹如何使用該 macro，並以具體的 Embedded Systems 開發作例子，相當有參考價值。首先，Nigel Jones 引用了幾個 C compiler 的實做： // Keil 8051 compiler
#define offsetof(s,m) (size_t)&(((s *)0)->m)
// Microsoft x86 compiler (version 7)
#define offsetof(s,m) (size_t)(unsigned long)&(((s *)0)->m)
// Diab Coldfire compiler
#define offsetof(s,memb) ((size_t)((char *)&((s *)0)->memb-(char *)0))
Nigel Jones 給了相當詳盡的解釋： To better understand the magic of the offsetof() macro, consider the details of Keil's definition. The various operators within the macro are evaluated in an order such that the following steps are performed: ((s *)0) takes the integer zero and casts it as a pointer to s. ((s *)0)->m dereferences that pointer to point to structure member m. &(((s *)0)->m) computes the address of m. (size_t)&(((s *)0)->m) casts the result to an appropriate data type.
By definition, the structure itself resides at address 0. It follows that the address of the field pointed to (Step 3 above) must be the offset, in bytes, from the start of the structure. At this point, we can make several observations: We can be a bit more specific about the term "structure name." In a nutshell, if the structure name you use, call it s, results in a valid C expression when written as (s *)0->m, you can use s in the offsetof() macro. The examples shown in Listings 3 and 4 will help clarify that point. The member expression, m, can be of arbitrary complexity. Indeed, if you have nested structures, then the member field can be an expression that resolves to a parameter deeply nested within a structure. It's easy enough to see why this macro also works with unions. The macro won't work with bitfields. You simply can't take the address of a bitfield member of a structure or union.
以上的設計非常 tricky，不過需要留意的是，上述的運作方式是高度平台相依的，也預先做了定址方式、size_t 轉換，以及 bit-fields operations 上的假設，所以 Keith Thompson 做了這樣的評論： That's why the offsetof() macro is defined in the standard library; there's no portable way to implement it, but there's always a non-portable way that works for a given implementation. The implementer is allowed to do things that you aren't. GNU GCC 作為一個高度可攜性的 C compiler 實做，勢必無法透過以上的 "magic"，以 GCC 4.0.2 來說，相關的定義在 /usr/lib/gcc/i486-linux-gnu/4.0.2/include/stddef.h，其 offsetof macro 定義為： /* Offset of member MEMBER in a struct of type TYPE. */
#define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)
看到 __builtin_ 開頭的 symbol，事實上我們知道這是 GCC 內部的實做，GCC 線上文件版本 4.0.0 的 [5.43 Offsetof] 提到這方面的細節： GCC implements for both C and C++ a syntactic extension to implement the offsetof macro.
primary: "__builtin_offsetof" "(" typename "," offsetof_member_designator ")" offsetof_member_designator: identifier | offsetof_member_designator "." identifier | offsetof_member_designator "[" expr "]" This extension is sufficient such that #define offsetof(type, member) __builtin_offsetof (type, member) is a suitable definition of the offsetof macro. In C++, type may be dependent. In either case, member may consist of a single identifier, or a sequence of member accesses and array references. 除了可攜性考量外，dependent expressions 也是 GCC 在內部 parsing tree 實做 __builtin_offsetof 的考量。倘若深入去追 __builtin_offsetof 實做方式，可參閱 gcc/cp/parser.c 。gcc/cp/parser.c 用以處理 primary-expression parsing 的部份在於 cp_parser_primary_expression function ，而 primary expression 的形式： literal this ( expression ) id-expression
在識別為 CPP_KEYWORD 並確認為 RID_OFFSETOF (Reserved Identifier) 後，將透過 cp_parser_builtin_offsetof() 來傳回其 representation form。
回到 Nigel Jones 的 [Learn a new trick with the offsetof() macro] 這篇文章，舉了 EEPROM 的應用案例，EEPROM 是 Embedded Systems 最常見的 nonvolatile memory，他提到存取的方式： Normally, such memories are byte addressable. The result is often a serial EEPROM driver that provides an API that includes a read function that looks like this: ee_rd(uint16_t offset, uint16_t nBytes, uint8_t * dest) In other words, read nBytes from offset offset in the EEPROM and store them at dest. The problem is knowing what offset in EEPROM to read from and how many bytes to read (in other words, the underlying size of the variable being read). ee_rd() 這類的 API 可在許多系統發現，但是我們的問題就在於如何得知 EEPROM 中的 offset 與預讀入的 bytes 數量，常見的解法是： typedef struct { int i; float f; char c; } EEPROM; EEPROM * const pEE = 0x0000000; ee_rd(&(pEE->f), sizeof(pEE->f), dest); 類似上面提到的 trick，不過這降低了可讀性，同時因為需要 pEE pointer 來協助算 offset，也可能因為不當的操作 (如 Nigel Jones 提到的 "You can write perfectly legal code (for example, pEE->f = 3.2) and get no compiler warnings that what you're doing is disastrous.")，造成新的問題。而透過 offsetof macro，我們改寫以上的最後兩行： ee_rd(offsetof(EEPROM,f), sizeof(float) /* f in struct EEPROM */, dest); 不能很直覺的得知 EEPROM structure 裡面 float f 的 size，是很大的遺憾，所以又施加類似的 trick： #define SIZEOF(s,m) ((size_t) sizeof(((s *)0)->m))
#define EE_RD(M,D) ee_rd(offsetof(EEPROM,M), SIZEOF(EEPROM,M), D)
所以現在讀取 EEPROM 資料並置放於特定 member f 中，可以寫成： EE_RD(f, &dest) 一舉克服了上面提到的問題，而 API 也更為乾淨。稍後的 [Use 3: protecting nonvolatile memory] 更是經典，篇幅稍微長了些。Linux Kernel 中，在 linux/nvram.h 規範了類似的 definition，不過透過 GCC extensions 來作 padding，比 Nigel Jones 舉例的寫法來得更漂亮且明確。以 arch/arm/kernel/asm-offsets.c 來說，也透過 offsetof macro 來增加填寫向量表格的可讀性： #define DEFINE(sym, val) asm volatile("\n->" #sym " %0 " #val : : "i" (val)) #define BLANK() asm volatile("\n->" : : ) int main(void) { DEFINE(TSK_ACTIVE_MM, offsetof(struct task_struct, active_mm)); BLANK(); DEFINE(TI_FLAGS, offsetof(struct thread_info, flags)); DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count)); ... 對了，讀者回應也很值得一看，Louis Huemiller 提到的 "providing an abstract interface, without having to use an object orientated high-level language (e.g. C++)" 很有意思。
由 jserv 發表於 December 20, 2005 11:51 AM
迴響

Jserv's blog: C99 的 offsetof macro 如何成為優秀的工程師 -- Jserv‘s blog C99 的语言新特性 C99 的语言新特性 macro 的米-漠石‘s Blog 子东的菜畦:zidon‘s blog Lucene的工作原理 - HillMover‘s BLOG - xroot‘s Blog -> 猎狗的故事 xroot‘s Blog -> 经典的定理麻烦的"RavMon.exe" - 豬毛‘s blog 迎迎的博客(迎迎'S Blog) Xp's Blog-研究性学习的模式群体的智慧--东行记　jiahou's　blog 生命的价值 - 黑狼's Blog 基于认证的入侵--sagely's blog 卡巴斯基的博客 KEugene Kaspersky's Blog Fwolf’s Blog Blog Archive subversion和module_rewrite的小冲突？ IwfWcf‘s Blog: 评论广告--Blog 广告的新盈利模式？ WebLeOn‘s Blog: 让你的Blog重获青春 Blog设计的7个趋势 - WebLeOn‘s Blog 给你的Blog添加Google广告 - PuterJam‘s Blog offsetof(TYPE, MEMBER) - bunny的技术之旅 - 51CTO技术... Tomcat Jserv Servlet