“Made in China”的电子产品

来源:百度文库 编辑:神马文学网 时间:2024/04/25 15:41:33
中国是世界工厂,是大多数电子设备的生产国,但是围绕中国出口的电子产品存在后门的猜测已流传了多年,最近的Google事件再次引起了对硬件后门的关注。/.报道,“能信任中国制造的电子产品吗”?有人认为不能信任;更有甚者认为不能信任任何你购买到的计算机产品。信任某个国家的产品本质上是一个政治问题,作为一个普通消费者,如果你担心计算机硬件有后门,不如担心计算机系统上的恶意程序或僵尸网络,因为对情报机构来说你的数据只是无穷尽的噪音之一。文章特别提到了图灵奖获得者Ken Thompso在1983年公开承认(中文)自己在Unix C编译器中放置了一个后门。
   UNIX是以C语言写成的。使用C语言的其中一个长处是造成了UNIX的可携性。
另一方面,工作站的销售对象是需要大量计算的工程师、科学家等等;因此不同
於PC,在工作站级以上的电脑上,compiler是一项附在作业系统中的基本配备。
UNIX系统中必定附有C compiler。既然要保持可携性, UNIX系统里面所附的的C
compiler也得和UNIX系统相同,用C写作。
    C compiler本身,也是用C写的。当一个语言的compiler 也用该语言本身来
写的时候,  便会发生一些有趣的事情。  也许您会问,  既然电脑上面已有
C compiler了,那麽我们要再去compile另一个compiler的source code作什麽?
答案可能是,原有的内建C compiler可能比较简陋或著老旧,因此我们想把新的
compiler用旧的compiler编译,然後当成系统内建的compiler用。换句话说,我
们就这麽扩充了系统的内建compiler。
    Glasgow大学的GRASP计划,也用这样的过程发展他们的Haskell compiler。
Haskell是个functional language(记得Programming Language课中
提到的functional programming吗?)。Functional programming总是带有较多
的学术味而缺乏实用经验。Haskell语言本身仍有不少需要再扩充的空间。GRASP
计划用 Haskell 来写Haskell compiler:先从简单的写起, 产生一个最原始的
Haskell compiler,然後用这套原始的Haskell语言写一个功能较强的 compiler
(把原来的Haskell扩充了),再用第二版的 Haskell 语言写第三版的compiler
....。由於都是compiler,因此并不会减低效率。一个好处是,每次扩充语言,
接下来立即用新的语言写compiler,於是我们能够立即看出新加功能是否有用处?
该怎麽用?如此累积的经验,正能够作Haskell语言以後发展设计的参考。GRASP
计划的理想就是″把functional proramming带出实验室″。
    UNIX的创造人之一,Ken Thompson,在他的 Turing Award Lecture中,便
由这个主题加以发挥,说了一些有趣的故事。C 是个被拿来写作业系统的语言。
写作业系统的人很难忍得住诱惑,不在系统里面装些後门的。想想看,假如我写
作业系统时,偷偷在login 的部份加一段程式码,使得全世界的这套作业系统只
要看到我的account和密码就让我进去,给我root权限,这该是多爽呀。 但是我
不能直接在 login 的 source code 里面这样写,否则一下就被人抓到了(既然
source code流通,就是要给人看的呀)。 该怎麽办呢?就从compiler里面动手
脚,称作patch1吧:在compiler中多加一道手续, 假如发现被compile的原始程
式″疑似″在作login动作,就把他开个漏洞,让我进得去。
    但是这样也不见得行得通。Compiler以後也会改版,新版的compiler可能不
是我在写。装系统的人也不见得用我的compiler。怎麽办呢?於是我在compiler
的source code中作第二次手脚,称作patch2:假如这个compiler觉得在compile
的程式″疑似″另一个 compiler 的 source 的话,就加入上面的patch1和这个
patch2本身。
    好,现在作业系统推出了,CC1 是我写的内建compiler,其中有我动的两个
手脚。现在某人在compile UNIX, 不得不用这个compiler。然而CC1 中已有了
patch1,於是一旦compile到login, compile出来的login程式就被动了手脚。只
要看到我的名字,就一定让我进系统,给我root权限。
    ,--------.      +-------------+      ,-----------.
    | login  |      |  Compiled   |      |   login   |
    | source |=====>|   by CC2    |=====>|  Program  |
    | (clean)|      | patch 1作用 |      |(受感染了!)|
    `--------’      +-------------+      `-----------’
    既然 compiler CC1会作怪, 那麽自己写 compiler 总能够了吧? 然而,C
compiler还是得用C写,写好了之後,用谁来compile呢? 只有用CC1来compile。
CC1发现新写的CC2是个compiler的source code,於是 patch2 就发挥作用了。
CC1会在CC2中也加入patch1和patch2。於是CC2也被″污染″了。
    ,--------.      +-------------+      ,-----------.
    |  CC2   |      |  Compiled   |      |    CC2    |
    | source |=====>|   by CC1    |=====>|  Program  |
    | (clean)|      | patch 2作用 |      |含 patch1,2|
    `--------’      +-------------+      `-----------’
    假如再用CC2来compile一个正常的login程式,由於CC2中有了patch1,所以
compile出来的login程式也会有後门,让我任意的login;
    ,--------.      +-----------+      ,----------.
    | login  |      | Compiled  |      |  login   |
    | source |=====>|  by CC2   |=====>| Program  |
    | (clean)|      |(patch 1,2)|      |(patched!)|
    `--------’      +-----------+      `----------’
    假如用CC2 compile另一个compiler CC3,由於CC2中已被加入了 patch2,
CC3又会被污染,也就是说CC3这个compiler中还是会有patch1和patch2......如
此一来,全世界的每一套UNIX都种下了这个後门,能够让我任意login!
    然而这些patch都只在binary档之中出现。CC2的source code一切正常,所以
从source code完全看不出有什麽不对劲呢!我们还能够进一步湮灭证据。一旦装
好一套系统,公开的CC1 source code中不必有动过手脚的程式码,只要让他被动
过手脚的compiler编译就能够了。
原本以为Ken Thompson只是写写罢了。後来据一些人说,这完全是Ken  Thompson 
本人干过的真人真事。
    
参考资料: ACM Turing Award Lectures :
             the first twenty years 1966 to 1985
           QA76.24

Reflections on Trusting Trust
Ken Thompson


Reprinted from Communication of the ACM, Vol. 27, No. 8, August 1984, pp. 761-763. Copyright © 1984, Association for Computing Machinery, Inc. Also appears in ACM Turing Award Lectures: The First Twenty Years 1965-1985 Copyright © 1987 by the ACM press and Computers Under Attack: Intruders, Worms, and Viruses Copyright © 1990 by the ACM press.

I copied this page from the ACM, in fear that it would someday turn stale.


Introduction

I thank the ACM for this award. I can't help but feel that I am receiving this honor for timing and serendipity as much as technical merit. UNIX swept into popularity with an industry-wide change from central main frames to autonomous minis. I suspect that Daniel Bobrow (1) would be here instead of me if he could not afford a PDP-10 and and had to "settle" for a PDP-11. Moreover, the current state of UNIX is the result of the labors of a large number of people.

There is an old adage, "Dance with the one that brought you," which means that I should talk about UNIX. I have not worked on mainstream UNIX in many years, yet I continue to get undeserved credit for the work of others. Therefore, I am not going to talk about UNIX, but I want to thank everyone who has contributed.

That brings me to Dennis Ritchie. Our collaboration has been a thing of beauty. In the ten years that we have worked together, I can recall only one case of miscoordination of work. On that occasion, I discovered that we both had written the same 20-line assembly language program. I compared the sources and was astounded to find that they matched character-for-character. The result of our work together has been far greater than the work that we each contributed.

I am a programmer. On my 1040 form, that is what I put down as my occupation. As a programmer, I write programs. I would like to present to you the cutest program I ever wrote. I will do this in three stages and try to bring it together at the end.

Stage I

In college, before video games, we would amuse ourselves by posing programming exercises. One of the favorites was to write the shortest self-reproducing program. Since this is an exercise divorced from reality, the usual vehicle was FORTRAN. Actually, FORTRAN was the language of choice for the same reason that three-legged races are popular.

More precisely stated, the problem is to write a source program that, when compiled and executed, will produce as output an exact copy of its source. If you have never done this, I urge you to try it on your own. The discovery of how to do it is a revelation that far surpasses any benefit obtained by being told how to do it. The part about "shortest" was just an incentive to demonstrate skill and determine a winner.


FIGURE 1

Figure I shows a self-reproducing program in the C programming language. (The purist will note that the program is not precisely a self-reproducing program, but will produce a self-reproducing program.) This entry is much too large to win a prize, but it demonstrates the technique and has two important properties that I need to complete my story: (1) This program can be easily written by another program. (2) This program can contain an arbitrary amount of excess baggage that will be reproduced along with the main algorithm. In the example, even the comment is reproduced.

Stage II

The C compiler is written in C. What I am about to describe is one of many "chicken and egg" problems that arise when compilers are written in their own language. In this ease, I will use a specific example from the C compiler.

C allows a string construct to specify an initialized character array. The individual characters in the string can be escaped to represent unprintable characters. For example,

"Hello world\n"

represents a string with the character "\n," representing the new line character.


FIGURE 2

Figure 2 is an idealization of the code in the C compiler that interprets the character escape sequence. This is an amazing piece of code. It "knows" in a completely portable way what character code is compiled for a new line in any character set. The act of knowing then allows it to recompile itself, thus perpetuating the knowledge.


FIGURE 3

Suppose we wish to alter the C compiler to include the sequence "\v" to represent the vertical tab character. The extension to Figure 2 is obvious and is presented in Figure 3. We then recompile the C compiler, but we get a diagnostic. Obviously, since the binary version of the compiler does not know about "\v," the source is not legal C. We must "train" the compiler. After it "knows" what "\v" means, then our new change will become legal C. We look up on an ASCII chart that a vertical tab is decimal 11. We alter our source to look like Figure 4. Now the old compiler accepts the new source. We install the resulting binary as the new official C compiler and now we can write the portable version the way we had it in Figure 3.


FIGURE 4

This is a deep concept. It is as close to a "learning" program as I have seen. You simply tell it once, then you can use this self-referencing definition.

Stage III


FIGURE 5

Again, in the C compiler, Figure 5 represents the high-level control of the C compiler where the routine "compile" is called to compile the next line of source. Figure 6 shows a simple modification to the compiler that will deliberately miscompile source whenever a particular pattern is matched. If this were not deliberate, it would be called a compiler "bug." Since it is deliberate, it should be called a "Trojan horse."


FIGURE 6

The actual bug I planted in the compiler would match code in the UNIX "login" command. The replacement code would miscompile the login command so that it would accept either the intended encrypted password or a particular known password. Thus if this code were installed in binary and the binary were used to compile the login command, I could log into that system as any user.

Such blatant code would not go undetected for long. Even the most casual perusal of the source of the C compiler would raise suspicions.


FIGURE 7

The final step is represented in Figure 7. This simply adds a second Trojan horse to the one that already exists. The second pattern is aimed at the C compiler. The replacement code is a Stage I self-reproducing program that inserts both Trojan horses into the compiler. This requires a learning phase as in the Stage II example. First we compile the modified source with the normal C compiler to produce a bugged binary. We install this binary as the official C. We can now remove the bugs from the source of the compiler and the new binary will reinsert the bugs whenever it is compiled. Of course, the login command will remain bugged with no trace in source anywhere.

Moral

The moral is obvious. You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code. In demonstrating the possibility of this kind of attack, I picked on the C compiler. I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode. As the level of program gets lower, these bugs will be harder and harder to detect. A well installed microcode bug will be almost impossible to detect.

After trying to convince you that I cannot be trusted, I wish to moralize. I would like to criticize the press in its handling of the "hackers," the 414 gang, the Dalton gang, etc. The acts performed by these kids are vandalism at best and probably trespass and theft at worst. It is only the inadequacy of the criminal code that saves the hackers from very serious prosecution. The companies that are vulnerable to this activity (and most large companies are very vulnerable) are pressing hard to update the criminal code. Unauthorized access to computer systems is already a serious crime in a few states and is currently being addressed in many more state legislatures as well as Congress.

There is an explosive situation brewing. On the one hand, the press, television, and movies make heroes of vandals by calling them whiz kids. On the other hand, the acts performed by these kids will soon be punishable by years in prison.

I have watched kids testifying before Congress. It is clear that they are completely unaware of the seriousness of their acts. There is obviously a cultural gap. The act of breaking into a computer system has to have the same social stigma as breaking into a neighbor's house. It should not matter that the neighbor's door is unlocked. The press must learn that misguided use of a computer is no more amazing than drunk driving of an automobile.

Acknowledgment

I first read of the possibility of such a Trojan horse in an Air Force critique (4) of the security of an early implementation of Multics.

References

  1. Bobrow, D.G., Burchfiel, J.D., Murphy, D.L., and Tomlinson, R.S. TENEX, a paged time-sharing system for the PDP-10. Commun. ACM 15, 3 (Mar. 1972), 135-143.
  2. Kernighan, B.W., and Ritchie, D.M. The C Programming Language. Prentice-Hall, Englewood Cliffs, N.J., 1978.
  3. Ritchie, D.M., and Thompson, K. The UNIX time-sharing system. Commun. ACM 17, 7(July 1974), 365-375.
  4. Karger, P.A., and Schell, R.R. Multics Security Evaluation: Vulnerability Analysis. ESD-TR-74-193, Vol II, June 1974, p 52.