气候门--泄漏

来源:百度文库 编辑:神马文学网 时间:2024/04/29 15:35:17
Climate-Gate: Leaked
IntroductionThe EmailsThe DocumentsConclusionFootnotesNotesAbout Me
Some time starting in mid November 2009, ten million teletypes all started their deet-ditta-dot chatter reeling off the following headline: "Hackers broke into the University of East Anglia's Climate Research Unit...."
I hate that. It annoys me because just like everything else about climate-gate it's been 'value-added'; simplified and distilled. The contents of FOIA2009.zip demand more attention to this detail and as someone once heard Professor Jones mutter darkly, "The devil is in the details...so average it out monthly using TMax!"
The details of the files tell a story that FOIA2009.zip was compiled internally and most likely released by an internal source.
The contents of the zip file hold one top-level directory, ./FOIA. Inside that it is broken into two main directories, ./mail and ./documents. Inside ./mail are 1073 text files ordered by date. The files are named in order with increasing but not sequential numbers. Each file holds the body and only the body of an email.
In comparison, ./documents is highly disorganized. MS Word documents, FORTRAN, IDL and other computer code, Adobe Acrobat PDF's and data are sprinkled in the top directory and through several sub-directories. It's the kind of thing that makes the co-workers disorganized desk look like the spit and polish of a boot camp floor.
What people are missing entirely is that these emails and files tell a story themselves.
Proponents of the hacker meme are saying that s/he broke into East Anglia's network and took emails. Let's entertain that idea and see where it goes.
There is no such thing as a private email. Collecting all of the incoming and outgoing email is simple in a mail server. Using:Postfix the configuration is always_bcc=, here are links on configuring the same forSendmail, and forExim. Those are the three main mail servers in use in the Unix environment. Two of them, Sendmail and Exim are or were in use as the external mail gateways and internal mail servers at the University of East Anglia (UEA).
When a mail server receives an email for someone@domain.net, it checks that it is authoritative for that domain. This means that a server for domain.net will not accept email for domain.ca. The mail server will usually then run checks on the email for spam, virus, and run other filters. It will then check to see whether to route the email to another server or to drop the email in a users mailbox on that server. In all examples examined in the released emails, the mail gateway forwarded the emails to another server.
The user then has a mail client that s/he uses to read email. Outlook Express, Eudora, Apple Mail, Outlook, Thunderbird, mutt, pine and many more are all mail clients.
Mail clients use one of two methods of reading email. The first is called POP and that stands for Post Office Protocol. A mail client reading email with POP logs into the mail server, downloads the email to the machine running the mail client and will then delete the original email from the users spool file on the mail server.
The second protocol is called IMAP, Internet Message Access Protocol. IMAP works by accessing the mailboxes on the mail server and doing most of the actions there. Nothing is actually downloaded onto the client machine. Only email that is deleted and purged by the mail client is gone. Either protocol allows the user the opportunity to delete the email completely.
Most email clients are setup for reading emails with POP by default and POP is more popular than IMAP for reading email.
The released emails are a gold mine for a system administrator or network administrator to map. While none of the emails released contained headers, several included replies that contained the headers of the original emails. An experienced administrator can create an accurate map of the email topography to and from the CRU over the time period in question, 1998 thru 2009.
Over the course time, UEA's systems administrators made several changes to the way email flows through their systems. The users also made changes to the way they accessed and sent email.
The Users
Using a fairly simple grep1 we can see that from the start of the time-frame, 1999, until at least 2005 the CRU unit accessed their email on a server called pop.uea.ac.uk. Each user was assigned a username on that server. From the released emails, we can link username to people as such:
Prof. Trevor Davies was user e022Dr. Timothy Osborn was user f055Prof. Phil Jones was user f028Prof. Mike Hulme was user f037Prof. Keith Briffa was user f023
In the previously referenced grep comes some more useful information. For instance, we know that Professor Davies was using QUALCOMM Windows Eudora Light Version 3.0.3 (32) in September of 1999. (ref Email: 0937153268.txt). If you look at theREADME.txt for that version you can see that it requires a POP account and doesn't support IMAP.
As mentioned previously, POP deletes email on the server usually after it is downloaded. Modern POP clients do have an option to save the email on the server for some number of days, but Eudora Light 3.0.3 did not. We can say that Professor Davies' emails were definitely removed from the server as soon as "Send/Recv" was finished.
This revelation leaves only two scenarios for the hacker:
Professor Davies' email was archived on a server and the hacker was able to crack into it, or Professor Davies kept all of his email from 1999 and he kept his computer when he was promoted toPro-Vice Chancellor for Research and Knowledge Transfer in 2004 from his position as Dean of the School of Environmental Sciences.
The latter scenario requires that the hacker would have had to know how to break into Prof. Davies' computer and would have had to get into that computer to retrieve those early emails. If that were true, then the hacker would have had to get into every other uea.ac.uk computer involved to retrieve the emails on those systems. Given that many mail clients use a binary format for email storage and given the number of machines the hacker would have to break into to collect all of the emails, I find this scenario very improbable.
Which means that the mail servers at uea.ac.uk were configured to collect all incoming and outgoing email into a single account. As that account built up, the administrator would naturally want to archive it off to a file server where it could be saved.
This is a simple evolution. You just run a crontab to start a shell-script that will stop the mail server, move the mail spool file into a file somewhere else, nulls the live spool and restart the mail server. The account would reside on the mail server, the file could be on any server.
Alternatively you could use aprocmail recipe to process the email as it comes in, but that may be a bit too much processing power for a very busy account.
This also helps to explain the general order of the ./mail directory. Only a computer would be able to reliably export bodies of email into numbered files in the FOIA archive. As the numbers are in order not just numerically but also by date, the logical reasoning is that a computer program is numbering emails as they are processed for storage. This is extremely easy to do withPerl and theMail::Box modules.
The Email Servers
I've created aDia diagram2 of the network topography regarding email only as demonstrated in the released emails. Here's ajpeg of it:

The first thing that springs to mind is that the admins did a lot of fiddling of their email servers over the course of ten years. :) The second thing is the anomaly. Right in the middle of 2006-2009 there is a Microsoft Exchange Server. Normally, this wouldn't be that big of an blip except we've already demonstrated that the servers at UEA were keeping a copy of all email in and out of the network. Admins familiar with MS Exchange know that it too is a mail server of sorts.
It is my opinion that the MS Exchange server was working in conjunction with ueams2.uea.ac.uk and I base this opinion on the fact that ueams2.uea.ac.uk appears both before and after the MS Exchange Server. It doesn't change its IP address nor does it change the type of mail server that is installed on it. There is a minor version update from 4.51 to 4.69. You can see Debian's changelog between the Exim versionshere.
I've shown that the emails were collected from the servers rather than from the users accounts and workstations, but I haven't shown which servers were doing the collection. There are two options, the mail gateway or the departmental mail servers.
As has been pointed out to me, the filenames are Unix epoch timestamps. (Like, duh, Lance.) This invalidates certain parts of my analysis, but doesn't in any way invalidate my conclusions.
The point of the original information was to provide more circumstantial evidence pointing to the location of the email archives. The fact that the emails are named with epoch timestamps that relate to the creation date of the emails actually enhances this point.
You definitely do not want multiple machines naming files based on a Unix timestamp. It has to be a single machine because the opportunity for overwriting a file is simply too great.
As demonstrated above, I believe that the numbers of the filenames correspond to the order that the emails were archived. If so, the numbers that are missing, represent other emails not captured in FOIA2009.zip.
I wrote a short Bash program3 to calculate the variances between the numbering system of the email filenames. Theresult is staggering, that's a lot of email outside of what was released. Here's a graph of the variances in order as well as a graph with the variances numerically sorted .Graph info down below.
The first graph is a little hard to read, but that's mostly because the first variance is 8,805,971. To see a little better, just lop off the first variance and rerun gnuplot. For simplicity, that graph ishere. The mean of the variances is 402839.36 so the average amount of emails between each released email is 402,839. While not really applicable, but useful, the standard deviation is 736228.56 and you can visualize that from the second graph.
I realize that variance without reference is useless, in this instance the number of days between emails.Here is a grep of the emails with their dates of origin.
I do not see the administrators copying the email at the departmental level, but rather at the mail gateway level. This is logical for a few reasons:
The machine name ueams2.uea.ac.uk implies that there are other departmental mail servers with the names like ueams1.uea.ac.uk, (or even ueams.uea.ac.uk), maybe a ueams3.uea.ac.uk. If true, then you would need to copy email from at least one other server with the same scripts. This duplication of effort is non-elegant. There is a second machine that you have to copy emails from and that is the MS Exchange server so you would need a third set of scripts to create a copy of email. Again, this would be unlike an Administrator. Departmental machines can be outside the purview of Administration staff or allow non-Administrative staff access. This is not where you want to be placing copies of emails for the purposes of Institutional protection. As shown with the email number variances, and if they are representative of the email number as it passed through UEA's email systems, that's a lot of emails from a departmental mail server and more like an institutional mail gateway. As the emails have been shown to be directly related to the Unix epoch, it seems certain that a single machine was responsible for naming the files. Having multiple servers writing files out with a filename based on a timestamp will certainly overwrite some files at some point.
So given the assumptions listed above, the hacker would have to have access to the gateway mail server and/or the Administration file server where the emails were archived. This machine would most likely be an Administrative file server. It would not be optimal for an Administrator to clutter up a production server open to the Internet with sensitive archives.
The ./FOIA/documents directory is a complete mess. There are documents from Professor Hulme, Professor Briffa, the now famous HARRY_READ_ME.txt, and many others. There seems to be no order at all.
One file in particular, ./FOIA/documents/mkhadcrut is only three lines long and contains:
tail +13021 hadcrut-1851-1996.dat | head -n 359352 | ./twistglob > hadcrut.dat# nb. 1994- data is already dateline-alignedcat hadcrut-1994-2001.dat >> hadcrut.dat
Pretty simple stuff, get everything in hadcrut-1851-1996.dat starting at the 13021st line. From that get only the first 359352 lines and run that through a program called twistglob in this directory and dump the results into hadcrut.dat. Then dump all of the information in hadcrut-1994-2001.dat into the bottom of hadcrut.dat.
....Except there isn't a program called twistglob in the ./FOIA/documents/ directory. Nor is there the resultant hadcrut.dat or the source files hadcrut-1851-1996.dat and hadcrut-1994-2001.dat.
This tells me that the collection of files and directories in ./documents isn't so much a shared directory on a server, but a dump directory for someone who collected all of these files. The originals would be from shared folders, home directories, desktop machines, workstations, profiles and the like.
Remember the reason that the Freedom of Information requests were denied? In email 1106338806.txt, Jan 21, 2005 Professor Phil Jones states that he will be using IPR (Intellectual Property Rights) to shelter the data from Freedom of Information requests. In email 1219239172.txt, on August 20th 2008, Prof. Jones says "The FOI line we're all using is this. IPCC is exempt from any countries FOI - the skeptics have been told this. Even though we (MOHC, CRU/UEA) possibly hold relevant info the IPCC is not part our remit (mission statement, aims etc) therefore we don't have an obligation to pass it on."
Is that why the data files, the result files and the 'twistglob' program aren't in the ./documents directory? I think this is a likely possibility.
If Prof. Jones and the UEA FOI Officer used IPR and the IPCC to shelter certain things from the FOIA then it makes sense that things are missing from the ./documents directory. Secondly it supports the reason that ./documents is in such disarray is that it was a dump folder. A dump folder explicitly used to collect information for the purpose of release pursuant to a FOI request.
I suggest that it isn't feasible for the emails in their tightly ordered format to have been kept at the departmental level or on the workstations of the parties. I suggest that the contents of ./documents didn't originate from a single monolithic share, but from a compendium of various sources.
For the hacker to have collected all of this information s/he would have required extraordinary capabilities. The hacker would have to crack an Administrative file server to get to the emails and crack numerous workstations, desktops, and servers to get the documents. The hacker would have to map the complete UEA network to find out who was at what station and what services that station offered. S/he would have had to develop or implement exploits for each machine and operating system without knowing beforehand whether there was anything good on the machine worth collecting.
The only reasonable explanation for the archive being in this state is that the FOI Officer at the University was practising due diligence. The UEA was collecting data that couldn't be sheltered and they created FOIA2009.zip.
It is most likely that the FOI Officer at the University put it on an anonymous ftp server or that it resided on a shared folder that many people had access to and some curious individual looked at it.
If as some say, this was a targeted crack, then the cracker would have had to have back-doors and access to every machine at UEA and not just the CRU. It simply isn't reasonable for the FOI Officer to have kept the collection on a CRU system where CRU people had access, but rather used a UEA system.
Occam's razor concludes that "the simplest explanation or strategy tends to be the best one". The simplest explanation in this case is that someone at UEA found it and released it to the wild and the release of FOIA2009.zip wasn't because of some hacker, but because of a leak from UEA by a person with scruples.
See file./popaccounts.txt
See file./email_topography.dia
3 See file./email_variance.sh
4 See file./gnuplotcmds
Graphs created withgnuplot using a simple command file4 for input. I use astripped down version of the file, it's the same, just stripped of comment and the filenames.. The second graph is a numerically sorted version, $> sort -n ./variance_results.txt > variance_sorted_numerically.txt.
Assigned Network Numbers for UAE fromRIPE.NET
RIPE.NET hasassigned 139.222.0.0 - 139.222.255.255,193.62.92.0 - 193.62.92.255, and 193.63.195.0 - 193.63.195.255 to the University of East Anglia for Internet IP addresses.
RIPE.NETAdmin contact for the University of East Anglia:Peter Andrews, Msc, Bsc (hons) - Head of Networking at University of East Anglia. (Linked In, Peter isn't in the UEA directory anymore so I assume he is no longer at UEA.)
RIPE.NETTech Contact for the University of East Anglia:Andrew Paxton
Current Mail Servers at UEA
A dig for the MX record of uea.ac.uk (email servers responsible for the domain uea.ac.uk) results in the following:
$> dig mx uea.ac.uk; <<>> DiG 9.6.1-P2 <<>> mx uea.ac.uk;; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 737;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 13, ADDITIONAL: 13;; QUESTION SECTION:;uea.ac.uk. IN MX;; ANSWER SECTION:uea.ac.uk. 50935 IN MX 2 ueamailgate01.uea.ac.uk.uea.ac.uk. 50935 IN MX 2 ueamailgate02.uea.ac.uk.
The IP addresses for the two UEA email servers are:
ueamailgate01.uea.ac.uk. 28000 IN A 139.222.131.184
ueamailgate02.uea.ac.uk. 28000 IN A 139.222.131.185
Test connections to UEA's current mailservers:
$> telnet ueamailgate01.uea.ac.uk 25Trying 139.222.131.184...Connected to ueamailgate01.uea.ac.uk.Escape character is '^]'.220 ueamailgate01.uea.ac.uk ESMTP Sendmail 8.13.1/8.13.1; Mon, 7 Dec 2009 01:45:42 GMTquit221 2.0.0 ueamailgate01.uea.ac.uk closing connectionConnection closed by foreign host.$> telnet ueamailgate02.uea.ac.uk 25Trying 139.222.131.185...Connected to ueamailgate02.uea.ac.uk.Escape character is '^]'.220 ueamailgate02.uea.ac.uk ESMTP Sendmail 8.13.1/8.13.1; Mon, 7 Dec 2009 01:45:49 GMTquit221 2.0.0 ueamailgate02.uea.ac.uk closing connection
I've been a Unix, Windows, OS X and Linux systems and network administrator for 15 years. I've compiled, configured, and maintained everything from mail servers to single-signon encrypted authentication systems. I run lines, build machines and tinker with code for fun. You can contact me via: lance@catprint.ca.
Lance Levsen,
December, 2009