FastCGI, SCGI, and Apache: Background and Future at VMUNIX Blues

来源:百度文库 编辑:神马文学网 时间:2024/04/27 23:31:28
FastCGI, SCGI, and Apache: Background and Future
Published bymark January 2nd, 2006 ingeneral,coding
Over at the Ruby on Rails Weblog,David made a post titledApache gets serious about FastCGI. I’ve tangled withFastCGI more than most people have, for a variety of reasons that I won’t get into except to say that I’ve spent time at places where Apache wasn’t the web server of choice and where PHP wasn’t the only language in use. Here’s some of my thoughts on the whole FastCGI thing that may be useful to someone new to the game.
The first thing to realize is that for most experienced webmasters FastCGI has essentially been considered abandoned technology for a good many years. Like, half a decade, or in Internet time, a freaking eternity. It didn’t catch on for a bunch of reasons, one being that the FastCGI implementation in Apache hasn’t been one of Apache’s, shall we say, highlights.. The other reason is that when quicker alternatives to traditionalCGI were being dreamed up Apache was The One True UNIX Web Server(tm) and a lot of people thought that The One True Way(tm) to go was with in-process modules. Modules could be very quick with noIPC overhead and get all sorts of useful info back from the web server that is easy to do when you’re in the damned web server itself, and rather hard to get when you’re not. This made sense when processors were 500MHz and lots of UNIXes had crappy IPC performance. So lots of Apache modules were being created, life was good, and then mod_php happened. How PHP became the defacto Apache module that’s in every Apache install on the planet is a story I’ll let someone else tell, but the end result is that in the world of UNIX/Linux web development PHP became the only thing that mattered and it was a module and so FastCGI as a generic, web server independant gateway was a solution looking for a problem that didn’t exist.
So, fast-forward 5 years and where does that leave Rails, Django and whatever miracle framework that will crop up next week? In a tough spot, because PHP remains the only Apache language module that’s ever gotten widespread adoption. Why does this matter? Because these are large frameworks written in interpreted languages and using CGI (the other technology still supported everywhere that lets you tell the web server to run some code) you have to load the whole shebang from scratch ON EVERY REQUEST. Performance will be shit, trust me, and so for anything but development with these “megaframeworks” CGI is completely unfeasable. You need a way of getting all that code loaded into memory and have it stay persistent across requests. That’s what you get with an Apache module, and that’s what you get with FastCGI. The renewed interest in FastCGI is because suddenly PHP’s assumed rule of the webdev roost has been called into question and now you have these compelling competing frameworks, written in competing languages, needing persistance on the web server and they’re not going to get it with an Apache module. Oh, and there’s another more general trend at play that I’ll touch on below.
Back to the topic that motivated this post. What’s wrong with FastCGI in Apache? Anybody who’s tried to run a busyZope site behind Apache via FastCGI knows this one. The UNIX Domain Sockets are unreliable, for unknown reasons. Switch to TCP runners and they sometimes hang. Unexplicably. I’ve seen it with PHP-via-FastCGI too. Matters aren’t helped by the fact that the FastCGI C code itself is crufty and, as I mentioned above, ancient (yes, 5 years ago is ancient). Finally, FastCGI in Apache just isn’t as flexible as we’ve come to expect things in Apache to be. With the general lack of interest in FastCGI over the years, it just didn’t get fixed. So people who really wished it did work generally worked around the problem.SCGI came about as a simpler FastCGI replacemet in the Python world. Zope, for example, is almost universally deployed behind Apache via mod_proxy these days. Ditto for Java. But the FastCGI technology itself is clearly quite a bit better than most people’s experience with it under Apache would lead them to believe. It’s been rock solid inZeus for many years. Recentlylighttpd has also proven that FastCGI can be quite robust and quick in an open source web server, to the point that it’s superior FastCGI implementation propelled lighttpd into the limelight from nowhere as *the* way to run the new wave of web frameworks. (Sidenote: Both lighttpd and Zeus are non-fork()ing asynchronous daemons; coincidence?)
So FastCGI may be good stuff afterall, but if we’ve gotten along just fine thank you without it for the last 5 years, the question becomes, do we really need FastCGI?
To answer that, let’s start with my comment to David’s post in response to a reader who wondered why one would bother with FastCGI when you have mod_ruby since mod_php and friends were proven to be “the best solutions”:
Actually, Ahmad, mod_(php|perl|python|ruby) has proven to often not be the best solution in practice. Embedding your interpreters in the httpd process often ends up just handcuffing you later when it comes time to do a site (perhaps one of many) upgrade, and sucks away precious memory in each Apache process making it harder to scale higher traffic sites up in volume. Per-user runners are also incredibly convenient for mapping sites into OS sandboxes (via ulimit, RBAC, SELinux, whatever).
I’m with David Morton though, and think that at this point SCGI is the better way to go if your backend doesn’t do smart proxy rewriting. FastCGI is quickly becoming irrelevant for the Python and Ruby frameworks that matter to me. I really think Zope has it right in this regard making it easy, quick, and reliable to proxy rewrite requests from Apache into the Zope appserver (and VirtualHostMonster) via mod_proxy.
My point here is that Apache modules are not a viable path forward. I think most experienced sysadmins already know this. Building and maintaining Apache gets exponentially more complex as you add modules, and that’s reason enough to avoid it in my books without even considering the memory consumption issue. A generic solution for persistent out-of-process page generation/handling is needed. There’s zero doubt about that. FastCGI is the leading contender, by default if no other reason, so in this regard any work done to improve FastCGI support in Apache is great. I happen to think SCGI is a better route to go, but in the end they’re very similar and if either becomes more mainstream we’ll all be better off and I’ll be happy.
Well… mostly happy… Because what I’m *really* wondering is whether we should be continuing with a CGI paradigm at all, or should we go the way of Java and Zope app servers and use what we already have: HTTP.
What Java and Zope app servers do (for the unfamiliar) is run their own solid HTTP servers that do intelligent URL parsing/generation for you to make sticking them behind a HTTP proxy (like Apache’s mod_proxy, or Squid, or whatever) at an arbitrary point in the URI a piece of cake. Typically you redirect some URL’s traffic (a virtual host, subdirectory, etc.) off to the dedicated app server the same way a proxy server sits between your web browser and the web server. It works just like directing requests off to a Handler in Apache, except the request is actually sent off to another HTTP server instead of handed off to a module or CGI script. And of course the reply comes back as a HTTP object that’s sent back to the originator. There’s a bunch of reasons why doing this with HTTP instead of CGI is a really nice approach. One is that setting up these app servers becomes pretty simple for sysadmins and doing the configs on the upstream webserver/proxy is IDENTICAL no matter what kind of downstream app server you’re talking to. That’s reduces errors. It’s flexible, too, allowing you start up an app server instance (which, of course, acts like a web server) on a port, run it as whatever system user you want, jail it, zone it, firewall it, whatever, and then you send HTTP requests to the thing. You can go straight to the app server in your web browser to debug stuff. Since it’s HTTP we already have a full suite of tools that can do intelligent things with the protocol. Firewalls, load balancers, proxies, and so on. There’s a huge market of mature “HTTP brokers” both free and commercial, including Apache itself (mod_proxy, which can be hooked into rewrites).
Given that the Java and Zope camps went down this road before, and both arrived at the HTTP app server solution makes me wonder why the current darlings haven’t also taken the same approach. One argument is that it’s much simpler to implement a CGI-like interface than a full-on HTTP server. But I don’t buy that, not with languages like Python and Ruby that make it as easy to embed a HTTP server in your app as a CGI-like interface! Maybe the stock httpd modules aren’t up to snuff? Could it be that hard to fix them if they are indeed lacking, and wouldn’t that be useful to all sorts of people who don’t need these huge frameworks? Maybe the Rails and Django folks just fear they’ll automatically become as complex and evil as the Java Application Servers they’re trying to out-simplify if they take this approach? Maybe it’s just too hard to support both CGI and HTTP models properly (the Zope guys ought to know the answer) and not worth the effort when a CGI based approach “mostly works”? Maybe they think that CGI (and hence persistent CGI runners like FastCGI and SCGI) is the only path to widespread adoption given the current mass-hosting landscape?
Deployment, as always, is a key issue. In my mind a FastCGI-like solution ought to be more compatible with the web hosting crowd. FastCGI runners can be started up and shutdown on the fly (it is a CGI imitator, afterall). The app server approach, by contrast, is a bit more “static” and likely appeals more to the “in house” crowd (I hesitate to call this the “enterprise” crowd) who are running their own servers.
If you actually read this far, then I’m impressed, and hopefully you’ve learned something. If I’ve made an incorrect statement please do let me know in the comments below. It’s an exciting time to be invovled in web development, and my first prediction for 2006 is that we’ll see either FastCGI or SCGI become a core Apache module, and that we won’t see a whole lot of movement towards an app server approach. Java and Zope are perceived as overly complex, and so people’s gut instincts may lead them away from the app server and towards FastCGI/SCGI. So pay attention to this FastCGI/SCGI stuff, cause it’s going to be important whether or not it’s the technically superior approach.
What’s on my wish list? Being able to use a threaded Apache with FastCGI/SCGI runners. That would be a powerful combination.
17 Responses to “FastCGI, SCGI, and Apache: Background and Future”  
Feed for this EntryTrackback Address
1Jay LaneyJan 2nd, 2006 at 3:31 pm
Great post, and I actually read all the way through.
Yeah, I’m more along the lines of the “in house” crowd and supporting more than one HTTP server on my cluster is not exactly something I think I’m going to want to do, so anything that makes Rails play nicer with Apache is something I am certainly going to be tracking.
2 Jay SoffianJan 5th, 2006 at 10:57 pm
It was 1998 and I and a few other folks ran the webfarm for Cox Interactive Media. Our developers wrote the dynamic portions of our site in mod_perl. Our poor little UltraSparc II’s were running out of memory way before they ran out of CPU trying to shove mod_perl into every apache process. So we segregated the mod_perl stuff off to a couple E450’s loaded with memory. At the time I had to hack mod_include and mod_proxy so that we could have the UltraSparc II apache instances pull in dynamic content from the mod_perl apache instances on the E450 via SSI documents using #include virtual’s. It worked pretty well though.
I think we briefly gave FastCGI a chance but using HTTP to talk to the mod_perl instance worked out better.
j.
3 markJan 5th, 2006 at 11:16 pm
Thanks for the input, er, Jay and Jay.
I’ve had similar experiences at telcos on the web hosting farms hacking some stuff together to seperate static and dynamic content. Like you, we dabbled in FastCGI and in the end just did our own stuff over HTTP. More recently that’s been easier with app servers purpose built for such scenarios.
4Dick DaviesJan 6th, 2006 at 5:41 am
I can understand the FCGI protocol feeling a bit redundant, but If Rails/Django have good HTTP performance, why bother with a frontend server at all?
WEBrick (at least) performance is pretty lame. You could always write a native http daemon to unpack the request for you, but then you’ve just rewritten lighttpd…
5AhmadJan 7th, 2006 at 3:34 pm
Nice post Mark, I enjoyed reading it to the end.
As I said in Dave’s post, there is a lot that I don’t know about in this area. If we exclude embedding the interpreter though modules, then I agree with you on this order of preference: HTTP, SCGI, FastCGI.
I like HTTP best for its simplicity.
I think that one of the advantages of FastCGI/SCGI over HTTP is that their main purpose is to work in such situations. So if needs change in the future, it would be easier to change FastCGI/SCGI to accomodate the new needs. But you can’t do that to HTTP. If you use it, you are stuck with it and if you change it, it is not HTTP anymore.
I was thinking about non-forking servers, and the fact that they might not benifit from the next wave of dual core PC’s, but now that the application is running in a seperate proccess and not embedded in the server process, I think that this is perfect now.
So you might want to add that to the benifits of using a seperate process.
However, there is still one big problem. How will all of this work for virtual hosting companies?
Will you let each customer run their own long-lasting processes?
I think that it makes a great sense security wise. Finally, everyone is running their processes using their own uid. No one can read other customers MySQL username and password files from their forum configuration file.
But it seems to me that virtual hosting companies are phobic to user run long lasting processes, probably for good reasons.
It might become a server administrators nightmare, because instead of having to just keep Apache happy, up and running, they now have hundreds of small “servers” running in their own processes and each one of them communicating with Apache using the overly complex FastCGI protocol.
6 AdamJan 8th, 2006 at 8:14 pm
About 2 years back I was brought on to admin a web cluster containing 14 webservers(apache/php) to support a website doing 4 million page views per day. It almost became a full time job recompiling and messing with the httpd.conf. When that contract was over a new contract came up which involved a 12 million page views per day site. This new client was looking to save as much as possible on hosting fees. At the time each of these webservers was $300 per month. I switched the webserver to Litespeed webserver running PHP as FastCGI. I was able to run this is on 4 webserver cluster saving a total of $36,000 per year over the previous installation while able to process 3 times more page views.
Eventually I switched from Litespeed to Lighttpd, because its open source, and haven’t used apache since. In terms of support for dual/multi processor systems, several of the non forking webservers, Litespeed and Zeus, spawn one process per cpu. I expect Lighttpd to support this at some point. Also not forgetting that the spawned FastCGI processes will use as many cpus as you have.
In general I’ve found SCGI to be more difficult to setup than FastCGI. There seems to be a bit of a learning curve the but interface and tools look more elegant. Moving forward I’d like to see a new interface replace CGI. Something set up and optimized to be persistent. Something like thishttp://litespeedtech.com/lsapi/ could be the answer.
7Oscar MeridaJan 12th, 2006 at 8:37 am
This is a great read on the history and role of FastCGI given the rise and popularity of “CGI Frameworks”. One of those rare entries where I learn a lot from each paragraph.
I’m the defacto sys admin on a web server my friends and I operate. While we don’t handle loads anywhere near the point where apache/mod_php start to fall down, I’ve picked up a lot here and learned about some new alternavives. Going to read up on lighttpd now…
8 AlanJan 30th, 2006 at 10:29 pm
Thanks for a very insightful post, Mark. Can’t remember how I first arrived at your blog, but I’ve been reading it for a month or so now and almost every post has been really interesting
My personal feeling ATM is that app servers are a nice way to go. For security, I like being able to run each web service under a different UID/GID (and preferably chroot’d if feasible). Also, I subscribe the DJB-style “small is beautiful‘’ school of thought, and the huge, complex configuration files that seem to come with combining lots of different services under a single server (usually Apache, but could be lighttpd, too) drive me crazy
When I need to run FastCGI apps, I tend run a separate, stripped-down instance of lighttpd for each one. For CGI, I often run instances of cgi-httpd (from theshttpd package). What I looking for now is a minimal replacement for Apache with mod_proxy, since IMO Apache seems like overkill for this kind of thing.
9dbtFeb 3rd, 2006 at 11:05 am
for Python stuff, pretty much all new python web-content-generation code is happening with WSGI, and it’s easy to serve WSGI-based content out via CGI, SCGI, FastCGI or HTTP. web.py (webpy.org) supports all four of these pretty much transparently to the real code.
Check out PythonPaste for some more WSGI awesomeness.
10 markFeb 3rd, 2006 at 6:05 pm
dbt: I agree. WSGI and Paste are very cool. I’ve always considered the fact that there are so many web frameworks in Python to be an inevitable outcome, but hated the fact that hooking them together was painful and deployment varied so much framework to framework. Which is why WSGI is great, it defines a common bottom tier for everyone to build on, and opens up tremendous flexibility when it comes to deployment. Paste, in my opinion, represents the future. I’m planning a whole post on Paste, so I won’t say much more here. But I will say that I’m more excited about Paste more than any other webdev technology right now!
11 Oliver CrowFeb 4th, 2006 at 12:39 pm
Great post, Mark. I would add a vote to the idea that mod_proxy is the way to interface application servers to Apache. It already comes as part of the core distribution. It is very easy to configure. It works well if you run different application environments. For example it’s easy to run Rails, Zope, TurboGears, Django, mod_perl and PHP applications simultaneously on one server, each as a proxy server behind a single Apache instance. In the mod_perl and PHP cases the application server process would also be Apache. In the other cases it is the app-server process running an embedded http server. In any event only the front-end Apache needs the modules that are important for talking to users (SSL, virtual hosting, etc.), and can handle any high volume static requests (e.g. image files).
Both mod_proxy and FastCGI share the feature that all of the memory hogging application stuff is kept out of the front-end server, which means that you can scale the front-end up to handle a larger number of requests without running out of server memory. It also keeps the applications self-contained in the back-end servers which has benefits for reliability (one application can’t take out another), deployment (easy to turn applications on and off) and security (each application can run as a different user, with limited access).
I have used this configuration in both dedicated server and virtual private server (VPS) hosting environments with great success. There is currently one fly in the ointment though, and that is monitoring the back-end application server processes. If the back-end server is Apache, monitoring it is not an issue. Apache is extraordinarily stable and almost never dies. Other application servers are not so reliable, and particularly the servers that run as a single process have an alarming tendency to die at some point. This is particularly a problem for example on a shared server where processes often run under O/S resource usage constraints and a process which uses too much memory or CPU may be killed by the O/S. In these cases it is necessary to have a monitoring process whose job it is to start and stop the app server process, and specifically to restart it when it dies. For this purpose djb’s daemontools can be pressed into use (http://cr.yp.to/daemontools.html), or python supervisor (http://www.plope.com/software/supervisor) and perhaps others.
The remaining issue is how to setup all of this in a shared hosting environment. Each user will need to run their own long-lived app server daemon process(es). Hosting providers will need to allow for custom app-server processes to be setup quickly and easily, monitored by an external process and automatically restarted upon system reboot.
Under BSDs cron provides the ability for each user to specify processes that are started at reboot (see the @reboot directive in man 5 crontab) . I don’t know if Linux or other unices have an equivalent. Daemontools could possibly provide the monitoring piece, although it would have to be installed in a way that made it easy for each user to have their own instance of the /services folder. And the setup of new applications could perhaps be handled by an administrative web interface. Perhaps there’s an opportunity to write that piece.
12 ZionFeb 16th, 2006 at 4:51 pm
Excellent article, I was thinking about doing my own SCGI implementation for PHP and then realized you are just right… Why would I botter using SCGI or FastCGI if I can write my own small HTTP server and use mod_proxy.
Thanks a lot, you saved my day (erhm… night).
1meneame.net Trackback onJan 2nd, 2006 at 4:05 am2Dave’s Place » Musings on the future of FastCGI Pingback onJan 6th, 2006 at 2:04 pm3SitePoint Blogs » The sysadmin view on “Why PHP” Pingback onJan 11th, 2006 at 7:29 am4Vanderbrew » Blog Archive » VMUNIX Blues » FastCGI, SCGI, and Apache: Background and Future Pingback onJan 11th, 2006 at 9:20 am5Notes Trackback onJan 14th, 2006 at 6:08 am
«Holiday Long Distance, Skype Style
Error Screens »
Posting Your Comment
Please Wait
_xyz