YouTube Architecture | High Scalability

来源:百度文库 编辑:神马文学网 时间:2024/04/23 17:48:01
YouTube Architecture

Tue, 07/17/2007 - 20:20 —Todd Hoff
 
YouTube Architecture (6316)
YouTube grew incredibly fast, to over 100 million video views per day,with only a handful of people responsible for scaling the site. How didthey manage to deliver all that video to all those users? And how havethey evolved since being acquired by Google?
 
Information Sources
 
Google Video
 
Platform
 
Apache
Python
Linux (SuSe)
MySQL
psyco, a dynamic python->C compiler
lighttpd for video instead of Apache
 
What‘s Inside?
 
The Stats
 
Supports the delivery of over 100 million videos per day.
Founded 2/2005
3/2006 30 million video views/day
7/2006 100 million video views/day
2 sysadmins, 2 scalability software architects
2 feature developers, 2 network engineers, 1 DBA
 
Recipe for handling rapid growth
while (true)
{
identify_and_fix_bottlenecks();
drink();
sleep();
notice_new_bottleneck();
}
This loop runs many times a day.
 
Web Servers
 
NetScalar is used for load balancing and caching static content.
Run Apache with mod_fast_cgi.
Requests are routed for handling by a Python application server.
Application server talks to various databases and other informations sources to get all the data and formats the html page.
Can usually scale web tier by adding more machines.
The Python web code is usually NOT the bottleneck, it spends most of its time blocked on RPCs.
Python allows rapid flexible development anddeployment. This is critical given the competition they face.
Usually less than 100 ms page service times.
Use psyco, a dynamic python->C compiler that uses a JIT compiler approach to optimize inner loops.
For high CPU intensive activities like encryption, they use C extensions.
Some pre-generated cached HTML for expensive to render blocks.
Row level caching in the database.
Fully formed Python objects are cached.
Some data are calculated and sent to each application so thevalues are cached in local memory. This is an underused strategy. Thefastest cache is in your application server and it doesn‘t take muchtime to send precalculated data to all your servers. Just have an agentthat watches for changes, precalculates, and sends.
 
Video Serving
 
Costs include bandwidth, hardware, and power consumption.
Each video hosted by a mini-cluster. Each video is served by more than one machine.
Using a a cluster means:
- More disks serving content which means more speed.
- Headroom. If a machine goes down others can take over.
- There are online backups.
Servers use the lighttpd web server for video:
- Apache had too much overhead.
- Usesepoll to wait on multiple fds.
- Switched from single process to multiple process configuration to handle more connections.
Most popular content is moved to aCDN (content delivery network):
-CDNs replicate content in multiple places. There‘s a better chance ofcontent being closer to the user, with fewer hops, and content will runover a more friendly network.
- CDN machines mostly serve out of memory because the content is sopopular there‘s little thrashing of content into and out of memory.
Less popular content (1-20 views per day) uses YouTube servers in variouscolo sites.
- There‘s a long tail effect. A video may have a few plays, but lots ofvideos are being played. Random disks blocks are being accessed.
-Cachingdoesn‘t do a lot of good in this scenario, so spending money on morecache may not make sense. This is a very interesting point. If you havea long tail product caching won‘t always be your performance savior.
- TuneRAID controller and pay attention to other lower level issues to help.
- Tune memory on each machine so there‘s not too much and not too little.
 
Serving Video Key Points
 
Keep it simple and cheap.
Keep a simple network path. Not too many devices betweencontent and users. Routers, switches, and other appliances may not beable to keep up with so much load.
Use commodity hardware. More expensive hardware gets the moreexpensive everything else gets too (support contracts). You are alsoless likely find help on the net.
Use simple common tools. They use most tools build into Linux and layer on top of those.
Handle random seeks well (SATA, tweaks).
 
Serving Thumbnails
 
Surprisingly difficult to do efficiently.
There are a like 4 thumbnails for each video so there are a lot more thumbnails than videos.
Thumbnails are hosted on just a few machines.
Saw problems associated with serving a lot of small objects:
- Lots of disk seeks and problems with inode caches and page caches at OS level.
- Ran into per directory file limit. Ext3 in particular. Moved to amore hierarchical structure. Recent improvements in the 2.6 kernel mayimprove Ext3 large directory handling up to100 times, yet storing lots of files in a file system is still not a good idea.
- A high number of requests/sec as web pages can display 60 thumbnails on page.
- Under such high loads Apache performed badly.
- Used squid (reverse proxy) in front of Apache. This worked for awhile, but as load increased performance eventually decreased. Wentfrom 300 requests/second to 20.
- Tried using lighttpd but with a single threaded it stalled. Run intoproblems with multiprocesses mode because they would each keep aseparate cache.
- With so many images setting up a new machine took over 24 hours.
- Rebooting machine took 6-10 hours for cache to warm up to not go to disk.
To solve all their problems they started using Google‘sBigTable, a distributed data store:
- Avoids small file problem because it clumps files together.
- Fast, fault tolerant. Assumes its working on a unreliable network.
- Lower latency because it uses a distributed multilevel cache. This cache works across different collocation sites.
- For more information on BigTable take a look atGoogle Architecture,GoogleTalk Architecture, andBigTable.
 
Databases
 
The Early Years
- Use MySQL to store meta data like users, tags, and descriptions.
- Served data off a monolithic RAID 10 Volume with 10 disks.
- Living off credit cards so they leased hardware. When they neededmore hardware to handle load it took a few days to order and getdelivered.
- They went through a common evolution: single server, went to a singlemaster with multiple read slaves, then partitioned the database, andthen settled on a sharding approach.
- Suffered from replica lag. The master is multi-threaded and runs on alarge machine so it can handle a lot of work. Slaves are singlethreaded and usually run on lesser machines and replication isasynchronous, so the slaves can lag significantly behind the master.
- Updates cause cache misses which goes to disk where slow I/O causes slow replication.
- Using a replicating architecture you need to spend a lot of money for incremental bits of write performance.
- One of their solutions was prioritize traffic by splitting the datainto two clusters: a video watch pool and a general cluster. The ideais that people want to watch video so that function should get the mostresources. The social networking features of YouTube are less importantso they can be routed to a less capable cluster.
The later years:
- Went to database partitioning.
- Split into shards with users assigned to different shards.
- Spreads writes and reads.
- Much better cache locality which means less IO.
- Resulted in a 30% hardware reduction.
- Reduced replica lag to 0.
- Can now scale database almost arbitrarily.
 
Data CenterStrategy
 
Used manage hosting providers at first. Living off credit cards so it was the only way.
Managed hosting can‘t scale with you. You can‘t control hardware or make favorable networking agreements.
So they went to a colocation arrangement. Now they can customize everything and negotiate their own contracts.
Use 5 or 6 data centers plus the CDN.
Videos come out of any data center. Not closest match or anything. If a video is popular enough it will move into the CDN.
Video bandwidth dependent, not really latency dependent. Can come from any colo.
For images latency matters, especially when you have 60 images on a page.
Images are replicated to different data centers using BigTable. Code
looks at different metrics to know who is closest.
 
Lessons Learned
 
Stall for time. Creative and risky tricks can help you cope in the short term while you work out longer term solutions.
 
Prioritize. Know what‘s essential to your service and prioritize your resources and efforts around those priorities.
 
Pick your battles. Don‘t be afraid to outsourcesome essential services. YouTube uses a CDN to distribute their mostpopular content. Creating their own network would have taken too longand cost too much. You may have similar opportunities in your system.Take a look atSoftware as a Service for more ideas.
 
Keep it simple! Simplicity allows you torearchitect more quickly so you can respond to problems. It‘s true thatnobody really knows what simplicity is, but if you aren‘t afraid tomake changes then that‘s a good sign simplicity is happening.
 
Shard. Sharding helps to isolate and constrain storage, CPU, memory, and IO. It‘s not just about getting more writes performance.
 
Constant iteration on bottlenecks:
- Software: DB, caching
- OS: disk I/O
- Hardware: memory, RAID
 
You succeed as a team. Have a good crossdiscipline team that understands the whole system and what‘s underneaththe system. People who can set up printers, machines, install networks,and so on. With a good team all things are possible.
Apache
CDN
Example
lighttpd
Linux
MySQL
Python
Shard
Visit YouTube Architecture
184505 reads
Comments
Mon, 07/30/2007 - 21:42 —dersteppenwolf (not verified)Great Article
Really Interesting!!
reply
Tue, 07/31/2007 - 15:09 — Dimitry (not verified)Thanks for this.
Fantastic article.
Thanks very much for this.
Dimitry.
reply
Wed, 08/01/2007 - 17:16 — Anonymous (not verified)The real meat of this is
The real meat of this is skipped over in a couple of lines: keep the popular content on aCDN. In other words, throw money at Akamai and let Akamai worry about it.
That is, of course, the right answer, but whether you‘re usingPython or not hardly matters. Without Akamai‘s services, YouTube couldnever have kept up with the demand given the infrastructure describedabove.
reply
Wed, 08/01/2007 - 18:18 — Anonymous (not verified)Startup hardware leasing
-Living off credit cards so they leased hardware. When they needed morehardware to handle load it took a few days to order and get delivered.
How did this work exactly? When we looked into this, we found thatbecause we were a new startup, we had no credit (I guess not ‘found‘,its pretty obvious), so hardware leasing companies would only lease tous if one of us personally backed the loans. Given that startup riskwas high and the bill was large, we ended up buying hardware andputting it on various low-intro-APR CC‘s, etc. All the big h/w vendorswere like "unless we can see your last N years of tax returns etc.,we‘re not leasing to you." Made it seem like leasing wasn‘t a realoption for ‘living off credit cards‘ startups.
reply
Wed, 08/01/2007 - 19:44 —alex (not verified)awesome
wow, thats a great article, didnt think about theCDN :P
reply
Thu, 08/02/2007 - 02:43 — Anonymous (not verified)you forgot the most important thing
besure to accept private venture capital financing from Sequoia, who wasalso the largest shareholder of Google and controlled its board.Sequoia used its influence to force Google to massively overpay forYoutube, and the sequoia partners made an instant 500 million dollarsin profit.
reply
Thu, 08/02/2007 - 03:58 —Quinton (not verified)This was a great read
The article and video have been extremely hopeful for a few projects I am working on. Thank you!
reply
Thu, 08/02/2007 - 05:48 —gadget00 (not verified)amazing
totally amazing; 100% worth the reading.
reply
Thu, 08/02/2007 - 09:05 —Live TV (not verified)Wow
wow that‘s pretty insane.
reply
Thu, 08/02/2007 - 09:42 —George (not verified)Excellent
Avery good article yet I have not learned anything new. The currentYouTube architecture is already applied to one of our customeryoutube-like, a romanian website calledhttp://www.trilulilu.ro/ . The only thing that it wasn‘t yet implemented by our customer is database sharding, no need for now as the totalMySQL database is under 250MB and the MySQL server handles at east more than 650 qps.
reply
Thu, 08/02/2007 - 10:32 — Aaron (not verified)Thats a great article.
Thats a great article.
reply
Thu, 08/02/2007 - 11:51 —Joephone (not verified)FastCGI?
Ithink it‘s interesting to note that they used mod_fastcgi. I mean, Iuse it because I‘m forced to on my shared host, but I‘ve always heardof tons of problems when trying to scale big with it (even thoughthat‘s what it‘s designed for). I guess if done right, FastCGI can be agreat asset to a server farm.
reply
Thu, 08/02/2007 - 12:38 — Danny (not verified)Thank you
This is a great article, very interesting!
reply
Thu, 08/02/2007 - 13:01 —Dean Whitney (not verified)More scalability stories
Wouldlove to hear more stories like this, Flickr, Twitter, MySpace, Meebo...Clients are still brain washed by big enterprise players thinking theyneed BEA Portal Server or the likes to achieve a robust, scalableenterprise solutions. It‘s a battle to convince them not to investtheir money on something that‘s way to expensive, takes forever todeploy and cost a fortune (and takes forever) to make it do what youwant it to do from the user experience perspective. I keep saying,"MySpace is registering 350,000 users a day and they aren‘t usingAqualogic - lets save that extra cash for some killerAJAX UI, widgets and an API that‘s actually useful.
reply
Thu, 08/02/2007 - 20:47 —fjordan (not verified)well, u misinterpret the part...
Youwere right in that living off YOUR personal credit card... that‘s forsure the case to Youtube back in the early days as well...
reply
Thu, 08/02/2007 - 20:53 —Kiran Vaka (not verified)Good one..thanks!
Thanks for the good article
reply
Fri, 08/03/2007 - 13:31 —Anton Shevchuk (not verified)Excellent
OrganizeMyPHPTube.com (YouTube clone) onLAMP (Linux,Apache,MySQL,PHP)
reply
Wed, 08/08/2007 - 08:22 —Coderoid (not verified)Wonderful! Python rocks
Thisis a wonderful read, it‘s good proof that Python is not the slow coachit‘s made out to be. In any language, the biggest setback will alwaysbe the programmer‘s skillset.
reply
Thu, 08/09/2007 - 06:14 — Jeffos (not verified)Number of Servers used and Host
Great Article. !
Does anybody know the evolution in the number of servers they had to use in the course of their ascension. ?
How many did they start with and what kind of config each server had.
Also any idea which host they were using. ?
Thanks
reply
Thu, 08/09/2007 - 14:12 — Namik (not verified)Great posts
Thank you!
reply
Thu, 08/09/2007 - 17:16 — Anonymous (not verified)OK that was interesting
OKthat was interesting information, but let‘s stop calling a largecollection of unordered bullet points an ‘article‘ shall we? An articlehas, you know, sentences. Probably arranged into paragraphs.
reply
Tue, 08/14/2007 - 23:30 —Flex RIA (not verified)who needs Oracle, Sun, IBM or HP
This is a great example what makes the web 2.0. Remmber those evil old days of starting a startup with millions of spending onOracle, SUN or other big dude just for getting the the basic program running. Now, who needs them.
reply
Tue, 08/21/2007 - 04:16 — Crt (not verified)How does Youtube or tinyURL generate unique ID?
Doesanyone know how Youtube or tinyURL generate unique ID? Are they usingsome kind of hash function? tinyURL doesn‘t generate unique ID if theURL is the same. For example, forwww.google.com, it always generateshttp://tinyurl.com/1c2. How do they encode/decode the URL?
reply
Tue, 08/21/2007 - 04:32 —Todd Hoff
re: generating IDs
Iguess I always assumed they were preallocated so they were alwaysminimal and unpredictable, but I couldn‘t find an authoritative answer.It‘s an interesting question though.
reply
Mon, 08/27/2007 - 04:26 — Anonymous (not verified)TinyURL needs a mapping of
TinyURL needs a mapping of code -> URL, so when you typewww.google.com, it searches for the URL on its DB, and gives you the previously-generated code.
reply
Sat, 09/01/2007 - 01:42 — JEffos (not verified)Re: YouTube Architecture
Does anybody know the evolution in the number of servers they had to use in the course of their ascension. ?
How many did they start with and what kind of config each server had.
Also any idea which host they were using. ?
Thanks
reply
Tue, 09/11/2007 - 07:47 —Alexei A. Korolev (not verified)Re: YouTube Architecture
Thank for post. Must read!
reply
Mon, 09/17/2007 - 05:49 —Derick (not verified)Re: How does Youtube or tinyURL generate unique ID?
I‘m guessing their ID approaches are different, since they seem to be sequential on tinyURL (you can getwww.tinyurl.com/1,www.tinyurl.com/2, and so...), while YouTube‘s are not (the fact thathttp://www.youtube.com/watch?v=wtnAI2OuYU7 exists, doesn‘t imply thathttp://www.youtube.com/watch?v=wtnAI2OuYU8 does).
TinyURL approach could be as easy as using a base conversion function, as mentioned inhttp://www.oreillynet.com/onlamp/blog/2003/11/lets_get_small_with_mysql_...
I‘m not sure if YouTube‘s ID‘s non-sequentiality might be due tolost ID‘s (removed, banned, ...), but I would bet for a hash function.
reply
Thu, 10/04/2007 - 21:09 —Vadym Timofeyev (not verified)Re: YouTube Architecture
Great article, thanks!
reply
Sun, 11/18/2007 - 12:40 —Alexei (not verified)Re: YouTube Architecture
LOL... some guys said me that python is not for big projects. YouTube shows other :)
Thank you for article. Very useful indeed.
reply
Sat, 12/08/2007 - 03:10 —Software Crown (not verified)Re: YouTube Architecture
YouTube‘s database size doubles every 5-6 months
reply
Fri, 12/14/2007 - 19:23 — Jason (not verified)Re: YouTube Architecture
2 sysadmins, 2 "scalability software architects", 2 feature developers, 2 network engineers, 1 dba.
that‘s 9 people for tech, yes?
What is the difference between a sysadmin and network engineer?
Unless you need two people to desktop-support the 7 other computers / make sure unreal tournament is working.........
As an unrelated suggestion, I am almost as interested in PEOPLE architecture as I am in technology architecture.
How many people do these websites employ, and for what roles?
(We all know PlentyOfFish, but how about the others?)
Jason
reply
Tue, 12/18/2007 - 03:50 — Anonymous (not verified)Re: YouTube Architecture
Just so you Python guys know.....
eve-online is written on a Python backend.... A very hacked andcustomised one apparently, but Python none-the-less. Sounds like theirrecentdeployment may have developed a lot of scalbility tricks with Python.
reply
Sun, 01/13/2008 - 22:03 —Dumitru Brinzan (not verified)Re: YouTube Architecture
the YouTube team is that small? I was thinking more of a team of 20-30 people, with two dozen of moderators :)
Still, knowing that they run it onApache is great news for me, as a php developer :)
reply
Wed, 01/30/2008 - 21:11 —Insight IT (not verified)Re: YouTube Architecture
Really interesting and useful article, thanks a lot for it!
reply
Fri, 02/08/2008 - 15:16 —clusteradmin.bl...Re: YouTube Architecture
Greatarticle. I was surprised how few sysadmins they hire. I wonder whatthey use for a lower-level stuff, such as systems (re)installation andupdates, monitoring, etc.
-marek
--
clusteradmin.blogspot.com :: blog about building and administering clusters.
reply
Mon, 02/11/2008 - 17:01 —youtube (not verified)Re: YouTube Architecture
I agree with you very great man!
reply
Mon, 02/11/2008 - 17:31 —savas sahin (not verified)Re: YouTube Architecture
thank you very much!
reply
Thu, 02/14/2008 - 21:18 —Ask sözleri (not verified)Re: Startup hardware leasing
Design and deploy industrial strength Ab Initio applications
reply
Fri, 02/15/2008 - 15:43 —youtube (not verified)Re: YouTube Architecture
thanks
reply
Sun, 02/24/2008 - 22:22 —youtube (not verified)Re: YouTube Architecture
thank you
reply
Fri, 02/29/2008 - 23:34 — Anonymous (not verified)Re: YouTube Architecture
does anyone know how youtube determines a "unique" visit?
if i run a macro to a page and refresh, it does not show up as a view... is there a way around this?
reply
Mon, 03/03/2008 - 08:38 —youtube (not verified)Re: YouTube Architecture
thank you very nice
reply