Nimble Method: Guerrilla's Guide to Optimizing Rails Applications

来源:百度文库 编辑:神马文学网 时间:2024/04/28 11:34:52
Guerrilla's Guide to Optimizing Rails Applications
The Battle For Performance
Rails is slow. Your rails application which does fairy simpledatabase operations will be born slow and will eventually become slowerand slower while still not doing anything fancy. That's like getting anew powerful locomotive (on Rails) together with a free bonus ofoverladen freight cars.
Having said that, I can virtually hear the herd of voices screaming"dude, optimize your own application and don't blame Ruby and Rails foreverything". Sure thing, our application isn't perfect and I do believeit can be optimized but you can't win the performance battle just bythat.
The winning tactics for this battle is guerrilla tactics. It turnsout you can guerrilla-patch (aka monkey-patch) Ruby and Rails andeasily get up to 3x speedup in some cases. We started doing that to ourown application -Acunote and got amazing results so far. Read on to find out how.
Ruby-Prof: The Guerrilla's Weapon
Acunote, our pet rails application, often need to copy a bunch ofActiveRecord (AR) objects. As any other project management tools, ourapplication works with tasks organized into iterations (sprints).Sometimes you want to copy tasks from current iteration/sprint to thenext iteration/sprint. And boy, that copy can become really slow. As anexample, let's look at the performance and profiling data of onerequest to copy 120 tasks.
Let's measure 3 things:
the real time request takes in production environment;
processing time profile information using the excellentruby-prof profiler byCharlie Savage andShugo Maeda;
memory profile information gathered with patched ruby-prof (I took Sylvain Joyeux's memory allocations measurement mode patches and hacked together my own ruby-profmemory size measurement mode).
So, for "copy 120" request we get:
Request time (without profiler): 13.6 sec.Time:Total: 16.78%self total self wait child calls name18.36 3.34 3.08 0.00 0.26 20766 Kernel#clone...Allocated Memory Size (in KB):Total: 1 001 730.00%self total self wait child calls name61.87 646795.00 619745.00 0.00 27049.00 20766 Kernel#clone...
"WTF" politely describes my initial reaction when I first saw that. How on earth we can spend 17 seconds and consume 1 Gigabyte of memory by just copying 120ActiveRecord objects? Ok, ok, we not only copy them but store some moreinformation to the database like task tags, copy events and so on.Therefore we end up saving actually360AR objects instead of 120. Alsothe memory profiler is inaccurate and usually shows a bit more memorythan the process actually takes.
Still, what is going on during those 17 seconds? Why do we need agig of RAM? Gee, on the same machine I can compile the whole rubyinterpreter in 57 seconds and the compiler will never take more than130M...
Heh, let's calm down and see what profiler says. The obvious suspect is Kernel#clone method. The (stripped) call tree for it as reported by ruby-prof is
Task#save...ActiveRecord::Base#create_without_callbacksActiveRecord::Base#attributes_with_quotesActiveRecord::Base#attributesActiveRecord::Base#clone_attributesActiveRecord::Base#clone_attribute_valueKernel#clone
FromCharlie's blog we know that AR::Base::attributes methodis evil. It clones attributes before returning them and Charlie gave awise advice to not use it. And indeed in your application is a goodidea to call, for example, task['foo'] instead of task.foo or task.attributes['foo']. But here AR::Base::create itself does the evil thing.
Let me cite Rails code (from rails/activerecord/lib/activerecord/base.rb):
def updateconnection.update("UPDATE #{self.class.table_name} " +"SET #{quoted_comma_pair_list(connection, attributes_with_quotes(false))} " +"WHERE #{self.class.primary_key} = #{quote_value(id)}","#{self.class.name} Update")enddef createif self.id.nil? && connection.prefetch_primary_key?(self.class.table_name)self.id = connection.next_sequence_value(self.class.sequence_name)endself.id = connection.insert("INSERT INTO #{self.class.table_name} " +"(#{quoted_column_names.join(', ')}) " +"VALUES(#{attributes_with_quotes.values.join(', ')})","#{self.class.name} Create",self.class.primary_key, self.id, self.class.sequence_name)@new_record = falseidenddef quoted_column_names(attributes = attributes_with_quotes)attributes.keys.collect do |column_name|self.class.connection.quote_column_name(column_name)endenddef attributes_with_quotes(include_primary_key = true)attributes.inject({}) do |quoted, (name, value)|if column = column_for_attribute(name)quoted[name] = quote_value(value, column) unless !include_primary_key && column.primaryendquotedendend
Great! AR::create clones all attributes twice (first with quoted_column_names and then with attributes_with_quotes). AR::update is a nicer guy, it clones attributes only once. For each of those 120 copied tasks we use AR::create twice and AR::update once. Therefore we call clone_attributes not less than 120*2*2+120 = 600 times.
Profiler says we lose 3 seconds and 650M memory because of that:
ActiveRecord::Base#clone_attributes:Time: 3.21 secMemory: 650 301.00 K
Ok. Time to employ our promised guerrilla tactics, but first let's talk about memory.
Lyrical Digression: Why Do We Care About Memory?
While guerrilla warriors take a nap before the battle, let's digressand look at why we measure the size of allocated memory. Sure thing,allocating literally tons of memory is bad. But we're using Ruby andRuby has its own guerrilla warrior which strikes our application whenit innocently eats its tasty memory.
The name of our enemy is "Ruby Garbage Collector" and the cunningenemy it is. As ruby's gc.c source says and Why the Lucky Stiffexplains,ruby GC kicks in when you allocate more memory than is available on theheap and heap boundary is 8M. When we sequentially allocate 650megabytes there's a chance that GC will be called 650/8=81 times. EachGC call accounts for approximately 70-80ms which in our case should addroundabout 5 seconds to the running time.
Let's do some math... Kernel#clone itself takes 3 seconds and weexpect garbage collection to take 5 more seconds. In total, 8 secondsare wasted because of cloning. See the trend? The more memory weallocate the worse our performance is.
Here's the morale: "watch your memory". Large memoryconsumption costs you more than you'd otherwise think because ofgarbage collection.
Guerrilla patching AR::create and AR::update
Now it's time to employ our guerrilla tactics and fix ActiveRecord. It's easy to rewrite attributes_with_quotes and avoid cloning the attributes:
module ActiveRecordclass Baseprivatedef attributes_with_quotes(include_primary_key = true)result = {}@attributes.each_key do |name|if column = column_for_attribute(name)result[name] = quote_value(read_attribute(name), column) unless !include_primary_key && column.primaryendendresultendendend
Now let's put that to environment.rb and see what we get:
Request time (without profiler): 6.7 sec.Time:Total: 9.5%self total self wait child calls name4.84 0.69 0.46 0.00 0.23 24023 ActiveRecord::ConnectionAdapters::Quoting#quote...2.22 0.52 0.18 0.00 0.34 2779 #realtime...0.00 0.00 0.00 0.00 0.00 6 Kernel#clone...Allocated Memory Size (in KB):Total: 350 966.00%self total self wait child calls name34.31 125655.00 120420.00 0.00 5235.00 2779 #realtime...0.00 0.00 0.00 0.00 0.00 6 Kernel#clone...
Ok, much better: 650 megabytes of memory saved with that little monkey patch! We estimated that will save 8 seconds. Let's look:
Original request time: 13.6 secRequest time after AR patch: 6.7 secTime saved: 6.9 sec
It turns out our estimation was almost correct. We indeed saved not only the time taken by Kernel#clone but also the time spent by garbage collector and gained 2x speedup for free. Nice! Our guerrilla tactics clearly works and we can for sure do better than 2x.
Guerrilla-patching Benchmark.realtime
Next suspect, as seen from memory profile above, is Benchmark#realtime.From time profiler the method doesn't look bad. We spend only 180milliseconds in it but look at memory profiler! It 180ms our Benchmark#realtime friend allocates 120 megabytesof memory. Guess what? Ruby garbage collector will be happy to kick ourass again at least 15 times and that estimates to about 1 second loss.
Let's see what's wrong in benchmark.rb:
def measure(label = "")t0, r0 = Benchmark.times, Time.nowyieldt1, r1 = Benchmark.times, Time.nowBenchmark::Tms.new(t1.utime - t0.utime, t1.stime - t0.stime,t1.cutime - t0.cutime, t1.cstime - t0.cstime, r1.to_f - r0.to_f, label)enddef realtime(&blk)Benchmark::measure(&blk).realend
Great, we calculate lots of useless information (like cpu time, user time, etc) in Benchmark#measure and then just throw it away. Let's do better, simpler and more memory efficient:
module Benchmarkdef realtimer0 = Time.nowyieldr1 = Time.nowr1.to_f - r0.to_fendmodule_function :realtimeend
That's it. No extra object creation. And here's our income:
Request time (without profiler): 5.8 sec.Time:Total: 8.63%self total self wait child calls name4.29 0.69 0.46 0.00 0.23 24023 ActiveRecord::ConnectionAdapters::Quoting#quote...0.35 0.19 0.03 0.00 0.16 2779 #realtime...Allocated Memory Size (in KB):Total: 225 668.00%self total self wait child calls name43.78 98805.00 98805.00 0.00 0.00 5280 Kernel#binding0.00 1918.00 2.00 0.00 1916.00 2779 #realtime...
Whoa! 120 megabytes and 1 second were saved with9-liner guerrilla patch. Nice, isn't it? Instead of original 13.6 secour request now runs in 5.8 sec. That's more than 2x improvementalready and we can do even better, but I'll leave that for the nextblog.
When Guerrilla Tactics Help
Nothing with Ruby and Rails is wrong until you do things in the loop. For example, Benchmark#realtimedoesn't have any impact if all you do is save one AR object. You justallocate 25 extra objects that take 45 extra kilobytes. But you'redoomed once you do that in a loop. Do the benchmark 200 times andyou'll guaranteed the garbage collector will happily shoot you in theback.
Also bear in mind that garbage collector should not be running whileyou profile your code. With GC you'll never get repeatable resultsbecause it will kick in at different times randomly increasing the timeand memory consumption of your methods.
Make Your Application Fast: Optimize MEMORY
Let me reiterate, by optimizing memory usage instead of (or inparallel with) processing time you can get significant performanceimprovement, and guerrilla patching memory-inefficient bits is theright way to make Ruby and Rails faster I think we're on the righttrack here. Memory is what we need to optimize. Memory is what wasn'toptimized before. And you've just seen what we can get by optimizingmemory.
We (Acunote devs) will continue the memory profiling and will blog as soon as we get new results.Our previous blog postbriefly summarize our current results, be sure to read it if youhaven't done that yet. Meanwhile, don't wait for us, grab ruby-prof,my memory profiler patchesand dig by yourself ;) I bet you'll find more big memory eaters andwhen you do so, please drop us a note. Let's collect our guerrillapatches and then file them upstream.