Migrating Rails with
a Large Codebase

presented by

Greg Hurrell and Adam Derewecki

Causes stats

Facebook Platform launch partner
Platform for collective action with 180m+ users
44k+ commits, first commit December 2006
1.2k+ source code files (down from 1.4k+)
93k+ LOC
3.4k+ example spec suite

Upgrade Roadmap

Rails 2.1.0 (May 2008)
Rails 2.3.14
Rails 3.0.11
Rails 3.2.3
Rails 3.2.6, 3.2.7 etc

Plan

Get spec suite passing
Manually QA in development environment
Test on staging
Gradual roll-out to production
Long-lived branches = pain, so be expedient

Anticipated
Pain Points

View auditing (due to new escaping behavior)
Gem compatibility
Asset pipeline
Routes (new DSL, and problematic for a Facebook Canvas app with "asymmetric" routes)

ree-1.8.7 to 1.9.3

Range#include? => Range#cover? to test Time range
File.open needs :binmode => true for binary files (like images)
Gem version upgrades
Range#step will not iterate over ranges of Time
Strings are not Enumerables, must call String#each_line to enumerate
Enumerable#map with no block returns an Enumerator instead of an Array
Date.today no longer calls Time.now, which broke our Time.warp

ree-1.8.7 to 1.9.3

Set operations require an enumerable instead of an instance of the object
e.g. Set.new([1,2]) - 1 vs. Set.new([1,2]) - [1]
Date.parse no longer understands mm/dd/yyyy format (which Facebook uses)
Removed String#to_a
when 'condition': 'returnvalue' became when 'condition' then 'returnvalue'
{:k1, val, :k2, val2, ...} syntax removed

ree-1.8.7 to 1.9.3

1.9.3 went live at 12:35

mysql gem to mysql2 gem

Benchmarking showed that the same query repeated 20x would have 1/20 of the queries running on the order of 100ms instead of .1-.4ms
Performing a basic "SELECT * FROM" query on a table with 30k rows and fields of nearly every Ruby-representable data type, then iterating over every row using an #each-like method yielding a block*:
```
user       system     total       real
Mysql2
0.750000   0.180000   0.930000 (  1.821655)
do_mysql
1.650000   0.200000   1.850000 (  2.811357)
Mysql
7.500000   0.210000   7.710000 (  8.065871)
```
(* https://github.com/brianmario/mysql2/)

When 'stable' isn't

Prepared Statement Cache for database queries
New feature in Rails 3.1
Allows `SELECT * FROM users WHERE id = 1' to be compiled as `SELECT * FROM users WHERE id = ?' and bind `1' as the param
Prepared statements execute faster
... except on MySQL, there is no significant speed gain
We launched Rails 3.2.3 on the night of 2012/4/18

The next morning...

ActiveRecord::StatementInvalid: Mysql::Error: Can't create more than max_prepared_stmt_count statements (current value: 16382): SELECT `articles`.* FROM `articles` WHERE `articles`.`id` = ? LIMIT 1
Google fu => https://github.com/rails/rails/issues/5121
Statement cache breaks for :has_many relations: SELECT * FROM articles WHERE user_id = ? and group_id = 1
Notice how only one id is parameterized
Eventually, you hit the MySQL default limit of 16382 prepared statements

Moving to bleeding edge

fd3984 allows the Statement Cache to be disabled

We vendored Rails and put HEAD at fd3984

commit fd398475afb64e362059a500e5cd54d08b9afdee
Author: Aaron Patterson <aaron.patterson@gmail.com>
Date:   Tue Feb 21 15:08:54 2012 -0800

prepared statements can be disabled

This was also nearly 2 months after the commit, and it hadn't made it to a stable release

Master-Slave

In Rails 2.1, `masochism' Gem directed writes to the master and reads to the slave
In Rails 2.3, evaluated several alternatives (Octopus, DbCharmer); switched to the master_slave Gem
With each subsequent update, we had to repatch master_slave and and our own "ar_extensions" code
Because of sitewide optimizations, we decided that SELECTs off of the master were not taxing enough to worry about getting read-from-slaves set up
Slaves exist today for failover purposes

Over-siloed database topologies

Database topology was too "smart"
Some large tables (cause_memberships ~900m rows)
`in_silo' broke with every major point release we upgraded to
Only silo databases once you hit performance problems

Caching ARel Relations

Shame on us for storing ActiveRecord objects in memcache!
Model.where(:id => x) is an ActiveRecord::Relation
If you cache this, you're caching the un-evaluated query
Every time it's retrieved from the cache, it evaluates
.. probably not what you wanted memcache to do

Mailers

Mailer.deliver_themailer
Mailer.themailer().deliver
Custom behavior implemented via common superclass
Reimplemented in terms of interceptors
There was no easy to way to convert this and we ended up verifying each mailer by hand
Premailer adds an additional layer of complexity

The Static
Asset Pipeline

A.K.A. Silver Bullet

Huge performance boost via concatenation and minification
Slow and painstaking process
Minimal effort up front to get rolling: symlink, then incrementally migrate and SASSify
Huge changes required to deploy process

Rollout

Special care needed with async jobs
Need separate queues, each running different Rails stack
Gradual roll out across application servers
Memcache may require invalidation

Takeaways

Stay as close to vanilla Rails as possible
Be prepared to pull in commits ahead of Rails stable
Minimize your external dependencies (lean Gemfile)
Simplicity is a win for the product and for the code base ("Dumb is the new clever")
Beware ARel and lazily evaluated queries

Was it all
worth it?

Huge performance gains (Ruby 1.9, Asset Pipeline, mysql2 adapter)
Able to use latest shiny toys (Haml, Sass, Compass, Jasmine, RSpec 2 etc)
Years of technical debt paid off, code deleted
Developer productivity and happiness higher than ever

Migrating Rails witha Large Codebase

presented by

Greg Hurrell and Adam Derewecki

Causes stats

Upgrade Roadmap

Plan

AnticipatedPain Points

ree-1.8.7 to 1.9.3

ree-1.8.7 to 1.9.3

ree-1.8.7 to 1.9.3

mysql gem to mysql2 gem

When 'stable' isn't

The next morning...

Moving to bleeding edge

Master-Slave

Over-siloed database topologies

Caching ARel Relations

Mailers

The StaticAsset Pipeline

A.K.A. Silver Bullet

Rollout

Takeaways

Was it allworth it?

Migrating Rails with
a Large Codebase

Anticipated
Pain Points

The Static
Asset Pipeline

Was it all
worth it?