This ActiveRecord plugin is the end result of a real world problem - First, how to aggregate table data from across multiple databases for the purpose of pagination / searching. Second, how to transparently have one production instance have read/write access to a set of aggregated records pulled from a global view.
The first step was solved by creating a SQL view to pool all production records together for a given databse. This SQL view resides on a central database accessible by all instances of our application. From there, we query against this view to return a set of Aggregate* records. The second step, which this plugin handles, is proxying to the original database/record for the ‘aggregate’ instance and transparently interacting with the returned aggregate record as if it were the original object.
Currently, sharded_database works by reflecting on a set of returned records, and taking the result of #sharded_connection_klass method to delegate the connection to - for each instance. The path we chose had us creating a view that would return aggregate records, with an added ‘source’ column specifying the originating database. We then rigged our #sharded_connection_klass to return an abstract connection model depending on the value. Other ideas for implementation include an after_create callback that writes the source database / ID to a centralized table.
Setup your Aggregate model, inheriting from a class that establishes a connection to your global datastore.
class AggregateFoo < ActiveRecord::Base include ShardedDatabase::Aggregate source_class 'Foo' def sharded_connection_klass "Connection::#{source.classify}".constantize end end
The constant returned by #sharded_connection_klass is expected to be an subclass of ActiveRecord::Base and respond to #connection. It is suggested that you use the preserve_attributes class method in the aggregate class and use that attribute to augment and/or find the constant somehow.
As of 0.2.0, the class method #source_class is required - sharded_database no longer guesses for the correct source class!
Assuming that AggregateFoo is a model that is bound to your aggregate view/table, you can now #find against this model as you would any other ActiveRecord class.
AggregateFoo.all # => [#<Foo(Connection::One) ..>, #<Foo(Connection::Two) ..>, #<Foo(Connection::One) ..>]
Updating an attribute on the first record in the above array would update the corresponding record located in the database that Connection::One connects to.
Associations are also taken into account, and any associations that are defined on a Foo model will be sourced correctly. I have only tested this one level deep, though.
Easily done. Simply add a :aggregate_proxy option set to false to your finders.
AggreateFoo.all(:aggregate_proxy => false)
In the event that you need to load a Foo object directly, and you may or may not know the appropriate connection at runtime, sharded_database is able to apply itself to an ActiveRecord model directly.
This is done by accepting a :connection key in the options hash for a #find method. The value of :connection can either be an abstract ActiveRecord class, a Proc, or a Symbol, which will attempt to call a method of the same name on the class doing the finding. The latter two options are also supplied the #find arguments, which is useful when wanting to dynamically determine a connection.
Any class that includes ShardedDatabase::Aggregate and defines a source_class will have that class be given this functionality. To add manually:
class Bar < ActiveRecord::Base include ShardedDatabase::ModelWithConnection end
As mentioned above, there are three acceptable values for the :connection key. Borrowing heavily from ActionController’s callback setup, acceptable values can either be an ActiveRecord class, or a Proc or class method name (represented as a symbol) that will ultimately return an ActiveRecord class. Both the Proc and method will be passed a splat of the attributes passed to #find.
The method implementation will be demonstrated in the Example, but the other implementations are as follows:
Supplying a class
Bar.find(:first, :order => 'created_at asc', :connection => BarConnection)
Supplying a Proc
Bar.find(123, :connection => lambda { |*args| (args.first % 2) == 1 ? Connection::One : Connection::Two })
…Which would use Connection::One, as % 2 of 123 is 1.
A real world situation would be something akin to Flickr. Data is sharded over N number of databases, and the criteria sent to your finder method would be used to connect to the right datastore.
Imagine you have a photo sharing application, and due to the amount of photos managed, you’re forced to shard the ‘photos’ database across 3 database servers. To do this with sharded_database, you’d do something like:
class Photo < ActiveRecord::Base def self.determine_database(*attrs) # [:all, { :conditions => { :name => 'mountain' } }] case first_letter = attrs.last[:conditions][:name].first[0,1] # get the first letter when a..f : Connection::AToF when g..p : Connection::GToP when q..z : Connection::QtoZ end end Photo.all(:conditions => { :name => 'mountain' }, :connection => :determine_database)
Which would return any photo named ‘mountain’ against the database Connection::GToP is connected to.
-
Apply associations to objects found using the :connection key
-
Fix association application to regenerate a proper reflection/association
Copyright © 2008, Brennan Dunn, Decisiv Inc. Released under the MIT license.