A long time ago, in an now long forgotten ruby-talk thread, a poster asked, "what's all the fuss over these open classes? If you really need something like that just open up a Hash and add method_missing", or something to that effect. It seemed a reasonable comment. And for some time after I thought the poster made a good point: for simple needs an "OpenHash" is a good light-weight solution, useful to many cases. The basic implementation of an OpenHash is as follows.

  class OpenHash < Hash
    def method_missing(s,*a,&b)
      case s.to_s
      when /\=$/
        self[s] = a[0]
      else
        a.empty? ? self[s] : super(s,*a,&b)
      end
    end
  end

Very concise, very elegant, very charming. But, looks can be deceiving. The problem lies in the fact that #method_missing catches public messages that are undefined. So right off the bat a whole slew of hash keys will be off-limits, notably any public method already defined by Hash or Enumerable, Object or Kernel. That's a lot of keys. To mitigate this problem we can either use delegation inplace of sub-classing Hash, or we can privatize all the methods.

  private *instance_methods.map{ |x| x.to_s !~ /^__/ }

The downside of course is that we no longer have a "Hash" and no thus access to any of the Hash methods. So here we have a bit of a catch-22. How much do we want OpenHash to get out of the way of potential keys and how much do we want access to the usual set of Hash methods?

Beyond this issue there is also a more subtle problem. Any core extension method added to Hash, Enumerable (if we are sub-classing Hash), or Object and Kernel (in either case), after OpenHash has already been defined, will create another exception that can't be anticipated. The only robust work around for this is to make OpenHash a subclass of BasicObject, which means we will have to use delegation and Ruby 1.9 (or an emulative BlankSlate library).

So now you might think, clearly these issues are the reason Ruby has OpenStruct (ostruct.rb) rather then this simple OpenHash class. However, it turns out that OpenStruct is little more than a delegated OpenHash and suffers the same issues.

Despite these issues however it is tempting to think that simple classes like OpenHash are okay solutions in most cases. After all, the chances of a name clash are fairly slim. I used to think this way myself. But one day, as I considered using an OpenHash in production code, it dawned on me. Using a weak solution just because it's "light-weight" and will probably be okay is asking for trouble. Even if all my pretty tests passed today, a simple change tomorrow in some far off dependency could stymie the whole works, and in such a way that it might not be so easily noticed right away. Clearly, despite the temptation, it is much wiser to use a more robust solution, even if it seems less elegant.

Thankfully, we don't need to give up on our OpenHash just yet. It turns out there is a reasonably robust solution to it's shortcomings: We can protect all newly defined keys.

  class OpenHash < Hash
    def method_missing(s,*a,&b)
      case s.to_s
      when /\=$/
        self[s] = a[0]
        protect_slot(s)
      else
        a.empty? ? self[s] : super(s,*a,&b)
      end
      private
      def protect_slot(s)
        (class << self; self; end).class_eval do
          protected s rescue nil
        end
      end
    end
  end

This will ensure that any set value will remain so, no matter what methods are previously defined or subsequently defined in superclasses. Great! But wait... we now run into another risk. We can't depend on any original method always being available. The above code could even allow object_id = :you_loose. Ouch! Clearly we need to add a few exceptions. We can do this by adding a new method called #omit_slots.

      private
      def omit_slots
        [:__id__, :__send__, :object_id]
      end
      def protect_slot(s)
        (class << self; self; end).class_eval do
          protected s rescue nil
        end unless omit_slots.include?(s.to_sym)
      end

That will handle the most basic case. And if we need an OpenHash with more restrictive requirements we can just override #omit_slots, either via a singleton method or by sub-classing OpenHash. Easy.

This pretty much mitigates all our issues with OpenHash, though certainly there are a few things we can yet do to spruce her up, such as define and omit a #to_hash method. Even so, this implementation does leave us a bit straddled in at least one respect: We have to be very precise about what methods/keys we plan to use. That might not always be easy as we would like. So might there be a way to have our cake and eat it too?

If we are willing to give up just a little of the usual elegance, then we can. Ruby allows us to define methods ending in a ? mark. So we can use this for an alternate implementation to look up values. Lets call it OpenQuery.

  class OpenQuery < Hash
    def method_missing(s, *a, &b)
      case s.to_s
      when /\?$/
        self[s.to_s.chomp('?').to_sym]
      when /\=$/
        self[s] = a[0]
      else
        super(s, *a, &b)
      end
    end
  end

Now we have all the feature of regular Hash, plus the elegance of an OpenStruct so long as we end our look up methods with a question.

  oq = OpenQuery.new
  oq.foo = 10
  oq.bar = 20
  oq.foo?                 #=> 10
  oq.bar?                 #=> 20
  oq.map{ |k,v| v + 10 }  #=> [20, 30]

With OpenQuery we need only worry about 20 or so core methods ending in ?.

    has_key? empty? instance_variable_defined? has_value? key? nil? value?
    tainted? include? instance_of? equal? all? kind_of? eql? any? is_a?
    member? one? frozen? respond_to? none?

If any of these are an issue, we can always add #protect_slot and #omit_slots to this solution as well.

The OpenHash concept has been explored by others in the past. You can find a rendition of it here. In this case the author, Fabian Streitel, mitigates the name clash issue by not sub-classing Hash. If Fabian went one step further and sub-classed BasicObject then he'd have a pretty good solution on his hands.

Ruby Facets also has a class called OpenObject. It is more robust than either OpenHash or OpenStruct in that it allows any method name to be set, and leaves only a few methods in place such as #each and #to_hash, but even these can, for better or worse, be overwritten by the setter.

So, returning to our original question, "Is OpenHash a bad idea?", we can happily say, "No". But only if we take great care in our design. There is no perfect solution here. We can get pretty close, but there are trade-offs regardless of the solution chosen. Ultimately the key is to know exactly your needs and the potential consequences of the solution you select, and code accordingly.

Written by trans, 2010-04-14