Ruby, shallow copy surprise!
Many people, like me, that have done any C, C++ programming are used to the idea that programming languages either copy variables by value, or by reference, or either if explicitly set.
I was under the impression Ruby was a 'pass by value' language, but no longer. After spending most of a day debugging a problem and then finding out that one of my base-level assumptions about Ruby was fundamentally flawed I was quite shocked, and felt more than a bit stupid. But upon talking to some fellow Rubyists the other day at Rails Pub Nite I found out I was not alone in my assumptions. This may seem silly to some that have broader experience with a range of high level languages, but to others (like me!) it can be quite surprising.
Consider this:
1 require 'test/unit'
2
3 class SanityTest < Test::Unit::TestCase
4 def test_my_sanity
5 complex_array = [{:blah => "hello"},{:foo => "blah"}]
6 copied_array = complex_array
7 copied_array[0][:blah] = nil
8
9 assert_not_equal copied_array, complex_array
10 end
11 end
12
13 # 1) Failure:
14 # test_my_sanity(SanityTest):
15 # <[{:blah=>nil}, {:foo=>"blah"}]> expected to be != to
16 # <[{:blah=>nil}, {:foo=>"blah"}]>
What? If ruby is copying by value why is that change back-propagating to our original array? How about Object.clone? That should fix whatever the problem is. But no. After some research it turns out that there is a big difference between the way Ruby handles 'immediate' values (Fixnum, Symbol) and the way it handles more complex objects (Float, String, Array, Hash, YourClass) during copying. The key thing that eventually jumped out at me when reading over the ruby Object docs was the phrase 'shallow copy'. What is a 'shallow copy'? A shallow copy means that only the fundamental datatypes actually get copied into new objects, everything else gets copied as a reference to the original object!
So how can we fix this? It's not very Rubyish, but at least it's simple:
1 copied_array = Marshal.load(Marshal.dump(complex_array))
We are simply serializing the object to a string and then de-serializing it into our new object, this breaks the chain of references and makes it impossible for any changes to back-propagate.
Update:
Gregory Brown, author of 'Ruby Best Practices' was kind enough to offer his insight into the issue. I contacted Greg because I had the feeling that I was missing a fundamental Ruby idiom that would side-step this problem, here's what he had to say (edited for brevity):
Ruby is purely pass by value, if you consider that absolutely every value you pass is a reference (With the exception of a few immediate objects, Symbols, Fixnums, and the like)
The solution is usually to avoid copying in the first place. Write your functions in such a way that you consciously decide whether or not they should have side effects, and document them as such. [...]
So generally speaking, if you're trying to make a copy of an object, you're probably thinking about the problem wrong. Most of these cases can be handled by thinking about the way ruby references work and changing your function interfaces to work in a way that is good for your context. Unfortunately, there isn't a clear cut answer to when you should use what, it's more of an intuition and style thing.
That said, if you do find the need to do a deep copy, you can always serialize out your data and load it back in. [...]
If you want to do it "right", so that my_obj.dup does copying the way you want should also look up the documentation for initialize_copy(), this is sort of like a constructor for copy objects, and might be useful if you're building your own objects that need to be deep copied and don't want to use the Marshal hack.
For the first year or two of using Ruby, I used to worry about copying all the time, and my code was littered with .dup calls everywhere. These days I find anywhere I call dup() to be a code smell... sometimes it makes sense, most of the time it doesn't. The real key is to nail the right design, and then things aren't nearly as scary as they seem.
Hope this helps.
-greg
If you don't already have a copy of 'Ruby Best Practices' then go buy it! It's definitely worth your time!