Ahhh, nothing like a little Ruby sugar to start the weekend. I did little refactoring of a particular ugly bit of code and couldn’t help to share. This one is all about chopping up arrays.
In a recent Rails app I needed a slightly different interpretation of in_groups_of. I had the need to chop up arrays into x parts, x being variable, but the parts have to be somewhat of the same length. Also, the chopped-up blocks array ideally would begin counting from the back of the array.
Here is my first attempt. Warning, it’s ugly. But it serves to illustrate my point.
class Array
# Divides an array into chunks of the same size.
def chunk(count = 3)
length = (self.length / count.to_f).ceil
result = []
slice = count - 1
count.times do
result<< self.slice(slice * length, length)
slice = slice - 1
end
result
end
alias / chunk
end
To get the first class Array out of the way: that is just to add this method to the built-in Array class. The rest of this snippet is how a typical PHP-monkey would approach the problem. Set up a bunch of local variables, set up an array, fill the array and then return the filled array at the end.
Not the Ruby Way(tm).
Second take:
class Array
# Divides an array into chunks of the same size.
def chunk(count = 3)
length = (self.length / count.to_f).ceil
result = []
count.times do
count = count - 1
result<< self.slice(count * length, length)
end
result
end
alias / chunk
end
Better, it reduces the subtraction of slice. One line down but still not very Ruby-ish. The pattern we're seeing here is a simple inject. So let's use that instead:
class Array
# Divides an array into chunks of the same size.
def chunk(count = 3)
length = (self.length / count.to_f).ceil
(1..count).inject([]) do |memo, obj|
count = count - 1
memo<< self.slice(count * length, length)
end
end
alias / chunk
end
Ah, now we're getting somewhere! Lots of lines less, same functionality. I build a Range using the number of chunks and iterate through it with the injector. Definitely much more Rubyesque. Two things bug me: still decreasing the count variable which is messy since it might be used outside the iterator some time. And the Range is just for show; its contents are unused. Let's fix that.
class Array
# Divides an array into chunks of the same size.
def chunk(count = 3)
length = (self.length / count.to_f).ceil
(1..count).inject([]) do |memo, obj|
memo<< self.slice((count - obj) * length, length)
end
end
alias / chunk
end
There. The local variable is left alone and I am using the contents of the Range inside the iterator. A lot better. I am sure there are ways left to reduce still, using various Ruby constructs, but at this stage I am happy. This code is now lean, mean and still fairly easy to follow and that is important too.
Wicked. It’s great to see someone else who’s trying as i am to improve their code to make it more “ruby”. Thanks for the ideas.
My chunk2 and chunk3 methods seems to perform a bit better than your original version. I think it’s because they do not have to create a range object and the subtraction is only performed once and not on every element. It is not a great difference, but maybe you can improve it again:
class Array
# Divides an array into chunks of the same size.
def chunk(count = 3)
length = (self.length / count.to_f).ceil
(1..count).inject([]) do |memo, obj|
memo<< self.slice((count - obj) * length, length)
end
end
def chunk2(count = 3)
length = (self.length / count.to_f).ceil
(count-1).downto(0).inject([]) do |memo, obj|
memo<< self.slice(obj * length, length)
end
end
def chunk3(count = 3)
length = (self.length / count.to_f).ceil
(count-1).downto(0).inject([]) do |memo, obj|
memo<< self[obj * length, length]
end
end
alias / chunk
end
a = (1..999).to_a
require 'benchmark'
TESTS = 20000
Benchmark.bmbm do |x|
x.report('chunk ') { TESTS.times {a.chunk} }
x.report('chunk2 ') { TESTS.times {a.chunk2} }
x.report('chunk3 ') { TESTS.times {a.chunk3} }
end
puts a.chunk == a.chunk2
puts a.chunk == a.chunk3
your code might be faster, but gives a LocalJumpError on my arrays…
Maybe you can give an example. Especially in chunk2 my function is doing exactly the same as yours. Except that you are counting from 1 to count and get the difference between the desired chunks (count) and the current value (obj) in the loop. And chunk2 just counts down from count-1 to 0, and does not have to evaluate the difference.