Lazy Enumerators: Process Millions of Rows Without Blowing Up Memory

Here’s a script that reads a 2GB log file and finds the first 10 error lines:

File.readlines("production.log")
  .select { |line| line.include?("ERROR") }
  .first(10)

This works. It also reads the entire 2GB file into an array, filters every single line, then throws away everything except 10 results. If the first error is on line 3, you still process millions of lines and allocate gigabytes of memory to get there.

Now add one word:

File.readlines("production.log")
  .lazy
  .select { |line| line.include?("ERROR") }
  .first(10)

.lazy changes the behavior completely. Instead of processing the entire array at each step, Ruby processes one element at a time through the whole chain. The moment it finds 10 errors, it stops. If those errors are in the first 500 lines, only 500 lines are ever evaluated. Memory stays flat.

How lazy evaluation works

Normal (eager) enumerators process each operation fully before moving to the next. If you chain .select then .map then .first(5), Ruby runs .select on every element, builds a full intermediate array, then runs .map on every element of that array, builds another array, then takes 5 from the front.

Lazy enumerators flip this. Instead of running operations one at a time across all elements, they run all operations on one element at a time:

# Eager: process ALL elements at each step
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
  .select { |n| n.even? }    # => [2, 4, 6, 8, 10]  (processes all 10)
  .map { |n| n * 10 }        # => [20, 40, 60, 80, 100]  (processes all 5)
  .first(2)                   # => [20, 40]  (takes 2)

# Lazy: process ONE element through the whole chain
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
  .lazy
  .select { |n| n.even? }    # 1 fails, 2 passes...
  .map { |n| n * 10 }        # 2 → 20...
  .first(2)                   # got one. 3 fails, 4 passes → 40, got two. Stop.

The lazy version only evaluates elements 1 through 4. Elements 5 through 10 are never touched. For a 10-element array this doesn’t matter. For a million-row database export or a multi-gigabyte file, it’s the difference between a script that works and one that gets killed by the OOM reaper.

Where lazy shines

Large files

Reading files lazily is one of the most practical applications. Combine lazy with each_line or IO.foreach and you process files of any size without loading them into memory:

File.foreach("huge_dataset.csv")
  .lazy
  .reject { |line| line.start_with?("#") }
  .map { |line| line.strip.split(",") }
  .select { |fields| fields[2].to_f > 100.0 }
  .first(50)

Each line is read, checked, and either passed along or discarded before the next line is read. Peak memory usage is roughly one line plus whatever you’re accumulating in the result.

Chained transformations with early termination

Any time you’re chaining select, map, reject, and taking a subset of the results, lazy avoids wasted work:

users
  .lazy
  .select { |u| u.active? }
  .reject { |u| u.admin? }
  .map { |u| { name: u.name, email: u.email } }
  .first(20)

Without lazy, every user gets checked for active, then every active user gets checked for admin, then every non-admin active user gets mapped. With lazy, the chain stops the moment you have 20 results.

Infinite sequences

Lazy enumerators can work with sequences that have no end. This is impossible with eager evaluation because you can’t build an infinite array.

# All natural numbers
natural_numbers = (1..Float::INFINITY).lazy

# First 10 primes (naive implementation)
require 'prime'
primes = natural_numbers.select { |n| Prime.prime?(n) }.first(10)
# => [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

# Fibonacci sequence
fibs = Enumerator.new do |y|
  a, b = 0, 1
  loop do
    y.yield a
    a, b = b, a + b
  end
end

fibs.lazy.select(&:odd?).first(5)
# => [1, 1, 3, 5, 13]

The sequence generates values on demand. No array is ever built. You pull exactly as many values as you need and stop.

When NOT to use lazy

Lazy enumerators add overhead per element. There’s a wrapper object involved, and each element goes through more method dispatch than a simple eager chain. For small collections, this overhead can make lazy evaluation slower than eager.

# For a small array, eager is faster
small = (1..100).to_a
small.select(&:even?).map { |n| n * 2 }.first(5)  # faster
small.lazy.select(&:even?).map { |n| n * 2 }.first(5)  # slower (overhead > savings)

The rule of thumb: if the collection fits comfortably in memory and you’re processing all or most of it, stay eager. Reach for lazy when:

The dataset is large or unbounded
You only need a subset of results (especially with first, take, or find)
You’re chaining multiple filters that eliminate most elements early
Memory is a constraint

If you’re processing every element and keeping every result, lazy just adds overhead with no benefit. The savings come from not processing elements, so there need to be elements you can skip.

Forcing a lazy enumerator

Calling .lazy returns a Lazy enumerator object, not an array. If you need a concrete result and aren’t using first or to_a, you can call .force (which is just an alias for .to_a):

result = numbers.lazy.select(&:even?).map { |n| n * 2 }
# => #<Enumerator::Lazy: ...>

result.force
# => [4, 8, 12, ...]  (concrete array)

result.to_a
# => same thing

Be careful with .force on infinite sequences. It will try to materialize the entire sequence and never finish. Always use first, take, or some other terminating method with infinite lazy enumerators.

Composing with other Enumerable methods

Most Enumerable methods work with lazy enumerators: select, reject, map, flat_map, zip, take, take_while, drop, drop_while, grep, and chunk. A few methods that inherently need the full collection, like sort, reverse, and min_by, will force evaluation of the entire sequence, defeating the purpose of lazy.

# This works lazily
data.lazy.select { ... }.map { ... }.take(10)

# This forces full evaluation (sort needs all elements)
data.lazy.select { ... }.sort_by { ... }.take(10)

If you need to sort, do it after you’ve narrowed the dataset down as much as possible. Filter lazily first, then sort the smaller result eagerly.

A practical pattern: streaming CSV processing

Here’s a real-world pattern for processing large CSV exports without loading the whole file:

require 'csv'

def high_value_orders(csv_path, threshold: 1000)
  File.foreach(csv_path)
    .lazy
    .drop(1)                                          # skip header
    .map { |line| CSV.parse_line(line) }
    .select { |row| row[4].to_f > threshold }         # amount column
    .map { |row| { id: row[0], customer: row[1], amount: row[4].to_f } }
    .each_slice(100)                                   # batch for DB insert
    .each { |batch| Order.insert_all(batch) }
end

This processes a CSV of any size, one line at a time, batching inserts in groups of 100. Memory usage is constant regardless of file size.

The one thing to remember

.lazy doesn’t make your code faster in the general case. It makes it possible to work with data that’s too large to fit in memory, and it avoids wasted computation when you only need a fraction of the results. When those conditions apply, it’s the difference between a script that runs and one that crashes.

For everything else, eager enumeration is simpler and faster. Reach for .lazy when you need it, not as a default.