From Python Generators to Scala Streams

Sharing is caring!

Preface

As a “Pythonista” one cool feature of Python that I was looking for in Scala were generators.
Python generators are a sort of iterators on an incrementally generated collection and they are magically created by using the yield keyword (which is also available in Scala but it has a complete different behavior)*.
For example the following Python code generates as many string objects as the given count on demand (this means that we’ll get only once object in memory on each iteration instead of a collection with multiple objects):

1
2
3
def stream(count):
    for i in range(count):
        yield "string_{}".format(i)

By calling the function above a generator instance will be returned:

1
2
string_generator = stream(10)
print(type(string_generator))

The output will be:

1
<class 'generator'>

By iterating on the returned object:

1
2
for item in string_generator:
    print(item)

We will get the following output:

1
2
3
4
5
6
7
8
9
10
string_0
string_1
string_2
string_3
string_4
string_5
string_6
string_7
string_8
string_9

So… you might wonder “what’s all the fuss about printing strings in a loop?”, the cool thing is that those objects are generated one by one on demand if and only if it’s needed, this means that you can generates bazillion of objects by having only one of them in memory at time.
This can be tested by adding a simple print statement before the yield statement.

1
2
3
4
def stream(count):
    for i in range(count):
        print("Generating new object ({})".format(i))
        yield "string_{}".format(i)

The output of:

1
2
for item in stream(10):
    print(item)

Will be:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Generating new object (0)
string_0
Generating new object (1)
string_1
Generating new object (2)
string_2
Generating new object (3)
string_3
Generating new object (4)
string_4
Generating new object (5)
string_5
Generating new object (6)
string_6
Generating new object (7)
string_7
Generating new object (8)
string_8
Generating new object (9)
string_9

As you can see by calling stream() you won’t get a preallocated collection of objects but instead you will pick one by one during the iteration. In practice when Python interpreter find a yield inside a loop it does not run it, but instead it turn that into a special class (generator).
Python generators act like iterators, so they can be “traversed” only once (after the first iteration the generator has been consumed and it won’t return any further data).

Let’s get to Scala now…

How can implement the same thing in Scala?
The answer is: by using Stream!
Here is a basic example:

1
2
3
def stream(): Stream[String] = {
  "one" #:: "two" #:: "three" #:: "four" #:: Stream.empty[String]
}

And by calling:

1
stream().foreach(println)

The output will be:

1
2
3
4
one
two
three
four

Ok, but how does it work? Well, basically the sign #:: in plain English means “next in the queue will be”, then at the end a Stream.empty acts as a sentinel value which indicates the end of the stream.
Obviously in the real world an approach like the code above does not make sense since it’s hard-coded. In order to make a stream dynamic like the previous Python example, in Scala we have to use recursion:

1
2
3
4
5
6
7
def stream(count: Int): Stream[String] = {
  if (count > 0) {
    s"string_$count" #:: stream(count - 1) // <- go on (recursion)
  } else {
    Stream.empty[String] // <- end of stream
  }
}

Which by calling stream(10).foreach(println) produces the following output:

1
2
3
4
5
6
7
8
9
10
string_10
string_9
string_8
string_7
string_6
string_5
string_4
string_3
string_2
string_1

As you should have noticed the output is different from the Python implementation, in order to have the same one we need an helper inner function**:

1
2
3
4
5
6
7
8
9
10
11
12
def generate(count: Int): Stream[String] = {
    def stream(current: Int): Stream[String] = {
        if (current < count) {
            s"string_$current" #:: stream(current + 1)
        } else {
            Stream.empty[String]
        }
    }
    stream(0) // <- start the generation from 0
}

In this way the count will be the same (from 0 to 9).

One more thing…

While both Python generators and Scala streams are lazy, in Scala, generated objects are kept in memory and it’s possible to use the same stream multiple times (even though objects are generated only the first time).
In practice Scala streams are actually lazy collections.
Fortunately this can be easily “fixed” by returning the stream iterator instead of the stream itself (here the inner function is still present just to maintain the same order of the Python implementation, but it’s not mandatory in other implementations):

1
2
3
4
5
6
7
8
9
10
11
12
def generate(count: Int): Iterator[String] = {
    def stream(current: Int): Stream[String] = {
        if (current < count) {
            s"string_$current" #:: stream(current + 1)
        } else {
            Stream.empty[String]
        }
    }
    stream(0).iterator
}

So later on:

1
2
3
4
5
val stream: Iterator[String] = generate(10)
while (stream.hasNext) {
    println(stream.next())
}

And in this way we will get one and only one object generated and in memory at time.

Notes

* In Scala yield is used in order to implement the same as Python list comprehension:

1
for (i <- 1 to 10; if i % 2 == 0) yield i

The code above will returns a collection containing only even numbers:

1
Vector(2, 4, 6, 8, 10)

** Honestly speaking I’m not 100% sure that it’s the only way to do it, but the more clean and simple way I figured out.

Final thoughts

The examples that I’ve used are of course dumb, but I think that they are anyway more practical and more useful than the abused Fibonacci sequence generation you can find everywhere.
It’s been hard but after 1 year, I’m finally appreciating Scala and starting to master all the features of the language, in this case streams, which in the end are more flexible than Python generators even though, as usual the design choices of Scala make it hard to understand and remember how things work… for instance: wouldn’t be more readable one of the following alternative syntax:

1
2
3
4
5
"one" then "two" then "three" then Stream.empty[String]
// or
"one" follow "two" follow "three" follow Stream.empty[String]

Instead of:

1
"one" #:: "two" #:: "three" #:: Stream.empty[String]

?
…Fuck yes!

shares