Asked  7 Months ago    Answers:  5   Viewed   35 times

In Scala 2.8, there is an object in scala.collection.package.scala:

def breakOut[From, T, To](implicit b : CanBuildFrom[Nothing, T, To]) =
    new CanBuildFrom[From, T, To] {
        def apply(from: From) = b.apply() ; def apply() = b.apply()
 }

I have been told that this results in:

> import scala.collection.breakOut
> val map : Map[Int,String] = List("London", "Paris").map(x => (x.length, x))(breakOut)

map: Map[Int,String] = Map(6 -> London, 5 -> Paris)

What is going on here? Why is breakOut being called as an argument to my List?

 Answers

24

The answer is found on the definition of map:

def map[B, That](f : (A) => B)(implicit bf : CanBuildFrom[Repr, B, That]) : That 

Note that it has two parameters. The first is your function and the second is an implicit. If you do not provide that implicit, Scala will choose the most specific one available.

About breakOut

So, what's the purpose of breakOut? Consider the example given for the question, You take a list of strings, transform each string into a tuple (Int, String), and then produce a Map out of it. The most obvious way to do that would produce an intermediary List[(Int, String)] collection, and then convert it.

Given that map uses a Builder to produce the resulting collection, wouldn't it be possible to skip the intermediary List and collect the results directly into a Map? Evidently, yes, it is. To do so, however, we need to pass a proper CanBuildFrom to map, and that is exactly what breakOut does.

Let's look, then, at the definition of breakOut:

def breakOut[From, T, To](implicit b : CanBuildFrom[Nothing, T, To]) =
  new CanBuildFrom[From, T, To] {
    def apply(from: From) = b.apply() ; def apply() = b.apply()
  }

Note that breakOut is parameterized, and that it returns an instance of CanBuildFrom. As it happens, the types From, T and To have already been inferred, because we know that map is expecting CanBuildFrom[List[String], (Int, String), Map[Int, String]]. Therefore:

From = List[String]
T = (Int, String)
To = Map[Int, String]

To conclude let's examine the implicit received by breakOut itself. It is of type CanBuildFrom[Nothing,T,To]. We already know all these types, so we can determine that we need an implicit of type CanBuildFrom[Nothing,(Int,String),Map[Int,String]]. But is there such a definition?

Let's look at CanBuildFrom's definition:

trait CanBuildFrom[-From, -Elem, +To] 
extends AnyRef

So CanBuildFrom is contra-variant on its first type parameter. Because Nothing is a bottom class (ie, it is a subclass of everything), that means any class can be used in place of Nothing.

Since such a builder exists, Scala can use it to produce the desired output.

About Builders

A lot of methods from Scala's collections library consists of taking the original collection, processing it somehow (in the case of map, transforming each element), and storing the results in a new collection.

To maximize code reuse, this storing of results is done through a builder (scala.collection.mutable.Builder), which basically supports two operations: appending elements, and returning the resulting collection. The type of this resulting collection will depend on the type of the builder. Thus, a List builder will return a List, a Map builder will return a Map, and so on. The implementation of the map method need not concern itself with the type of the result: the builder takes care of it.

On the other hand, that means that map needs to receive this builder somehow. The problem faced when designing Scala 2.8 Collections was how to choose the best builder possible. For example, if I were to write Map('a' -> 1).map(_.swap), I'd like to get a Map(1 -> 'a') back. On the other hand, a Map('a' -> 1).map(_._1) can't return a Map (it returns an Iterable).

The magic of producing the best possible Builder from the known types of the expression is performed through this CanBuildFrom implicit.

About CanBuildFrom

To better explain what's going on, I'll give an example where the collection being mapped is a Map instead of a List. I'll go back to List later. For now, consider these two expressions:

Map(1 -> "one", 2 -> "two") map Function.tupled(_ -> _.length)
Map(1 -> "one", 2 -> "two") map (_._2)

The first returns a Map and the second returns an Iterable. The magic of returning a fitting collection is the work of CanBuildFrom. Let's consider the definition of map again to understand it.

The method map is inherited from TraversableLike. It is parameterized on B and That, and makes use of the type parameters A and Repr, which parameterize the class. Let's see both definitions together:

The class TraversableLike is defined as:

trait TraversableLike[+A, +Repr] 
extends HasNewBuilder[A, Repr] with AnyRef

def map[B, That](f : (A) => B)(implicit bf : CanBuildFrom[Repr, B, That]) : That 

To understand where A and Repr come from, let's consider the definition of Map itself:

trait Map[A, +B] 
extends Iterable[(A, B)] with Map[A, B] with MapLike[A, B, Map[A, B]]

Because TraversableLike is inherited by all traits which extend Map, A and Repr could be inherited from any of them. The last one gets the preference, though. So, following the definition of the immutable Map and all the traits that connect it to TraversableLike, we have:

trait Map[A, +B] 
extends Iterable[(A, B)] with Map[A, B] with MapLike[A, B, Map[A, B]]

trait MapLike[A, +B, +This <: MapLike[A, B, This] with Map[A, B]] 
extends MapLike[A, B, This]

trait MapLike[A, +B, +This <: MapLike[A, B, This] with Map[A, B]] 
extends PartialFunction[A, B] with IterableLike[(A, B), This] with Subtractable[A, This]

trait IterableLike[+A, +Repr] 
extends Equals with TraversableLike[A, Repr]

trait TraversableLike[+A, +Repr] 
extends HasNewBuilder[A, Repr] with AnyRef

If you pass the type parameters of Map[Int, String] all the way down the chain, we find that the types passed to TraversableLike, and, thus, used by map, are:

A = (Int,String)
Repr = Map[Int, String]

Going back to the example, the first map is receiving a function of type ((Int, String)) => (Int, Int) and the second map is receiving a function of type ((Int, String)) => String. I use the double parenthesis to emphasize it is a tuple being received, as that's the type of A as we saw.

With that information, let's consider the other types.

map Function.tupled(_ -> _.length):
B = (Int, Int)

map (_._2):
B = String

We can see that the type returned by the first map is Map[Int,Int], and the second is Iterable[String]. Looking at map's definition, it is easy to see that these are the values of That. But where do they come from?

If we look inside the companion objects of the classes involved, we see some implicit declarations providing them. On object Map:

implicit def  canBuildFrom [A, B] : CanBuildFrom[Map, (A, B), Map[A, B]]  

And on object Iterable, whose class is extended by Map:

implicit def  canBuildFrom [A] : CanBuildFrom[Iterable, A, Iterable[A]]  

These definitions provide factories for parameterized CanBuildFrom.

Scala will choose the most specific implicit available. In the first case, it was the first CanBuildFrom. In the second case, as the first did not match, it chose the second CanBuildFrom.

Back to the Question

Let's see the code for the question, List's and map's definition (again) to see how the types are inferred:

val map : Map[Int,String] = List("London", "Paris").map(x => (x.length, x))(breakOut)

sealed abstract class List[+A] 
extends LinearSeq[A] with Product with GenericTraversableTemplate[A, List] with LinearSeqLike[A, List[A]]

trait LinearSeqLike[+A, +Repr <: LinearSeqLike[A, Repr]] 
extends SeqLike[A, Repr]

trait SeqLike[+A, +Repr] 
extends IterableLike[A, Repr]

trait IterableLike[+A, +Repr] 
extends Equals with TraversableLike[A, Repr]

trait TraversableLike[+A, +Repr] 
extends HasNewBuilder[A, Repr] with AnyRef

def map[B, That](f : (A) => B)(implicit bf : CanBuildFrom[Repr, B, That]) : That 

The type of List("London", "Paris") is List[String], so the types A and Repr defined on TraversableLike are:

A = String
Repr = List[String]

The type for (x => (x.length, x)) is (String) => (Int, String), so the type of B is:

B = (Int, String)

The last unknown type, That is the type of the result of map, and we already have that as well:

val map : Map[Int,String] =

So,

That = Map[Int, String]

That means breakOut must, necessarily, return a type or subtype of CanBuildFrom[List[String], (Int, String), Map[Int, String]].

Tuesday, June 1, 2021
 
hnkk
answered 7 Months ago
76

The performance problem has nothing to do with the way the data is read. It is already buffered. Nothing happens until you actually iterate through the lines:

// measures time taken by enclosed code
def timed[A](block: => A) = {
  val t0 = System.currentTimeMillis
  val result = block
  println("took " + (System.currentTimeMillis - t0) + "ms")
  result
}

val source = timed(scala.io.Source.fromFile("test.txt")) // 200mb, 500 lines
// took 0ms

val lines = timed(source.getLines)
// took 0ms

timed(lines.next) // read first line
// took 1ms

// ... reset source ...

var x = 0
timed(lines.foreach(ln => x += ln.length)) // "use" every line
// took 421ms

// ... reset source ...

timed(lines.toArray)
// took 915ms

Considering a read-speed of 500mb per second for my hard drive, the optimum time would be at 400ms for the 200mb, which means that there is no room for improvements other than not converting the iterator to an array.

Depending on your application you could consider using the iterator directly instead of an Array. Because working with such a huge array in memory will definitely be a performance issue anyway.


Edit: From your comments I assume, that you want to further transform the array (Maybe split the lines into columns as you said you are reading a numeric array). In that case I recommend to do the transformation while reading. For example:

source.getLines.map(_.split(",").map(_.trim.toInt)).toArray

is considerably faster than

source.getLines.toArray.map(_.split(",").map(_.trim.toInt))

(For me it is 1.9s instead of 2.5s) because you don't transform an entire giant array into another but just each line individually, ending up in one single array (Uses only half the heap space). Also since reading the file is a bottleneck, transforming while reading has the benefit that it results in better CPU utilization.

Thursday, August 12, 2021
 
medhybrid
answered 4 Months ago
78

As of Scala 2.13, this is no longer a concern. Breakout has been removed and views are the recommended replacement.


Scala 2.13 Collections Rework

Views are also the recommended replacement for collection.breakOut. For example,

val s: Seq[Int] = ... 
val set: Set[String] = s.map(_.toString)(collection.breakOut)

can be expressed with the same performance characteristics as:

val s: Seq[Int] = ... 
val set = s.view.map(_.toString).to(Set)
Thursday, October 7, 2021
 
Optimus
answered 2 Months ago
98

Scala 2.9:

scala> List(1, "1") collect {
     |   case x: Int => x
     | }
res0: List[Int] = List(1)
Thursday, October 28, 2021
 
Michael
answered 1 Month ago
17

You can achieve do it by using the companion object of ListMap class as followings:

class Bar
class Foo(val name: String, val bar: Bar)
val myList: java.util.List[Foo] = ...
val result = ListMap((for(foo <- myList) yield (foo.name, foo.bar)):_*)
Saturday, November 20, 2021
 
Optimus
answered 1 Week ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share