Wednesday, May 23, 2012

Scala Profiling talk at Riviera Scala Clojure Group

Yesterday night I gave a talk about scala profiling at Riviera Scala Closure Group in Sophia Antipolis (
Great Experience !

Here my 25 readers can find the slides:
Scala profiling presentation

Stay tuned for more details

Saturday, May 12, 2012

Scala profiling - part 1 - data structures

This is the first post of a non determined length list about performance profiling of Scala applications. Now most people will say that Scala application are ran in a JVM and compiled to Java byte code, thus the same techniques for Java profiling should be applied. This is mostly true, in fact the same profilers, and the same heap dump analyzers can be safely used for Scala applications, still there are some topics which are specific to Scala applications like:
  • Data structures to be searched for in a heap dump (we are not using the comfortable java.utul.List and so on...)
  • Objects found in the heap dumps for clojures and so on...
  • How do Scala control structure appear in a  profiler
  • How do I configure garbage collection to optimize Scala programs
This post deals with the first topic: given a heap dump, here you will find how to search for Scala Lists, Tuples, Sets, Maps and how to inspect the content.

Taking a heap dump and opening it

This is the first step for  inspecting the heap of a Scala application, and it is not at all different than for Java applications (here the description concerns Oracle JVMs, for other VMs equivalent procedures exist).
You can take it either through a memory analyzer like Eclipse MAT ( (my favourite)) or Visual VM which is included in every Oracle JDK since Java6. Alternatively since Java6 there is the command line tool jmap that allows you to take a dump.
In all these cases you have to know the pid of the Scala application. With jmap you have first to execute jps, which lists java processes with pid on a given machine.

Let's see an example:
after starting the Scala program:
new-host:~ pyppo$ jps
12764 Jps
12763 ExplodingPermGen
our pid is 12763, so we can get a heap dump through
jmap -dump:live,format=b,file=heap.bin 12763

If you prefer a visual approach, both MAT and visualvm provide you the list of running java processes when you try to acquire the dump (File->Acquire Heap Dump on MAT).

Now we have a file (normally with hprof extension) that we can open with MAT, visualvm or most of Java profilers.

Scala lists and arrays in a heap dump

Now that we have opened our heap dump we may be interested in searching lists, arrays and list wrappers there inside.
Here is a screenshot of our opened heap dump which shows all local variables of our main: <Java local> marked elements referenced by the main Thread object.

We can first search for arrays. This is easy. Arrays in Scala are pure Java arrays, thus no difference in the way we search for them than java. Here we can see an empty String array (java.lang.String[0]).
Something more interesting are immutable lists, build like this:
 var listOut = List(1,2,3)
This kind of lists are immutable linked lists and the corresponding java class (the one we have to search for in the dump) is: scala.collection.immutable.$colon$colon. This represent the head of the list. It contains two attributes: tl of the same type which is the pointer to the next element and scala$collection$immutable$$colon$colon$$hd which is the reference to the contained element.

Since the list is immutable, the :: and ::: Scala operators to create a new list adding an element to an existing list and to concatenate lists are very fast. No copy of elements between lists is done. Adding an element in front simply create a new list with the new element as head and the reference to the previous list as tail. This also means that if the objects in the list are mutable,  modifying the object will touch both the initial list and the newly built one.

A mutable flavour of Lists in Scala is ListBuffer, These instances can be found as scala.collection.mutable.ListBuffer, This wraps an immutable list defined through the previous data structure, but it keeps also a reference to the tail of the list through last0 attribute. This allows to attach new elements to the end of the list, making it mutable.

Maps and sets

Both Sets and Maps in Scala have two different kind of internal structure depending on the number of elements they contain (for immutable version). Under 5 elements the class implementing the Map/Set contains an attribute per element. While over this limit the underlying data structure is an array.
Let's first consider immutable Sets and Maps unders five elements. They are straightforward in structure and they are respectively managed through inner classes of scala.collection.immutable.Map and scala.collection.immutable.Set. For maps we have $Map1 to $Map5. Same for Sets. Both these data structures contain a list of attributes called valueN for maps and elemN for Sets.

For bigger Sets we have instances of scala.collection.immutable.HashSet$HashTrieSet. This contain an array of scala.collection.immutable.HashSet$HashSet1 each of which contains a direct reference to the element of the set and an integer representing the hash.

For mutable Sets and Maps there is no difference in the data structure between big and small collecitons. They are represented respectively through scala.collection.mutable.HashSet and scala.collection.mutable.HashMap. In both the cases the underlying data structure is an array. The difference is the content of the array: for the HashSet, the elements in the array are directly the elements provided by the user, while for the HashMap, these are wrapped scala.collection.mutable.DefaultEntry, each of which contains the key, the value and a reference to the next element in the same position in case of hash collision.


Tuples have a straightforward structure, which reminds the one of the sets of small number of elements. They are implemented through scala.Tuple[CARDINALITY] where CARDINALITY is the size of the tuple. Inside they contain a private final attribute per element. These attrubte names are _NUM where NUM is the sequential number of the element inside the tuple.

.... stay tuned for the next data structures ....