Introduction to Scala

Object Oriented and Functional Programming for the JVM

Created by @jmarinotero

Juan Marin Otero

  • Software engineering for the past 15 years
  • GIS & Open Source
  • Now: Solutions Architect for HMDA Operations
  • Previously:
    • CTO at Boundlessgeo
    • Lead Developer for National Broadband Map at FCC
    • Solutions Engineer at Esri

The rise of functional programming

The free lunch is over

Parallel code is good

Broadband Mobile Coverage

Not just your phone

Modern applications are different

  • Multi-threaded
  • Clustered
  • Multi-datacenter
  • Distributed

Note: I'm not talking about websites, but web applications

The need to scale: Cloud Native Applications

  • Aware of infrastructure, scales resources up and down appropriately
  • Doesn't assume much about users and environment. Expects failure
  • Cost aware, adapts to different runtime conditions
  • Resilient to disaster
  • Can be deployed to multiple infrastructure options
  • Incremental deployment, completely testable

For more, see this

The Reactive Manifesto

Approach to systems architecture to meet modern demands

  • Responsive: The system responds in a timely manner if at all possible
  • Resilient: The system stays responsible in the face of failure
  • Elastic: The system stays responsive under varying workload
  • Message Driven: Reactive systems rely on asynchronous message-passing that ensures loose coupling, isolation, location transparency and provides a means to delegate errors as messages

You can also think about it this way

  • Resources, not hardware
  • Cattle machines, not pet machines
  • Service Oriented Architecture (but this is not your traditional SOA!)

Functional languages favor Immutability

This means no shared state, easier to read and write concurrently

Code can be sent where the data is

Functional languages are also expressive and work well with data structures

Doing some work on an array in Javascript

var i;
for(i = 0; i < someArray.length; i++) {
  var someThing = someArray[i];
  doSomeWorkOn(someThing);
}

Much better with the underscore.js library:

_.each(someArray, doSomeWorkOn);

Some popular functional programming languages

  • Haskell: purely functional, built in concurrency and parallelism
  • Scala: strongly typed object oriented and functional, runs on the JVM
  • Clojure: functional and dynamic, runs on the JVM
  • F#: strongly typed, imperative and functional, runs on .NET

Java is everywhere

  • Widespread in enterprise computing (unless you prefer .NET)
  • Over 1 billion Android devices
  • The JVM has over two decades of solid engineering
  • Scales extremely well (large projects)
  • Unless you are a very good C/C++ developer, it's probably faster than what you are using
  • Very mature open source ecosystem

Java the language is by no means perfect

  • Very verbose
  • Lacks modern features
  • Imperative language, hard to do certain things
  • Great support for concurrency, but very hard to do right
  • Not really a functional language, introducing some now
  • Much better at JDK 8

Scala has become the main alternative to Java on the JVM

Scala language tour

Some nice features

  • Very Compact language
  • Type inference
  • Immutability
  • Pattern matching
  • Collections API
  • Functional composition
  • Extensible language: great for DSL

Defining a class


Java

					
class Point {
    private int x;
    private int y;
 
    public Point(int x, int y) {
      setX(x);
      setY(y);
    }
 
    public int getX() {
      return x;
    }
 
    public void setX(int x) {
      this.x = x;
    }
 
    public int getY() {
      return y;
    }
 
    public void setY(int y) {
      this.y = y;
    }
 
    @Override
    public boolean equals(Object other) {
      if (other instanceof Point) {
        Point otherPoint = (Point) other;
        return otherPoint.getX() == getX() &&
            otherPoint.getY() == getY();
      } else {
        return false;
      }
    }
 
    @Override
    public int hashCode() {
      return (new Integer[] {getX(), getY()}).hashCode();
    }
}
					
					

Defining a class


Python

					
class Point:
  def __init__(self, x, y):
    self.x = x
    self.y = y
 
  def __eq__(self, other):
    if isinstance(other, Point):
      return other.x == self.x and other.y == self.y
    else:
      return False
 
  def __hash__(self):
    return hash((self.x, self.y))
	
					
					

Defining a class


Scala


					
case class Point(var x: Int, var y: Int)
					
					

Type Inference

					
val x = 1 + 2 * 3 // The type of x is Int

val y = x.toString // The type of y is String

def add(x:Int) = x + 1 // Add returns Int values
					
					

Immutability

					
val x = 2  // Declare immutable variable

x = 3 // error: reassignment to val
					
					

Pattern matching

					
val geometry: Geometry = ...

geometry match {
  case p:Point => println(s"$p.x, $p.y")
  case line:Line => println(line.length)
  case poly:Polygon => println(poly.perimeter)
}
					
					

Collections API

					
val points:List[Point] = ...

val polygon: Polygon = ...

// If point is inside polygon, print it

points.foreach(p => if polygon.contains(p) println(p))

// Create a buffer around each point, return that collection of polygons

points.map(p => p.buffer(1.0))

// Buffer points that are inside the polygon

points.map { p =>
  val pt = polygon.contains(p)
  pt.buffer(1.0)
}

					
					

This conciseness is powerful, but can be addictive and dangerous!

Make sure you understand things like the following before writing them

From 10 Scala One Liners to Impress Your Friends:

					
// Filter list of numbers into two categories based on a criteria

val (passed, failed) = List(49, 58, 76, 82, 88, 90) partition ( _ > 60 )

// Sum list of numbers

(1 to 1000).reduceLeft( _ + _ )

// Sieve of Eratosthenes, algorithm to calculate if a number is prime

(n: Int) => (2 to n) |> (r => r.foldLeft(r.toSet)((ps, x) => if (ps(x)) ps -- (x * x to n by x) else ps))
					
					

Yeah, I know. This is crazy

It's also beautiful if you understand it

Functional Composition

Automatically mark all emails from my mom as read in my inbox

					
emails.filter(_.isNotSpam).filter(_.sender == "Mom").map(_.copy(isRead = true))					    
					
					

Or using a for comprehension

          
for {
  notSpam <- emails if emails.isNotSpam == true
  momEmails <- notSpam if emails.sender == "Mom"
  read <- notSpam.copy(isRead = true)
} yield read
					
					

Domain Specific Languages

A computer programming language of limited expressiveness focused on a particular domain (Martin Fowler, 2010)

Scala is great for defining your own language. Example from ScalaTest:

					
import collection.mutable.Stack
import org.scalatest._

class ExampleSpec extends FlatSpec with Matchers {

  "A Stack" should "pop values in last-in-first-out order" in {
    val stack = new Stack[Int]
    stack.push(1)
    stack.push(2)
    stack.pop() should be (2)
    stack.pop() should be (1)
  }

  it should "throw NoSuchElementException if an empty stack is popped" in {
    val emptyStack = new Stack[Int]
    a [NoSuchElementException] should be thrownBy {
      emptyStack.pop()
    } 
  }
}					
					

Good presentation on the topic

Concurrency and parallelism

This is where Scala really shines

  • Futures
  • Actor Model
  • Reactive Streams
  • High performance web applications

Futures and Promises

Future: a read only object holding a value that might become available at a later time

Promise: a writable, single assignment container which "completes" a future

Scala Futures

Example 1: Blocking future until it returns

					
import scala.concurrent._
import scala.concurrent.duration._

def main(args: Array[String]) {
  val rateQuote = future {
    connection.getCurrentValue(USD)
  }
  
  val purchase = rateQuote map { quote =>
    if (isProfitable(quote)) connection.buy(amount, quote)
    else throw new Exception("not profitable")
  }
  
  Await.result(purchase, 0 nanos) // THIS BLOCKS! (use only for testing)
}
					
					

Scala Futures

Example 2: Callback

				  
import scala.util.{Success, Failure}

val f: List[Future[Double]] = future {
  connection.getCurrentValues(USD, EUR)
}

f onComplete {
  case Success(quotes) => for (quote <- quotes) println(quote)
  case Failure(t) => println("An error has occured: " + t.getMessage)
}
					
					
Avoid callbacks in nested future calls (callback hell)

Scala Futures

Example 3: For comprehensions and composition

          
val usdQuote = future { connection.getCurrentValue(USD) }
val chfQuote = future { connection.getCurrentValue(CHF) }
val purchase = for {
  usd <- usdQuote
  eur <- chfQuote
  if isProfitable(usd, eur)
} yield connection.buy(amount, eur)
purchase onSuccess {
  case _ => println("Purchased " + amount + " EUR")
}
					
					

Scala Futures using Async

An easier way to write asynchronous and non blocking code
          
def slowCalcFuture: Future[Int] = ...             // 01
def combined: Future[Int] = async {               // 02
  await(slowCalcFuture) + await(slowCalcFuture)   // 03
}
val x: Int = Await.result(combined, 10.seconds)   // 05
					
					

In this example, lines 1-4 are non-blocking, but not parallel. To parallelize both computations:

					
def combined: Future[Int] = async {
  val future1 = slowCalcFuture
  val future2 = slowCalcFuture
  await(future1) + await(future2)
}
					
					

Scala Parallel Collections

Example: Reduce elements of an array in parallel (using multiple CPUs)

					
val array = (1 to 100000).toArray
array.par.reduce(_ + _)
					
					

Note: the advantages of this are only visible in large collections

Actor Model

Main implementation in the Akka library

Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM.

Actor Model

An actor is a primitive that can make local decisions, create more actors, send more messages, and determine how to respond to the next message received

Very small memory footprint: ~2.5 million / GB of memory

Hello World in Akka Actors

					
import akka.actor.{ActorSystem, Actor, ActorLogging, Props }

case class Greeting(who: String)
 
class GreetingActor extends Actor with ActorLogging {
  def receive = {
    case Greeting(who) ⇒ log.info("Hello " + who)
  }
}
 
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], name = "greeter")
greeter ! Greeting("Mr. Smith")
					
					

Two types of messages

  • Tell: fire and forget, doesn't return anything
  • 
    def receive = {
      case Greeting(who) ⇒ log.info("Hello " + who)
    }						
    						
    greeter ! Greeting("Mr. Smith")
  • Ask: returns a future
  • 
    def receive: Receive = {
      case Greeting(who) => sender() ! "Hello, " + who
    }
    						
    val f:Future[String] = greeter ? Greeting("Mr Smith")

Both are asynchronous and non blocking(*)

Actors killer feature: Supervision

Actors killer feature: Location Transparency

Remote and local process actors treated the same

Unified programming model for multicore and distributed computing

Example: Distributed Workers

This is my test cluster :-)

Reactive Streams

Reactive Streams is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure on the JVM.


  • Govern exchange of stream data across asynchronous boundary (thread, different machine)
  • Deal with backpressure in a fully non-blocking and asynchronous manner
  • Fast data producer doesn't overwhelm slow consumer
  • Dynamic push / pull model
  • Collaborative effort by engineers from Typesafe, Oracle, Pivotal, Netflix, Red Hat and others

How this works

Akka Streams Example


import akka.actor.ActorSystem
import akka.stream.ActorFlowMaterializer
import akka.stream.scaladsl.Source

object BasicTransformation {
  def main(args: Array[String]): Unit = {
    implicit val system = ActorSystem("Sys")
    import system.dispatcher
    implicit val materializer = ActorFlowMaterializer()

    val text =
      """|Lorem Ipsum is simply dummy text of the printing and typesetting industry.
         |Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, 
         |specimen book.""".stripMargin

    Source(() => text.split("\\s").iterator).
      map(_.toUpperCase).
      runForeach(println).
      onComplete(_ => system.shutdown())
 }
}
					

High Performance Web Applications with the Play Framework

  • Full MVC web framework
  • Type safety
  • First class REST and JSON support
  • Reactive, uses Akka under the hood
  • Asset compiler (Javascript, LESS, CSS, etc)

Routes


GET   /     controllers.Application.index()
GET   /foo  controllers.Application.foo()   
					

Controllers


def index() {
  Ok(views.html.index("Scala Play Demo"))
}
					

Views


@(title: String)(content: Html)

<html>
  <head>
    @title
    
    
  </head>
  <body>
    @content
  </body>
</html>
					

Websockets

Server


def echoWs = WebSocket.using[String] { request =>
    val (enumerator, channel) = Concurrent.broadcast[String]
    val in = Iteratee.foreach[String](channel.push)
    (in, enumerator)
}
          

Client in Coffescript


$ ->
    ws = new WebSocket("ws://localhost:9000/echo")
    ws.onopen = () ->
        ws.send("foo")
    ws.onmessage = (message) ->
        console.log(message.data)					
					

Play Framework Demo

Reactive scales

Big Data with Scala: Spark

Also, it's 10 - 100x faster than Hadoop

Spark: not just Map Reduce

Oh, the heresy. Compiling Scala to Javascript: Scala.js

Demo

Scala ecosystem

Who uses Scala

  • Internet companies: Twitter, Foursquare, Tumblr, Klout, LinkedIn
  • Online media: The Guardian, The New York Times, The Huffington Post, NBC
  • Large Internet retail: Walmart Canada, Gilt
  • Large business: Verizon, Sony Pictures, Siemens
  • Finance: Bank of America, UBS, Credit Suisse
  • Places where scale matters: Netflix, Coursera
  • Big Data: Spark, Kafka, GearPump
  • Many more

Thank You