Introducing Fetch: A new Scala library for simple and efficient data access

by Alejandro Gomez
•
May 24, 2016
•
news• scala• fetch• releases• scala libraries
|
9 minutes to read.

We’re excited to bring you a new Scala library for simplifying and optimizing access to data such as files systems, databases, and web services. Fetch, based off of Cats Free monad, is a library that simplifies and increases the efficiency of code that reads data from remote sources.

These types of data sources usually have a latency cost, and that means we often have to trade code clarity for performance when querying them. We can easily end up with code that complicates the business logic performed on the data we’re fetching with explicit synchronization or optimizations such as caching and batching.

Fetch can automatically request data from multiple sources concurrently, batch multiple requests to the same data source, and cache previous requests’ results without the user having to use any explicit concurrency constructs.

It does so by separating data fetch declaration from execution, building a tree with the data dependencies where you can express concurrency with the applicative bind, and sequential dependency with monadic bind. It borrows heavily from the Haxl (Haskell, open sourced) and Stitch (Scala, not open sourced) projects.

Ready to see Fetch in action?

This example covers fetching data for rendering a blog. We’ll use the following types to represent users, articles, and article metadata.

type UserId = Int
case class User(id: UserId, username: String)

type ArticleId = Int
case class Article(id: ArticleId, author: UserId, content: String)
case class ArticleInfo(topic: String)

First off, we need a way to tell Fetch how to fetch that data. We can accomplish this by implementing the DataSource trait.

trait DataSource[Identity, Result]{
  def fetch(ids: NonEmptyList[Identity]): Eval[Map[Identity, Result]]
}

A DataSource takes two type parameters:

Identity: the type of the identity we want to fetch (UserId and ArticleId in our example)
Result: the type of the data we retrieve (User, Article, and ArticleInfo in our example)

The fetch method takes a non-empty list of identities and must return an Eval that will result in a map from identities to results. Accepting a list of identities gives Fetch the ability to batch requests to the same data source, and return mapping from identities to results. Fetch can detect whenever an identity cannot be fetched or no longer exists.

For the process of fetching users, we know that the response type must be User and the request type UserId. Let’s implement a data source for fetching users; we’ll simulate a database with an in-memory map.

import cats.Eval
import cats.data.NonEmptyList
import cats.std.list._

import fetch._

val userDatabase = Map(
  1 -> User(1, "@one"),
  2 -> User(2, "@two"),
  3 -> User(3, "@three")
)

implicit object UserSource extends DataSource[UserId, User]{
  override def fetch(ids: NonEmptyList[UserId]): Eval[Map[UserId, User]] = {
    Eval.later({
      println(s"Fetching users $ids")
      userDatabase.filterKeys(ids.unwrap.contains)
    })
  }
}

Now that we have a data source we can write a function for fetching users given an id, we just have to pass a UserId as an argument to Fetch.

def getUser(id: UserId): Fetch[User] = Fetch(id) // or, more explicitly: Fetch(id)(UserSource)

Creating and running a fetch

Now that we’ve told Fetch how to request articles, we can start creating fetches. Note that creating a fetch doesn’t actually request data, we have to run the fetch to obtain the results.

val fetchUser: Fetch[User] = getUser(1)

A Fetch is just a value, and in order to get something out of it, we must execute it. We can execute a Fetch value as many times as we want, even to different target monads, since it is just an immutable value.

We need to provide a target monad when executing a fetch. We’ll be using Eval for now. Make sure to import fetch.implicits._ since Fetch needs an instance of MonadError[Eval, Throwable] for running a fetch in the Eval monad.

import fetch.implicits._

val result: User = Fetch.run[Eval](fetchUser).value
// Fetching users OneAnd(1,List())
//=> result: User = User(1,@one)

Sequencing

When we have two fetches that depend on each other, we can use flatMap to combine them. When composing fetches with flatMap we’re telling Fetch that the second one depends on the previous one, so it isn’t able to make any optimizations.

val fetchTwoUsers: Fetch[(User, User)] = for {
  aUser <- getUser(1)
  anotherUser <- getUser(aUser.id + 1)
} yield (aUser, anotherUser)

val result: (User, User) = Fetch.run[Eval](fetchTwoUsers).value
// Fetching users OneAnd(1,List())
// Fetching users OneAnd(2,List())

//=> result: (User, User) = (User(1,@one),User(2,@two))

As you can see in the console output, when executing the fetch, the queries are performed in two rounds. First, we fetch the user with id 1 and then fetch the user with id 2.

Batching

Whenever we use Cartesian or Applicative operations over multiple fetches, Fetch knows that those fetches are independent and is able to optimize the resulting fetch. If we combine two independent requests to the same data source, Fetch will automatically batch them together into a single request.

import cats.syntax.cartesian._

val fetchTwoUsers: Fetch[(User, User)] = getUser(1).product(getUser(2))

val result: (User, User) = Fetch.run[Eval](fetchTwoUsers).value
// Fetching users OneAnd(1,List(2))

//=> result: (User, User) = (User(1,@one),User(2,@two))

As you can see in the console output, the request for the users with id 1 and 2 was batched. Instead of asking for two independent results to the data source we save some work by requesting it all at once.

Deduplication

If two independent requests ask for the same identity, Fetch will detect that and deduplicate such id, so you don’t have to worry about it.

val fetchTwoUsers: Fetch[(User, User)] = getUser(1).product(getUser(1))

val result: (User, User) = Fetch.run[Eval](fetchTwoUsers).value
// Fetching users OneAnd(1,List())

//=> result: (User, User) = (User(1,@one),User(1,@one))

Caching

During the execution of a fetch, previously requested results are implicitly cached. This allows us to write fetches in a very modular way, asking for all the data they need as if it was in memory; furthermore, it also avoids refetching an identity that may have changed during the course of a fetch execution, which can lead to inconsistencies in the data.

val fetchCached: Fetch[(User, User)] = for {
  aUser <- getUser(1)
  anotherUser <- getUser(1)
} yield (aUser, anotherUser)

val result: (User, User) = Fetch.run[Eval](fetchCached).value
// Fetching users OneAnd(1,List())

//=> result: (User, User) = (User(1,@one),User(1,@one))

As you can see in the console output, the user with id 1 was fetched only once in a single round-trip. The next time it was needed we used the cached versions, thus avoiding another request to the user data source.

Combining data from multiple sources

Now that we know some of the optimizations Fetch can create to read data efficiently, let’s look at how we can combine more than one data source. First, we’ll add the two data sources we’re missing:

one for retrieving an Article given its id
another for retrieving an article’s metadata (ArticleInfo) given its id

Let’s give it a try:

val articleDatabase: Map[ArticleId, Article] = Map(
  1 -> Article(1, 2, "An article"),
  2 -> Article(2, 3, "Another article"),
  3 -> Article(3, 4, "Yet another article")
)

implicit object ArticleSource extends DataSource[ArticleId, Article]{
  override def fetch(ids: NonEmptyList[ArticleId]): Eval[Map[ArticleId, Article]] = {
    Eval.later({
      println(s"Fetching articles $ids")
      articleDatabase.filterKeys(ids.unwrap.contains)
    })
  }
}

def getArticle(id: ArticleId): Fetch[Article] = Fetch(id)

val articleInfoDatabase: Map[ArticleId, ArticleInfo] = Map(
  1 -> ArticleInfo("monad"),
  2 -> ArticleInfo("applicative"),
  3 -> ArticleInfo("monad")
)

implicit object ArticleInfoSource extends DataSource[ArticleId, ArticleInfo]{
  override def fetch(ids: NonEmptyList[ArticleId]): Eval[Map[ArticleId, ArticleInfo]] = {
    Eval.later({
      println(s"Fetching article info $ids")
      articleInfoDatabase.filterKeys(ids.unwrap.contains)
    })
  }
}

def getArticleInfo(id: ArticleId): Fetch[ArticleInfo] = Fetch(id)

We can also implement a function for fetching an article’s author given an article:

def getAuthor(p: Article): Fetch[User] = Fetch(p.author)

Now that we have multiple sources, let’s mix them in the same fetch:

val fetchArticleAndAuthor: Fetch[(Article, User)] = for {
  article <- getArticle(1)
  user <- getAuthor(article)
} yield (article, user)

val result: (Article, User) = Fetch.run[Eval](fetchArticleAndAuthor).value
// Fetching articles OneAnd(1,List())
// Fetching users OneAnd(2,List())

//=> result: (Article, User) = (Article(1,2,An article),User(2,@two))

In the previous example, we fetched a post given its id and then fetched its author. This information could come from entirely different places, but Fetch makes working with heterogeneous sources of data very easy.

Concurrency

Earlier we saw an example of combining two fetches from the same data source and learned how the library can batch multiple requests into one. But, what happens when you’re combining fetches from different data sources? The only possible optimization is to query both data sources at the same time, and that’s what Fetch does.

In the following example, we are fetching from different data sources so both requests will be evaluated together.

val fetchArticleAndUser: Fetch[(Article, User)] = getArticle(1).product(getUser(2))

val result: (Article, User) = Fetch.run[Eval](fetchArticleAndUser).value
// Fetching articles OneAnd(1,List())
// Fetching users OneAnd(2,List())

//=> result: (Article, User) = (Article(1,2,An article),User(2,@two))

Since we are interpreting the fetch to the Eval monad that doesn’t give us any parallelism, the fetches will be run sequentially. However, if we interpret it to a Future, each independent request will run in its own logical thread, minimizing the time we have to wait for a fetch round to finish.

Conclusion

Fetch can simplify the code that reads data from remote sources and increase its efficiency at the same time, freeing the programmer from implementing ad-hoc optimizations. It does so by separating data dependency declaration from execution. Fetch is implemented using the Free Monad and Interpreter pattern and based on the Cats library, although Scalaz support is also planned.

Take a look at the following resources if you are curious about Fetch:

Code on GitHub.
Documentation site
Fetch: Simple & Efficient data access talk at Typelevel Summit in Oslo

If you have questions on Fetch or any other Scala library, connect with us, and we’d be happy to help out.