Metadata types with Scala 3

Strongly-typed programming languages allow us to avoid programming errors by lifting important information about data into the type system. By doing this, we can use the compiler to check our programs, and the stronger your types, the more your compiler can aid you in checking the correctness of your program.

Using types as metadata was possible with Scala 2, but it wasn't as capable of what we're able to do with Scala 3's improvements to tuples and the introduction of match types.

Metadata types in Scala 2

In Scala 2, you'd frequently use intersections for metadata types. For example, if you had a set, and wanted to indicate what types of data were in it, you could use intersections:

import scala.reflect.ClassTag

sealed trait A
sealed trait B
sealed trait C

class Set[T](contents: Map[Class[_], Any]) {
  def get[U](implicit ev1: T <:< U, ev2: ClassTag[U]) = contents(ev2.runtimeClass)
  def put[U](u: U)(implicit ev1: ClassTag[U]) = new Set[T with U](contents.updated(ev1.runtimeClass, u))
}

object Set {
  val empty = new Set[Any](Map.empty)
}


val test = Set.empty.put(new A{}).put(new B{})

test.get[C]// - Causes a compile-time failure
test.get[A]

As can be seen, the metadata type here is an intersection type with the types added to the Set being added to the set's signature, and retrieval being dependent on proving that the type being requested has been added to the set.

There are a number of weaknesses to this approach to metadata types:

  • Preventing the fetching of non-existent types is frustrating
  • You cannot remove metadata easily from types built up this way

Preventing the fetching of non-existent types

In the above example, the type of test is Set[A with B]. Under the encoding shown above, you can write test.get[A with B], and despite this type not existing in the set, a compile time error will not be yielded.

In order to avoid this issue, we must create a wrapper type for our metadata:

import scala.reflect.ClassTag

sealed trait Has[T]
sealed trait A
sealed trait B
sealed trait C

class Set[T](contents: Map[Class[_], Any]) {
  def get[U](implicit ev1: T <:< Has[U], ev2: ClassTag[U]) = contents(ev2.runtimeClass)
  def put[U](u: U)(implicit ev1: ClassTag[U]) = new Set[T with Has[U]](contents.updated(ev1.runtimeClass, u))
}

object Set {
  val empty = new Set[Any](Map.empty)
}


val test = Set.empty.put(new A{}).put(new B{})

test.get[C]// - Causes a compile-time failure
test.get[A]

This type Has doesn't add much, makes the metadata for Set more cluttered to view and write, but is absolutely necessary to guard against the summoning of non-existent types.

Removal of data from the Set

Compared to our put and get methods, our method for removal of data ends up ugly and very unfriendly to our users.

import scala.reflect.ClassTag

sealed trait Has[T]
sealed trait Not[T] 
implicit def notAll[T]: Not[T] = new Not[T] {}
implicit def uhoh[T](implicit ev: T): Not[T] = new Not[T] {}

sealed trait A
sealed trait B
sealed trait C

class Set[T](contents: Map[Class[_], Any]) {
  def get[U](implicit ev1: T <:< Has[U], ev2: ClassTag[U]) = contents(ev2.runtimeClass)
  def put[U](u: U)(implicit ev1: ClassTag[U]) = new Set[T with Has[U]](contents.updated(ev1.runtimeClass, u))
  def remove[U, NT](implicit ev1: T <:< Has[U], ev2: T <:< NT, ev3: ClassTag[U], ev4: Not[NT <:< Has[U]]): Set[NT] = new Set(contents.removed(ev3.runtimeClass))
}

object Set {
  val empty = new Set[Any](Map.empty)
}


val test = Set.empty.put(new A{}).put(new B{})

test.get[C]// - Causes a compile-time failure
test.remove[B, Has[A]].get[A]
test.remove[B, Has[B]] //- Causes a cryptic compile-time failure
test.remove[B, Any] //this miscalculation is not caught

As you can see, our remove method requires the user to calculate the result of removal of the type from the metadata type. As our metadata type grows in complexity, this removal calculation becomes more burdensome on the user, and more likely to be erroneous. While we've added guards to make sure the calculated type doesn't include the removed type, and that the calculated type is a subtype of the original metadata type, we cannot guard against a calculation that says more is deleted than actually was, as can be seen in the last example of the above code.

These weaknesses are observable in ZIO and ZIO2, whose environment type allows removal by provideLayer, but which requires the user to calculate the result type after the removal.

Metadata types in Scala 3

Scala 3 provides a lot of new features with regards to types, type calculation, and metaprogramming. It also enhances the Tuple type quite a bit, which means that our choice of how to metadata types shifts; metadata types should be tuples.

import scala.compiletime.{error, constValue}
import scala.reflect.ClassTag

type Contains[T <: Tuple, U] <: Boolean = T match
  case U *: r => true
  case ? *: r => Contains[r, U]
  case EmptyTuple => false

type Remove[T <: Tuple, U] <: Tuple = T match
  case U *: r => Remove[r, U]
  case t *: r => t *: Remove[r, U]
  case EmptyTuple => EmptyTuple

type Add[T <: Tuple, U] = U *: Remove[T, U]

class MSet[T <: Tuple](content: Map[Class[?], Any]):

    def put[U](u: U)(using ev:ClassTag[U]): MSet[Add[T, U]] = MSet(content.updated(ev.runtimeClass,u))

    inline def get[U](using ev: ClassTag[U]) =
      inline if constValue[Contains[T, U]] then
        content(ev.runtimeClass)
      else
        error("This set does not contain the requested type")

    def remove[U](using ev: ClassTag[U]) = MSet[Remove[T,U]](content.removed(ev.runtimeClass))
object MSet:
  val empty = MSet[EmptyTuple](Map.empty)

final class A
final class B
final class C
MSet.empty.put(A()).put(B()).remove[C].get[B] //compiles
MSet.empty.put(A()).put(B()).remove[C].get[C] //fails to compile

Using match types, we can deconstruct the tuple that acts as MSet's metadata, allowing us to easily test if data exists in the tuple. Removal of metadata is also automatically calculated instead of requiring the user to do the calculation, and the get function produces a nice error message if a type doesn't exist in the set. Finally, and best of all, the set's signature is relatively clean: MSet[(A,B)] contains an A and a B. A weakness of this new approach is related to how match types work; if a type is not provably disjoint from the selector, then you can be unable to reach other cases. Effectively, this means that with non-final types, you risk getting an uncalculateable state for the set.

Still, this approach is stronger in most all cases than what Scala 2 ever provided.

An example use case: Enhanced builder pattern

An example of the benefits we can see with metadata types is the builder pattern. With the standard builder pattern you know from Java, it's very easy to attempt to construct something only to get a runtime exception declaring that what you were trying to build is malformed. This is the builder pattern's greatest weakness over constructors, and it holds it back when it comes to Scala.

While Scala provides a great deal of concepts that help reduce the need for the builder pattern (curried constructors, named parameters, default parameters), the imperfect synergy between these features sometimes necessitates the builder pattern still.

Metadata types in Scala 3 can help us overcome the weaknesses in the traditional builder pattern, allowing us to use it safely when needed! Let's see how...

Imagine we're trying to build a storage system for data in our program. We have a common API for the storage:

  • load(key: String): Data
  • store(key: String, data: Data): Unit

Our storage can be hosted on a database (for which we need a jdbc uri), filesystem (for which we need a path), or on an sftp server (for which we need a uri). If we're using sftp, we need to provide a user name and password, if we're using a database we need to specify if we need a connection pool, and if so, the class of the connection pool. Finally, for all three forms of storage, we need to set whether it's cached (stores up changes and saves periodically), and if so, the frequency of fetching from the host.

Let's get started:

trait Data
trait StorageSystem:
  def load(key: String): Data
  def store(key: String, value: Data): Unit

class StorageSystemBuilder[M <: Tuple]

object StorageSystemBuilder:
  sealed trait DatabaseStorage
  sealed trait FilesystemStorage
  sealed trait SFTPStorage
  sealed trait Credentials
  sealed trait ConnectionPoolInfo
  sealed trait CacheInfo

We've set up the api for our storage system, a skeleton for the builder, and a set of metadata tags that we can use to know the state of our builder. Now lets add support for the first build path, file system storage.

import scala.compiletime.ops.boolean.{&&, ||}
import scala.compiletime.{constValue, error}
import scala.annotation.targetName
import java.nio.file.Path

class StorageSystemBuilder[M <: Tuple]:
  import StorageSystemBuilder.*

  def setStorage(p: Path): StorageSystemBuilder[
    FilesystemStorage *: RemoveAll[M, Tuple.Concat[SFTPSpecific, DBSpecific]]
  ] = StorageSystemBuilder()

  def setCacheInfo(cached: false): StorageSystemBuilder[CacheInfo *: M] =
    StorageSystemBuilder()

  @targetName("needsCache")
  def setCacheInfo(
    cached: true,
    syncRateInMs: Long
  ): StorageSystemBuilder[CacheInfo *: M] = StorageSystemBuilder()

  inline def build(): StorageSystem =
    inline if constValue[
        SetEquals[M, CompleteFS] || SetEquals[M, CompleteDB] ||
          SetEquals[M, CompleteSFTP]
      ]
    then
      new StorageSystem:
        def load(key: String): Data = ???
        def store(key: String, data: Data) = ()
    else error("Cannot build. The builder is currently in an incomplete state")

object StorageSystemBuilder:
  type SFTPSpecific = (SFTPStorage, Credentials)
  type DBSpecific = (DatabaseStorage, ConnectionPoolInfo)

  type CompleteFS = (FilesystemStorage, CacheInfo)
  type CompleteDB = (DatabaseStorage, ConnectionPoolInfo, CacheInfo)
  type CompleteSFTP = (SFTPStorage, Credentials, CacheInfo)

  type Remove[T <: Tuple, U] <: Tuple = T match
    case U *: t     => Remove[t, U]
    case h *: t     => h *: Remove[t, U]
    case EmptyTuple => EmptyTuple
  type RemoveAll[T <: Tuple, U <: Tuple] <: Tuple = U match
    case h *: t     => RemoveAll[Remove[T, h], t]
    case EmptyTuple => T
  type Contains[T <: Tuple, U] <: Boolean = T match
    case U *: ?     => true
    case ? *: t     => Contains[t, U]
    case EmptyTuple => false

  type IsSubsetOrEqualTo[T <: Tuple, U <: Tuple] <: Boolean = T match
    case h *: t     => Contains[U, h] && IsSubsetOrEqualTo[t, U]
    case EmptyTuple => true

  type SetEquals[T <: Tuple, U <: Tuple] = IsSubsetOrEqualTo[T, U] &&
    IsSubsetOrEqualTo[U, T]

  sealed trait DatabaseStorage
  sealed trait FilesystemStorage
  sealed trait SFTPStorage
  sealed trait Credentials
  sealed trait ConnectionPoolInfo
  sealed trait CacheInfo

We've added a lot here so lets look at the individual pieces.

SetEquals

  type Contains[T <: Tuple, U] <: Boolean = T match
    case U *: ? => true
    case ? *: t => Contains[t, U]
    case EmptyTuple => false 

  type IsSubsetOrEqualTo[T <: Tuple, U <: Tuple] <: Boolean = T match 
    case h *: t => Contains[U, h] && IsSubsetOrEqualTo[t, U]
    case EmptyTuple => true

  type SetEquals[T <: Tuple, U <: Tuple] = IsSubsetOrEqualTo[T,U] && IsSubsetOrEqualTo[U,T]

SetEqualsis a type alias that can be used to see if two tuple types are set equivalent to each other. That is, if given the types (A, B, C, D) and (A, A, C, D, B, D) SetEquals returns true. (A,B,C) and (A, B, D, C, A) would return false however. This type will be helpful for checking whether we have one of the valid states for our builder.

RemoveAll

  type RemoveAll[T <: Tuple, U <: Tuple] <: Tuple = U match
    case h *: t => RemoveAll[Remove[T, h], t]
    case EmptyTuple => T

Given tuple types (A,B,C,D) and (C,A,B), this match type will return Tuple1[D]. This is used to clear the state of our builder in case we want to change it to a different form.

We've added the setStorage method to the builder, which declares the builder to be building storage based on the file system, and clears other options that are not related to file system storage. This isn't strictly necessary, just a nicety. We've also declared two setCacheInfo methods. One only accepts false, and turns off caching, the other only accepts true and a refresh timing parameter. This means that if you don't need caching you don't need to provide a refresh period, and if you do need caching, you don't have to pass the refresh period in a Some value.

We've also added our build method, and it uses compile-time or plus SetEquals to determine if the builder is currently in one of the three acceptable states, and if not, produces a compile-time error.

Now lets add the machinery for the database hosting:

  @targetName("dbStorage")
  def setStorage(p: JDBCUri): StorageSystemBuilder[
    DatabaseStorage *: RemoveAll[M, SFTPSpecific]
  ] = StorageSystemBuilder()

  inline def noConnectionPool: StorageSystemBuilder[ConnectionPoolInfo *: M] =
    inline if constValue[Contains[M, DatabaseStorage]]
    then StorageSystemBuilder()
    else error("This setting can only be used with DatabaseStorage")

  inline def setConnectionPool(
    connectionPoolClass: Class[?]
  ): StorageSystemBuilder[ConnectionPoolInfo *: M] =
    inline if constValue[Contains[M, DatabaseStorage]]
    then StorageSystemBuilder()
    else error("This setting can only be used with Database storage")

The changes are relatively few this time. We add a new overload to setStorage accepting a JDBCUri, and this overload removes SFTP specific options from the metadata. There are no filesystem specific options, so none are cleared. In the noConnectionPool and setConnectionPool methods, compile-time errors occur if they are used without database storage.

Finally, we add the machinery for the SFTP storage:

  @targetName("sftpStorage") 
  def setStorage(p: URI): StorageSystemBuilder[
    SFTPStorage *: RemoveAll[M, DBSpecific]
  ] = StorageSystemBuilder()

  @targetName("sftpStorageComplete")
  def setStorage(p: URI, username: String, password: String): StorageSystemBuilder[SFTPStorage *: Credentials *: RemoveAll[M, DBSpecific]] = StorageSystemBuilder()

  inline def setCredentials(username: String, password: String): StorageSystemBuilder[Credentials *: M] = 
    inline if constValue[Contains[M, SFTPStorage]]
    then StorageSystemBuilder()
    else error("Builder must be in sftp storage mode to use `setCredentials`")

This is much the same as the DB pieces, except we've added a 4th overload of setStorage that accepts credentials immediately, and which adds the Credentials metadata type to our metadata.

We've now got our builder set up, let's see it in action!

StorageSystemBuilder.init.setStorage(JDBCUri()).noConnectionPool.setCacheInfo(false).build()

StorageSystemBuilder.init.noConnectionPool //compile error: "This setting can only be used with DatabaseStorage"

StorageSystemBuilder.init.setStorage(Paths.get("bar")).setCredentials("foo", "baz") //compile error: Builder must be in sftp storage mode to use `setCredentials`

StorageSystemBuilder.init.setStorage(JDBCUri()).noConnectionPool.build() //compile error: Cannot build. The builder is currently in an incomplete state

As you can see, our builder is now compile-time checked against invalid configurations. The set-up chosen for this example is purely optional, and you can do things much differently, but this usage of metadata types really strengthens builders.

Here's the complete source code for the builder example:

import scala.compiletime.ops.boolean.{&&, ||}
import scala.compiletime.{constValue, error}
import scala.annotation.targetName
import java.nio.file.Path
import java.net.URI

trait Data
trait StorageSystem:
  def load(key: String): Data
  def store(key: String, value: Data): Unit

class JDBCUri

class StorageSystemBuilder[M <: Tuple]:
  import StorageSystemBuilder.*

  def setStorage(p: Path): StorageSystemBuilder[
    FilesystemStorage *: RemoveAll[M, Tuple.Concat[SFTPSpecific, DBSpecific]]
  ] = StorageSystemBuilder()

  @targetName("dbStorage")
  def setStorage(p: JDBCUri): StorageSystemBuilder[
    DatabaseStorage *: RemoveAll[M, SFTPSpecific]
  ] = StorageSystemBuilder()

  @targetName("sftpStorage") 
  def setStorage(p: URI): StorageSystemBuilder[
    SFTPStorage *: RemoveAll[M, DBSpecific]
  ] = StorageSystemBuilder()

  @targetName("sftpStorageComplete")
  def setStorage(p: URI, username: String, password: String): StorageSystemBuilder[SFTPStorage *: Credentials *: RemoveAll[M, DBSpecific]] = StorageSystemBuilder()

  inline def setCredentials(username: String, password: String): StorageSystemBuilder[Credentials *: M] = 
    inline if constValue[Contains[M, SFTPStorage]]
    then StorageSystemBuilder()
    else error("Builder must be in sftp storage mode to use `setCredentials`")

  inline def noConnectionPool: StorageSystemBuilder[ConnectionPoolInfo *: M] =
    inline if constValue[Contains[M, DatabaseStorage]]
    then StorageSystemBuilder()
    else error("This setting can only be used with DatabaseStorage")

  @targetName("usingConnectionPool")
  inline def setConnectionPool(
    connectionPoolClass: Class[?]
  ): StorageSystemBuilder[ConnectionPoolInfo *: M] =
    inline if constValue[Contains[M, DatabaseStorage]]
    then StorageSystemBuilder()
    else error("This setting can only be used with Database storage")

  def setCacheInfo(cached: false): StorageSystemBuilder[CacheInfo *: M] =
    StorageSystemBuilder()

  @targetName("needsCache")
  def setCacheInfo(
    cached: true,
    syncRateInMs: Long
  ): StorageSystemBuilder[CacheInfo *: M] = StorageSystemBuilder()

  inline def build(): StorageSystem =
    inline if constValue[
        SetEquals[M, CompleteFS] || SetEquals[M, CompleteDB] ||
          SetEquals[M, CompleteSFTP]
      ]
    then
      new StorageSystem:
        def load(key: String): Data = ???
        def store(key: String, data: Data) = ()
    else error("Cannot build. The builder is currently in an incomplete state")

object StorageSystemBuilder:
  val init: StorageSystemBuilder[EmptyTuple] = StorageSystemBuilder()
  type SFTPSpecific = (SFTPStorage, Credentials)
  type DBSpecific = (DatabaseStorage, ConnectionPoolInfo)

  type CompleteFS = (FilesystemStorage, CacheInfo)
  type CompleteDB = (DatabaseStorage, ConnectionPoolInfo, CacheInfo)
  type CompleteSFTP = (SFTPStorage, Credentials, CacheInfo)

  type Remove[T <: Tuple, U] <: Tuple = T match
    case U *: t     => Remove[t, U]
    case h *: t     => h *: Remove[t, U]
    case EmptyTuple => EmptyTuple
  type RemoveAll[T <: Tuple, U <: Tuple] <: Tuple = U match
    case h *: t     => RemoveAll[Remove[T, h], t]
    case EmptyTuple => T
  type Contains[T <: Tuple, U] <: Boolean = T match
    case U *: ?     => true
    case ? *: t     => Contains[t, U]
    case EmptyTuple => false

  type IsSubsetOrEqualTo[T <: Tuple, U <: Tuple] <: Boolean = T match
    case h *: t     => Contains[U, h] && IsSubsetOrEqualTo[t, U]
    case EmptyTuple => true

  type SetEquals[T <: Tuple, U <: Tuple] = IsSubsetOrEqualTo[T, U] &&
    IsSubsetOrEqualTo[U, T]

  sealed trait DatabaseStorage
  sealed trait FilesystemStorage
  sealed trait SFTPStorage
  sealed trait Credentials
  sealed trait ConnectionPoolInfo
  sealed trait CacheInfo

Happy Scala hacking!

Did you find this article valuable?

Support Mark Hammons by becoming a sponsor. Any amount is appreciated!