Crossing those hills

Crossing those hills

Last time, we looked at two potential encodings of C's platform dependent type long. Today we'll renew our search, but with the help of opaque types.

Opaque types

Opaque types are a new form of type in Scala 3. They are types that exist only at compile-time, erasing to their underlying type at runtime. Take the following example:

opaque type Meter = Int

object Meter: 
  def apply(meters: Int): Meter = meters

This defines an opaque type called Meter that is backed by Int at runtime. Within the scope that one defines an opaque type, it is merely an alias for the type that backs it at runtime. This means that within this file, Meter and Int are the same type, but outside of it, Meter is it's own type with no relation to the Int type.

Opaque types can be used to enrich basic data types with information. For example, one could define a PositiveInt opaque type, and make it only instantiable after testing that an Int passed in to its apply method is positive:

opaque type PositiveInt = Int

object PositiveInt:
  def apply(int: Int): PositiveInt = 
    if int >= 0 then int 
    else throw new Error("Not positive!")

Opaque types are also useful for implementing lightweight new types, acting as an alternative to classes that extended AnyVal in Scala 2. In this case, we want to have types that represent platform dependent types without requiring unboxing to use with MethodHandles generated by java.lang.foreign. Opaque types could be perfect for this because they erase at runtime to their underlying type, and the standard java.lang types work perfectly with foreign's MethodHandles.

CVal

To start with, let's define an opaque type called CVal. This type will act as the base type for our platform dependent types, and will house methods that should be available for all platform dependent types.

opaque type CVal = Matchable

object CVal:
  def apply(a: Matchable): CVal =
    a

In this snippet, CVal is defined as being backed by Matchable because it's the lowest level type that is understandable by the JVM in Scala 3. In Scala 2, Any would've been used, but Any technically includes opaque types and other types that exist outside of the JVM's understanding.

Let's add a method to turn a CVal back into a regular type:

opaque type CVal = Matchable

object CVal:
  def apply(a: Matchable): CVal =
    a

  extension (cval: CVal)
    def as[A <: Matchable]
        : Option[A] =
      cval match
        case a: A => Some(a)
        case _    => None

When one tries to compile this code, they'll get a compiler warning: " the type test for A cannot be checked at runtime because it refers to an abstract type member or type parameter".

In order to avoid this, one can have the method use a TypeTest, a new type class in Scala 3.

import scala.reflect.TypeTest

opaque type CVal = Matchable

object CVal:
  def apply(a: Matchable): CVal =
    a

  extension (cval: CVal)
    def as[A <: Matchable](using
        TypeTest[Matchable, A]
    ): Option[A] =
      cval match
        case a: A => Some(a)
        case _    => None

Now the as method definition throws no errors. This is because a TypeTest is a record of information necessary to determine if Matchable is actually an A at compiletime. However, TypeTest can be an inefficient way to test this, as it creates and tosses away an Option in the process of matching. We may toss it aside for .isInstanceOf and .asInstanceOf later on, but at least we have the ability to check this without using those methods.

At least, all of this works in theory. In reality, it's best to unit test these things to make sure there's no deficiency in our code that the compiler did not catch. I've written the following unit test to check the as method on CVal:

class CValDemonstration
    extends munit.FunSuite:
  test("demo 1") {
    val cval = CVal(5)

    cval.as[5]

    assertEquals(
      cval.as[Int],
      Some(5)
    )
  }

There are no warnings or errors when compiling this code, and when I run the test, Some(5) and cval.as[Int] are recognized as equal to each other. So far so good.

CIntegral

The next type we'll define is CIntegral. We'll make the platform dependent types that are known to be integral on all platforms into subtypes of this type, and it will be home to common methods and logic for integral platform dependent types.

opaque type CIntegral <: CVal =
  CVal

object CIntegral:
  def apply(a: Long): CIntegral =
    CVal(a)
  extension (
      cintegral: CIntegral
  )
    def asLong =
      cintegral.as[Long].get

One thing you might notice is <: CVal = CVal in our opaque type definition. This means that CIntegral erases to a CVal at runtime, but is considered a subtype of CVal at compiletime. It also means that unlike CVal, which has no relation to its runtime type of Matchable except in its defining scope, CIntegral has a subtyping relationship with CVal and can be used wherever a CVal is expected.

This subtyping relationship is more for the sharing of method definitions between the two than for the ability to use CVal as the Any of platform dependent types. This is particularly important to keep in mind because it's very difficult if not impossible to retrieve a subtype of an opaque type after you've cast that type information away. Unlike normal JVM types that carry type information around with them at runtime, opaque types cease to exist at compile-time, so if their type is not based on some runtime data, you cannot retrieve the child type of an opaque type.

Anyway, this definition of CIntegral allows Long values to be redefined as CIntegrals via the apply method, and allows us to convert any CIntegral into a Long via the asLong extension method. By making the sole "constructor" of CIntegral types only accept a Long we effectively make CIntegral backed by Long values. Consequently, that makes the assumption that as[Long] will return Some practically certain. This is more of a de facto type safety than the de jure typesafety that we'd want; if the definition of our patterns change then our code can become unsafe without the compiler warning us.

We can try to come up with something safer as our experiments continue, should they show promise.

Third try at CLong

Now that we have CIntegral defined, let's take a stab at implementing CLong. We'll define it as an opaque subtype of CIntegral.

opaque type CLong <: CIntegral = CIntegral

object CLong: 
  def apply(i: Int): CLong = 
    CIntegral(i.toLong)

In the above snippet I've declared an apply method for CLong which accepts Int values. The reason for this is because long in C is defined to be at least 32-bits long in the C specification. Since that's the minima of the specification, any Int should fit inside a signed longon any platform. Non-conformant implementations of C exist, but we can't program our way around that madness.

Int isn't the only integral we'd likely like to be able to instantiate a CLong from. A n apply method for Long would be great if that's safe for the platform we're developing on. However, first we need a concept of platforms for our constructor to reason about:

enum Platform:
  case WinX64
  case MacOSX64
  case LinuxX64

This time, the Platform type can be defined as an enum, since we don't do anything fancy with it. Now, our apply method will take a platform as context, and return a defined Option if the platform supports 64-bit longs :

opaque type CLong <: CIntegral = CIntegral

object CLong:
  def apply(i: Int): CLong = 
    CIntegral(i.toLong)

  def apply(l: Long)(using
      p: Platform
  ): Option[CLong] = p match
    case Platform.LinuxX64 |
        Platform.MacOSX64 =>
      Some(CIntegral(l))
    case _ => None

This new apply method for creating CLong from Long types works as defined, but returning an Option can be unsatisfying. This is boxing, and it only exists because the compiler doesn't know what platform we're dealing with. It would be nice to allow users to avoid that boxing if they go through the effort of proving to the compiler what the current platform is. We can try to define methods that return a bare CLong if a specific Platform is present in the call's context:

object CLong:
  def apply(i: Int): CLong =
    CIntegral(i.toLong)

  def apply(l: Long)(using
      p: Platform
  ): Option[CLong] = p match
    case Platform.LinuxX64 |
        Platform.MacOSX64 =>
      Some(CIntegral(l))
    case _ => None

  def apply(l: Long)(using
      p: Platform.LinuxX64.type
  ): CLong = CIntegral(l)

  def apply(l: Long)(using
      p: Platform.MacOSX64.type
  ): CLong = CIntegral(l)

Sadly, this code doesn't compile as written. The compiler complains that the final apply is a double definition. That means that at runtime, the right apply method overload would not be able to be chosen based on the runtime types. This is mainly because the concept of singleton types like .type doesn't exist on the JVM at runtime. However, we can get around this with a new annotation present in Scala 3 called @targetName. @targetName basically gives the annotated method a different name in the bytecode, meaning the Scala compiler with its total knowledge of the types in question can instruct the JVM on which apply method to use:

opaque type CLong <: CIntegral =
  CIntegral

object CLong:
  def apply(i: Int): CLong =
    CIntegral(i.toLong)

  def apply(l: Long)(using
      p: Platform
  ): Option[CLong] = p match
    case Platform.LinuxX64 |
        Platform.MacOSX64 =>
      Some(CIntegral(l))
    case _ => None

  def apply(l: Long)(using
      p: Platform.LinuxX64.type
  ): CLong = CIntegral(l)

  @targetName(
    "certainLongMacOSX"
  )
  def apply(l: Long)(using
      p: Platform.MacOSX64.type
  ): CLong = CIntegral(l)

The @targetName here basically gives the 4th apply method the name "certainLongMacOSX" in the bytecode emitted by the Scala compiler. Thanks to that, our definition works now. Let's test our CLong instantiation with the following unit test:

class CLongDemonstration
    extends munit.FunSuite:
  given Platform =
    Platform.LinuxX64
  test("demo 1") {
    val clong1 = CLong(5)
    val clong2 = CLong(5L)
  }

The definition of clong2 here doesn't compile because the compiler is complaining of an ambiguous overload. This is a tough problem to deal with so we won't address it right now. Instead we'll sidestep it by renaming our platform specific applies to certain.

  def certain(l: Long)(using
      Platform.LinuxX64.type
  ): CLong = CIntegral(l)

  @targetName(
    "certainLongMacOSX"
  )
  def certain(l: Long)(using
      Platform.MacOSX64.type
  ): CLong = CIntegral(l)
  test("demo 1") {
    val clong1 = CLong(5)
    val clong2 = CLong(5L)

    assertEquals(
      clong1.as[Long],
      clong2.flatMap(_.as[Long])
    )
  }

  test("demo 2") {
    val clong =
      summon[Platform] match
        case Platform.LinuxX64 =>
          CLong.certain(5L)(using
            Platform.LinuxX64
          )
        case Platform.MacOSX64 =>
          CLong.certain(5L)(using
            Platform.MacOSX64
          )
        case _ => CLong(5)

    assertEquals(
      clong.as[Long],
      Some(5L)
    )
  }

With the fixed method name, we can complete the unit test implementations, and run them. Doing so has the assertions come back true, but the match expression and the usage of certain is ugly. However, we can fix those problems by using two new features of Scala 3 - pattern bound given instances and union types:

  test("demo 2") {
    val clong =
      summon[Platform] match
        case given (Platform.LinuxX64.type |
              Platform.MacOSX64.type) =>
          CLong.certain(5L)
        case _ => CLong(5)

    assertEquals(
      clong.as[Long],
      Some(5L)
    )
  }

This new version of "demo 2" is cleaner looking, but now we get a compiler error on at the invocation of certain. The compiler is complaining once again that the invocation is ambiguous, and it is because both methods could serve the union given we created. However, that's a hint that certain shouldn't have two definitions in the first place. certain should be available for use in the case that the platform is known to be either LinuxX64 or MacOSX64, and that's what the union Platform.LinuxX64.type | Platform.MacOSX64.type indicates. So lets try to collapse the two definitions into one.

  def certain(l: Long)(using
      Platform.LinuxX64.type |
        Platform.MacOSX64.type
  ): CLong = CIntegral(l)

The cleaner "demo 2" code now compiles and all the assertions pass. However, has our collapse of the two certain methods into one cost us our ability to use certain with a given of the singleton type Platform.LinuxX64.type? Let's write a third demo:

  test("demo 3") {
    val clong = 
      summon[Platform] match
        case given Platform.LinuxX64.type =>
          CLong.certain(5l)
        case _ => CLong(5)

    assertEquals(
      clong.as[Long],
      Some(5l)
    )
  }

certain can still be invoked in the case where we only have LinuxX64 or MacOSX64 in context! Things are really starting to look up!

Trouble in paradise

Sadly, there is currently an insufficiency in our code. When describing a C function to java.lang.foreign, one gives it a set of type descriptions corresponding to Byte, Float, Long, Int, Short, Double, and some "foreign" specific types. The MethodHandle generated from that description expects those types to be passed into it. So on Windows 64-bit, if a C function requires a long, then we describe it as taking Int and we must pass in anInt. Therefore we have two choices:

  1. We can stick with an encoding similar to the one we have today, where integral types are always backed by a Long and we convert them into/out of the type needed when invoking a C function binding
  2. We just store the type needed inside CLong.

Both approaches have their upsides and downsides. An upside of approach 1 is we always know what type we're dealing with when writing code internal to these integral types. A downside is that we must know what type to convert it to or from without having examples when we need them.

Approach 2 will have us storing an Int in CIntegral types when the platform demands a 32-bit integral, which has the upside that we don't need to convert the data when passing into method handles and when extracting data, but we have to make sure that the data we're passing in is aligned with the platform definitions or type misalignment can happen.

There's also differences in how math is done on all the different integral types as a side-effect of their boundaries, meaning that to get equivalent math to an Int using Long as a backing for CIntegral would mean converting to an Int and back for some math operations like ordering and division.

For now, let's pursue approach 2, as it seems to be the simpler of the two. Lets start by modifying CIntegral and CLong to allow them to store any integral type:

opaque type CIntegral <: CVal =
  CVal

object CIntegral:
  def apply(a: AnyVal): CIntegral =
    CVal(a)
  extension (
      cintegral: CIntegral
  )
    def asLong =
      cintegral.as[Long].get

opaque type CLong <: CIntegral =
  CIntegral

object CLong:
  def apply(i: Int)(using
      p: Platform
  ): CLong =
    p match
      case Platform.WinX64 =>
        CIntegral(i)
      case Platform.MacOSX64 |
          Platform.LinuxX64 =>
        CIntegral(i)

  def apply(l: Long)(using
      p: Platform
  ): Option[CLong] = p match
    case Platform.LinuxX64 |
        Platform.MacOSX64 =>
      Some(CIntegral(l))
    case _ => None

  def certain(l: Long)(using
      Platform.LinuxX64.type |
        Platform.MacOSX64.type
  ): CLong = CIntegral(l)

If we run this code through the demonstration unit tests from before, the assertions will fail. That's because there's bugs in the implementation:

  • CLong's apply method for Int has the Linux and Mac branch failing to convert the Int to Long, an example of type misalignment
  • CIntegrals asLong method assumes that CIntegral is a Long in all cases, which is no longer true

Type classes to the rescue

Type classes can be used for many things in Scala, but one of the more interesting uses for them is as a vehicle for type calculation. Take for example the following type class:

trait MyCalc[A]:
  type B

object MyCalc:
  given MyCalc[Int] with 
    type B = Float

  given MyCalc[Float] with 
    type B = String

This type class and its instances are a mapping between type A and type B. Using this type class, it's possible to write a function that demands a type B based on what type A is. Lets take a look:

class MyCalcDemonstration
    extends munit.FunSuite:
  def myFun[A](using
      mc: MyCalc[A]
  )(b: mc.B): mc.B = b

  test("compiles") {
    assertNoDiff(
      compileErrors(
        "val f: Float = myFun[Int](5f)"
      ),
      ""
    )
  }

This code compiles because a MyCalc is looked up based upon the type A passed in, and from that MyCalc, instance the input type and result type is determined. In this case, since A was Int, B is Float.

We can extend this approach to create a type class that maps from a Platform and a type like CLong to the underlying type for that platform:

trait TypeRelation[
    P <: Platform,
    A
]:
  type Real <: AnyVal

This type class, TypeRelation has two type inputs, and an inner type called Real that's a subtype of Matchable. P is the corresponding Platform for the mapping, A is the opaque type, and Real is the type that will back the opaque type at runtime. Let's try to put it into usage:

opaque type CLong <: CIntegral =
  CIntegral

object CLong:
  def apply(i: Int)(using
      p: Platform
  ): CLong =
    p match
      case Platform.WinX64 =>
        CIntegral(i)
      case Platform.MacOSX64 |
          Platform.LinuxX64 =>
        CIntegral(i)

  def apply(l: Long)(using
      p: Platform
  ): Option[CLong] = p match
    case Platform.LinuxX64 |
        Platform.MacOSX64 =>
      Some(CIntegral(l))
    case _ => None

  def certain[P <: Platform](
      using
      p: P,
      tr: TypeRelation[P, CLong]
  )(l: tr.Real): CLong =
    CIntegral(l)

  given TypeRelation[
    Platform.LinuxX64.type |
      Platform.MacOSX64.type,
    CLong
  ] with
    type Real = Long

  given TypeRelation[
    Platform.WinX64.type,
    CLong
  ] with
    type Real = Int

TypeRelation here provides the mapping needed when defined as part of CLongs companion. Of note is the redefinition of certain here. First, we move the context parameter section in front of the standard inputs. This is an option that's newly available in Scala 3, and it lets us control what inputs are allowed by first checking the context of the certain invocation. This change empowers certain to only take the integral types that matches CLong on the current platform. As a nice bonus, it also makes certain work on the Windows platform. However, this still hasn't saved us from the bug in apply. Lets see if we can modify CIntegral's apply to use TypeRelation.

opaque type CIntegral <: CVal =
  CVal

object CIntegral:
  def apply[P <: Platform](using
      p: P,
      tr: TypeRelation[P, ?]
  )(a: tr.Real): CIntegral =
    CVal(a)

  extension (
      cintegral: CIntegral
  )
    def asLong =
      cintegral.as[Long].get
opaque type CLong <: CIntegral =
  CIntegral

object CLong:
  def apply(i: Int)(using
      p: Platform
  ): CLong = p match
    case given (Platform.MacOSX64.type |
          Platform.LinuxX64.type) =>
      CIntegral(i)
    case given Platform.WinX64.type =>
      CIntegral(i)

  def apply(l: Long)(using
      p: Platform
  ): Option[CLong] = p match
    case given (Platform.LinuxX64.type |
          Platform.MacOSX64.type) =>
      Some(CIntegral(l))
    case _ => None

  def certain[P <: Platform](
      using
      p: P,
      tr: TypeRelation[P, CLong]
  )(l: tr.Real): CLong =
    CIntegral(l)

  given TypeRelation[
    Platform.LinuxX64.type |
      Platform.MacOSX64.type,
    CLong
  ] with
    type Real = Long

  given TypeRelation[
    Platform.WinX64.type,
    CLong
  ] with
    type Real = Int

Since we have no information about the opaque type being dealt with within CIntegral, we have to use ? as the platform dependent type in its apply method. However, this works fine for our purposes because the requisite TypeRelation definitions are in scope where we're invoking CIntegral's apply. With this definition, as long as a specific Platform context is available, CIntegral knows exactly what type is needed, and only allows that type. In the case where we had a bug in before, Int is being promoted to a Long automatically, and we can confirm that via the unit tests all passing once again.

We'll explore further, trying to add some basic functionality to CIntegral and CVal, but that will have to wait until the next blogpost...

The code for this blogpost can be found under the opaque-types-1 folder at this github repository.

Til next time, and happy Scala hacking!

Did you find this article valuable?

Support Mark Hammons by becoming a sponsor. Any amount is appreciated!