Our journey begins

Our journey begins

It's been a longstanding dream of mine to make C and Scala play nice. While Scala is one of my favorite languages, it tends to be trapped in the semi-walled garden of the JVM. I want to write apps that benefit the opensource community, like applications for KDE. Doing so in Scala has long meant dealing with JNI, dealing with platform differences and more. JNI itself is fairly painful, requiring writing both C++ code and Java code to use it. It also doesn't play super well with pure Scala. The alternatives to JNI, JNA and JNR, suffer from similar Scala incompatibilities as well as being poorly documented, making it difficult to determine how to use them properly.

A few years ago a project to improve the ability to use native capabilities of platforms from within Java was started, titled "panama". The foreign part of the project was particularly interesting to me, as it enables the generation of bindings to C functions and libraries without necessarily requiring one to write C. Better yet, this foreign API plays well with pure Scala in most cases. That is how and why I started Slinc.

The creation of Slinc has been troublesome despite how nice the new foreign API is in Java. Of particular difficulty to me has been the definition of platform dependent types.

Platform dependent types - A nightmare for strong typing?

Platform dependent types are a concept in C, where the characteristics of a type are dependent on the platform the type is being used on. The standard integral primitives in C, char, short,int,long, long long, are all defined based on the platform you're using them on, though a particularly prominent example is long. On X64 processors running MacOSX or Linux, long is a 64-bit wide signed integral type, meaning that the range of possible integers a long can hold on these platforms is −9,223,372,036,854,775,807 to +9,223,372,036,854,775,807. On X64 processors running Windows, a long is a 32-bit wide signed integral type, meaning the range of possible integers a long can hold on Windows is −2,147,483,647 to +2,147,483,647.

Basically, on Windows X64, long is the equivalent of Scala's Int type, and on X64 Mac and Linux it's the equivalent of Scala's Long.

This is a major difference that doesn't matter much if you're coding software for one specific platform, but once your software is supposed to be platform agnostic this becomes a major issue. And since the JVM tends to favor the "write once run anywhere" philosophy, writing a library for a JVM language to interact with C becomes difficult to do even when only dealing with the predefined primitives for the C language.

In Scala we prefer to write code in a style that uses strong typing. For me, strong type systems are ones where you're able to express yourself with types, and your expression is guided by the compiler without becoming so impossible that you have to use escape hatches like casting.

Here's an example of this with Scala and Java (before Java 21):

val i: Number = Long.box(5l)
i match 
  case l: Long => l + l
Number i = new Long(5l);
Long l;
if(i instanceof Long) {
  l = ((Long) i) * ((Long) i);
  return;
}

In both examples, we're trying to deduce if i is a Long, and then use it as a Long. The Scala version can do this without casting (at least by us), while the Java version requires the user to cast. Having this transformation of types backed by the compiler helps us avoid errors by requiring us to write code to prove to the compiler and ourselves what types values are.

So how do we express a platform dependent type like long from C in Scala while preserving the guarantees of strong typing? This type can be the Scala equivalent of Long or Int depending on what platform the program is running on. Worse yet, any and all types in C can have this property, including user defined ones, so how can the compiler help us with these types when we don't know what their definition is until the program runs?

We're going to explore that in the next few blog posts.

Requirements

For me, a pattern for platform dependent types should meet the following requirements (ranked by priority):

  1. They should be definable by people using Slinc, not just myself

  2. They should be type-safe

  3. They should be light weight

  4. They should be easy to use

  5. They should be easy to define

A naive attempt at CLong

We can attempt to encode an analog of C's long in Scala with the following definition of CLong

class CLong(val data: Int | Long)

The problem with this definition is that two CLongs can have different data inside them.

This definition of CLong violates a number of the requirements I set at the beginning of this journey.

With regards to requirement 2: the definition is not type-safe because there's no restriction on the input to CLong with regards to the platform the program is on.

val clong1 = CLong(4l)
val clong2 = CLong(2)

In the case of CLong, this isn't the biggest deal because you can typically convert the possible values into each other to achieve type alignment, but other platform dependent types won't have this property.

With regards to requirement 4: the definition is not easy to use. If a person wanted to use CLong for anything, they would have to test what type it contains using a match expression, and these match expressions would pollute the user's code.

(clong1.data, clong2.data) match 
  case (i: Int, j: Int) =>  CLong(i + j)
  case (i: Long, j: Long) => CLong(i + j)
  case (i: Int, j: Long) => CLong(i.toLong + j)
  case (i: Long, j: Int) => CLong(i + j.toLong)

This would get even worse when dealing with multiple platform dependent types, and considering all the standard primitive integral types in C are potentially platform dependent, this usage style could quickly become overwhelming for users.

Finally, with regards to requirement 3: the definition is not performant. As can be seen, every single operation on this definition requires a pattern match, and the type itself requires one to allocate and reallocate a class.

This definition of CLong cannot work for my purposes.

Trying path dependent types

As mentioned in the last section, tying the definition of CLong to the platform we're currently running on is a necessity, so maybe we should start by defining a type to represent the platform:

sealed trait Platform:
  type CLong
  def clong(i: Int): CLong
  def clong(
      l: Long
  ): Option[CLong]

case object LinuxX64
    extends Platform:
  type CLong = Long
  def clong(i: Int): CLong =
    i.toLong
  def clong(
      i: Long
  ): Option[CLong] =
    Some(i)

  def clongCertain(
      i: Long
  ): CLong =
    i

case object WinX64
    extends Platform:
  type CLong = Int
  def clong(i: Int): CLong = i
  def clong(
      i: Long
  ): Option[CLong] = None

case object MacOSX64
    extends Platform:
  type CLong = Long
  def clong(i: Int): CLong =
    i.toLong
  def clong(
      i: Long
  ): Option[CLong] = Some(i)
  def clongCertain(
      i: Long
  ): CLong = i

This seems to be a better choice. Now we can write a val that pretends to detect the platform when the program is loaded and sets the right Platform value:

val platform: Platform = WinX64

Now we can write out code that uses CLong and that value will for sure be the same width everywhere:

val c: Platform#CLong = platform.clong(5)
val d: Option[Platform#CLong] = platform.clong(5l)
val e: Platform#CLong = platform match 
  case w: WinX64.type => w.clong(5l.toInt)
  case l: LinuxX64.type => l.clongCertain(5l)
  case m: MacOSX64.type => m.clongCertain(5l)

As seen in the above code, Platform#CLong is roughly the same value everywhere, can be instantiated based on the minimum size for a long (32-bits) and can potentially be instantiated from a Scala Long if the platform allows it (returning None if it doesn't). We can even do a match on the platform to detect which one we're running on, and have code that handles the different platforms differently. Problem solved right?

Unfortunately no...

Why path dependent types fail us here

The above implementation fails hard on requirement 1: The path dependent types are defined as part of the platform, which is controlled by Slinc, which means users cannot add platform dependent types without submitting a pull request to the Slinc repository.

There are many more platform dependent types than long in the C standard library, as well as in other C libraries. That means the number of platform dependent types we may need to define is unbounded. It also means that users need to be able to define these types.

This approach also fails requirement 5: Since in this example the types are defined as part of the Platform type, and the Platform type is sealed, that means that the file that contains the definition of all platforms grows multiplicatively with each platform dependent type definition. Likewise, all supporting methods, such as def clong, must be defined in the same file. This makes for a file of unbounded size, and it means that if someone wants to define a new platform dependent type, they have to redefine Platform itself.

Worse, lets imagine enabling basic math to be done on CLong and other platform dependent integral types:

sealed trait Platform:
  type CLong
  given clongIntegral
      : Integral[CLong]
  def clong(i: Int): CLong
  def clong(
      l: Long
  ): Option[CLong]

case object LinuxX64
    extends Platform:
  type CLong = Long
  given clongIntegral
      : Integral[CLong] =
    Numeric.LongIsIntegral
  def clong(i: Int): CLong =
    i.toLong
  def clong(
      i: Long
  ): Option[CLong] = Some(i)
  def clongCertain(
      i: Long
  ): CLong = i

case object WinX64
    extends Platform:
  type CLong = Int
  given clongIntegral
      : Integral[CLong] =
    Numeric.IntIsIntegral
  def clong(i: Int): CLong = i
  def clong(
      i: Long
  ): Option[CLong] = None

case object MacOSX64
    extends Platform:
  type CLong = Long
  given clongIntegral
      : Integral[CLong] =
    Numeric.LongIsIntegral
  def clong(i: Int): CLong =
    i.toLong
  def clong(
      i: Long
  ): Option[CLong] = None
  def clongCertain(
      i: Long
  ): CLong = i

Every addition we make to this type adds to the complexity of a single file and said file would eventually grow to an unmanageable size.

In my next post, we'll further explore the problem and other approaches to defining CLong in such a way as to hopefully overcome these issues.

The code for this article is available at this github repo under the "naive" and "path-dependent-types" folders.

Happy Scala hacking!

Did you find this article valuable?

Support Mark Hammons by becoming a sponsor. Any amount is appreciated!