It's been a longstanding dream of mine to make C and Scala play nice. While Scala is one of my favorite languages, it tends to be trapped in the semi-walled garden of the JVM. I want to write apps that benefit the opensource community, like applications for KDE. Doing so in Scala has long meant dealing with JNI, dealing with platform differences and more. JNI itself is fairly painful, requiring writing both C++ code and Java code to use it. It also doesn't play super well with pure Scala. The alternatives to JNI, JNA and JNR, suffer from similar Scala incompatibilities as well as being poorly documented, making it difficult to determine how to use them properly.
A few years ago a project to improve the ability to use native capabilities of platforms from within Java was started, titled "panama". The foreign part of the project was particularly interesting to me, as it enables the generation of bindings to C functions and libraries without necessarily requiring one to write C. Better yet, this foreign API plays well with pure Scala in most cases. That is how and why I started Slinc.
The creation of Slinc has been troublesome despite how nice the new foreign API is in Java. Of particular difficulty to me has been the definition of platform dependent types.
Platform dependent types - A nightmare for strong typing?
Platform dependent types are a concept in C, where the characteristics of a type are dependent on the platform the type is being used on. The standard integral primitives in C, char
, short
,int
,long
, long long
, are all defined based on the platform you're using them on, though a particularly prominent example is long
. On X64 processors running MacOSX or Linux, long
is a 64-bit wide signed integral type, meaning that the range of possible integers a long
can hold on these platforms is −9,223,372,036,854,775,807 to +9,223,372,036,854,775,807. On X64 processors running Windows, a long
is a 32-bit wide signed integral type, meaning the range of possible integers a long
can hold on Windows is −2,147,483,647 to +2,147,483,647.
Basically, on Windows X64, long
is the equivalent of Scala's Int
type, and on X64 Mac and Linux it's the equivalent of Scala's Long
.
This is a major difference that doesn't matter much if you're coding software for one specific platform, but once your software is supposed to be platform agnostic this becomes a major issue. And since the JVM tends to favor the "write once run anywhere" philosophy, writing a library for a JVM language to interact with C becomes difficult to do even when only dealing with the predefined primitives for the C language.
In Scala we prefer to write code in a style that uses strong typing. For me, strong type systems are ones where you're able to express yourself with types, and your expression is guided by the compiler without becoming so impossible that you have to use escape hatches like casting.
Here's an example of this with Scala and Java (before Java 21):
val i: Number = Long.box(5l)
i match
case l: Long => l + l
Number i = new Long(5l);
Long l;
if(i instanceof Long) {
l = ((Long) i) * ((Long) i);
return;
}
In both examples, we're trying to deduce if i
is a Long
, and then use it as a Long
. The Scala version can do this without casting (at least by us), while the Java version requires the user to cast. Having this transformation of types backed by the compiler helps us avoid errors by requiring us to write code to prove to the compiler and ourselves what types values are.
So how do we express a platform dependent type like long
from C in Scala while preserving the guarantees of strong typing? This type can be the Scala equivalent of Long
or Int
depending on what platform the program is running on. Worse yet, any and all types in C can have this property, including user defined ones, so how can the compiler help us with these types when we don't know what their definition is until the program runs?
We're going to explore that in the next few blog posts.
Requirements
For me, a pattern for platform dependent types should meet the following requirements (ranked by priority):
They should be definable by people using Slinc, not just myself
They should be type-safe
They should be light weight
They should be easy to use
They should be easy to define
A naive attempt at CLong
We can attempt to encode an analog of C's long
in Scala with the following definition of CLong
class CLong(val data: Int | Long)
The problem with this definition is that two CLong
s can have different data inside them.
This definition of CLong
violates a number of the requirements I set at the beginning of this journey.
With regards to requirement 2: the definition is not type-safe because there's no restriction on the input to CLong
with regards to the platform the program is on.
val clong1 = CLong(4l)
val clong2 = CLong(2)
In the case of CLong
, this isn't the biggest deal because you can typically convert the possible values into each other to achieve type alignment, but other platform dependent types won't have this property.
With regards to requirement 4: the definition is not easy to use. If a person wanted to use CLong for anything, they would have to test what type it contains using a match expression, and these match expressions would pollute the user's code.
(clong1.data, clong2.data) match
case (i: Int, j: Int) => CLong(i + j)
case (i: Long, j: Long) => CLong(i + j)
case (i: Int, j: Long) => CLong(i.toLong + j)
case (i: Long, j: Int) => CLong(i + j.toLong)
This would get even worse when dealing with multiple platform dependent types, and considering all the standard primitive integral types in C are potentially platform dependent, this usage style could quickly become overwhelming for users.
Finally, with regards to requirement 3: the definition is not performant. As can be seen, every single operation on this definition requires a pattern match, and the type itself requires one to allocate and reallocate a class.
This definition of CLong
cannot work for my purposes.
Trying path dependent types
As mentioned in the last section, tying the definition of CLong
to the platform we're currently running on is a necessity, so maybe we should start by defining a type to represent the platform:
sealed trait Platform:
type CLong
def clong(i: Int): CLong
def clong(
l: Long
): Option[CLong]
case object LinuxX64
extends Platform:
type CLong = Long
def clong(i: Int): CLong =
i.toLong
def clong(
i: Long
): Option[CLong] =
Some(i)
def clongCertain(
i: Long
): CLong =
i
case object WinX64
extends Platform:
type CLong = Int
def clong(i: Int): CLong = i
def clong(
i: Long
): Option[CLong] = None
case object MacOSX64
extends Platform:
type CLong = Long
def clong(i: Int): CLong =
i.toLong
def clong(
i: Long
): Option[CLong] = Some(i)
def clongCertain(
i: Long
): CLong = i
This seems to be a better choice. Now we can write a val that pretends to detect the platform when the program is loaded and sets the right Platform value:
val platform: Platform = WinX64
Now we can write out code that uses CLong
and that value will for sure be the same width everywhere:
val c: Platform#CLong = platform.clong(5)
val d: Option[Platform#CLong] = platform.clong(5l)
val e: Platform#CLong = platform match
case w: WinX64.type => w.clong(5l.toInt)
case l: LinuxX64.type => l.clongCertain(5l)
case m: MacOSX64.type => m.clongCertain(5l)
As seen in the above code, Platform#CLong
is roughly the same value everywhere, can be instantiated based on the minimum size for a long
(32-bits) and can potentially be instantiated from a Scala Long
if the platform allows it (returning None
if it doesn't). We can even do a match on the platform to detect which one we're running on, and have code that handles the different platforms differently. Problem solved right?
Unfortunately no...
Why path dependent types fail us here
The above implementation fails hard on requirement 1: The path dependent types are defined as part of the platform, which is controlled by Slinc, which means users cannot add platform dependent types without submitting a pull request to the Slinc repository.
There are many more platform dependent types than long
in the C standard library, as well as in other C libraries. That means the number of platform dependent types we may need to define is unbounded. It also means that users need to be able to define these types.
This approach also fails requirement 5: Since in this example the types are defined as part of the Platform
type, and the Platform
type is sealed, that means that the file that contains the definition of all platforms grows multiplicatively with each platform dependent type definition. Likewise, all supporting methods, such as def clong
, must be defined in the same file. This makes for a file of unbounded size, and it means that if someone wants to define a new platform dependent type, they have to redefine Platform
itself.
Worse, lets imagine enabling basic math to be done on CLong
and other platform dependent integral types:
sealed trait Platform:
type CLong
given clongIntegral
: Integral[CLong]
def clong(i: Int): CLong
def clong(
l: Long
): Option[CLong]
case object LinuxX64
extends Platform:
type CLong = Long
given clongIntegral
: Integral[CLong] =
Numeric.LongIsIntegral
def clong(i: Int): CLong =
i.toLong
def clong(
i: Long
): Option[CLong] = Some(i)
def clongCertain(
i: Long
): CLong = i
case object WinX64
extends Platform:
type CLong = Int
given clongIntegral
: Integral[CLong] =
Numeric.IntIsIntegral
def clong(i: Int): CLong = i
def clong(
i: Long
): Option[CLong] = None
case object MacOSX64
extends Platform:
type CLong = Long
given clongIntegral
: Integral[CLong] =
Numeric.LongIsIntegral
def clong(i: Int): CLong =
i.toLong
def clong(
i: Long
): Option[CLong] = None
def clongCertain(
i: Long
): CLong = i
Every addition we make to this type adds to the complexity of a single file and said file would eventually grow to an unmanageable size.
In my next post, we'll further explore the problem and other approaches to defining CLong
in such a way as to hopefully overcome these issues.
The code for this article is available at this github repo under the "naive" and "path-dependent-types" folders.
Happy Scala hacking!