It's a typeof magic

24 Aug 2015

Ary Borenzweig

The story of typeof

The story of the typeof expression begins with array literals. In Crystal you can write

array = [1, 2, 3]

and the compiler will infer that the array is an Array(Int32), meaning it can only contain 32 bits integers. And you can also write:

array = [1, 'a', true]

and the compiler will infer that it’s an Array(Int32 | Char | Bool), where Int32 | Char | Bool means the union of those types: the array can hold any of those type at any point during the program’s execution.

Literals in the language, like array, hash and regular expression (regex) literals, are simple syntax rewrites to regular standard library calls. In the case of a regex, this:

/fo(o+)/

is rewritten to:

Regex.new("fo(o+)")

The rewrite of array literals needs a bit more thought. Arrays are generic, meaning that they are parameterized with a type T that specifies what type they can hold, like the Array(Int32) and Array(Int32 | Char | Bool) mentioned earlier. The non-literal way to create one is:

Array(Int32 | Char | Bool).new

In the case of an array literal we need the type to be the union type of all the elements in the array literal. And so, typeof was born. In the beginning this was called type merge and it was a compiler internal thing that you couldn’t express (there was no syntax for it), but the compiler used it for these literals. An example rewrite:

array = [1, 'a', true]

# Rewritten to this, where <type_merge>(exp1, exp2, ...) computes
# the union type of the expressions:
Array(<type_merge>(1, 'a', true)).build(3) do |buffer|
  buffer[0] = 1
  buffer[1] = 'a'
  buffer[2] = true
  3
end

Now this literal is invoking a regular method to build an array. The catch is that you couldn’t write this: <type_merge> is only the representation of this internal node that allows you to compute a type, but if you wrote the above you would get a syntax error.

We later decided that because this <type_merge> node worked pretty well, and we wanted literals to have no magic, to let users use this <type_merge> node, and named it typeof, because this name is pretty familiar in other languages. Now writing this:

array = [1, 'a', true]

and this:

Array(typeof(1, 'a', true)).build(3) do |buffer|
  buffer[0] = 1
  buffer[1] = 'a'
  buffer[2] = true
  3
end

are exactly equivalent: there’s no magic (but of course the first syntax is much easier to write and read).

Little did we know that typeof would bring a lot of power to the language.

Simple uses of typeof

One obvious use-case of typeof is to ask the compiler the inferred type of an expression. For example:

puts typeof(1) #=> Int32
puts typeof([1, 2, 3].map &.to_s) #=> Array(String)

At this point you might think that typeof(exp) is similar to exp.class. However, the first gives you the compile-time type, while the second gives you the runtime type:

exp = rand(0..1) == 0 ? 'a' : true
puts typeof(exp) #=> Char | Bool
puts exp.class   #=> Char (or Bool, depending on the chosen random value)

Another simple use case is to create a type based on another object’s type:

hash = {1 => 'a', 2 => 'b'}
other_hash = typeof(hash).new #:: Hash(Int32, Char)

In this way we can avoid repeating or hardcoding a type name.

But these are too simple to be interesting.

Advanced uses of typeof

Let’s write the Array#compact method. This method returns an Array where nil instances are removed. Of course, if we start with an Array(Int32 | Nil), that is, an array of integers and nils, we want to end with an Array(Int32).

The type grammar allows creating unions. For example Int32 | Char creates a union of Int32 and Char. However, there’s no way to subtract types. There’s no T - Nil syntax. But, using typeof, we can still write this method.

First, we define a method whose type will be the one we want:

def not_nil(exp)
  if exp.is_a?(Nil)
    raise "oops, nil"
  else
    exp
  end
end

If exp is Nil we raise an exception, otherwise we return exp. Let’s check its type:

puts typeof(not_nil(1))   #=> Int32
puts typeof(not_nil(nil)) #=> NoReturn

Thanks to the way if var.is_a?(…) works, when we give it something that’s not nil it tells us that the type is that same type. But when we give it nil, the only branch in the if that can be executed is the raise one. Now, raise has this NoReturn type, which basically means there’s no value returned by that expression… because it raises an exception! Another expression that has NoReturn is, for example, exit.

Let’s try and give not_nil something that’s a union type:

element = rand(0..1) == 0 ? 1 : nil
puts typeof(element)          #=> Int32 | Nil
puts typeof(not_nil(element)) #=> Int32

Note that the NoReturn type is gone: the “expected” type of the last expression would be Int32 | NoReturn, that is, the union of the possible types of the method. However, NoReturn doesn’t have a tangible value, so mixing NoReturn with any type T basically gives you T back. Because, if the not_nil method succeeds (that is, it doesn’t raise), you will get an integer back, otherwise an exception will be bubbled through the stack.

Now we are ready to implement the compact method:

class Array
  def compact
    result = Array(typeof(not_nil(self[0]))).new
    each do |element|
      result << element unless element.is_a?(Nil)
    end
    result
  end
end

ary = [1, nil, 2, nil, 3]
puts typeof(ary)       #=> Array(Int32 | Nil)

compacted = ary.compact
puts compacted         #=> [1, 2, 3]
puts typeof(compacted) #=> Array(Int32)

The magical line is the first one in the method:

Array(typeof(not_nil(self[0]))).new

We create an array whose type is the type that results of invoking not_nil on the first element of the array. Note that the compiler doesn’t know what types are in each position in an array, so using 0, 1 or 123 would be the same.

In this way we were able to forge a type that excludes Nil without needing to extend the type grammar: the compiler’s machinery for the type inference algorithm is all we needed.

But this is still simple. Let’s move on to something really interesting and fun.

typeof sorcery

Our next task is to implement Array#flatten. This method returns an Array that is a one-dimensional flattening of the original array (recursively). That is, for every element that is an array, extract its elements into this new array.

Note that this has to work recursively. Let’s see some expected behaviour:

ary1 = [1, [2, [3], 'a']]
puts typeof(ary1)             #=> Array(Int32 | Array(Int32 | Array(Int32) | Char))

ary1_flattened = ary1.flatten
puts ary1_flattened           #=> [1, 2, 3, 'a']
puts typeof(ary1_flattened)   #=> Array(Int32 | Char)

Like before, let’s start by writing a method whose type will have the type that we need for the flattened array:

def flatten_type(object)
  if object.is_a?(Array)
    flatten_type(object[0])
  else
    object
  end
end

puts typeof(flatten_type(1))                          #=> Int32
puts typeof(flatten_type([1, [2]]))                   #=> Int32
puts typeof(flatten_type([1, [2, ['a', 'b']]]))       #=> Int32 | Char

The method is simple: if the object is an array, we want the flatten type of any of its elements. Otherwise, the type is that of the object.

And with this, we are ready to implement flatten:

class Array
  def flatten
    result = Array(typeof(flatten_type(self))).new
    append_flattened(self, result)
    result
  end

  private def append_flattened(object, result)
    if object.is_a?(Array)
      object.each do |sub_object|
        append_flattened(sub_object, result)
      end
    else
      result << object
    end
  end
end

In this second example we were able to forge a type that is an array flattening.

Conclusion

In the end, there’s nothing really magical about typeof. It just lets you query and use the compiler’s ability to infer the type of an expression really well.

In the previous examples we were able to forge types from other types with regular stuff: types and methods. There’s nothing new to learn, there’s no special syntax for talking about types. And this is good, because it’s simple, but powerful.