Here we’ll continue explaining how Crystal assigns types to each variable and expression of your program. This post is a bit long, but in the end it’s just about making Crystal behave in the most intuitive way for the programmer, to make it behave as similar as possible to Ruby.
We’ll start with literals, C functions and some primitives. Then we’ll continue with flow
control structures, like if, while, and blocks. Then we’ll talk about the special
NoReturn type and type filters.
Literals have a type of their own, known by the compiler:
true # Boolean
1 # Int32
"hello" # String
1.5 # Float64When you define a C function you must tell the compiler its types:
lib C
fun sleep(seconds : UInt32) : UInt32
end
sleep(1_u32) # sleep has type UInt32The allocate primitive gives you an uninitialized instance of an object:
class Foo
def initialize(@x)
end
end
Foo.allocate # Foo.allocate has type FooYou don’t normally invoke it directly. Instead, you invoke new, which
is automatically generated by the compiler to something like this:
class Foo
def self.new(x)
foo = allocate
foo.initialize(x)
foo
end
def initialize(@x)
end
end
Foo.new(1)A similar primitive is Pointer#malloc, which gives you a typed pointer to a
memory region:
Pointer(Int32).malloc(10) # has type Pointer(Int32)Next, when you assign an expression to a variable, the variable will be bound to that expression’s type (if the expresision’s type changes, so the variable’s type changes).
a = 1 # 1 is Int32, so a is Int32The compiler tries to be as smart as possible when you use variables. For example, you can assign multiple times to a variable:
a = 1 # a is Int32
a.abs # ok, Int32 has a method 'abs'
a = "hello" # a is now String
a.size # ok, String has a method 'size'To achieve this, the compiler remembers which expression was the last one assigned
to a variable. In the above example, after the first line the compiler knows that
a has type Int32, so a call to abs is valid. In the third line we assign
a String to it, so the compiler remembers this and, on the fourth line, it’s perfectly
valid to invoke size on it.
Additionally, the compiler remembers that both an Int32 and a String were
assigned to a. When generating LLVM code, the compiler will represent a
as a union type that can be Int32 or String. It would be something like this in C:
struct Int32OrString {
int type_id;
union {
int int_value;
string string_value;
} data;
}
This might seem inefficient if we continually assign different types to the same variable.
However, the compiler knows that when you invoked abs, a was an Int32, so it never
checks the type_id field: it directly uses the int_value field. LLVM notices this
and optimizes this out, so in the generated code there will never be a union (the type_id
field is never read).
Going back to Ruby, if you assign a variable multiple times in a row, the last value (and type) is the one that counts for subsequent calls. Crystal mimics this behaviour. A variable then just becomes a name for the last expression that we assigned to.
Let’s take a piece of Ruby code and analyze it:
if some_condition
a = 1
a.abs
else
a = "hello"
a.size
end
a.sizeIn Ruby, the only line that can fail at runtime is the last one. The first call to abs
will never fail, as an Int32 was assigned to a. The first call to size will
also never fail, as a String was assigned to a. However, after the if, a can
either be an Int32 or a String.
So Crystal tries to keep this intuitive reasoning about a’s type. When a variable is assigned
inside an if’s then or else branch, the compiler knows that it will continue to have that type until the if
ends or until it is assigned a new expression. When an if ends, the compiler will let a
have the type of the last expressions that it was assigned to in each branch.
The last line in Crystal will give a compiler error: “undefined method ‘size’ for Int32”.
That’s because even though String has a size method, Int32 doesn’t.
In designing the language we had two choices: make the above a compile-time error (like now) or just make it a runtime error (like in Ruby). We believe it’s better to make it a compile-time error. In some cases you might know better than the compiler and you will be sure that a variable has the type that you might think. But in some cases the compiler will let you know that you overlooked a case or some logic, and you’ll thank it for that.
The if has some more cases to take into account. For example, the variable a might not exist
before the if. In this case, if it’s not assigned in one of the branches, at the end of the
if it will also contain the Nil type if it’s read:
if some_condition
a = 1
end
a # here a is Int32 or NilThis, again, mimics Ruby’s behaviour.
Finally, an if’s type is the union of the last expressions in both branches. If a branch
is missing, it’s considered to have a Nil type.
A while is in a way similar to an if:
a = 1
while some_condition
a = "hello"
end
a # here a is Int32 or StringThat’s because some_condition might be falsy the first time.
However, since a while is a loop there are some more things to consider. For example,
the last expression assigned to a variable inside a while determines the type of that
variable in the next iteration. In this way, the type at the beginning of the loop
will be a union of the type before the loop and the type after the loop:
a = 1
while some_condition
a # here a is actually Int32 or String
a = false # here a is Bool
a = "hello" # here a is String
a.size # ok, a is String
end
a # here a is Int32 or StringSome other things to consider inside a while are break and next. A break
makes the types right before the break add to the type at the exit of the while:
a = 1
while some_condition
a # here a is Int32 or Bool
if some_other_condition
a = "hello" # we break, so at the exit a can also be String
break
end
a = false # here a is Bool
end
a # here a is Int32 or String or BoolA next adds the type to the beginning of the while:
a = 1
while some_condition
a # here a is Int32 or String or Bool
if some_other_condition
a = "hello" # we next, so in the next iteration a can be String
next
end
a = false # here a is Bool
end
a # here a is Int32 or String or BoolBlocks are very similar to a while: they can be executed zero or more times. So
the logic for variables’ types is very similar to that of a while.
There’s a misterious type in Crystal called NoReturn. One such example is
C’s exit function:
lib C
fun exit(status : Int32) : NoReturn
end
C.exit(1) # this is NoReturn
puts "hello" # this will never be executedAnother very useful method that is NoReturn is raise: raising an exception.
The type basically means: after this point there’s nothing else. Nothing gets returned,
and nothing that comes afterwars is executed (of course, a rescue will be executed
if there’s one surrounding the code, but the normal path won’t be executed).
The compiler knows about NoReturn. For example, take a look at the following code:
a = some_int
if a == 1
a = "hello"
puts a.size # ok
raise "Boom!"
else
a = 2
end
a # here a can only be Int32Remember that after an if a variable’s type is the union of the types of
both branches. However, since the first branch ends there, because raise is NoReturn,
the compiler knows that code after the if, if that branch is taken, will never be
executed. So it can definitely say: a will only have the type of the else
branch.
The same logic applies when you have return, break or next inside an if.
Also, when you define a method whose type is NoReturn, that method is in turn NoReturn:
def raise_boom
raise "Boom!"
end
if some_condition
a = 1
else
raise_boom
end
a.abs # okRemember that an if’s type is the union of the last expressions of the if’s branches.
What type has the following if (and consequently the a variable)?
a = if some_condition
raise "Boom!"
else
1
end
a # a is...?Well, the then branch is definitely NoReturn. The else branch is definitely
Int32. We could conclude then that a has type NoReturn or Int32.
However, NoReturn means that nothing gets executed afterwards. So a can only
be Int32 at the end of the previous snippet, and that’s how the compiler behaves.
With this we can implement a little method called not_nil!. Here it is:
class Object
def not_nil!
self
end
end
class Nil
def not_nil!
raise "Nil assertion failed"
end
end
a = some_condition ? 1 : nil
a.not_nil!.abs # compiles!a’s type is Int32 or Nil. One thing that we didn’t say yet is that when you
have a union type and you invoke a method on it, and all types respond to that method, the resulting
type is the union of the types of each method.
In this case, a.not_nil! will have the type Int32 if a is Int32, or
NoReturn if it’s Nil (because of the raise). Combining these types just
gives Int32, so the above code is perfectly valid. And that’s how you can discard Nil
from a variable and turn it into a runtime exception if it turns out to be nil. No special language
construct is needed. All is made with the logic explained so far.
Now, what if we want to execute a method on a varaible whose type is Int32 or Nil,
but only if that variable is Int32. If it’s Nil, we don’t want to do anything.
We can’t use not_nil!, because that will raise a runtime exception when nil.
We can define another method, try:
class Object
def try
yield self
end
end
class Nil
def try(&block)
nil
end
end
a = some_condition ? 1 : nil
b = a.try &.abs # b is Int32 or Nil(if you are not sure what &.abs means, read this)
Since doing something depending on whether a value is Nil or not is so common, Crystal
provides another way to do the above. This was shortly explained
here, but now we’ll explain it
better and combine it with the previous explanations.
If a variable is an if’s condition, the compiler assumes the variable is not nil
inside the then branch:
a = some_condition ? 1 : nil # a is Int32 or Nil
if a
a.abs # a is Int32
endThis makes sense: if a is truthy then it means it is not nil. Not only this,
but the compiler also makes a’s type be that one after the if, combined with
whatever type a has in the else branch. For example:
a = some_condition ? 1 : nil
if a
a.abs # ok, here a is Int32
else
a = 1 # here a is Int32
end
a.abs # ok, a can only be Int32 hereJust like a programmer expects the above to always work in Ruby (never raise an “undefined method” error in runtime), so it works in Crystal.
We call the above a “type filter”: a’s type got filtered inside the if’s
then branch by removing Nil from the possible types a can have.
Another type filter happens when you do is_a?:
a = some_condition ? 1 : nil
if a.is_a?(Int32)
a.abs # ok
endAnd another type filter happens when you do responds_to?:
a = some_condition ? 1 : nil
if a.responds_to?(:abs)
a.abs # ok
endThese are special methods, known by the compiler, and that’s why the compiler is
able to filter the types. On the contrary, the method nil? is not special right now
so the following won’t work:
a = some_condition ? 1 : nil
if a.nil?
else
a.abs # should be ok, but now gives error
endWe’ll probably make nil? a special method too, so it’s more consistent with the
rest of the language and the above works. We’ll also probably make the unary ! method
special, not overloadable, so you could do:
a = some_condition ? 1 : nil
if !a
else
a.abs # should be ok, but now gives error
endIn conclusion, as was said in the beginning of this post, we want Crystal to behave as much as possible as Ruby, and if something is intuitive and makes sense for the programmer to make the compiler understand it too. For example:
def foo(x)
return unless x
x.abs # ok
end
a = some_condition ? 1 : nil
b = foo(a)The above shouldn’t give you a compile time error. The programmer knows that if x
was nil inside foo, the method returns. It follows that x can never be
nil afterwards so it’s ok to invoke abs on it. How does the compiler know this?
Well, first, the compiler rewrites an unless to an if:
def foo(x)
if x
else
return
end
x.abs # ok
endNext, inside the then branch of the if we know that x is not nil.
Inside the else branch the method returns, so we don’t care about the type of x
afterwards. So, after the if, x can only be of type Int32. This is idiomatic
code in Ruby, and so it is in Crystal if we carefully follow the language rules.
We still have to talk about methods and instance variables, but this post is already long enough so that will have to be explained in a following post. Stay tuned!
comments powered by Disqus