Here we’ll continue explaining how Crystal assigns types to each variable and expression of your program. This post is a bit long, but in the end it’s just about making Crystal behave in the most intuitive way for the programmer, to make it behave as similar as possible to Ruby.
We’ll start with literals, C functions and some primitives. Then we’ll continue with flow
control structures, like if
, while
, and blocks. Then we’ll talk about the special
NoReturn
type and type filters.
Literals have a type of their own, known by the compiler:
When you define a C function you must tell the compiler its types:
The allocate primitive gives you an uninitialized instance of an object:
You don’t normally invoke it directly. Instead, you invoke new
, which
is automatically generated by the compiler to something like this:
A similar primitive is Pointer#malloc
, which gives you a typed pointer to a
memory region:
Next, when you assign an expression to a variable, the variable will be bound to that expression’s type (if the expresision’s type changes, so the variable’s type changes).
The compiler tries to be as smart as possible when you use variables. For example, you can assign multiple times to a variable:
To achieve this, the compiler remembers which expression was the last one assigned
to a variable. In the above example, after the first line the compiler knows that
a
has type Int32
, so a call to abs
is valid. In the third line we assign
a String
to it, so the compiler remembers this and, on the fourth line, it’s perfectly
valid to invoke size
on it.
Additionally, the compiler remembers that both an Int32
and a String
were
assigned to a
. When generating LLVM code, the compiler will represent a
as a union type that can be Int32 or String. It would be something like this in C:
struct Int32OrString { int type_id; union { int int_value; string string_value; } data; }
This might seem inefficient if we continually assign different types to the same variable.
However, the compiler knows that when you invoked abs
, a
was an Int32, so it never
checks the type_id
field: it directly uses the int_value
field. LLVM notices this
and optimizes this out, so in the generated code there will never be a union (the type_id
field is never read).
Going back to Ruby, if you assign a variable multiple times in a row, the last value (and type) is the one that counts for subsequent calls. Crystal mimics this behaviour. A variable then just becomes a name for the last expression that we assigned to.
Let’s take a piece of Ruby code and analyze it:
In Ruby, the only line that can fail at runtime is the last one. The first call to abs
will never fail, as an Int32
was assigned to a
. The first call to size
will
also never fail, as a String
was assigned to a
. However, after the if
, a
can
either be an Int32
or a String
.
So Crystal tries to keep this intuitive reasoning about a
’s type. When a variable is assigned
inside an if
’s then or else branch, the compiler knows that it will continue to have that type until the if
ends or until it is assigned a new expression. When an if
ends, the compiler will let a
have the type of the last expressions that it was assigned to in each branch.
The last line in Crystal will give a compiler error: “undefined method ‘size’ for Int32”.
That’s because even though String
has a size
method, Int32
doesn’t.
In designing the language we had two choices: make the above a compile-time error (like now) or just make it a runtime error (like in Ruby). We believe it’s better to make it a compile-time error. In some cases you might know better than the compiler and you will be sure that a variable has the type that you might think. But in some cases the compiler will let you know that you overlooked a case or some logic, and you’ll thank it for that.
The if
has some more cases to take into account. For example, the variable a
might not exist
before the if
. In this case, if it’s not assigned in one of the branches, at the end of the
if
it will also contain the Nil type if it’s read:
This, again, mimics Ruby’s behaviour.
Finally, an if
’s type is the union of the last expressions in both branches. If a branch
is missing, it’s considered to have a Nil
type.
A while
is in a way similar to an if
:
That’s because some_condition
might be falsy the first time.
However, since a while
is a loop there are some more things to consider. For example,
the last expression assigned to a variable inside a while
determines the type of that
variable in the next iteration. In this way, the type at the beginning of the loop
will be a union of the type before the loop and the type after the loop:
Some other things to consider inside a while
are break
and next
. A break
makes the types right before the break add to the type at the exit of the while
:
A next
adds the type to the beginning of the while
:
Blocks are very similar to a while
: they can be executed zero or more times. So
the logic for variables’ types is very similar to that of a while
.
There’s a misterious type in Crystal called NoReturn
. One such example is
C’s exit
function:
Another very useful method that is NoReturn is raise
: raising an exception.
The type basically means: after this point there’s nothing else. Nothing gets returned,
and nothing that comes afterwars is executed (of course, a rescue
will be executed
if there’s one surrounding the code, but the normal path won’t be executed).
The compiler knows about NoReturn
. For example, take a look at the following code:
Remember that after an if
a variable’s type is the union of the types of
both branches. However, since the first branch ends there, because raise
is NoReturn
,
the compiler knows that code after the if
, if that branch is taken, will never be
executed. So it can definitely say: a
will only have the type of the else
branch.
The same logic applies when you have return
, break
or next
inside an if
.
Also, when you define a method whose type is NoReturn
, that method is in turn NoReturn
:
Remember that an if
’s type is the union of the last expressions of the if
’s branches.
What type has the following if
(and consequently the a
variable)?
Well, the then
branch is definitely NoReturn
. The else
branch is definitely
Int32
. We could conclude then that a
has type NoReturn
or Int32
.
However, NoReturn
means that nothing gets executed afterwards. So a
can only
be Int32
at the end of the previous snippet, and that’s how the compiler behaves.
With this we can implement a little method called not_nil!
. Here it is:
a
’s type is Int32
or Nil
. One thing that we didn’t say yet is that when you
have a union type and you invoke a method on it, and all types respond to that method, the resulting
type is the union of the types of each method.
In this case, a.not_nil!
will have the type Int32
if a
is Int32
, or
NoReturn
if it’s Nil
(because of the raise
). Combining these types just
gives Int32
, so the above code is perfectly valid. And that’s how you can discard Nil
from a variable and turn it into a runtime exception if it turns out to be nil
. No special language
construct is needed. All is made with the logic explained so far.
Now, what if we want to execute a method on a varaible whose type is Int32
or Nil
,
but only if that variable is Int32
. If it’s Nil
, we don’t want to do anything.
We can’t use not_nil!
, because that will raise a runtime exception when nil.
We can define another method, try
:
(if you are not sure what &.abs
means, read this)
Since doing something depending on whether a value is Nil
or not is so common, Crystal
provides another way to do the above. This was shortly explained
here, but now we’ll explain it
better and combine it with the previous explanations.
If a variable is an if
’s condition, the compiler assumes the variable is not nil
inside the then
branch:
This makes sense: if a
is truthy then it means it is not nil
. Not only this,
but the compiler also makes a
’s type be that one after the if
, combined with
whatever type a
has in the else
branch. For example:
Just like a programmer expects the above to always work in Ruby (never raise an “undefined method” error in runtime), so it works in Crystal.
We call the above a “type filter”: a
’s type got filtered inside the if
’s
then
branch by removing Nil
from the possible types a
can have.
Another type filter happens when you do is_a?
:
And another type filter happens when you do responds_to?
:
These are special methods, known by the compiler, and that’s why the compiler is
able to filter the types. On the contrary, the method nil?
is not special right now
so the following won’t work:
We’ll probably make nil?
a special method too, so it’s more consistent with the
rest of the language and the above works. We’ll also probably make the unary !
method
special, not overloadable, so you could do:
In conclusion, as was said in the beginning of this post, we want Crystal to behave as much as possible as Ruby, and if something is intuitive and makes sense for the programmer to make the compiler understand it too. For example:
The above shouldn’t give you a compile time error. The programmer knows that if x
was nil
inside foo
, the method returns. It follows that x
can never be
nil
afterwards so it’s ok to invoke abs
on it. How does the compiler know this?
Well, first, the compiler rewrites an unless
to an if
:
Next, inside the then
branch of the if
we know that x
is not nil
.
Inside the else
branch the method returns, so we don’t care about the type of x
afterwards. So, after the if
, x
can only be of type Int32
. This is idiomatic
code in Ruby, and so it is in Crystal if we carefully follow the language rules.
We still have to talk about methods and instance variables, but this post is already long enough so that will have to be explained in a following post. Stay tuned!
comments powered by Disqus