The concept of Lua is wonderful. Nearly all of the finer points of a first-rate language are provided by clear and compact grammar. Furthermore, it only takes an afternoon to create a crude interpreter with a huge switch case. However, if you want a JIT-style interpreter to perform well, you should use assembly. After a few months of work, [Haoran Xu] published a work-in-progress titled LuaJIT Remake in which he questioned whether he could achieve better performance without hand-rolled assembly.
Currently supporting Lua 5.1, LJR outperforms the official Lua engine by three times and LuaJIT, the second-fastest Lua, by about 28% on a selection of 34 benchmarks. [Haoran] gives a good description of interpreters that gives the issue a solid background and context.
But the long and short of it is that switch cases are expensive and hard to optimize for compilers, so using tail call is a reasonable solution that comes with some significant drawbacks. With tail calls, each case statement becomes a “function” that is jumped to and then jumped out of without mucking with the stack or the registers too much.
However, the calling convention requires any callee-saved registers to be preserved, which means you lose some registers as there is no way to tell the compiler that this function is allowed to break the calling convention. Clang is currently the only compiler that offers a guaranteed tail-call annotation ([[clang::musttail]
). There are other limitations too, for instance requiring the caller and callee to have identical function prototypes to prevent unbounded stack growth.
So [Haoran] went back to the drawing board and wrote two new tools: C++ bytecode semantical description and a special compiler called Deegan. The C++ bytecode looks like this:
void Add(TValue lhs, TValue rhs) {
if (!lhs.Is<tDouble>() || !rhs.Is<tDouble>()) {
ThrowError("Can't add!");
} else {
double res = lhs.As<tDouble>() + rhs.As<tDouble>();
Return(TValue::Create<tDouble>(res));
}
}
DEEGEN_DEFINE_BYTECODE(Add) {
Operands(
BytecodeSlotOrConstant("lhs"),
BytecodeSlotOrConstant("rhs")
);
Result(BytecodeValue);
Implementation(Add);
Variant(
Op("lhs").IsBytecodeSlot(),
Op("rhs").IsBytecodeSlot()
);
Variant(
Op("lhs").IsConstant(),
Op("rhs").IsBytecodeSlot()
);
Variant(
Op("lhs").IsBytecodeSlot(),
Op("rhs").IsConstant()
);
}
Note that this is not the C keyword return. Instead, there is a definition of the bytecode and then an implementation. This bytecode is converted into LLVM IR and then fed into Deegan, which can transform the functions to do tail calls correctly, use the GHC calling conventions, and a few other optimizations like inline caching through a clever C++ lambda mechanism. The blog post is exceptionally well-written and offers a fantastic glimpse into the wild world of interpreters.
The code is on Github. But if you’re interested in a more whimsical interpreter, here’s a Brainf**k interpreter written in Befunge.
Subscribe to Google News.