Tree-Walking Evaluation with Modules
The original simple evaluator could work given only a (main/only) module-object. Once module-qualified references enter the picture, it seems to need the complete set of loaded modules. Things get even weirder with specific imported names.
How to resolve a reference dynamically
The original (simplistic) way
The original evaluator used a chain-of-dictionaries to represent the dynamic environment. Every name-lookup was just a probe into this structure. That had two important consequences:
First, each time it applied a user-defined function,
it had to eagerly create class Closure(Procedure) objects for all possible sub-function
calls to guarantee that look-up would succeed, and in the right place,
for expressions within that scope. (This also took care of static-linkage.)
Second, it had to fill an outer dynamic environment with thunks for module-level globals. Even outside that, it filled another environment from built-in and preamble elements.
The chain-of-dictionaries worked, but it didn’t play well with the idea of inter-module references. At least, not by itself. Also, all that searching seems inefficient.
A bit smarter way
If a name refers to a (lazy) parameter or local sub-procedure, then the dynamic meaning of that name refers to a thunk bound to an enclosing activation record. The original interpreter used a hack: Assuming all scopes nest perfectly, it could construct (dynamic-style) thunks once for global names into an outermost dynamic environment. Then during evaluation, all names are simply look-ups into the current dynamic environment.
There is already a static definition associated with every reference:
Class WordDefiner creates the static-scope symbol table(s),
and class WordResolver associates definitions accordingly to each and every reference.
In principle, the evaluator could use a different strategy depending on
whether the name statically refers to a parameter, a sub-function, a global function,
a data constructor, or indeed even a native-function binding.
Although parameters, functions, and data-constructors are easily distinguished syntax objects, the situation within the realm of functions is a bit more complicated. Syntax alone does not distinguish global functions (which close over nothing) from nested functions (which close over the current dynamic scope) or siblings and uncles which close over some outer dynamic scope – findable only by traversing static links.
There is a straightforward partial solution to this smaller problem: Decorate each function-definition with its numeric nesting level, each parameter-definition with the level corresponding to (the inside of) its owning function, and finally each name-reference to the nesting level at which it actually appears. Then, to resolve a (non-global) name dynamically, simply walk back the indicated number of static-pointers to find the correct dynamic environment.
Note
Once it’s clear which activation record is the proper host for each name, there is no more need for search and so closures can be built only at need. This might mean a simpler (and maybe faster) evaluator.
The last bit of the puzzle is this: Inter-module references are all to global objects. Global objects do not need a static link. (Or rather, their thunks can have a null static link.) The evaluator does not specifically need to know which module a global lives in, so long as it finds globals directly by their definition link.
Dealing Well with Global References
The evaluator builds different kinds of run-time proxies for data constructors, user-defined functions, and native functions. These provide for a nice consistent internal API, so they’re still important even in a module-aware system. Thus, it still needs a notion of global scope.
- Do we store the proxies:
Attached as an attribute to global-object definitions? This certainly works for user-defined things, but might be iffy with native functions. It has the somewhat icky property of “monkey-patching” objects defined elsewhere, which seems like a terrible habit.
In a separate global dictionary? This is no friend to embedded interpreters running concurrently, but it’s fine for a stand-alone scenario.
In a global dictionary passed around with the local environment? This seems to add lots of overhead.
In an outermost static scope? This seems like a slower option.
- Do we build the proxies in advance or as needed?
As-needed adds an avoidable test for every call. In-advance means needing to know the complete set of modules up front.
- The decisions currently are:
Use a global dictionary, keyed for now to the corresponding definition-object.
Prepare in advance.
The function run_module take the list of loaded modules in topological order straight from a Loader object.
It then prepares and evaluates each module in succession, using the same global dictionary.
This works fine because the syntax objects from different modules are all distinct as hash-keys.