Actually, let's make a table of contents:

- Neural Nets in Rust: Starts with this post.
- Nobilis 3e Character Creation: Starts with this post.

Hey all! I enjoyed putting together the Let's Play over in the videogames forum, but I ended up getting a little frustrated with how much I had to rely on others for e.g. bug fixes. Ideally, I'd have the person responsible for fixing bugs in easy smacking distance.

At the same time as I do other things, I also work on various hobby coding projects. One that I recently started focusing on again lately is an attempt to roll my own neural net library, because the stuff I've looked at confuses me, so maybe I'll get some insight if I make one myself.

*The story so far...*

Several months ago, I read up on basic machine learning stuff, and put together a primitive autoencoder in Python. The design had some interesting ideas, which I'll try to carry forward, and some dodgy implementation details, which I'll try not to.

Crash course on the design: I ended up foundering in my attempts to understand tensor concepts. In order to have some concept of what nets are doing, I started translating the vector behaviors in the writeups into very localized descriptions of what happens to single units in the network, then worked back up from there.

The architecture that I ended up going for was a symbolic DAG (directed acyclic graph) of simple mathematical operations. To explain what this was like, consider an algebraic expression, like (5x + 3y)(56x-1)^5. That expression can be thought of as a tree structure:

Code: Select all

` *`

/ \

+ ^

/ \ / \

* * - 5

/ \ / \ / \

5 x 3 y * 1

/ \

56 x

Now, there are a couple of issues here. One is that there could be a lot of repeated structure. If you're evaluating this, you'd like to reuse as much information as possible. Those efforts could, however, be hampered, if you can't actually tell whether two nodes are equal.

My solution to the latter problem was to develop a canonical form for the nodes. Equivalent node values will draw their arguments into a common form, making it trivial to confirm that two nodes are identical. This ability then feeds into solving the first problem: instead of a tree, the nodes are in a DAG, where any particular expression is only instantiated once. I'll get into how all this works in detail after I've posted the code.

First up, the concepts the library can express: addition, multiplication, hyperbolic tangent, inputs to the network, trainable values, and named constants. Each of these can be differentiated with respect to a trainable value, and the result expressed simply, solely in terms of the concepts that are already defined. Addition and multiplication work as in calculus, trainable values are 1 with respect to themselves, and 0 with respect to everything else, and similarly for the inputs and constants. The derivative of tanh(x) with respect to x is just 1-tanh^2(x), so that's no big deal either.

With all that said, here's the code I'm working with now: http://pastebin.com/B5tx7ekx. This is actually a rewrite of the first, more confusing iteration that tried to do performance hacks in Pypy. Now, you'll notice that this code is in Python, not Rust. So what gives? Well, this code is relatively strictly typed, and I'd like to see how it does, given the ability to take advantage of that, and also to get some hands-on experience with Rust. (You'll also notice that there's no code to actually evaluate a DAG on an input. That got stripped out in the rewrite, for the time being.)

For now, a quick tour of the code:

Code: Select all

`class Environment:`

This class serves the purpose of explicitly enforcing the node deduplication. It provides the intended constructors for node instances, and functions for deriving new nodes from them. I may look into creating a container class that bundles an environment with its Node, thereby allowing me to get rid of all those explicit passes of the environment. Prior to the rewrite, the methods lived at the module level, and there was a single implicit environment that I provided functions to manipulate for test purposes. Gross.

The methods contain some asserts. Some of those are useful, and some are from chasing down some absolutely shameful bugs introduced in the rewrite.

Code: Select all

` def add(self, left, right):`

...

def multiply(self, left, right):

...

def _combine_terms(self, left, right):

These heavyweight functions work by converting their arguments into Sum nodes, and doing basic operations to them, not too fancy. If you squint at add(), you can see how it leverages the canonical form: because the non-constant terms are sorted, it suffices to perform a modified merge of the lists, and this creates the correct, canonical merged list. I'll be looking for ways to cut down the verbiage in these functions, because it's a little over-the-top.

Code: Select all

`def cmp(left, right):`

This code was originally Python 2, and it's now Python 3. I tossed this in as a quick hack. It should be unnecessary in Rust.

Code: Select all

`class State(collections.namedtuple('State', ['input variables parameters'])):`

This is a remnant of the evaluation code. It represents an initial state of the network. I don't know what it'll end up looking like, but I assume somewhat close to this version. Rust won't need the weird method it has.

Code: Select all

`class Node(tuple):`

The base class for the read-only nodes in the DAG. Every node must offer a way to coerce to Sum, and to produce its partial derivative with respect to a trainable variable.

Then, a bunch of instantiations of the abstract class. In Rust, this is going to be accomplished with several traits and enums.

In any case, this is all we need to describe the

*structure*of a neural network, including determining the gradient of a loss function, in order to perform gradient descent. Actually evaluating it is another matter, and I'd like to translate this code to Rust first, before trying to re-implement that.

So, where I stand now: I've got a bunch of Python code that passes the tests (not shown), and, aside from the complete lack of documentation and comments, I think looks pretty good. Next week I'm going to talk about trying to port it to Rust, and write about that. That's all for now. See you around!