Let's Code: Nobilis 3e Character Creation

Necromance old blog posts, talk about Shamus' books or videos, or discuss allied projects like Errant Signal
Forum rules
This forum is not for swiping blog threads. Avoid talking about blog posts less than a month old.
User avatar
mwchase
Contact:

Let's Code: Nobilis 3e Character Creation

Postby mwchase » Sun Mar 05, 2017 2:53 am

EDIT: Was "Let's Code: Neural Nets (in Rust?)"

Actually, let's make a table of contents:
  • Neural Nets in Rust: Starts with this post.
  • Nobilis 3e Character Creation: Starts with this post.

Hey all! I enjoyed putting together the Let's Play over in the videogames forum, but I ended up getting a little frustrated with how much I had to rely on others for e.g. bug fixes. Ideally, I'd have the person responsible for fixing bugs in easy smacking distance.

At the same time as I do other things, I also work on various hobby coding projects. One that I recently started focusing on again lately is an attempt to roll my own neural net library, because the stuff I've looked at confuses me, so maybe I'll get some insight if I make one myself.

The story so far...

Several months ago, I read up on basic machine learning stuff, and put together a primitive autoencoder in Python. The design had some interesting ideas, which I'll try to carry forward, and some dodgy implementation details, which I'll try not to.

Crash course on the design: I ended up foundering in my attempts to understand tensor concepts. In order to have some concept of what nets are doing, I started translating the vector behaviors in the writeups into very localized descriptions of what happens to single units in the network, then worked back up from there.

The architecture that I ended up going for was a symbolic DAG (directed acyclic graph) of simple mathematical operations. To explain what this was like, consider an algebraic expression, like (5x + 3y)(56x-1)^5. That expression can be thought of as a tree structure:

Code: Select all

           *
      /         \
     +           ^
   /   \       /   \
  *     *     -     5
 / \   / \   / \
5   x 3   y *   1
           / \
         56   x

Now, there are a couple of issues here. One is that there could be a lot of repeated structure. If you're evaluating this, you'd like to reuse as much information as possible. Those efforts could, however, be hampered, if you can't actually tell whether two nodes are equal.

My solution to the latter problem was to develop a canonical form for the nodes. Equivalent node values will draw their arguments into a common form, making it trivial to confirm that two nodes are identical. This ability then feeds into solving the first problem: instead of a tree, the nodes are in a DAG, where any particular expression is only instantiated once. I'll get into how all this works in detail after I've posted the code.

First up, the concepts the library can express: addition, multiplication, hyperbolic tangent, inputs to the network, trainable values, and named constants. Each of these can be differentiated with respect to a trainable value, and the result expressed simply, solely in terms of the concepts that are already defined. Addition and multiplication work as in calculus, trainable values are 1 with respect to themselves, and 0 with respect to everything else, and similarly for the inputs and constants. The derivative of tanh(x) with respect to x is just 1-tanh^2(x), so that's no big deal either.

With all that said, here's the code I'm working with now: http://pastebin.com/B5tx7ekx. This is actually a rewrite of the first, more confusing iteration that tried to do performance hacks in Pypy. Now, you'll notice that this code is in Python, not Rust. So what gives? Well, this code is relatively strictly typed, and I'd like to see how it does, given the ability to take advantage of that, and also to get some hands-on experience with Rust. (You'll also notice that there's no code to actually evaluate a DAG on an input. That got stripped out in the rewrite, for the time being.)

For now, a quick tour of the code:

Code: Select all

class Environment:

This class serves the purpose of explicitly enforcing the node deduplication. It provides the intended constructors for node instances, and functions for deriving new nodes from them. I may look into creating a container class that bundles an environment with its Node, thereby allowing me to get rid of all those explicit passes of the environment. Prior to the rewrite, the methods lived at the module level, and there was a single implicit environment that I provided functions to manipulate for test purposes. Gross.

The methods contain some asserts. Some of those are useful, and some are from chasing down some absolutely shameful bugs introduced in the rewrite.

Code: Select all

    def add(self, left, right):
...
    def multiply(self, left, right):
...
    def _combine_terms(self, left, right):

These heavyweight functions work by converting their arguments into Sum nodes, and doing basic operations to them, not too fancy. If you squint at add(), you can see how it leverages the canonical form: because the non-constant terms are sorted, it suffices to perform a modified merge of the lists, and this creates the correct, canonical merged list. I'll be looking for ways to cut down the verbiage in these functions, because it's a little over-the-top.

Code: Select all

def cmp(left, right):

This code was originally Python 2, and it's now Python 3. I tossed this in as a quick hack. It should be unnecessary in Rust.

Code: Select all

class State(collections.namedtuple('State', ['input variables parameters'])):

This is a remnant of the evaluation code. It represents an initial state of the network. I don't know what it'll end up looking like, but I assume somewhat close to this version. Rust won't need the weird method it has.

Code: Select all

class Node(tuple):

The base class for the read-only nodes in the DAG. Every node must offer a way to coerce to Sum, and to produce its partial derivative with respect to a trainable variable.

Then, a bunch of instantiations of the abstract class. In Rust, this is going to be accomplished with several traits and enums.

In any case, this is all we need to describe the structure of a neural network, including determining the gradient of a loss function, in order to perform gradient descent. Actually evaluating it is another matter, and I'd like to translate this code to Rust first, before trying to re-implement that.

So, where I stand now: I've got a bunch of Python code that passes the tests (not shown), and, aside from the complete lack of documentation and comments, I think looks pretty good. Next week I'm going to talk about trying to port it to Rust, and write about that. That's all for now. See you around!
Last edited by mwchase on Sun Apr 16, 2017 6:37 pm, edited 2 times in total.
User avatar
mwchase
Contact:

Re: Let's Code: Neural Nets (in Rust?)

Postby mwchase » Sun Mar 12, 2017 7:35 pm

Week 2: Well, I haven't reached feature parity, but I'm learning Rust. Not much theory to go into, on account of I haven't reached feature parity.

Current code: http://pastebin.com/pNCur6tf (their syntax highlighting is broken)

So, let's see what I accomplished in the last week!

Code: Select all

extern crate num;
 
use std::cmp::Ordering;
use std::collections::HashMap;
use std::collections::hash_map::{DefaultHasher, Entry};
use std::hash::{Hash, Hasher};
use std::sync::Arc;
 
use num::{Integer, One, Signed, Zero};
Getting a bunch of types and traits into scope.

Code: Select all

trait NodeData {
    type Constant: Hash + Eq + Ord + Clone + Signed;
    type Coefficient: Hash + Eq + Ord + Clone + Signed;
    type Exponent: Hash + Clone + Integer;
    type Input: Hash + Eq + Clone;
    type SystemVariable: Hash + Eq + Clone;
    type Parameter: Hash + Eq + Clone;
}
I don't want to restrict the kinds of data the DAGs can contain to a single type, but they do need to be consistent within a DAG. And I don't want to toss around five or six type parameters at all time. So, I created a trait that contains a bunch of associated types, constrained with what they need to satisfy. I have to wonder if there's a better way, because doing it like this offends the #[derive] attribute's delicate sensibilities, on account of it's trying to constrain the NodeData instance itself, but NodeData instances don't need to be inhabited!

My secret ability is finding horrendously annoying edge cases in about five seconds.

Moving on, we have the Node implementations. Nothing too exciting here. The Sum<T> and Product<T> types have a field to store a precomputed hash, because I want hashing to be at most linearly complex.

I'm using Arc<T>s all over the place because this code was formerly leaning heavily on some Python features. It should be possible to rephrase all of this in terms of statically analyzed borrows, I think, but I'm not actually sure.

Code: Select all

type SelfMap<T> = HashMap<T, Arc<T>>;
 
struct Environment<T: NodeData> {
    sum: SelfMap<Sum<T>>,
    product: SelfMap<Product<T>>,
    power: SelfMap<Power<T>>,
    primitive: SelfMap<Primitive<T>>, //I could put in the maps for the various partial
                                      //derivatives, but that's a
                                      //non-vital optimization.
}
An environment is basically a set of caches that map an instance of a node to a dynamically borrowed instance, that's the canonical version for that environment.

Code: Select all

trait Summable<T: NodeData> {
    fn as_sum(&self, env: &mut Environment<T>) -> Arc<Sum<T>>;
}
 
trait Productable<T: NodeData> {
    fn as_product(&self, env: &mut Environment<T>) -> Arc<Product<T>>;
}
"There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors."

Code: Select all

fn _as_sum<T: NodeData>(productable: &Productable<T>, env: &mut Environment<T>) -> Arc<Sum<T>> {
I wrote this function because I couldn't figure out how to give three types identical impls (without using macros or repeating myself), so I did this to cut down on the repetition.

Code: Select all

trait Node<T: NodeData>: Summable<T> {
    fn partial_derivative(&self, variable: &T::Parameter, env: &mut Environment<T>) -> Arc<Sum<T>>;
 
    fn fancy_clone(&self, env: &mut Environment<T>) -> Self;
 
    fn get_storage<'a>(&self, env: &'a mut Environment<T>) -> &'a mut SelfMap<Self>
       where Self: Sized;
 
   fn insert(&self, env: &mut Environment<T>) -> Arc<Self>
       where Self: Sized + Clone + Eq + Hash
   {
       if !self.get_storage(env).contains_key(self) {
           let key = self.fancy_clone(env);
           let value = key.clone();
           self.get_storage(env).insert(key, Arc::new(value));
       }
       match self.get_storage(env).get(self) {
               Some(node) => node,
               None => unreachable!(),
           }
           .clone()
   }
}
Nodes rely on the environment to have canonical forms, which is why these functions take a mutable environment. The Summable<T> inheritance requires that they provide an as_sum method, which I did above. They need to be able to take partial derivatives with respect to a parameter, to clone themselves into a canonical form, and the last two functions are to handle inserting a node into the environment. I'm not sure I have these actually implemented correctly. The latter doesn't need to be overridden, because get_storage provides the mapping, which has identical logic.

Code: Select all

unimplemented!()
I have so much code stubbed out because it's not ported yet. Pretty easy to do in lots of static languages.

Code: Select all

&mut env.sum
It took me an embarrassingly long time to figure out how to write these functions, especially for how short they are.

Code: Select all

impl<T: NodeData> Productable<T> for Primitive<T> {
    fn as_product(&self, env: &mut Environment<T>) -> Arc<Product<T>> {
        match self {
            &Primitive::Sigmoid(true, ref sum) => {
                let sigmoid = env.make_sigmoid(false, sum);
                let power = sigmoid.as_power(env);
                env.make_product(&-T::Coefficient::one(), &vec![power]).unwrap()
            }
            _ => {
                let power = self.as_power(env);
                power.as_product(env)
            }
        }
    }
}
I think I messed up the Python implementation somewhat, and this hopefully makes sense now. The order of things turns out to be Sum-Product-Primitive-Power, as opposed to in order of composition, which is Power-Primitive at the end.

Then most of the rest of file is some boilerplate trait implementations. Some of them, like Clone, could I think be derived properly if #[derive] weren't a delicate flower. Some of them actually do need custom implementations.

Code: Select all

#[cfg(test)]
mod tests {
   #[test]
   fn it_works() {}
}
This was put there by the project boilerplate generation, and I haven't decided to get rid of it.

So, I still need to work on getting stuff together to handle the neural network structure. The current priority is implementing or using the Ord trait on all the Node<T> instances, and then try to implement the node arithmetic operations, then try to implement the partial derivatives. May look into finding things to dedupe, like maybe making a type to stand in for Arc<T>. Also, I should write a constructor for Environment<T>.

So, where I stand now: the code is partially ported to Rust, and compiles. There are no tests in Rust; I can maybe look into porting them once I have everything in the main module ported over. Hopefully I'll have some progress on all of this next week. That's all for now. See you around!
User avatar
mwchase
Contact:

Re: Let's Code: Neural Nets (in Rust?)

Postby mwchase » Mon Mar 20, 2017 3:28 am

Week 3: The Week Where I Split Things Up Into A Bunch Of Files

The size of the code was getting pretty intolerable, so I tried to divide it up into several files:

The main file, kind of the dumping ground for everything not slotted into another file: http://pastebin.com/PGzmzP4m
The file what defines most of the relevant types, as well as uninteresting trait implementations: http://pastebin.com/fEzqZPj9
The file that defines ordering, which is a somewhat more interesting trait implementation: http://pastebin.com/W1v77eYf

For my own reference, this is from changeset 90:880ca00ddc7a

Because I didn't do that last week, this might end up a little scattered.

I was joking before I divided up the code into files that I could have a "tedious_fucking_boilerplate.rs" file. The approach I took was to try to group together tightly coupled code. Sort of a cross-eyed view of the single responsibility principle, in which I'm trying to make sure that the file layout allows for non-trivial local refactorings.

From node_defs:

Code: Select all

pub type Container<T> = Arc<T>;
 
pub fn container<T>(data: T) -> Container<T> {
    Arc::new(data)
}
I decided to make things easier on myself if I later switch out Arc (atomic reference count) for some other wrapper. I'm not sure that avoiding having to change the name everywhere will make things significantly easier, but I figure it can't make things harder.

Code: Select all

pub trait NodeData {
    type Constant: Hash + Ord + Clone + Signed;
    type Exponent: Hash + Clone + Integer;
    type Input: Hash + Ord + Clone;
    type SystemVariable: Hash + Ord + Clone;
    type Parameter: Hash + Ord + Clone;
}
I cleaned up the constraints a little. Also, in order to implement multiplication, I had to combine the Constant and Coefficient types.

Code: Select all

//I would like these fields to be private if possible.
And a pony.

From node_ordering:

Code: Select all

#[derive(Eq, Ord, PartialEq, PartialOrd)]
enum PrimitiveType {
    Input,
    SystemVariable,
    Parameter,
    Sigmoid,
}
It's not that I don't want to use #[derive()], it's that it kept on breaking so much that I couldn't. It didn't break here, though.

Code: Select all

impl<T: NodeData> Primitive<T> {
    fn simplify(&self) -> PrimitiveType {
        match self {
            &Primitive::Input(_) => PrimitiveType::Input,
            &Primitive::SystemVariable(_) => PrimitiveType::SystemVariable,
            &Primitive::Parameter(_) => PrimitiveType::Parameter,
            &Primitive::Sigmoid(_, _) => PrimitiveType::Sigmoid,
        }
    }
}
I've got impls scattered throughout the code, not in the module where the types got defined, and I'm honestly not sure if that's supposed to work... The code passes cargo check, so at some level it thinks this all makes sense... but at the same time, I ran across some commits from a few years ago talking about dramatically restricting where you can put an impl. I'm very confused.

Code: Select all

impl<T: NodeData> Ord for Primitive<T> {
    fn cmp(&self, other: &Primitive<T>) -> Ordering {
        match (self, other) {
            (&Primitive::Input(ref a), &Primitive::Input(ref b)) => a.cmp(b),
            (&Primitive::SystemVariable(ref a), &Primitive::SystemVariable(ref b)) => a.cmp(b),
            (&Primitive::Parameter(ref a), &Primitive::Parameter(ref b)) => a.cmp(b),
            (&Primitive::Sigmoid(minus_a, ref a), &Primitive::Sigmoid(minus_b, ref b)) => {
                let cmp_result = a.cmp(b);
                match cmp_result {
                    Ordering::Equal => minus_a.cmp(&minus_b),
                    _ => cmp_result,
                }
            }
            _ => self.simplify().cmp(&other.simplify()),
        }
    }
}
Hm. I don't want to touch macros yet, but I bet this whole file would benefit from some way of generalizing the semantics that Sigmoid has here.

From lib.rs:

Code: Select all

mod node_defs;
mod node_ordering;
 
use node_defs::{Container, container, NodeData, Sum, Product, Power, Primitive};
I'm a little confused that I need to have these lines, insofar as, when my Diskstation went "I'M HELPING" and duplicated lib.rs, cargo build complained about ~1 billion conflicting definitions.

Code: Select all

struct Environment<T: NodeData> {
    sum: SelfMap<Sum<T>>,
    product: SelfMap<Product<T>>,
    power: SelfMap<Power<T>>,
    primitive: SelfMap<Primitive<T>>,
    sum_derivative: HashMap<(Sum<T>, T::Parameter), Container<Sum<T>>>,
    product_derivative: HashMap<(Product<T>, T::Parameter), Container<Sum<T>>>,
    power_derivative: HashMap<(Power<T>, T::Parameter), Container<Sum<T>>>,
    primitive_derivative: HashMap<(Primitive<T>, T::Parameter), Container<Sum<T>>>,
}
I started implementing derivatives, and preemptively added a bunch of fields to the Environment struct.

Code: Select all

    fn at_zero(&self) -> T::Constant {
...
    fn adjust_constant(&self,
                       constant: &T::Constant,
                       env: &mut Environment<T>)
                       -> Container<Sum<T>> {
...
    fn conditional_negate(&self, env: &mut Environment<T>) -> Vec<Container<Product<T>>> {
...
    fn negate_products(&mut self,
                       terms: &Vec<Container<Product<T>>>)
                       -> Vec<Container<Product<T>>> {
I noticed some convenience methods I could use to cut down some on the implementation of add().

Code: Select all

terms.extend_from_slice(&right_terms[..].split_at(right_index).1);
I've been reading the documentation seriously out of order. I think it should be possible to replace these lines with something more compact.

So, I've got addition, and multiplication, and I've started working on partial derivatives, but I've hit a snag. The derivative of a sum is easy, but when I started working on the derivative of a product, I ended up wanting to just pass a chained iterator into make_product, which implies that I might want to mess with the signature and implementation.

Also, it just now occurred to me that I could probably simplify the DAG manipulation a bunch if I got rid of the concept of "Power" nodes, and simply allowed multiple consecutive copies of a single primitive. That would let me get rid of the "merge" logic that's, frankly, kind of weird. I'll have to think about this, as it'd have some non-trivial effects on sort order, and thinking ahead, pushes more work onto the DAG visitor, which is much hotter code.

(Sort order: suppose x < y. Under the current implementation, x < xy < x^2, and under the proposed implementation, x < xx < xy. I forget if sorting needs to have any particular properties, so I'm not sure if this is a dealbreaker.)

I'll be honest, I don't know what anyone else thinks of these, but for rubber-ducking of that quality, I'll keep these up as long as I can.

Meanwhile, for the Minecraft thing, I'm putting in a heroic effort to put off further updates until the crash bug is fixed.

In any case, making progress, always a good time.

So, where I stand now: it kind of feels like I didn't do much, but actually there's quite a bit of code, both new and improved. I need to rewrite some utility functions, I'm just not totally sure how, yet. That's all for now. See you around!
User avatar
mwchase
Contact:

Re: Let's Code: Neural Nets (in Rust?)

Postby mwchase » Sun Mar 26, 2017 8:08 pm

Week 4: That Is A Beefy Diff

I'm doing something a little different with the file upload this week, and just uploading a diff. I figure this doesn't make too much difference because I haven't been including quite enough to build this, anyway.

Diff: http://pastebin.com/bS7fyH8Y

Based on changeset 137:c23d7ad1cd47

Let's dive in and see what specific accomplishments we can drag out of this ~1k line file.

Code: Select all

-use std::hash::{Hash, Hasher};
+use std::hash::Hash;
 
-use num::{One, Zero};
-
I don't need all of the imports that I had. Some of them are removed, and some just moved around.

Code: Select all

+mod node_conversions;
 mod node_defs;
 mod node_ordering;
 
-use node_defs::{Container, container, NodeData, Sum, Product, Power, Primitive};
+use node_conversions::Summable;
+use node_defs::{Container, container, FancyClone, NodeData, Sum, Product, Power, Primitive};
I added a new module for handling conversions between node types, and moved fancy_clone into its own trait, which now lives with the other simple node code. As such, various details about the construction and conversion of nodes is now hidden at the top level.

Code: Select all

+/// ...
I tried to document my functions. The result is... passable? Maybe?

Code: Select all

env.secret_make_...
I didn't like the unwraps scattered around the methods, so I implemented some private methods that do the minimum required, and kept the public versions, but with some nicer functionality.

Code: Select all

+pub ...
Exposed a few more things to hypothetical library users.

Code: Select all

-                       constant: &T::Constant,
+                       constant: T::Constant,
I tried to cut out various borrows and clones. I don't know if this really makes a difference, but I think it looks cleaner.

Code: Select all

-                     &self.terms)
...
+                     self.terms.iter().cloned())
I'm trying to hold off on constructing new vectors until it's needed. Thus, there's a lot more in the way of iterator methods now.

Code: Select all

-                .collect::<Vec<_>>()
+                .collect()
Nothing especially clever here, it just turned out that verbosity was unneeded.

Code: Select all

-    fn make_sum(&mut self,
-                minus: bool,
-                constant: &T::Constant,
-                terms: &Vec<Container<Product<T>>>)
-                -> Container<Sum<T>> {
+    /// Makes a canonicalized Sum from the given components.
+    pub fn make_sum<C, N: Into<T::Constant>>(&mut self,
+                                             minus: bool,
+                                             constant: N,
+                                             terms: C)
+                                             -> Container<Sum<T>>
+        where C: IntoIterator<Item = Container<Product<T>>>
+    {
I don't fully grasp the differences in expressiveness between where clauses and constraints, but I don't really care yet, because the checker accepts this code. The changes here are focused on loosening the constraints on the input types. Instead of "a constant" and "a vector of (containers of) products", this code accepts any types that aren't too much hassle to convert to those. The other make_ methods got a similar overhaul.

Code: Select all

-                terms.extend_from_slice(&right_terms[..].split_at(right_index).1);
+                terms.extend_from_slice(&right_terms[right_index..]);
Yup. Could have done that all along.

Code: Select all

-        if terms.len() == 0 || terms[0].coefficient >= T::Constant::zero() {
...
+        if terms.len() == 0 || terms[0].coefficient >= 0.into() {
...
-        let mut last = self.number(&T::Constant::zero());
+        let mut last = self.number(0);
Much nicer to look at, IMO.

Code: Select all

     fn partial_derivative(&self,
                           variable: &T::Parameter,
                           env: &mut Environment<T>)
-                          -> Container<Sum<T>>;
...
+                          -> Container<Sum<T>>
+        where Self: Sized + Clone + Eq + Hash
+    {
+        if !self.get_derivative_cache(env).contains_key(&(self.clone(), variable.clone())) {
+            let key = (self.fancy_clone(env), variable.clone());
+            let result = self.derivative_impl(variable, env);
+            self.get_derivative_cache(env).insert(key, result);
+        }
+        match self.get_derivative_cache(env).get(&(self.clone(), variable.clone())) {
+            Some(derivative) => derivative.clone(),
+            None => unreachable!(),
+        }
+    }
 
     fn get_storage<'a>(&self, env: &'a mut Environment<T>) -> &'a mut SelfMap<Self>
         where Self: Sized;
 
+    fn get_derivative_cache<'a>(&self,
+                                env: &'a mut Environment<T>)
+                                -> &'a mut HashMap<(Self, T::Parameter), Container<Sum<T>>>
+        where Self: Sized;
+
+    fn derivative_impl(&self,
+                       variable: &T::Parameter,
+                       env: &mut Environment<T>)
+                       -> Container<Sum<T>>;
The derivative implementation had hella boilerplate, so I made a wrapper function. Sadly, it seems like it does have to be that convoluted to satisfy the borrow checker. This may change in the future.

Then things get a little confusing on account of all the code I moved out of the file.

Code: Select all

-                let mut result = env.number(&T::Constant::zero());
-                unimplemented!()
-                /*for (index, focus) in self.powers.iter().enumerate() {
-                    let remainder = env.make_product(T::Constant::one(), )
-                }*/
...
+            let mut result = env.number(0);
+            for (index, focus) in self.powers.iter().enumerate() {
+                let remainder = env.secret_make_product(1,
+                                                        self.powers[..index]
+                                                            .iter()
+                                                            .chain(self.powers[index + 1..]
+                                                                       .iter())
+                                                            .cloned()
+                                                            .collect());
+                let partial_derivative = focus.partial_derivative(variable, env);
+                let product = env.multiply(remainder, partial_derivative);
+                result = env.add(result, product);
+            }
+            result
And there's a new partial derivative implementation.

Code: Select all

-    fn partial_derivative(&self,
-                          variable: &T::Parameter,
-                          env: &mut Environment<T>)
-                          -> Container<Sum<T>> {
-        unimplemented!();
-    }
-
...
+    fn derivative_impl(&self,
+                       variable: &T::Parameter,
+                       env: &mut Environment<T>)
+                       -> Container<Sum<T>> {
+        let derivative = self.primitive.partial_derivative(variable, env);
+        if self.exponent == 1.into() {
+            derivative
+        } else {
+            let coefficient = env.number(self.exponent.clone());
+            let remainder = env.secret_make_power(self.primitive.clone(),
+                                                  self.exponent.clone() - 1.into());
+            let product = env.multiply(derivative, remainder);
+            env.multiply(coefficient, product)
         }
     }
And another.

Code: Select all

+                let self_squared = env.secret_make_power(self_in_env, 2);
This line from the sigmoid derivative reveals part of my motivation for some of the above rewrites: I wanted numbers easily to hand, besides zero and one!

Next up, node_conversions.rs. This is mostly code moved out of lib.rs. However, I do remember a few changes...

Code: Select all

+fn _as_sum<T: NodeData, U: Productable<T>>(productable: &U,
+                                           env: &mut Environment<T>)
+                                           -> Container<Sum<T>> {
The old signature for this, with &Productable, was kind of wrong. I wanted compile-time specialization for each implementation of Productable (the name of which is one of various things that this new module keeps contained). By passing in a reference, I was having it work with a trait object, which is a run-time construct.

I don't think there's anything else terribly interesting in there, so moving on to node_defs.rs...

Code: Select all

+use {Environment, Node};
+
I moved fancy_clone in here, which means that this code needs to be aware of Environment and the Node trait. I could probably make a submodule that does this, so the uses don't pollute the top level.

Code: Select all

-    type Constant: Hash + Ord + Clone + Signed;
-    type Exponent: Hash + Clone + Integer;
+    type Constant: Hash + Ord + Clone + Signed + Into<Self::Coefficient> + From<i32>;
+    type Coefficient: Hash + Ord + Clone + Signed + Into<Self::Constant> + From<i32>;
+    type Exponent: Hash + Clone + Integer + Into<Self::Constant> + From<i32>;
The Coefficient type is back! I kind of glossed over the changes I made as a result of this, but I'm much happier with the type system requiring explicit conversions between constants and coefficients. The other thing here that I showed the results of, but didn't explain, is allowing these types to convert from i32, which is the most common type of integer literal. (Because Constants and Coefficients are supposed to interconvert, they should normally be the same type, but from this perspective, I want that to be an implementation detail.)

Code: Select all

+pub trait FancyClone<T: NodeData> {
+    fn fancy_clone(&self, env: &mut Environment<T>) -> Self;
+}
+
fancy_clone is now its own trait, which let me move it in here, which let me...

Code: Select all

-    pub pre_hash: u64, //I would like these fields to be private if possible.
+    pre_hash: u64, //I would like these fields to be private if possible.
Grant my wish. I should get rid of those comments. *one minute later* There we go.

node_ordering.rs:

Code: Select all

-use node_defs::{Container, NodeData, Sum, Product, Power, Primitive};
+use node_defs::{NodeData, Sum, Product, Power, Primitive};
I got rid of an unused import. *confetti*

Okay, so this past week was a mix of housekeeping and eliminating TODOs. At this abstract level, the DAG code is now ready. To go anywhere, it needs concrete implementations, and those implementations are constrained by the semantics of the code that actually evaluates the DAG for a given state. This means that next week is working on State structures, and Evaluators.

So, where I stand now: making steady progress, preparing to express the idea of State in an abstract fashion, and hopefully a little more flexibly than in my Python implementation. That's all for now. See you around!

I swear in a few weeks I'll have a net going. It's pretty simple to create arbitrary structures, once all the infrastructure is in place.
User avatar
SpammyV
Contact:

Re: Let's Code: Neural Nets (in Rust?)

Postby SpammyV » Sun Mar 26, 2017 8:28 pm

I'll be honest, when I saw the thread title I thought you meant the game Rust, and this would turn into one of those Minecraft coding stories like "Hey I made a redstone computer AI in Minecraft and it learned what love is what do I do now?"
User avatar
mwchase
Contact:

Re: Let's Code: Neural Nets (in Rust?)

Postby mwchase » Sun Mar 26, 2017 9:41 pm

Would've been under the video games forum if that were the case.
User avatar
mwchase
Contact:

Re: Let's Code: Neural Nets (in Rust?)

Postby mwchase » Sat Apr 01, 2017 2:25 am

Sneak preview of my reflections on this week's work:
Everything is on fire.
User avatar
mwchase
Contact:

Re: Let's Code: Neural Nets (in Rust?)

Postby mwchase » Sun Apr 02, 2017 8:20 pm

Week 5: Everything Is Not Quite As On Fire

I'm doing this in part to learn Rust, so I spent a lot of this week working on a toy project to understand concurrency. I'll be annotating that extensively to see if my understanding is actually correct.

Adagio diff: https://pastebin.com/w9RFUhCY
Q Sequence diff: https://pastebin.com/H0e7hkvN

Based on changeset 162:eb1fb913e17f

Quick overview of the changes in the Adagio crate:

eval.rs: I tried to put together some data types, before I realized that I should be worrying more about how they're consumed.

lib.rs:

Code: Select all

extern crate void;
...

+mod eval;
I added the void crate because I'd like some uninhabitable types, and declared the eval module so I can typecheck it.

Code: Select all

-        match self.get_derivative_cache(env).get(&(self.clone(), variable.clone())) {
-            Some(derivative) => derivative.clone(),
-            None => unreachable!(),
-        }
+        self.get_derivative_cache(env)[&(self.clone(), variable.clone())].clone()
...
-        match self.get_storage(env).get(self) {
-            Some(node) => node.clone(),
-            None => unreachable!(),
-        }
+        self.get_storage(env)[self].clone()
Syntactic sugar exists...

Most of the rest of the changes were the new rustfmt doing its thing, and a change I'll explain later...

node_defs.rs:

Code: Select all

-    type SystemVariable: Hash + Ord + Clone;
+    type Real: Hash + Ord + Clone;
By which I mean, now. I changed my intended semantics for one of the primitive types, and changed the name to reflect that. It now represents an arbitrary real constant that cannot be represented with the rest of the semantics. (Fun fact: under Rust semantics, it's actually not permissible for any part of the DAG to contain a floating point number!)

Code: Select all

-    pre_hash: u64, //I would like these fields to be private if possible.
+    pre_hash: u64,
Like I said, I took the comments off.

Code: Select all

-            &Primitive::SystemVariable(ref a) => {
-                "SystemVariable".hash(state);
+            &Primitive::Real(ref a) => {
+                "Real".hash(state);
This would be a breaking change if there were any serialization going on, but there's not, so it isn't.

node_ordering.rs: Nothing going on in here I haven't already covered.

Now, let's see what was on fire!

Q Sequence: main.rs:

Code: Select all

use std::collections::HashMap;
use std::sync::{Arc,RwLock};
use std::thread;
We're working with maps, shared between threads using reference counts and locks. This is all pretty heavy-weight for a "systems" language, and I'd like to look into doing something more efficient. Later.

Code: Select all

fn q(mapping: &mut Arc<RwLock<HashMap<i32, Arc<RwLock<i32>>>>>, key: i32) -> i32 {
This is kind of a ridiculous signature, and I'll probably use some form of syntactic sugar to shorten it in Adagio. So, what's this function supposed to do?

The Hofstadter Q sequence is based off the Fibonacci sequence. Where F(n) = F(n - 1) + F(n - 2), Q(n) = Q(n - Q(n - 1)) + Q(n - Q(n - 2)). It's not actually sufficiently more complicated than the Fibonacci sequence for this purpose, but it's cool, so I used it.

In any case, the q() function returns the value of the key-th value of the Q sequence, given a thread-shareable HashMap cache, that's been pre-seeded with at least the first two values, which are both 1.

The function contains a few println macros, because threading is confusing, even in this kind of restricted problem space. (Due to the dependence on the previous value, the order of execution is deterministic. I still managed to deadlock a few times, working on this.)

This code does a bunch of lock operations, which it all assumes succeed.

Code: Select all

    {
        match mapping.read().unwrap().get(&key) {
            Some(i) => return *i.read().unwrap(),
            None => ()
        }
    }
First, check once for the value in the cache. This requires taking a read lock on the cache, checking for a value (which may not be calculated yet, and merely reserving space in the map), taking a read lock on THAT, and returning the result of dereferencing it. If there's no value found, all the locks are dropped, and the function continues.

Code: Select all

        let lock = Arc::new(RwLock::new(0));
        let mut val = lock.write().unwrap();
Prepare a new value to put in the cache, with invalid data to start, and immediately acquire a write lock on it. Since the value hasn't been shared, we know this succeeds immediately.

Code: Select all

        {
            use std::collections::hash_map::Entry::*;
            let mut mutable_mapping = mapping.write().unwrap();
            match mutable_mapping.entry(key) {
                Occupied(entry) => return *entry.get().read().unwrap(),
                Vacant(entry) => {entry.insert(lock.clone());}
            }
            println!("Inserted for {:?}", key);
        }
Acquire a write lock on the cache, then look for the entry. If it's present, bail out, returning the contained value. If it's absent, insert a reference to the locked value, and continue.

Code: Select all

        {
            let mut _mapping = mapping.clone();
            thread::spawn(move || {
                q(&mut _mapping, key - 1);
            });
        }
Create a new reference to the cache, and pass it into a thread to calculate the next-lowest Q number.

Before we finish off the q function, let's look at the function I skipped:

Code: Select all

fn poll_for(mapping: Arc<RwLock<HashMap<i32, Arc<RwLock<i32>>>>>, key: i32) -> Arc<RwLock<i32>> {
    loop {
        match mapping.read().unwrap().get(&key) {
            Some(i) => return i.clone(),
            None => ()
        }
    }
}
This takes a cache and a key, and attempts to extract a reference to the value. This function will constantly check for a value; it may be susceptible to deadlocks in non-toy contexts.

Code: Select all

        {
            let prev = poll_for(mapping.clone(), key - 1);
            let prev_prev = poll_for(mapping.clone(), key - 2);
            let first = poll_for(mapping.clone(), key - *prev.read().unwrap());
            let second = poll_for(mapping.clone(), key - *prev_prev.read().unwrap());
            *val = *first.read().unwrap() + *second.read().unwrap();
        }
        *val
Once the thread is spawned, look for the result value, and use similar code for the rest of the values, for the heck of it. Once the Q number is calculated, store it in the cache through the lock, then return the value. After this is done, the lock is dropped, and other code can read the value from cache.

Code: Select all

fn main() {
    let mut mapping = HashMap::new();
    mapping.insert(1, Arc::new(RwLock::new(1)));
    mapping.insert(2, Arc::new(RwLock::new(1)));
    let mut mapping = Arc::new(RwLock::new(mapping));
    println!("{:?}", q(&mut mapping, 10));
    println!("{:?}", q(&mut mapping, 10));
}
Initialize the cache properly, pass it in, run it, and the answer is indeed 6.

So, that's what I was working on the past few days. In the context of Adagio, I'm going to want a fixed data structure that contains intermediate storage mapping nodes to an output type, given a state that's constant for the purposes of a single calculation step. The actual flow between nodes is defined by what the nodes are, but the details of calculation can be swapped out somewhat: there should be a single output type, but depending on speed vs accuracy constraints, there could be multiple ways to actually calculate the values.

So, where I stand now: I've got a proof of concept for concurrent execution, though it's in some ways closer to concept than proof. I'll try to get it into Adagio next week. That's all for now. See you around!
User avatar
Thomas

Re: Let's Code: Neural Nets (in Rust?)

Postby Thomas » Mon Apr 03, 2017 7:17 pm

This is super interesting, just wanted to let you know!
User avatar
mwchase
Contact:

Re: Let's Code: Neural Nets (in Rust?)

Postby mwchase » Mon Apr 03, 2017 8:00 pm

For the record, I realized just a few minutes ago that I did something a little wrong with this latest installment. It doesn't break anything, and it doesn't act any differently in this restricted context when I fix it, but it's there, and it'll be relevant when I try to adapt this code to the evaluation. Going back and forth on whether to go into more detail before next Sunday.
User avatar
mwchase
Contact:

Re: Let's Code: Neural Nets (in Rust?)

Postby mwchase » Sun Apr 09, 2017 6:38 pm

Week 6: Written (In Part) Shortly After Week 5

Before we get into the code changes this week, I figured it's a good time for a change of pace. The code so far has kind of dived headlong into Rust, without really explaining what's going on. Now that we've got some non-arbitrary example code to work from, let's look back at what we've got, and at Rust in general.

Rust's big thing is the ownership system. In terms of what the various checkers actually see, in typical Rust code, there's a few ways to refer to an object:

  • Direct ownership: If a value is created within a function, or an argument's type is bare, then that function is said to own the value. It's responsible for properly deallocating the value after it finishes executing, either by dropping it when the value's associated scope ends, or by passing it into a function that takes ownership. The latter case, and the behavior arising from assigning the value to a new variable (using a let-binding), are examples of move semantics: there's only one binding to a given resource, so when we give that binding to another variable or function, it's no longer valid. Move semantics does not always apply; I'll get to that in a moment.
  • Immutable references: We often don't want a function to take ownership of a value. To accomplish this statically, we request a reference. An immutable reference type is written with an ampersand, an optional lifetime parameter, and the rest of the type. I believe it's possible to have a reference to a reference, but I've avoided that so far. References are associated with a lifetime, which is usually related to a scope. When the associated scope ends, the borrow ends. A value can have many immutable references to it.
  • Mutable references: ... But it can only have one mutable reference, and only if there are no immutable references at the same time.

All of these things are static behavior, and it's possible to have more freedom dynamically using various traits and types.

  • Clone: This is a standard interface for obtaining an independent owned value from a reference. It must be invoked explicitly.
  • Copy: A specialization of Clone. Values that are of Copy types are not subject to move semantics. They get copied implicitly.
  • Borrow: This trait offers a generic interface to various simple container types: the bare type, immutable and mutable references, 33 sizes of array, and Box, Rc, and Arc. In detail...
  • Box: A pointer to a heap allocation; I believe it otherwise acts mostly the same as an owned value. I seem to mostly read about it in the context of runtime polymorphism; the size of a box is known, even if the size of what's in it is not, which makes it possible to reason about data that doesn't have a fixed size.
  • Rc: A reference-counting pointer that cannot be sent between threads. Invoking clone() on it will, rather than cloning the underlying value, create a new pointer; this means, in fact, that the pointed-at type doesn't need to be Clone for this to work. This is an example of an important thing that I didn't quite grasp, and may not fully grasp yet: while the contents of a container or pointer are important to the high-level semantics of our code, what matters most in compilation is the interface presented by the outer layer. With an Rc, we can copy references to a structure around within a thread, and try at runtime to extract a mutable reference. This contract is defined by the Rc, and is (almost) independent of the semantics of its contents.
  • Arc: An atomic-reference-counting pointer that can be sent between threads. I've got them all over Adagio because I assume multi-threading will be helpful at some point.
  • Mutex: So you've got a value that is accessible from multiple threads. If you want to do more than look at it, you need to be able to lock access to the value dynamically, so only one thread can alter it at a time, and no other thread can see it being altered. The Mutex is the big-hammer approach to this: only one thread can access the wrapped value at once.
  • RwLock: For more advanced capabilities, and slightly more runtime cost, the RwLock allows either a single thread to get exaclusive access to the wrapped value, or multiple threads to get read-only access.

Now, where I'd failed to apply what I said under Rc was, the mutability or immutability of a container refers to the container itself, and it's possible to obtain a reference FROM the container that allows you to do things to it that you can't do to the container. This is known as interior mutability, and is what I neglected to take advantage of last week.

First diff: https://pastebin.com/Sz8x9air

I basically just got rid of the muts in the function declarations, because they're not necessary, overall. The compiler didn't tell me they weren't necessary, because it was kind of a fixed point: because the declaration had a mut, I needed a mut clone. Therefore, it needed to be mut. Whoops!

All that said, I'm still not really happy with how this is set up. So, I'm going to try to adapt the futures library to my goals.

Well, that went horribly, but I did find out that HashSet will do what I'm doing with SelfMap, but more cleanly.

Here's the diff: https://pastebin.com/R43ZNwvZ
It doesn't compile, not even a little.

That said, I did make those changes I just mentioned, and they were great:

Diff: https://pastebin.com/zNZJde8P

This is cool because now the environment structure, which maintains the DAG contract, doesn't need separate storage for keys and values.

Anyway, what went wrong with using futures? I think it was a square-peg/round-hole issue. Near as I can tell, futures uses type parameters to encode the entire structure of the computation it's performing. This lets it compile down to no dynamic dispatches, but it means I couldn't figure out how to make it work in a situation where the provenance of intermediate calculations varied dynamically.

The way I see it, I have two major options now: use the old code I had to roll my own, more dynamic version, or focus on one of the things I want to do eventually, and could do now: extracting linear algebra operations back out of the DAG, and try to hook them up to BLAS. I'm going to try to do the former first, because that seems easier to tackle now.

I've got a big document where I think about what to do next. Previous things were about adapting and porting code. I think I'll put my thoughts on what I want to do with my own version of futures here.

First up, join_all. Converts an iterable of Futures into a Future of a Vec. Pretty useful. There are really only a few operations I need to be able to perform: collect a bunch of futures into one, convert a future into another one with a closure, and convert a non-future value into a future.

Actually, what if I sidestep all of this? A Node contains all the information about its dependencies. How about I ignore threading for now and calculate based on a different mode entirely? Given an intermediate storage, a node can either calculate its value, or construct a list of nodes that must be evaluated first. So, set up a priority queue. Put the desired value in with priority 0, and pop it. If it evaluates, return the result. If it fails to evaluate, it will return a list of nodes instead. Push them with priority 1... actually, it should just return one node at a time, I think. The first node it finds not in storage. Wait, that's quadratic... but if it uses every node, then it could potentially queue up the same node many times... could use a secondary intermediate storage that just records the index checked up to.

If I do this, I might need a wrapper type to contain every type of node, so they can be compared for ordering purposes.

The basic goal of all this is to expressive a recursive calculation iteratively, so it only uses constant stack depth.

Now, if it's just one at a time, then I could just use an actual stack for now.

Okay, I'm making myself dizzy.

This does suggest an initial implementation: just ignore everything I've been trying out, and start off with single-threaded recursive. Still going to have intermediate storage, because it's kind of nuts otherwise.

Also, to start with, let's hardcode the relevant behavior.

Final diff (from 169:a0752c857839): https://pastebin.com/fhQ0etH5

Code: Select all

-use std::marker::PhantomData;
+//use std::marker::PhantomData;
...
-trait Input<I, T> {
...
+/*trait Input<I, T> {
...
+}*/
Area man hoards abstract, immaterial information. (Seriously, I ought to just get rid of those.)

Code: Select all

+struct State {
+    inputs: Vec<f64>,
+    parameters: Vec<f64>,
 }
This represent storage for two distinct floating point numbers, basically. If I wanted to do something really over the top in terms of conciseness, I could have a fixed number of parameters, followed by a possibly variable number of inputs. But that sounds tricky. Right now it's hardcoded to double precision floating point because eh, why not, it's a number, but maybe I'll want to switch to single precision later, or maybe even a fixed-point representation.

Code: Select all

+struct Cache<T: NodeData> {
+    sums: HashMap<Container<Sum<T>>, f64>,
+    products: HashMap<Container<Product<T>>, f64>,
+    powers: HashMap<Container<Power<T>>, f64>,
+    primitives: HashMap<Container<Primitive<T>>, f64>,
+}
+
+impl<T: NodeData> Cache<T> {
+    fn new() -> Cache<T> {
+        Cache {
+            sums: HashMap::new(),
+            products: HashMap::new(),
+            powers: HashMap::new(),
+            primitives: HashMap::new(),
+        }
+    }
+}
Technically, it's not a cache but an intermediate representation, but that's long. This is, one way or another, the data structure that's going to accelerate calculations until I figure out how to convert the generic DAGs into a bunch of tensor operations.

Code: Select all

+impl<T: NodeData<Input = usize, Parameter = usize>> Cache<T>
+    where T::Constant: Into<f64>,
+          T::Coefficient: Into<f64>,
+          T::Exponent: Into<i32>
+{
Constraining data types is weird. The equality constraints are there because those values are simply indices into the fields on the state struct. Constant and Coefficient need to convert into something that supports tanh, power, and multiplication by a scaling factor, followed by addition. As I said, in this case that's a double precision float. Finally, Exponent only needs to convert to a smallish integer because I don't expect it to get too big (though nested sigmoid derivatives may prove my undoing; if I go deep, I might need to make some of the changes I mentioned earlier), and using an integer exponent is faster.

Code: Select all

+                &Primitive::Real(_) => unimplemented!(),
I didn't bother to define anything, or indeed constrain this type, because I haven't actually tried to write code that would use this yet. My potential use cases for this variant are stuff like discrete Fourier transforms. Which I'm not going to mess with just yet.

This was kind of a rough week, what with jumping into futures, which is an exquisitely tuned module... for a different use case. I decided to focus for now on getting something that works, and I'll devote next week to proving it works. There are several steps to that: write code to produce a custom leaky rectifier I put together out of the available nodes. Way leakier than normal leaky rectifiers, but totally smooth. Then, put together a simple network for learning XOR. Then training it.

So, where I stand now: I'm currently messing with code that works well enough. I'll try to get a simple toy network running next week. That's all for now. See you around!
User avatar
mwchase
Contact:

Re: Let's Code: Neural Nets (in Rust?)

Postby mwchase » Wed Apr 12, 2017 8:01 pm

Coming this Sunday: The final entry for now.

EDIT: Actually, you know what?
User avatar
mwchase
Contact:

Re: Let's Code: Neural Nets (in Rust?)

Postby mwchase » Thu Apr 13, 2017 5:02 pm

Week 7: Ergonomics! (Until It Isn't...)

Until now, I hadn't really used the features I've been implementing, not truly.

Code: Select all

+++ b/rust/adagio/src/lib.rs
@@ -1,9 +1,11 @@
 extern crate num;
+extern crate num_rational;

 use std::cmp::Ordering;
 use std::collections::{HashMap, HashSet};
 use std::hash::Hash;

+mod builder;
 mod eval;
 mod node_conversions;
 mod node_defs;
@@ -344,6 +346,11 @@
         powers

     }
+
+    pub fn tanh<S: Node<T>>(&mut self, sum: Container<S>) -> Container<Primitive<T>> {
+        let sum = sum.as_sum(self);
+        self.make_sigmoid(false, sum)
+    }
 }

+++ b/rust/adagio/src/builder.rs
@@ -0,0 +1,21 @@
+use num_rational::Rational;
+
+use {Environment, Node};
+use node_defs::{Container, NodeData, Sum};
+
+impl<T:NodeData> Environment<T> {
+    fn swlu<I: Node<T>>(&mut self, input: Container<I>) -> Container<Sum<T>> where T::Constant: From<Rational> {
+        let leakiness = self.number(Rational::new(1, 10));
+        let one_half = self.number(Rational::new(1, 2));
+        let one = self.number(Rational::new(1, 1));
+        let minus_one = self.number(Rational::new(-1, 1));
+        let tanh = self.tanh(input.clone());
+        let minus_tanh = self.multiply(minus_one, tanh.clone());
+        let one_plus = self.add(one.clone(), tanh);
+        let one_minus = self.add(one, minus_tanh);
+        let scaled = self.multiply(leakiness, one_minus);
+        let sum = self.add(one_plus, scaled);
+        let half_input = self.multiply(one_half, input);
+        self.multiply(half_input, sum)
+    }
+}
Yeah. Wow. Not a good look.

What I need is a way to express simple arithmetic operations without relying on obtaining a canonical form at every step. The result will still need a canonical form, but that can happen at the end of the function, more or less. My basic plan here is to put together something that functions as an abstract syntax tree, and then gets converted to a canonicalized DAG "in one step".

I'm going to try to put together an enum to unify all the different things I'd like out of an arithmetic AST here, and then put together some impls to use a bunch of nice shorthand for math purposes. Then, there just needs to be a method on the Environment that converts the AST to a DAG in that Environment.

You may also be wondering what this function is, you know, for. In essence, tanh is not a great source of nonlinearity. It gives you gradients, but when you try to use them for gradient descent, they don't actually, well, descend. The prefered nonlinearities are nearly linear for positive numbers, and linear or sublinear for negative numbers, and with lower absolute values. There are a variety of schemes in use. max(x, 0). Approximate that with some exponentials and logarithms. Make the left half something like x/100. Make the left half decay exponentially. All nice ideas, and all something that would mess with the DAG canonicalization something fierce. So, I looked at what I have. By scaling tanh and adding a constant, I can make it asymptotically approach any pair of numbers. If I then multiply it by its input, I get a function that asymptotically approaches two different line segments, and is 0 at 0, which sounds helpful. I was working under the constraint that the derivative must be positive everywhere. If the derivative has roots, then the derivative produced by the chain rule will have roots, and the system runs the risk of getting trapped by those roots. The tightest leakiness I could manage given that constraint was just a little closer to 0 than -0.1. That is close enough to crossing the axis that I don't think getting any more precise would really help. So that's where the "SWLU" (sigmoid weighted linear unit) comes from. It's probably kind of junky, but it's smooth and (relatively) simple.

Okay, time to start setting up networks.

Code: Select all

error[E0277]: the trait bound `num_rational::Ratio<isize>: std::convert::From<i32>` is not satisfied
   --> src/lib.rs:569:14
    |
569 |         impl NodeData for MyNodeData {
    |              ^^^^^^^^ the trait `std::convert::From<i32>` is not implemented for `num_rational::Ratio<isize>`
    |
    = help: the following implementations were found:
              <num_rational::Ratio<T> as std::convert::From<T>>
              <num_rational::Ratio<T> as std::convert::From<(T, T)>>
    = note: required by `node_defs::NodeData`

error: aborting due to previous error

error: Could not compile `adagio`.
Ew. Okay, looks like I need to give the library some consistent opinions about data types, then look into making it generic.

Ugh, I may have to walk back some decisions if I want this to work properly. Let's look at the constraints. I need a numeric type that implements Eq, and for the purposes of the SWLU, I'd like it to specifically be some form of rational type. At the same time, I'd like to be able to use numeric literals some. For this, I don't need it to implement a globally specific type, it just needs to be some type. I seem to recall having trouble defining constraints using a different type in the same trait, but maybe I was just doing something wrong.

That didn't work, so I decided to just hardcode it to a single type of literal. And then I realized that SWLU should be a function on exprs.

I'm not sure how to write up what's going on now. I'm discovering random bugs in code that I haven't really tested yet.

Let's have a look at the error output I'm getting.

Code: Select all

  thread 'tests::construct_xor' panicked at 'index out of bounds: the len is 1 but the index is 1', /Users/rustbuild/src/rust-buildbot/slave/stable-dist-rustc-mac/build/src/libcollections/vec.rs:1392
stack backtrace:
   1:        0x106fb3e9c - std::sys::imp::backtrace::tracing::imp::write::h21ca2762819c7ae8
   2:        0x106fb5e8e - std::panicking::default_hook::{{closure}}::h38f99a37d00bb19b
   3:        0x106fb5a33 - std::panicking::default_hook::ha2186ee24b50729c
   4:        0x106fb6347 - std::panicking::rust_panic_with_hook::h979db19ee91d2a53
   5:        0x106fb61f4 - std::panicking::begin_panic::h6a69f5b54391c64d
   6:        0x106fb6112 - std::panicking::begin_panic_fmt::h9de2343580b3c2c4
   7:        0x106fb6077 - rust_begin_unwind
   8:        0x106fdcf90 - core::panicking::panic_fmt::haa2997386017a96f
   9:        0x106fdcf08 - core::panicking::panic_bounds_check::hc4fb0050ff5f0dcf
  10:        0x106f60a55 - <collections::vec::Vec<T> as core::ops::Index<usize>>::index::h55c0330840da5da3
  11:        0x106f730fd - <adagio::Environment<T>>::_combine_terms::hd8cf49a91fd56afe
  12:        0x106f722d7 - <adagio::Environment<T>>::multiply::hda61f96e243d5da3
  13:        0x106f74211 - <adagio::Environment<T>>::eval::ha0fea4e2212a494f
  14:        0x106f73b40 - <adagio::Environment<T>>::eval::ha0fea4e2212a494f
  15:        0x106f73b40 - <adagio::Environment<T>>::eval::ha0fea4e2212a494f
  16:        0x106f73ca2 - <adagio::Environment<T>>::eval::ha0fea4e2212a494f
  17:        0x106f73ca2 - <adagio::Environment<T>>::eval::ha0fea4e2212a494f
  18:        0x106f73ca2 - <adagio::Environment<T>>::eval::ha0fea4e2212a494f
  19:        0x106f73b40 - <adagio::Environment<T>>::eval::ha0fea4e2212a494f
  20:        0x106f73b40 - <adagio::Environment<T>>::eval::ha0fea4e2212a494f
  21:        0x106f7612e - adagio::tests::construct_xor::hf644501a83fcd418
  22:        0x106f84a3e - <F as test::FnBox<T>>::call_box::h911d6e7479d078b1
  23:        0x106fb722a - __rust_maybe_catch_panic
  24:        0x106f78f18 - std::panicking::try::do_call::h3431891132b0ee14
  25:        0x106fb722a - __rust_maybe_catch_panic
  26:        0x106f7fc7e - <F as alloc::boxed::FnBox<A>>::call_box::h2e47f51b54ac05cf
  27:        0x106fb5564 - std::sys::imp::thread::Thread::new::thread_start::hca57ad22492f366f
  28:     0x7fff8cac49ae - _pthread_body
  29:     0x7fff8cac48fa - _pthread_start
Huh. Huh.

So, I appear to have hit an issue in Rust. Not sure how I did this, but I managed to create some kind of mutant vector, or mutant vector situation, that crashes the runtime. This is an interesting thing to investigate, but I don't want to get into it now.

I did have fun doing these, so I'll be back soon with a project that's hopefully more tractable.

First, here's what I did find looking at the code.

Final diff: https://pastebin.com/EEN9AW3k

Something went wrong in the _combine_terms method, but it happened in some of the most benign-looking code, under the simplest conditions. It happens at the beginning of the first iteration of the while loop, when it attempts to index into one of the right-side vectors. I confirmed it's the vector itself by rearranging the index operations. Somehow, in one of these multiplications, the right-side argument of the "last" vector manages to interpret a 0 index argument as a 1.

I don't want to look into this now, but this does provide a little object lesson: get code coverage early. If I want to chase this, I need to find a minimal example, which will be something, to be sure.

In any case, I want to take a break from this. So, I'm tossing around ideas for what to start in on. Here's what I've got so far:
  • Try to reimplement a type to represent algebraic numbers (implementations exist for Python, in the SymPy library, and for Julia, in a library I read a blog post about) in Python, port to Rust (learning to use code coverage tools, because eesh, don't want to get ambushed), and try mashing up with someone else's Rust project to roll a raytracer, in order to get GLORIOUS EXACT PRECISION RAYTRACING, which will probably, um, be indistinguishable from decades-old rendering tech, and, um, slower. (Math with types for algebraic numbers is WEIRD. The square root is mostly trivial, and there are papers on how to do addition and multiplication.)
  • Learn an existing neural network library, like someone who isn't a huge goof. Probably TensorFlow.
  • Get away from weird numeric stuff entirely. Look at indie tabletop games, and try to transfer some of the rules to an electronic form. This would probably start in Lua, and I'd try to port the result into TiddlyWiki.
If anyone has any input, I'll let it rattle around, maybe even act on it in some abstract way.

So, where I stand now: I appear to have hit a bug in Rust, but I don't really want to chase it right now. Going to switch gears. That's all for now. See you around!
User avatar
mwchase
Contact:

Re: Let's Code: Nobilis 3e Character Creation

Postby mwchase » Sun Apr 16, 2017 6:35 pm

Week 7: A Fork In The Road OR Introduction To Yak Shaving

Hey there. I'm doing something else for a while, but before I can do something else, I need to do something else.

...

I started this wrong. Here's the deal: I'd like to mess around with prototyping some kind of helper application for RPG character creation, specifically the Lifepath system in Nobilis 3rd Edition. To do this, I want to use Lua, because it's a good choice for coming up with lightweight formats that happen to be executable. If it doesn't work, I'll use Python, but I wanted to give it a shot. Now, I decided to grab some libraries beforehand. I mean, I need to do stuff like internationalization (partly an excuse to separate game text from code, because heck if I understand copyright), and I also grabbed libraries for testing, code coverage, general utilities, and it would be nice to have logging, right?

Um.

Well.

The most prominent logging library for Lua, LuaLogging, is functionally abandoned*. I'm not totally sure how this happened; I guess it's mostly used as an embedded interpreter? I hope that's why there isn't some kind of action towards getting out a version that incorporates the pull requests, and fixes the weird bug in its tostring function where it alphabetizes integer indices.

So, my options are basically to complain, or to actually do something about this.

... I really, really, would rather just complain, but lets get this over with.

So, I forked the project on GitHub, renamed it, and created a new git branch to hold development, then set it to the default. The result lives at https://github.com/mwchase/log4l, and as of this writing, is barely altered from the base repository (which has not been committed to in two and a half years, and fails to support new versions of Lua properly).

Then, to do proper development, I make a local clone. I mention this step only because I'm a filthy partisan and use Mercurial wherever possible. The main takeaways I got were:
  • git branches are equivalent to mercurial bookmarks, and mercurial branches are equivalent to not doing what you'd want, at all.
  • The plugin I'm using doesn't seem to be able to handle pushing new bookmarks to GitHub, so I had to create my dev branch through the web UI.
I want to get some pull requests from the original project in, but first I should set up continuous integration. The tutorial that's at the top of the search results is for Travis, so I'll use that. The initial step is to get the original tests, against 5.1 and 5.2, running. So, once I activate the repo in Travis, I just adapt a travis.yml file from the internet and go, hopefully. Looks like this project uses a little shell script for its tests, so I'll hook into that for now.

After a slight hiccup, the tests run, and, um...

Code: Select all

$ tests/run_tests.sh
lua: cannot open test.lua: No such file or directory
The command "tests/run_tests.sh" exited with 0.
Done. Your build exited with 0.
...
$ tests/run_tests.sh
lua: cannot open test.lua: No such file or directory
The command "tests/run_tests.sh" exited with 0.
Done. Your build exited with 0.
We have discovered a bold redefinition of "success".

And now the tests actually run, but they choke on something, error out, and still give a return code of zero. This test script is just relentlessly optimistic. ... Weird. hererocks, the Python package I'm using to install my Lua test environment, doesn't appear to support altering the LUA_PATH variable. Meaning it won't work on my machine at all, and it's not looking for all of the things it should be looking for on Travis. And I found a command to do it. Several builds later, and I've got all of the tests passing, and, thanks to a StackOverflow answer, actually reporting their success properly.

There's a few directions I could take here. One would be to try to set up coverage with these tests. Another would be to enable 5.3 and see how it works. I'm thinking coverage. Got coverage up and running, and oh dear. Some of these tests... don't.

I've got to fix the tests to improve the coverage to prepare to take the pull requests that'll let me use this module on the version of Lua I want. In the house that Jack built.

You can follow along with me throughout the week by watching commits at https://github.com/mwchase/log4l. This is one of the things that continuous integration with an open-source project forces you to do: make all of your weird false starts and stuff public, because it's just so much easier to get the test machines to run it.

And I think I've got enough improvements made to this now that I'd like to do one more interface pass, then look into updating the documentation and releasing this.

Status as of Saturday night: coverage is improved, I'm testing on 5.3, I've fixed a bunch of bugs and acted on some pull requests (though half the pull requests against the base repo are, um, not correct?). I think it's about time to try getting an alpha build onto LuaRocks.

Status as of Sunday afternoon: the initial version is available on LuaRocks, as log4l. It's got a bunch of bugs fixed, maybe some new ones added. It desperately needs some feedback, so I'll be trying to figure out how to get eyes on it.

So, where I stand now: I took over a project, fixed a bunch of bugs, published an updated version under a new name, and now I'm done with it for now, though the docs do need a bit more touching up. Now that the obvious really bad bugs are finished off, I'll switch gears to the project proper. That's all for now. See you around!

*LuaLogging. Is it maintained? No. No it is not. Is anyone maintaining LuaLogging? That is not a thing that is happening. Who is maintaining LuaLogging? Nobody. Nobody is maintaining LuaLogging. Is LuaLogging a library that is under active development? I do not want to say no, but honesty compels me. Has LuaLogging been updated to correctly support Lua 5.3? No, that is not a thing that has happened.
User avatar
mwchase
Contact:

Re: Let's Code: Nobilis 3e Character Creation

Postby mwchase » Sun Apr 23, 2017 4:54 pm

Week 8: Huh.

Well, I thought I was going to find a clear story for gui with Lua, but, uh, not so much.

I was tempted to try to just pretend I have a gui, write an interface to the fake gui, and mock the interface to it for tests.

This is a very yak-shavy project. I ended up falling down a hole of "let's make a bunch of pure-Lua packages". I pondered converting luvit's readline implementation into a form that I grok. Like, I guess if possible, I want something that takes a stdin/stdout pair, and operates on that.

Update from several days later: I started converting readline, and ended up getting a bit burnt out because most of the changes I was making to adapt it weren't interesting. So, I started mocking up an interface using a less-featureful version of Python's cmd module. That was all pretty neat, but I had this utopian idea that it would be internationalizable, and, um...

Let's play a game of, let's see if you're smarter than I was here. So, I'm designing a terminal app with word-based commands, right. And I want to be able to swap out the interface language.

...

If it took you less than ten minutes to be totally disgusted with the scope of the technical issues there, you're more on the ball than I was!

My options were to either drop internationalization, or the terminal interface setup. Since my port of cmd isn't even quite half-baked yet, I went and looked into GUI libraries. I'd done this before, and come away discouraged, because the GUI-running-on-OSX-and-the-latest-version-of-the-language story is not great in Lua. But I had another look, and I'm glad I did, because the situation for Python seems to have gotten notably better than the last time I checked. Python's always had Tk, and I've poked at it occasionally, but what's different now is...

The Phoenix Project. I'm reflexively skeptical of software rewrite projects with grandiose names like that, but it looks like the effort to update wxPython has paid off! As of just a few days ago, they've started putting up pip-installable builds.

So, if I move from Lua to Python, I'll have options. Time to do mockups~...

But first, addressing an obvious point: yes, if I switch to Python, it means I'm not going to use the logging library I spent last week on. But I choose to look at it this way: now I'll have a properly-working logging library next time I go back to Lua. This'll pay off at some point.

Several mockups later, and my tender hands, softened by months of pure intellectual pursuit and the velvety smoothness of cyberspace, are callused anew by the rough labor of holding a fucking pencil oh my god. This is a thing that actually happened, and I'm almost too shocked to feel shame?

Anyway, I've got the mockups, and I'm now striving to translate them to code. This is a little embarrassing, compared to posting Rust, because I know Python well enough to be put off by the code I'm writing.

Here's a screenshot:
Image
Here's the code: https://pastebin.com/LnnjNzm1

(Technically, I made and marked up the screenshot with an earlier version of the code, but I didn't touch the display logic.)

Code: Select all

KeyType = collections.namedtuple('KeyType', 'strength weakness')
Everything relating to this is new. I need to actually represent the lifepath data at some point, so here's a bit of the lifepath data.

Code: Select all

POWER_TEMPLATES = ()
POWER_TEMPLATES += ...,
I have a delimiter-detecting plugin in my text editor. It gets confused too easily, and I structured this code thus to appease it.

Code: Select all

    # Not sure how best to represent this.
    "(.*'s )?(Rage|Hatred|Fury)",
Hooray for special cases, I guess.

Code: Select all

DECEIVER_TEMPLATES = ()
I want to make the main lifepath code first, but no reason I can't accommodate the expansion-ish content. These are why there are all those "None"s in the Power block. Actually, I might lay things out a little differently...

Code: Select all

DECEIVER_TEMPLATES += KeyTemplate(
    DECEIVER_KEY, 'Phoenix Posy', 'Self-Creating Key, the',
    'I HAVE MADE MYSELF...', None, None, 'how you fit into the world',
    ("I'm an enemy of the world", 'from nothing')),
I tweaked this last line a bit, compared to what's in the book. Not sure what the elegant solution is, because it seems like it should match some of the other starting bullets, but it's also supposed to follow on from the name of the Key...

Code: Select all

class MultiSelect(wx.Panel):
I am pretty proud (for now) of everything in this class that doesn't directly interact with wx.

Code: Select all

for child in self.Children:
Now wait a sec. What happens if I add non-button Windows below this object? That'd be pretty weird, right?

Code: Select all

def make_power():
Currently I just have most of the logic shoved into this function for prototyping purposes. Most of the code doesn't particularly do anything, though the limiter code on the Keys does work.

I think the biggest issue I have with this code currently comes with the conflation (implicitly endorsed by the wxPython tutorial, the early parts of it at least) of UI elements with business logic, through inheritance. This means that business logic is kind of awkwardly cordoned into specific UI elements, and the wx framework kind of owns everything. Like, I guess I can test this automatically, but it would involve some weird dummying. I think super-ideally, I'd be managing all control state from my code, and just relying on wx for layout/rendering/events. (Also, with inheritance, the only way I'm confident I won't step on an attribute is if I deliberately use a different naming convention, which I guess I was anyway.)

So, where I stand now: I started prototyping a GUI in Glorious Future Ultra wxPython, and it works, so long as you don't care about stuff like "aesthetics" or "being able to see a legible amount of the UI on startup", which are both on me, I think. I've got some ideas for improvement, and I'm going to try to enact them over the following week. That's all for now. See you around!

Now that I look at it, I can't get over how macOS window screenshots have a drop shadow.
User avatar
mwchase
Contact:

Re: Let's Code: Nobilis 3e Character Creation

Postby mwchase » Tue Apr 25, 2017 11:22 pm

This week is going to be weird. Might have a post Sunday, might not. It's all about priorities, which are really messed up right now.

Return to “Twenty Sided Stuff”

Who is online

Users browsing this forum: No registered users and 1 guest