kmarekspartz

Expanding our capacity to respond to unforeseen changes

Internals of Props


In my previous post about Props, I said I would elaborate on its equivalent to Gen a from Haskell's QuickCheck.

In Haskell, consider the following implementation of a tree:

data Tree a = Tree { value :: a
                   , children :: [Tree a]
                   } deriving (Eq, Show)

Let's make it an instance of Arbitrary[^1]:

instance (Arbitrary a) => Arbitrary (Tree a) where
    arbitrary = do value <- arbitrary
                   children <- arbitrary
                   return Tree { value=value
                               , children=children
                               }

[^1]: This won't terminate as it would create an infinite Tree structure! But that's not important here. See StackOverflow if you want a version that would terminate.

Since Haskell can determine types at compile time, we don't need to put in any type annotations. In particular, we don't need to tell arbitrary what type we want it to return when making the arbitrary value and children.

In Python, the following class which implements ArbitraryInterface, would be approximately[^2] equivalent to the above Haskell type[^3]:

class Tree(ArbitraryInterface):
    value = None
    children = []

    def __init__(self, value, children):
        self.value = value
        self.children = children

    @classmethod
    def arbitrary(cls):
        return cls(
            arbitrary(int),           # Value
            arbitrary(list_of(Tree))  # Children
        )

[^2]: The Python Tree class is just for int. This would be equivalent to the Tree Int type in Haskell. If we had generics in Python... something for a later post.

[^3]: This also won't terminate. I'll leave that as an exercise for the reader.

Ignoring some details[^4], we can think of Haskell's Gen a as representing something which can be turned into an a. How can we make something that can be turned into an a? We can use a thunk! A thunk is an object which represents a paused computation. Forcing a thunk causes the computation to occur. Since Haskell is lazy, everything is thunk by default. In Python, iterators can be thought of as lazy, but they usually represent collections of objects, rather than just a single object, so this would not be the best use of iterators.

[^4]: For example, Haskell's QuickCheck passes in a nonce to Gen a to make sure the generated a is unique. There's also the notion of shrink which Props does nothing with. Yet.

One simple way to implement a thunk is to make a closure which takes no arguments. For example:

def setup_a_thunk(some, variables, _for, our, computation):
    def thunked_computation():
        // ...
        // Do a bunch of work
        // Using the variables passed in to setup_a_thunk
        return a_result
    return thunked_computation

We can then call setup_a_thunk a bunch of times, but it would not actually do the work until we force it, i.e. call the thunk with no arguments:

a_thunk = setup_a_thunk(...)
another_thunk = setup_a_thunk(...)

one_result = a_thunk()  # This forces the thunk.

Since objects are closures, we can view the class definition for Tree as setting up a closure, which we force when we call arbitrary.

So, there you have it. The equivalent of Haskell QuickCheck's Gen a in Props is a closure!

An analogy between type systems and databases


There are tradeoffs with almost any technology choice. Often, a choice's strength is also its weakness.

When choosing a programming language, should it use static or dynamic typing? There are arguments to be made on each side. For static, we get a reasonable amount of assurance that certain classes of bugs cannot occur. However, with dynamic, we can be more flexible about our structure and design. Many people have argued for the inverse: languages with static type systems are inflexible and languages with dynamic type systems are unsafe.

In databases, one ongoing trend is the popularity of NoSQL systems, particularly key-value stores. One argument for key-value stores is their flexibility. However, just as before, we lose the guarantees that a static type system (i.e. SQL) provides. This is just a tradeoff that has to be considered when choosing a database.

The main argument for flexibility in both dynamic languages and key-value stores is how fast a prototype can be made since there is little need to account for changes in schema or typing. However, when these systems need to be made robust later on, developers end up making code to account for the schema or typing in their system. This code would be redundant and automatic in a static type system.

It is perhaps even worse when dynamism is forced into static systems. In C, we could just put all of our data into an untagged union. This would allow us to pass anything into anywhere. We lose the safety of a (weak) static type system for a small gain in flexibility. In SQL databases, we can denormalize and put everything in one table, and use nullable strings for every column type. But again, we lose safety for flexibility.[^1]

[^1]: Interestingly, both forcing static types into a dynamic system and forcing dynamism into a static system decreases maintainability.

Expanding on the link between databases and category theory


It turns out that people more qualified than I are working on the link between databases and category theory.

While perusing through my backlog of unread posts from Edward Z. Yang, I came across this interesting post from 2010 (I'm pretty far behind!). Some of the links from the post are dead, so I'll amend that here:

In 2010, David Spivak gave a tech talk entitled Databases are Categories at Galois. The slides are more digestible than the paper.

Types and Programming Languages book


Just because you've implemented something doesn't mean you understand it.

– Brian Cantwell Smith (p. 88, Types and Programming Languages)

I'd like to learn more about type systems, so I'm working my way through Benjamin C. Pierce's Types and Programming Languages. I also would like to learn more about Haskell, so I'm implementing languages from the book as I encounter them. If you'd like to follow along, the implementations I'm working on are on GitHub.

Before this I hadn't done any significant multiple-file Haskell work, so I'm particularly looking for feedback on the project's structure and whether it is idiomatic.

Releasing a property-based testing framework for Python


Props (PyPI, GitHub) provides property-based testing for Python à la QuickCheck.

There are other QuickCheck-like libraries for Python:

However, I'm not sure how to add instances of Arbitrary (in the Haskell sense) for any of them! That's where Props comes in.

In Haskell's QuickCheck, Arbitrary is a typeclass:

class Arbitrary a where
    arbitrary :: Gen a

Typeclasses are like interfaces, but for types instead of classes. The way to read this definition is “Any class a which implements the Arbitrary interface must provide a method arbitrary which returns a Gen a.” Think of Gen a as a container which can be turned into random instances of a.[^1] To create new instance of Arbitrary[^2]:

instance Arbitrary Bool where
    arbitrary = choose (False, True)

[^1]: I'll discuss the equivalent of Gen a being used in Props (closures! thunks!) in a later post. [^2]: This specific instance is already provided for you, but it makes for a good example.

In Python, we define a class which provides arbitrary as a classmethod which raises NotImplementedError since we don't have interfaces:

class ArbitraryInterface(object):
    @classmethod
    def arbitrary(cls):
        raise NotImplementedError

To implement an instance:

class bool(ArbitraryInterface):
    @classmethod
    def arbitrary():
        return random.choice([False, True])

However, in this case, bool is already defined, and we do not want to shadow that definition. Classmethods are just functions which dispatch on the type given, so we can define arbitrary somewhat like this[^3]:

arbitrary_dict = {
    bool: lambda: random.choice([False, True]),
    ...
}

def arbitrary(cls):
    if issubclass(cls, ArbitraryInterface):
        return cls.arbitrary()
    return arbitrary_dict[bool]()

[^3]: None doesn't work in this case, so I handle it separately in Props.

This way we can provide implementations of arbitrary for already defined types (either in the standard library or in external libraries) in arbitrary_dict, and also handle instances of ArbitraryInterface for our own classes.

Object and function duality


Let's pretend we are stuck in an alternate universe where first-class functions[^1] don't exist in Python. How could we add them back in using classes, without modifying the run-time?

Here's a square function:

class SquareFunc:
    def apply(x):
        return x * x

square = SquareFunc()

Okay, that wasn't very exciting. What about higher-order functions?

class MapFunc:
    def apply(f, list):
        new_list = []
        for element in list:
            new_list.append(f(element))
        return new_list

map = MapFunc()

map.apply(square, [1, 2, 3])  # [1, 4, 9]

And that's it!

[^1]: Or list comprehensions, as that would be cheating for this example. Also, I'm pretending __call__ doesn't exist. Because.

How I use GNU Style and Diction, and aspell.


For software, we have unit tests, linters, and type checkers. What do we have for natural language?

One tool that I use on occasion is diction. I can never interpret its default output, so I tell it to make suggestions with -s and to warn me of beginner mistakes with -b:

diction -bs posts/2014-02-11-linting-natural-language.md

With most word processors and websites we have spell checkers. For plain-text, we can use aspell:

aspell -c posts/2014-02-11-linting-natural-language.md

Are there other tools you use for linting natural language?

Why I prefer top-down to bottom-up programming


Think about the ideal way to write a web app. Write the code to make it happen.

Aaron Swartz (from web.py)

If we write the code that we think we will need, before we actually need it, we likely are writing code we don't need. Instead, use lazy evaluation to implement only the code that you need as you need it! This prevents the creation of spaghetti code. Not only does this keep our implementation simple, but this also allows us to program in the problem domain, which makes it easier to find problems later.

I've been using readme driven development to implement my port of QuickCheck to Python. It seems to be working well so far. All I have left is to add some tests, and name it! Any ideas for the name? There's already quite a few other libraries that have taken the good names.

Metaprogrammed named classes in Python


I'm finishing up my QuickCheck implementation for Python. I'm writing generator combinators to make it easier to make new generators. Since the generators I'm using are just classes with a class method arbitrary, the combinators dynamically create classes:

def a_combinator(generator1, generator2):
    class a_combinator_class(...):
        @classmethod
        def arbitrary():
            ...
    return a_combinator_class

However, this leaves us with an issue. I would like to be able to display a name for the generator. For generators which aren't constructed with combinators, we can just call generator.__name__, but this doesn't work for generator combinators. With the above combinator, that would show a_combinator_class no matter what generators we pass into the combinator. It would be better if it showed a_combinator(generator1, generator2). We can do this by setting __name__ on a_combinator_class:

def a_combinator(generator1, generator2):
    class a_combinator_class(...):
        ...
    a_combinator_class.__name__ = ''.join([
        'a_combinator(',
        ', '.join([
            generator1.__name__,
            generator2.__name__
        ]),
        ')'
    ])
    return a_combinator_class

Culture over code


[...] someone who [...] has enough leadership or influence to cause other programmers to write code like they would write without them having to do it, because you don't have the hours in the day or fingers. Having that ability to spread your approach and whatever you've learned about programming, and have that go through some kind of community and produce a corpus of code that's bigger than you could do, that's as satisfying to me as being the one that stays up all night writing too much code.

Brendan Eich (from Coders at Work)

Enter your email to subscribe to updates.