kmarekspartz

Sacrificing Safety for Flexibility

February 19, 2014

An analogy between type systems and databases

There are tradeoffs with almost any technology choice. Often, a choice's strength is also its weakness.

When choosing a programming language, should it use static or dynamic typing? There are arguments to be made on each side. For static, we get a reasonable amount of assurance that certain classes of bugs cannot occur. However, with dynamic, we can be more flexible about our structure and design. Many people have argued for the inverse: languages with static type systems are inflexible and languages with dynamic type systems are unsafe.

In databases, one ongoing trend is the popularity of NoSQL systems, particularly key-value stores. One argument for key-value stores is their flexibility. However, just as before, we lose the guarantees that a static type system (i.e. SQL) provides. This is just a tradeoff that has to be considered when choosing a database.

The main argument for flexibility in both dynamic languages and key-value stores is how fast a prototype can be made since there is little need to account for changes in schema or typing. However, when these systems need to be made robust later on, developers end up making code to account for the schema or typing in their system. This code would be redundant and automatic in a static type system.

It is perhaps even worse when dynamism is forced into static systems. In C, we could just put all of our data into an untagged union. This would allow us to pass anything into anywhere. We lose the safety of a (weak) static type system for a small gain in flexibility. In SQL databases, we can denormalize and put everything in one table, and use nullable strings for every column type. But again, we lose safety for flexibility.[^1]

[^1]: Interestingly, both forcing static types into a dynamic system and forcing dynamism into a static system decreases maintainability.

Databases are Categories

February 19, 2014

Expanding on the link between databases and category theory

It turns out that people more qualified than I are working on the link between databases and category theory.

While perusing through my backlog of unread posts from Edward Z. Yang, I came across this interesting post from 2010 (I'm pretty far behind!). Some of the links from the post are dead, so I'll amend that here:

In 2010, David Spivak gave a tech talk entitled Databases are Categories at Galois. The slides are more digestible than the paper.

Types and Programming Languages

February 18, 2014

Types and Programming Languages book

Just because you've implemented something doesn't mean you understand it.

– Brian Cantwell Smith (p. 88, Types and Programming Languages)

I'd like to learn more about type systems, so I'm working my way through Benjamin C. Pierce's Types and Programming Languages. I also would like to learn more about Haskell, so I'm implementing languages from the book as I encounter them. If you'd like to follow along, the implementations I'm working on are on GitHub.

Before this I hadn't done any significant multiple-file Haskell work, so I'm particularly looking for feedback on the project's structure and whether it is idiomatic.

Props Now On PyPI

February 17, 2014

Releasing a property-based testing framework for Python

Props (PyPI, GitHub) provides property-based testing for Python à la QuickCheck.

There are other QuickCheck-like libraries for Python:

However, I'm not sure how to add instances of Arbitrary (in the Haskell sense) for any of them! That's where Props comes in.

In Haskell's QuickCheck, Arbitrary is a typeclass:

class Arbitrary a where
    arbitrary :: Gen a

Typeclasses are like interfaces, but for types instead of classes. The way to read this definition is “Any class a which implements the Arbitrary interface must provide a method arbitrary which returns a Gen a.” Think of Gen a as a container which can be turned into random instances of a.[^1] To create new instance of Arbitrary[^2]:

instance Arbitrary Bool where
    arbitrary = choose (False, True)

[^1]: I'll discuss the equivalent of Gen a being used in Props (closures! thunks!) in a later post. [^2]: This specific instance is already provided for you, but it makes for a good example.

In Python, we define a class which provides arbitrary as a classmethod which raises NotImplementedError since we don't have interfaces:

class ArbitraryInterface(object):
    @classmethod
    def arbitrary(cls):
        raise NotImplementedError

To implement an instance:

class bool(ArbitraryInterface):
    @classmethod
    def arbitrary():
        return random.choice([False, True])

However, in this case, bool is already defined, and we do not want to shadow that definition. Classmethods are just functions which dispatch on the type given, so we can define arbitrary somewhat like this[^3]:

arbitrary_dict = {
    bool: lambda: random.choice([False, True]),
    ...
}

def arbitrary(cls):
    if issubclass(cls, ArbitraryInterface):
        return cls.arbitrary()
    return arbitrary_dict[bool]()

[^3]: None doesn't work in this case, so I handle it separately in Props.

This way we can provide implementations of arbitrary for already defined types (either in the standard library or in external libraries) in arbitrary_dict, and also handle instances of ArbitraryInterface for our own classes.

Adding First-class Functions to Python

February 14, 2014

Object and function duality

Let's pretend we are stuck in an alternate universe where first-class functions[^1] don't exist in Python. How could we add them back in using classes, without modifying the run-time?

Here's a square function:

class SquareFunc:
    def apply(x):
        return x * x

square = SquareFunc()

Okay, that wasn't very exciting. What about higher-order functions?

class MapFunc:
    def apply(f, list):
        new_list = []
        for element in list:
            new_list.append(f(element))
        return new_list

map = MapFunc()

map.apply(square, [1, 2, 3])  # [1, 4, 9]

And that's it!

[^1]: Or list comprehensions, as that would be cheating for this example. Also, I'm pretending __call__ doesn't exist. Because.

Linting Natural Language

February 11, 2014

How I use GNU Style and Diction, and aspell.

For software, we have unit tests, linters, and type checkers. What do we have for natural language?

One tool that I use on occasion is diction. I can never interpret its default output, so I tell it to make suggestions with -s and to warn me of beginner mistakes with -b:

diction -bs posts/2014-02-11-linting-natural-language.md

With most word processors and websites we have spell checkers. For plain-text, we can use aspell:

aspell -c posts/2014-02-11-linting-natural-language.md

Are there other tools you use for linting natural language?

Top-down Programming

February 10, 2014

Why I prefer top-down to bottom-up programming

Think about the ideal way to write a web app. Write the code to make it happen.

Aaron Swartz (from web.py)

If we write the code that we think we will need, before we actually need it, we likely are writing code we don't need. Instead, use lazy evaluation to implement only the code that you need as you need it! This prevents the creation of spaghetti code. Not only does this keep our implementation simple, but this also allows us to program in the problem domain, which makes it easier to find problems later.

I've been using readme driven development to implement my port of QuickCheck to Python. It seems to be working well so far. All I have left is to add some tests, and name it! Any ideas for the name? There's already quite a few other libraries that have taken the good names.

Setting Class Names Dynamically in Python

February 7, 2014

Metaprogrammed named classes in Python

I'm finishing up my QuickCheck implementation for Python. I'm writing generator combinators to make it easier to make new generators. Since the generators I'm using are just classes with a class method arbitrary, the combinators dynamically create classes:

def a_combinator(generator1, generator2):
    class a_combinator_class(...):
        @classmethod
        def arbitrary():
            ...
    return a_combinator_class

However, this leaves us with an issue. I would like to be able to display a name for the generator. For generators which aren't constructed with combinators, we can just call generator.__name__, but this doesn't work for generator combinators. With the above combinator, that would show a_combinator_class no matter what generators we pass into the combinator. It would be better if it showed a_combinator(generator1, generator2). We can do this by setting __name__ on a_combinator_class:

def a_combinator(generator1, generator2):
    class a_combinator_class(...):
        ...
    a_combinator_class.__name__ = ''.join([
        'a_combinator(',
        ', '.join([
            generator1.__name__,
            generator2.__name__
        ]),
        ')'
    ])
    return a_combinator_class

On Teaching Programming

February 4, 2014

Culture over code

[...] someone who [...] has enough leadership or influence to cause other programmers to write code like they would write without them having to do it, because you don't have the hours in the day or fingers. Having that ability to spread your approach and whatever you've learned about programming, and have that go through some kind of community and produce a corpus of code that's bigger than you could do, that's as satisfying to me as being the one that stays up all night writing too much code.

– Brendan Eich (from Coders at Work)

On Choosing the Best Tool for the Job

February 4, 2014

Finding balance between tools that do it all and tools that are ideal

Django. Ruby on Rails. jQuery.

There are many such frameworks which promise to be Swiss army knives. They include as many tools that they think you'll need as possible, and then some. There are people who can program in these frameworks without knowing much of their respective host languages. They'll argue they are working on the level of the problem domain and don't need anything else. Working on the level of the problem domain is great, however, there are issues with dependence on a single tool.

If all you have is a hammer, everything looks like a nail.

– Law of the instrument

Flexibility of a tool could be defined in at least two ways. First, one could argue a Swiss army knife is flexible as it can be used for many different problems. On the other hand, decomposing the problem allows you to be choose the best tools for their respective jobs.

As a corollary, when using a decomposed system, if you're screwdriver stops being maintained or your employer uses a different platform, you can swap it out with minimal effort. When using a Swiss army knife, your screwdriver and your bottle opener might be one and the same. Since the tools are combined, there may be unintentional interdependencies that prevent you from switching completely.

You're locked in.