kmarekspartz

Hakyll CSS Template Compiler

December 9, 2014

To appease Google, I inlined CSS in my Hakyll pages.

Google wasn't happy with how many render-blocking resources I was loading from pages on this site. It suggested that I inline render-blocking CSS.

I'm not sold that this is ideal. Previously, if a client requested a resource they already had, they wouldn't necessarily need to re-download the resource. The CSS being used across pages could be cached, so each request would only get the necessary unique data. Other than the initial fetch which would get all the resources, the amount of information exchanged is minimal.

With the CSS inline, this content cannot be cached (across pages), and therefore it is downloaded for each page. While there are fewer requests needed to get a single page, there is redundant information being included in each request.

Anyway, I thought I'd try it out. Hakyll provides a CSS compiler which compresses CSS by removing whitespace, comments, and redundant punctuation. It also provides a template compiler, which allows variable substitution and nesting of templates. However, it doesn't provide a way to compress CSS inside a template. I took the definition of it's template compiler and its CSS compiler and combined them:

cssTemplateCompiler :: Compiler (Item Template)
cssTemplateCompiler = cached "Hakyll.Web.Template.cssTemplateCompiler" $
    fmap (readTemplate . compressCss) <$> getResourceString

I made a few other changes in order to make this work. I needed to combine my CSS files, and then include the combined CSS as a partial template in my default template.

I did have one CSS file that I couldn't inline like this, since it specifies additional resources to fetch (fonts), but this is a Google hosted file, so hopefully they won't hold it against me. Google suggested that I create a link to this file after rendering, using JavaScript, so that's in my changes, too.

One Year of Blogging

December 8, 2014

A look back over some recent posts from this blog

In the last year, I've written 103 posts (an average of one every three and a half days) for a total of 20,183 words.

In the last six months, I've written 24 posts, for a total of 6,223 words. Compared to my first six months of blogging, my post quantity is definitely down, but my word count per post is up (~260 words compared to ~178), which may be a naive estimate of quality.

Benchmarking Large Riak Data Types: A Potential Fix

December 3, 2014

Russell Brown has a potential fix for the Data Type growth

A potential fix

As I mentioned in my last post, Russell Brown is working on a feature branch of riak_dt which may have a fix for some of this. Today, I ran my benchmark against a locally-compiled Riak, and then I built it with the potential fix.

My environment

I'm on Mac OS X 10.9.5, and I ran the benchmark against a single Riak 2.0.2 node running locally, installed from source. I have OTP R17, and therefore had to make a few changes in the Riak build system:

Allow 17 in the accepted versions regex
Disable warnings_as_errors in a few of the dependencies as needed
Exclude rabbit_common due to conflicting modules

To include the patched version, I changed the riakdt branch that riakkv depends on to bug/rdb/faster-merge-orswot.

Results

Map writes has improved significantly. Not only does it no longer seem exponential, but it is much faster! Notice the time scales:

Unpatched Map writes (data-type API)

Patched Map writes (data-type API)

The same applies for Map reads:

Unpatched Map reads (data-type API)

Patched Map reads (data-type API)

The stripes in the Patched Map reads chart is likely due to the precision the benchmark uses when printing the time. They can be ignored.

However, there was a regression (tradeoff) with regards to size. It appears that the patched version grows at a faster rate, though it is still linear.

Unpatched Map byte size and number of items

Patched Map byte size and number of items

The blip around 1,000 items may be due to the change in the size of the elements from 3 to 4 bytes.

There are no visually significant changes for Map reads using the key-value API.

Unpatched Map reads (key-value API)

Patched Map reads (key-value API)

20k Elements

Testing the unpatched version with 20,000 elements would be unbearable, but did test the patched version to give a better idea of how it would behave at that size. It's not quite what I was expecting, but it is still much faster than the unpatched version with 3,000 elements.

Patched Map writes (20k elements) (data-type API)

Patched Map reads (20k elements) (data-type API)

Raw data

Here you go!

Future work

Test the patched Set implementation.
Test against an actual cluster rather than a single local node.
Make a basho_bench version of the benchmark.

Once again, big thanks to Russell Brown for the quick turn-around on this! We're looking forward to using this change once it is officially released. We are also eagerly anticipating delta-mutation, which should bring the linear growth (for writes) down to amortized constant time, if I'm understanding the change correctly.

Benchmarking Large Riak Data Types, continued

December 2, 2014

Some additional information about yesterdays post

Yesterday's post could have included a bit more information. Here's that information.

My environment

I'm on Mac OS X 10.9.5, and I ran the benchmark against a single Riak 2.0.2 node running locally, installed from homebrew, which does a binary install.

Other graphs

Some graphs I didn't include were a comparison of byte size and number of items, Set read times for both the key-value and data-type APIs, and Map read times for the key-value API. Here they are:

Set byte size and number of items

Map byte size and number of items

Set reads (key-value API)

Set reads (data-type API)

Map reads (key-value API)

Raw data

Here you go!

A potential fix

Russell Brown is working on a feature branch of riak_dt which may have a fix for some of this. I'm running the benchmark against a locally-compiled Riak right now, and then I'll build it with his potential fix and compare the benchmark results.

Update: I ran the benchmark against the potential fix with good results.

Benchmarking Large Riak Data Types

December 1, 2014

A benchmark for large (10k+ elements) Riak Data Types

Riak 2.0 includes Data Types which provide reasonable guarantees with regard to eventual consistency. However, the data types, as implemented, aren't really designed to be performant with a large number of elements.

To that end, here's a benchmark which demonstrates the behavior of large maps and sets using the Riak Ruby Client. First, it reads using the key-value API and gets the size of the object (number of bytes, not number of elements). Then, using the data-type API, it reads, adds an element, and reads twice. This allows us to compare read performance between the APIs, as well as examine performance relative to the number of elements or bytes.

require 'riak'
require 'benchmark'
require 'securerandom'

data_type = "maps"  # or "sets"

bucket = "testing CRDT #{data_type}: #{SecureRandom.hex}"

n_items = 15000
n_containers = 1

client = Riak::Client.new(nodes: [{:host => 'localhost', :pb_port => 8087}])

puts 'items, id, size, kv_read_elapsed, read_prewrite_elapsed, write_elapsed, read_postwrite_elapsed, read_postwrite_elapsed2'
(1..n_items).each do |i|
  (1..n_containers).each do |id|
    key = id.to_s
    size = 0
    kv_read_elapsed = 0
    begin
      kv_read_elapsed = Benchmark.realtime do
        size = client.bucket(bucket).get(key, type: data_type).content.raw_data.size
      end
    rescue Riak::ProtobuffsFailedRequest
      # get will fail the first time through.
    end
    container = nil
    read = nil
    write = nil
    if data_type == "maps"
      read = lambda{ container = Riak::Crdt::Map.new(client.bucket(bucket), key) }
      write = lambda{ container.sets['set'].add(i.to_s) }
    elsif data_type == "sets"
      read = lambda{ container = Riak::Crdt::Set.new(client.bucket(bucket), key) }
      write = lambda{ container.add(i.to_s) }
    end
    read_prewrite_elapsed = Benchmark.realtime &read
    write_elapsed = Benchmark.realtime &write
    read_postwrite_elapsed = Benchmark.realtime &read
    read_postwrite_elapsed2 = Benchmark.realtime &read
    puts "#{i}, #{id}, #{size}, #{kv_read_elapsed}, #{read_prewrite_elapsed}, #{write_elapsed}, #{read_postwrite_elapsed}, #{read_postwrite_elapsed2}"
  end
end

Results

I ran the benchmark for 5,000 items in a Map, and 10,000 items in a Set against a single local node.

For both Sets and Maps, size has near-linear growth relative to the number of items[^2].

[^2]: Item size should matter, but they are all short strings in this benchmark.

Set reads for both the key-value and data-type APIs are small and seemingly constant.[^1] These are nearly flat lines, so I didn't include the plots. The same goes for Map reads using the key-value API, but not Map reads using the data-type API:

Map reads (data-type API)

Consecutive reads don't seem to have much of an effect:

Map consecutive reads (data-type API)

Both Set and Map writes seem to grow superlinear:

Set writes (data-type API)

Map writes (data-type API)

[^1]: This may be a flaw in the benchmark. Perhaps someone with more knowledge of the Ruby Riak Client can tell me more.

Future work

I haven't tested this against a Riak cluster rather than a single local node, though similar behavior has been seen in both. basho_bench could be used to test without using the Ruby client. Additionally, I'd like to test the data types directly.

Update: For some missing graphs, and more information about my environment, please see my next post.

Update #2: Russell Brown may have fixed this in a feature branch. I ran the benchmark against the potential fix with good results.

Benjamin Franklin on Open Source

November 1, 2014

Ben Franklin is a fan of Linus's Law.

These Thoughts, my dear Friend, are many of them crude and hasty, and if I were merely ambitious of acquiring some Reputation in Philosophy, I ought to keep them by me, 'till corrected and improved by Time and farther Experience. But since even short Hints, and imperfect Experiments in any new Branch of Science, being communicated, have oftentimes a good Effect, in exciting the attention of the Ingenious to the Subject, and so becoming the Occasion of more exact disquisitions (as I before observed) and more compleat Discoveries, you are at Liberty to communicate this Paper to whom you please; it being of more Importance that Knowledge should increase, than that your friend should be thought an accurate Philosopher.

– Benjamin Franklin, from the Invention of Air by Steven Johnson.

Glabal

October 28, 2014

Global sandboxes for Cabal

I recently moved to a new laptop and decided to rethink my GHC and cabal setup. After installing GHC for Mac OS X and running cabal install cabal-install, I decided not to install any other cabal packages globally in order to avoid cabal hell. However, there are a few programs I link to install globally with cabal, e.g. pandoc. I asked Tyler Holien what he does, and he suggested a separate cabal sandbox for each program and a symlink to the resulting binary somewhere on my $PATH.

I created a script called glabal to do the first part of this work for me:

# ~/bin/glabal:
mkdir -p ~/.installs/$1
pushd ~/.installs/$1
cabal sandbox init
cabal install $1 -j8
popd

To use this, I type glabal pandoc and it creates a sandbox in .installs/pandoc and attempts to install pandoc. If this doesn't complete successfully, I either rm -r .install/pandoc or go into the sandbox manually and attempt to recover.

Parallelizing algorithms in Haskell

August 28, 2014

I demonstrate parallelizing existing code in Haskell

I've been working through Parallel and Concurrent Programming in Haskell. In my last post, I demonstrated the facilities Haskell provides for lightweight concurrency. In this post, let's take a look at Haskell facilities for parallelism.

As a brief example, let's parallelize Quicksort[^1].

import Control.Parallel.Strategies

Strategies provide a means to tell the run-time system how to evaluate objects. We'll be using rseq is the sequential evaluation strategy, and parList takes a strategy for list items, and uses that strategy for each list element in parallel.

Here's our non-parallelized Quicksort implementation:

quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort (x:xs) =
  let leftPartition = [y | y <- xs, y < x]
      rightPartition = [y | y <- xs, y >= x]
      left = quicksort leftPartition
      right = quicksort rightPartition
  in left ++ [x] ++ right

Quicksort partitions a list around a pivot, sorts each partition, and then combines the partitions and the pivot.

Our parallelized version is almost the same:

parallelsort :: Ord a => [a] -> [a]
parallelsort [] = []
parallelsort (x:xs) =
    let leftPartition = [y | y <- xs, y < x] `using` parList rseq
        rightPartition = [y | y <- xs, y >= x] `using` parList rseq
        left = parallelsort leftPartition
        right = parallelsort rightPartition
    in left ++ [x] ++ right

We simply tell the run-time system what strategy to use for the list comprehensions.

This doesn't really improve much in this case, but when used judiciously, extending your existing code with parallelism is straight-forward in Haskell.

This post is also available as a literate Haskell file.

[^1]: This isn't the best algorithm to parallelize, nor is this an efficient implementation, but it shows how to add parallelism to your code.

Concurrent implementation of the Daytime Protocol in Haskell

August 26, 2014

I demonstrate a concurrent implementation of the Daytime Protocol in Haskell

One example from Parallel and Concurrent Programming in Haskell is a concurrent network server. The server given in the book implements an informally specified doubling protocol, where each submitted line gets parsed as an Integer and returns the double of the input.

Back in May, Andrew Clarkson gave a talk at the PyMNtos meeting about Asynchronous IO in Python. As an example, he included an asynchronous version of a Daytime server.

Let's take the Haskell doubling server, and make it a (TCP) Daytime server.

... but first, some imports.

import Control.Monad (forever)

forever is useful for making loops out of void functions (IO ()).

import Text.Printf (printf)

printf behaves similarly to how it does in other languages: it takes a format string and items to interpolate into the format string.

import Data.Time (getCurrentTime, formatTime)
import System.Locale (defaultTimeLocale)

These are used to get the current time, and format it with the default locale.

import System.IO (
    Handle

A handle is a place for IO to stream data.

  , stdout

stdout is the default handle used for things like putStrLn.

  , hPutStrLn

This is the handle equivalents of putStrLn. Rather than assuming the standard handle, the handle is passed in explicitly. One can define putStrLn (in point-free style) as putStrLn = hPutStrLn stdout.

  , hClose

Handles need to be closed when you are done with them.

import Network (
    withSocketsDo

This is needed on Windows to initialize the networking subsystem. It is here for portability reasons.

  , listenOn

This opens a socket on a specified port.

  , PortID(PortNumber)

To specify the port, we'll pass in a PortNumber.

  , accept

accept takes a socket and returns a tuple of a Handle, a host, and the port.

import Control.Concurrent (
    forkFinally

forkFinally creates a new (light-weight) thread to run a specified process and when it completes, it “finally” runs another command. You'll see when we get there.

  , threadDelay

It turns out our single-threaded implementation is quite efficient, so we'll add a slight delay to make clear how concurrency affects the server.

This program has four different main functions. The fourth one is the concurrent Daytime server, so we'll use that implementation as our main main:

main = main4

First, lets write a non-server version of what a Daytime server does. According to the specification:

“Once a connection is established the current date and time is sent out the connection as a ascii character string (and any data received is thrown away). The service closes the connection after sending the quote.”

A non-server of this would just be a simple version of date

main1 :: IO ()
main1 = do
  ct <- getCurrentTime
  let time = formatTime defaultTimeLocale "%F %T" ct
  putStrLn time

This outputs the time, and halts.

In order to work with sockets, however, we'll need to use the Handle equivalent program. We'll also adding a slight delay to make the benefits of concurrency clear later.

To convert main1 to use handles explicitly, we'll pass an output handle in, and use hPutStrLn instead of putStrLn:

mainWith :: Handle -> IO ()
mainWith outH = do
  threadDelay (10^6) -- to simulate "real" work.
  ct <- getCurrentTime
  let time = formatTime defaultTimeLocale "%F %T" ct
  hPutStrLn outH time

Ignoring the threadDelay, the equivalent program to main1 would be:

main2 :: IO ()
main2 = mainWith stdout

Now we can get on with implementing the server. The specification establishes port 13 for the Daytime protocol.[^1]

port :: Int
port = 13

Let's start out with the server implementation from the book:

main3 :: IO ()
main3 = withSocketsDo $ do
  sock <- listenOn (PortNumber (fromIntegral port))
  printf "Listening on port %d\n" port
  forever $ do
    (handle, host, port) <- accept sock
    printf "Accepted connection from %s: %s\n" host (show port)

Here's where things diverge. We use our mainWith in place of talk. Since we can pass handles into mainWith, we can pass handles returned by accept into mainWith:

    mainWith handle

Don't forget to close the handle after the connection has been handled (as specified)!

    hClose handle

This handles each connection in sequence. Since handling the connection takes over a second of work (due to the thread delay), it can only respond to one connection per second.

Let's add some concurrency! This first part is the same:

main4 :: IO ()
main4 = withSocketsDo $ do
  sock <- listenOn (PortNumber (fromIntegral port))
  printf "Listening on port %d\n" port
  forever $ do
    (handle, host, port) <- accept sock
    printf "Accepted connection from %s: %s\n" host (show port)

Here's where forkFinally comes in. Instead of calling mainWith handle, we fork a new thread to call it. When the thread completes, we (finally) close the handle.

    forkFinally (mainWith handle)
                (\_ -> hClose handle)

Since a new thread is created for each connection, we are no longer limited to one connection per second.

To run this file[^2]:

runhaskell daytime.lhs

To test the concurrency[^3]:

yes "nc localhost 13" | parallel -j 32

This streams commands to connect to localhost on port 13, and uses parallel to have 32 worker threads running those commands. With main3, you should see one response per second, whereas with main4, you should see 32 responses per second.

[^1]: You may need root access in order to run this program. If you do not have root access, change the port to 1313 or some other number above 1024. Socket numbers below 1024 are generally protected on modern computers.

[^2]: Again, you may need root access.

[^3]: You'll need GNU parallel installed, if you don't have it.

Imports, continued

August 6, 2014

I've been asked a follow up question about reexporting items.

I sent yesterday's post about imports to Danny Gratzer. He had a follow-up question:

“Is there a Python equivalent to the Haskell trick of doing the following which will then export all 3 modules?”

module Foo (module X) where
import Data.Text as X
import Data.ByteString as X
import Control.Lens as X

The following should accomplish the same result, though it isn't as concise due to Python's lack of an embedded module language. Though Haskell's module language is minimal[^1], it has one. In Python, file structure determines modules (and packages).

[^1]: ... relative to languages such as ML. There is ongoing work to improve Haskell's module language.

# foo/x.py:
from data.text import *
from data.byte_string import *
from control.lens import *

# foo/__init__.py:
import foo.x as x

# bar.py:
from foo import x

I wouldn't recommend these sorts of tricks as the import order affects what is in scope, e.g. if Data.Text and Data.ByteString both defined baz, which would be X.baz? Ideally import order should not matter as it is an implementation detail that should be hidden by a module. A language (or linter) can enforce that property by preventing imports with conflicting names in general.