Functional Programming - Turning Eggs into Egg Salad
The Functional Programming paradigm encourages you to think of your application with respect to data, and the functions that operate on them. When explaining this to people unfamiliar with the concept, I like to reference my favorite recipe for Egg Salad.
Let’s say you start with a box of a dozen eggs. In order to make an egg salad, one would follow these steps
- Visually inspect each egg, and remove any eggs that you suspect to be rotten
- Boil each egg
- Peel and chop each boiled egg into pieces of roughly the same size
- Combine into a bowl with spicy mayonnaise, cheese and chives until the bowl is full
In functional programming parlance, you have just taken a list of eggs, and applied the following functions: filter, map, mapcat, and reduce.
Let’s look at each of these functions in detail.
The first step of our algorithm is to visually inspect each egg, and filter out each rotten egg.
Coded in the above statement, are the core principles behind a filter
- A filter accepts a list of items and a predicate, and returns a list of items. The outputted list has fewer (or equal) number of items. The predicate can only return two values: true (keep in result) or false (remove from result).
- A filter can not change any individual egg. The egg remains unchanged before and after filtering.
- The effect of filtering one egg has no impact on the filtering of the next egg (or any other egg). A rotten egg has no impact on the next egg in the series.
The next step of our algorithm iterates over every egg that remained from the filter, boiling each one.
- Map accepts a list of items and a mapping function (f). It outputs a list where each item is replaced with f(item)
- Map cannot change the number of items returned. No items can be added or removed, and each item in the result has a corresponding item in the input.
- Similar to filter, mapping one item has no impact on any other items. Boiling one egg has no impact of the cooking of another egg.
- Sometimes it may make sense to boil multiple eggs together. For performance characteristics, check out the section below on laziness.
Chopping one egg will give you a list of egg pieces. Mapping over a list of eggs and chopping each one, will give you a list of list of egg pieces. Concatenating those list gives you a list of egg pieces.
- Mapcat is very similar to a map, except the mapping function is expected to return a collection of items. These lists are concatenated to get a single, long list of items
- Mapcat can be used to change the number of items in the result, though each result item will have a single source item. But a single source item may be mapped to many (or none) result items.
- Similar to map, the items outputted from one source item has no impact on the remaining items
The final step in the recipe is to combine all the egg pieces into a bowl. Unlike the other steps, this step does not output a list of items, it outputs a single bowl of egg salad.
- Reduce will consume the entire list, accepting a starting value and a function to apply to each item.
- In our case, the starting value was a bowl containing mayonnaise and chives, and the function was to stir the next piece of egg into the bowl. This is repeated until all the egg pieces are consumed.
- If the bowl is full, unconsumed egg pieces are simply wasted
- Since reduce consumes all the egg pieces available, it is typically the last step of a map-filter-reduce chain.
The above algorithm transforms eggs into an egg salad. But an important question remains. How many eggs do I need for my egg salad? Let’s say I start with 12 eggs (10 good ones and 2 rotten ones), and I need 5 eggs worth of pieces for my egg salad.
The Non Lazy Approach
The first approach we can take is the non lazy approach. This is what is taken by Ruby, Python and most other languages by default (Ruby since implemented lazy collections, but the default enum functions are eager by default).
Let us consider the following ruby pseudo-code: eggs.filter(&:not_rotten?).map(:boil).flat_map(&:peel_and_chop).reduce(into_bowl)
Assuming we filter out the two bad eggs, we will end up boiling, peeling and chopping 10 good egg pieces. When reducing, we only need 5 eggs worth of pieces for our egg salad, which means that 5 eggs are wasted. Worse, they’ve been chopped up, so I can’t put them back into the refrigerator.
Is there a way that we can consume an egg only when needed?
The Lazy Sequence Approach
Let’s now take an extremely lazy approach. filter, map, mapcat and take can all return a lazy collection, meaning that items are realised as needed. Only reduce will force the lazy sequences. This means that as reduce needs more pieces to fill it’s bowl, a single egg will be inspected, boiled, peeled and chopped.
This guarantees that less than one egg is wasted, as a single egg is processed at a time.
However, boiling each egg individually is going to be tedious and time consuming. Is there a way to club eggs so that they can all get boiled together?
A Hybrid Approach
Clojure takes a slightly hybrid approach. Instead of boiling one egg at a time, which is time consuming, Clojure will chunk items together, and boil a batch of 32 eggs at a time. This may not give a lot of benefits when you need only 5 eggs, but it ensures that you will waste a maximum of 31 eggs, even if you are making 100 bowls of egg salad from a pool of 600 eggs.
In the case of Clojure, these optimisations happen under the hood, and you do not have easy control over these. A simple technique to emulate the same thing in any language is to chunk a list into smaller lists-of-5-items, and mapcat each list-of-5-items into the final result.
Infinitely Long Sequences
Once you start dealing with lazy sequences, it is possible to imagine infinitely long sequences. For example, you may tie the source of your eggs to your amazon account, so that every time you are out of eggs, it will automatically order a new batch of dozen. Infinite sequences know that you will never need to realise the entire sequence, you will probably consume some finite amount of items and ignore the rest.
Further, infinite sequences get items just in time. If you only order 12 eggs at a time from Amazon as you need it, you only need storage for 12 eggs at a time. All other eggs are either unrealised (in Amazon), or processed (in the bowl). Storage here is a metaphor for memory required.
Haskell takes this to an extreme level. I have no idea how this works, but Haskell will not start to boil the first egg until somebody tries to eat the egg salad. It is widely believed that when Simon Peyton Jones, the lead developer of the Glasgow Haskell Compiler, orders a pizza, a new pizza restaurant gets a license.
To Quote Carl Sagan:
Get in touch with email@example.com for a free demo of Quintype headless CMS platform.