At PSPDFKit, we have a strong bias toward functional programming, which emphasizes immutability. Using immutable data structures has a lot of benefits in regard to the readability of your code, and it also makes it easier to reason about it. In this blog post, I will point out a few of these benefits and explain lazy operations that will also improve the performance of your code based on a simple example.
Immutable Data Structures
Immutable persistent data structures are data structures that can’t be mutated after they are initiated and therefore will always hold the same value after one is set. Operations on these data structures will not change them but will instead return new values. For example, when an operation is evaluated over an immutable list, the operation will return a new list instead of changing the original list. This may sound like a big performance hit, because a lot of copying takes place when you save the return values of an operation to another immutable data structure, but hash map tries and vector tries enable structural sharing, which minimizes the need to actually copy the data and save it in a memory-efficient way.
A Simple Example
To illustrate how immutable data structures can improve the readability of your code and how lazy operations work on persistent immutable data structures, let’s look a simple example where we have a list of people and want to calculate the average height for everyone older than 20. The list only contains each person’s birth year and their height in inches. The output should be the average height in centimeters for all people older than 20.
When looking at such an example, we can easily find the operations we need to apply to our data structure to get to the desired result:
- Calculate the age for every person.
- Filter every person by age.
- Convert the height of everyone older than 20 to centimeters.
- Sum up the heights and divide by the total amount of people.
Now, let’s look at the example input data:
Now it’s time to implement the functions needed for the first three points on our list:
After defining the functions, we can chain them together to get a list of all people older than 20 with their calculated ages and their respective heights in centimeters:
With the list of the selected people (all people, older than 20) and their converted heights, we’re now able to calculate the average height for them and print the resulting value to the console:
The use of simple and small functions chained together makes this code easy to understand, but there’s a problem with this declarative approach: We have to iterate multiple times over our dataset, which could cause performance problems when dealing with a lot of data. This means that the complexity of these operations is
k O(n) but can be easily improved to a complexity of
O(n). A simple solution to this problem is to only iterate over the dataset once by combining all functions in one
Although we accomplished what we aimed for with the above, we also sacrificed the readability of our code, which makes it harder for other developers to reason about the code and to maintain and understand it. This is one example where lazy evaluation will fit in nicely and provide you the best of both worlds: the declarative and composable way of programming of our first implementation, and the performance optimization of our second implementation.
Lazy evaluation delays the evaluation of an expression until the value is needed, and it also avoids repeated evaluations. Immutable.js provides lazy evaluations through the
Seq data structure. We can modify our first implementation to work on a
Seq instead of iterating over the list multiple times:
The only thing we had to do here was insert
Immutable persistent data structures provide declarative, compositional, and — through structural sharing and lazy evaluation — efficient ways to code, which makes it easier to understand and maintain your code. For this reason, the entire state for PSPDFKit for Web is stored in Immutable.js objects.