In the first article, we’ve covered how async/await helps with async commands but it offers little help when it comes to asynchronously processing collections. In this post, we’ll look into the reduce function, which is the most versatile collection function as it can emulate all the other ones.
Reduce iteratively constructs a value and returns it, which is not necessarily a collection. That’s where the name comes from, as it reduces a collection to a value.
The iteratee function gets the previous result, called memo in the examples below, and the current value, e.
The following function sums the elements, starting with 0 (the second argument of reduce):
memo
e
result
0 (initial)
1
1
1
2
3
3
3
(end result) 6
Asynchronous reduce
An async version is almost the same, but it returns a Promise on each iteration, so memo will be the Promise of the previous result. The iteratee function needs to await it in order to calculate the next result:
memo
e
result
0 (initial)
1
Promise(1)
Promise(1)
2
Promise(3)
Promise(3)
3
(end result) Promise(6)
With the structure of async (memo, e) => await memo, the reduce can handle any async functions and it can be awaited.
Timing
Concurrency has an interesting property when it comes to reduce. In the synchronous version, elements are processed one-by-one, which is not surprising as they rely on the previous result. But when an async reduce is run, all the iteratee functions start running in parallel and wait for the previous result (await memo) only when needed.
await memo last
In the example above, all the sleeps happen in parallel, as the await memo, which makes the function to wait for the previous one to finish, comes later.
await memo first
But when the await memo comes first, the functions run sequentially:
This behavior is usually not a problem as it naturally means everything that is not dependent on the previous result will be calculated immediately, and only the dependent parts are waiting for the previous value.
When parallelism matters
But in some cases, it might be unfeasible to do something ahead of time.
For example, I had a piece of code that prints different PDFs and concatenates them into one single file using the pdf-lib library.
This implementation runs the resource-intensive printPDF function in parallel:
I noticed that when I have many pages to print, it would consume too much memory and slow down the overall process.
A simple change made the printPDF calls wait for the previous one to finish:
Conclusion
The reduce function is easy to convert into an async function, but parallelism can be tricky to figure out. Fortunately, it rarely breaks anything, but in some resource-intensive or rate-limited operations knowing how the functions are called is essential.