Sunday, July 2, 2017

Some reflections on my experiences with web technology

It has been a while since I wrote my last blog post. In the last couple of months, I have been working on many kinds of things, such as resurrecting the most important components of my personal web framework (that I have developed many years ago) and making them publicly available on GitHub.

There are a variety of reasons for me to temporarily switch back to this technology area for a brief period of time -- foremost, there are a couple of web sites still using pieces of my custom framework, such as those related to my voluntary work. I recently had to make changes, mostly maintenance-related, to these systems.

The funny thing is that most people do not consider me a web development (or front-end) person and this has an interesting history -- many years ago (before I started doing research) I always used to refer to "web technology" as one of my main technical interests. Gradually, my interest started to fade, up until the point that I stopped mentioning it.

I started this blog somewhere in the middle of my research, mainly to provide additional practical information. Furthermore, I have been using my blog to report on everything I do open-source related.

If I would have started this blog several years earlier, then many articles would have been related to web technology. Back then, I have spent considerable amounts of time investigating techniques, problems and solutions. Retrospectively, I regret that it took me so long to make writing a recurring habit as part of my work -- many interesting discoveries were never documented and have become forgotten knowledge.

In this blog post, I will reflect over my web programming experiences, describe some of the challenges I used to face and what solutions I implemented.

In the beginning: everything looked great


I vividly remember the early days in which I was just introduced to the internet, somewhere in the mid 90s. Around that time Netscape Navigator 2.0 was still the dominant and most advanced web browser.

Furthermore, the things I could do on the internet were very limited -- today we are connected to the internet almost 24 hours a day (mainly because many of us have smart phones allowing us to do so), but back then I only had a very slow 33K6 dial-up modem and internet access for only one hour a week.

Aside from the fact that it was quite an amazing experience to be connected to the world despite these limitations, I was also impressed by the underlying technology to construct web sites. It did not take long for me to experiment with these technologies myself, in particular HTML.

Quite quickly I was able to construct a personal web page whose purpose was simply to display some basic information about myself. Roughly, what I did was something like this:

<html>
  <head>
    <title>My homepage</title>
  </head>

  <body bgcolor="#ff0000" text="#000000">
    <h1>Hello world!</h1>

    <p>
      Hello, this is my homepage.
    </p>
    <p>
      <img src="image.jpg" alt="Image">
    </p>
  </body>
</html>

It was simply a web page with a red colored background displaying some text, hyperlinks and images:


You may probably wonder what is so special about building a web page with a dreadful background color, but before I was introduced to web technology, my programming experience was limited to various flavours of BASIC (such as Commodore 64, AMOS, GW and Quick BASIC), Visual Basic, 6502 assembly and Turbo Pascal.

Building user interfaces with these kind of technologies was quite tedious and somewhat impractical compared to using web technology -- for example, you had to programmatically define your user interface elements, size them, position them, define style properties for each individual element and programming event handlers to respond to user events, such as mouse clicks.

With web technology this suddenly became quite easy and convenient -- I could now concisely express what I wanted and the browser took care of the rendering parts.

Unknowingly, I was introduced to a discipline called declarative programming -- I could describe what I wanted as opposed to specifying how to do something. Writing applications declaratively had all kinds of advantages beyond the ability to express things concisely.

For example, because HTML code is high-level (not entirely, but is supposed to be), it does not really matter much what browser application you use (or underlying platform, such as the operating system) making your application quite portable. Rendering a paragraph, a link, image or button can be done on many kinds of different platforms from the same specification.

Another powerful property is that your application can degrade gracefully. For example, when using a text-oriented browser, your web application should still be usable without the ability to display graphics. The alternate text (alt) attribute of the image element should ensure that the image description is still visible. Even when no visualization is possible (e.g. for visually impaired people), you could, for example, use a Text to Speech system to interpret your pages' content.

Moreover, the introduction of Cascading Style Sheets (CSS) made it possible to separate the style concern from the page structure and contents making the code of your web page much more concise. Before CSS, extensive use of presentational tags could still make your code quite messy. Separation of the style concern also made it possible to replace the stylesheet without modifying the HTML code to easily give your page a different appearance.

To deal with visual discrepancies between browser implementations, HTML was standardized and standards-mode rendering was introduced when an HTML doctype was added to a HTML file.

What went wrong?


I have described a number of appealing traits of web technology in the previous section -- programming applications declaratively from a high level perspective, separation of concerns, portability because of high-level abstractions and standardization, the ability to degrade gracefully and conveniently making your application available to the world.

What could possibly be the reason to lose passion while this technology has so many compelling properties?

I have a long list of anecdotes, but most of my reasons can be categorized as follows:

There are many complex additional concerns


Most web technologies (e.g. HTML, CSS, JavaScript) provide solutions for the front-end, mainly to serve and render pages, but many web applications are much more than simply a collection of pages -- they are in fact complex information systems.

To build information systems, we have to deal with many additional concerns, such as:

  • Data management. User provided data must be validated, stored, transformed in something that can be visually represented, properly escaped so that they can be inserted into a database (preventing SQL injections).
  • Security. User permissions must be validated on all kinds of levels, such as page level, or section level. User roles must be defined. Secure connections must be established by using the SSL protocol.
  • Scalability. When your system has many users, it will no longer be possible to serve your web application from a single web server because it lacks sufficient system resources. Your system must be decomposed and optimized (e.g. by using caching).

In my experience, the tech companies I used to work for understood these issues, but I also ran into many kinds of situations with non-technical people not understanding that all these things were complicated and necessary.

One time, I even ran into somebody saying: "Well, you can export Microsoft Word documents to HTML pages, right? Why should things be so difficult?".

There is a lack of abstraction facilities in HTML


As explained earlier, programming with HTML and CSS could be considered declarative programming. At the same time, declarative programming is a spectrum -- it is difficult to draw a hard line between what and how -- it all depends on the context.

The same thing applies to HTML -- from one perspective, HTML code can be considered a "what specification" since you do not have specify how to render a paragraph, image or a button.

In other cases, you may want to do things that cannot be directly expressed in HTML, such as embedding a photo gallery on your web page -- there is no HTML facility allowing you to concisely express that. Instead, you must provide the corresponding HTML elements that implement the gallery, such as the divisions, paragraphs, forms and images. Furthermore, HTML does not provide you any facilities to define such abstractions yourself.

If there are many recurring high level concepts to implement, you may end up copying and pasting large portions of HTML code between pages making it much more difficult to modify and maintain the application.

A consequence of not being able to define custom abstractions in HTML is that it has become very common to generate pages server side. Although this suffices to get most jobs done, generating dynamic content is many times more expensive than serving static pages, which is quite silly if you think too much about it.

A very common sever side abstraction I used to implement (in addition to an embedded gallery) is a layout manager allowing you to manage static common sections, a menu structure and dynamic content sections. I ended up inventing such a component because I was used to frames, that became deprecated. Moving away from them required me to reimplement common sections of a page over and over again.

In addition to generated code, using JavaScript also has become quite common by dynamically injecting code into the DOM or transforming elements. As a result, quite a few pages will not function properly when JavaScript has been disabled or when JavaScript in unsupported.

Moreover, many pages embed a substantial amount of JavaScript significantly increasing their sizes. A study reveals that the total size of a quite a few modern web pages are equal to the Doom video game.

There is a conceptual mismatch between 'pages' and 'screens'


HTML is a language designed for constructing pages, not screens. However, information systems typically require a screen-based workflow -- users need to modify data, send their change requests to the server and update their views so that their modifications become visible.

In HTML, there are only two ways to propagate parameters to the server and getting feedback -- with hyperlinks (containing GET parameters) or forms. In both cases, a user gets redirected to another page that should display the result of the action.

For pages displaying tabular data or other complicated data structures, this is quite inconvenient -- we have to rerender the entire page each time we change something and scroll the user back to the location where the change was made (e.g. by defining anchors).

Again, with JavaScript this problem can be solved in a more proper and efficient way -- by programming an event handler (such as a handler for the click event), using the XMLHttpRequest object to send a change message to the server and updating the appropriate DOM elements, we can rerender only the affected parts.

Unfortunately, this again breaks the declarative nature of web technologies and the ability to degrade gracefully -- in a browser that lacks JavaScript support (e.g. text-oriented browsers) this solution will not work.

Also, efficient state management is a complicated problem, that you may want to solve by integrating third party JavaScript libraries, such as MobX or Redux.

Layouts are extremely difficult


In addition to the absence of abstractions in HTML (motivating me to develop a layout manager), implementing layouts in general is also something I consider to be notoriously difficult. Moreover, the layout concern is not well separated -- some aspects need to be done in your page structure (HTML code) and other aspects need to be done in stylesheets.

Although changing most visual properties of page elements in CSS is straight forward (e.g. adjusting the color of the background, text or borders), dealing with layout related aspects (e.g. sizing and positioning page elements) is not. In many cases I had to rely on clever tricks and hacks.

One of the weirdest recurring layout-tricks I used to implement is a hack to make the height of two adjacent floating divs equal. This is something I commonly used to put a menu panel next to a content panel displaying text and images. You do not know in advance the height of both panels.

I ended up solving this problems as follows. I wrapped the divs in a container div:

<div id="container">
  <div id="left-column">
  </div>
  <div id="right-column">
  </div>
</div>

and I provided the following CSS stylesheet:

#container
{
    overflow: hidden;
}

#left-column
{
    padding-bottom: 3000px;
    margin-bottom: -3000px;
}

#right-column
{
    padding-bottom: 3000px;
    margin-bottom: -3000px;
}

In the container div, I abuse the overflow property (disabling scroll bars if the height exceeds the screen size). For the panels themselves, I use a large padding value value and a equivalent negative margin. The latter hack causes the panels to stretch in such a way that their heights become equal:


(As a sidenote: the above problem can now be solved in a better way using a flexbox layout, but a couple of years ago you could not use this newer CSS feature).

The example shown above is not an exception. Another notable trick is the clear hack (e.g. <div style="clear: both;"></div>) to ensure that the height of the surrounding div grows automatically with the height of your inner divs.

As usual, JavaScript can be used to solved to abstract these oddities away, but breaks declarativity. Furthermore, when JavaScript is used for an essential part of your layout, your page will look weird if JavaScript has been disabled.

Interoperability problems


Many web technologies have been standardized by the World Wide Web Consortium (W3C) with the purpose to ensure interoperability among browsers. The W3C also provides online validator services (e.g. for HTML and CSS) that you can use to upload your code and check for its validity.

As an outsider, you may probably expect that if your uploaded code passes validation that your web application front-end is interoperable and will work properly in all browsers... wrong!

For quite some time, Internet Explorer 6 was the most dominant web browser. Around the time that it was released (2001) it completely crushed its biggest competitor (Netscape) and gained 95% market share. Unfortunately, it did not support modern web standards well (e.g. CSS 2 and newer). After winning the browser wars, Microsoft pretty much stopped its development.

Other browsers kept progressing and started to become much more advanced (most notably Mozilla Firefox). They also followed the web standards more faithfully. Although this was a good development, the sad thing was that in 2007 (6 years later) Internet Explorer 6 was still the most dominant browser with its lacking support of standards and many conformance bugs -- as a result, you were forced to implement painful IE-specific workarounds to make your web page work properly.

What I typically used to do is that I implemented a web page for "decent browsers" first (e.g. Firefox, Chrome, Safari), and then added Internet Explorer-specific workarounds (such as additional stylesheets) on top. By using conditional comments, an Internet Explorer-specific feature that treated certain comments as code, I could ensure that the hacks were not used by any non-IE browser. An example usage case is:

<!--[if lt IE 7]><link rel="stylesheet" type="text/css" href="ie-hacks.css"><![endif]-->

The above conditional comment states that if an Internet Explorer version lower than 7 is used, then the provided ie-hacks.css stylesheet should be used. Otherwise, it is treated as a comment and will be ignored.

Fortunately, Google Chrome overtook the role as the most dominant web browser and is developed more progressively eliminating most standardization problems. Interoperability today is still not a strong guarantee, in particular for new technologies, but considerably better around the time that Internet Explorer still dominated the browser market.

Stakeholder difficulties


Another major challenge are the stakeholders with their different and somewhat conflicting interests.

The most important thing that matters to the end-user is that a web application provides the information they need, that it can be conveniently found (e.g. through a search engine), and that they are not distracted too much. Visual appearance of your web site also matters in some extent (e.g. a dreadful appearance your web site will affect an end user's ability to find what they need and your credibility), but is typically not as important as most people think.

Most clients (the group of people requesting your services) are mostly concerned with visual effects and features, and not so much with the information they need to provide to their audience.

I still vividly remember a web application that I developed whose contents could be fully managed by the end user with a simple HTML editor. I deliberately kept the editor's functionality simple -- for example, it was not possible in the editor to adjust the style (e.g. the color of the text or background), because I believed that users should simply follow the stylesheet.

Moreover, I have spent substantial amounts of time explaining clients how to write for the web -- they need to structure/organize their information properly write concisely and structure/format their text.

Despite all my efforts in bridging the gap between end-users and clients, I still remember that one particular client ran away dissatisfied because of the lack of customization. He moved to a more powerful/flexible CMS, and his new homepage looked quite horrible -- ugly background images, dreadful text colors, sentences capitalized, e.g.: "WELCOME TO MY HOMEPAGE!!!!!!".

I also had been in a situation once in which I had to deal with two rivaling factions in a organization -- one being very supportive with my ideas and another being completely against them. They did not communicate with each other much and I basically had to serve as a proxy between them.

Also, I have worked with designers giving me mixed experiences. A substantial group of designers basically assumed that a web page design is the same thing as a paper design, e.g. a booklet.

With a small number of them I had quite a few difficulties explaining that web page designs need to be flexible and working towards a solution meeting these criteria -- people use different kinds of screen sizes, resolutions. People tend to resize their windows, adjust their font sizes, and so on. Making a design that is too static will affect how many users you will attract.

Technology fashions


Web technology is quite sensitive to technological fashions -- every day new frameworks, libraries and tools appear, sometimes for relatively new and uncommon programming languages.

While new technology typically provides added value, you can also quite easily shoot yourself in the foot. Being forced to work around a system's broken foundation is not particularly a fun job.

Two of my favorite examples of questionable technology adoptions are:

  • No SQL databases. At some point, probably because of success stories from Google, a lot of people consider traditional relational databases (using SQL) not to be "web scale" and massively shifted to so-called "No SQL databases". Some well known NoSQL databases (e.g. MongoDB) sacrifice properties in service of speed -- such as consistency guarantees.

    In my own experience, many applications that I developed, the relational model made perfect sense. Also, consistency guarantees were way more important than speed benefits. As a matter of fact, most of my applications were fast enough. (As a sidenote: This does not disqualify NoSQL databases -- they have legitimate use cases, but in many cases they are simply not needed).
  • Single threaded event loop server applications (e.g. applications built on Node.js).

    The single thread event loop model has certain kinds of benefits over the traditional thread/process per connection approach -- little overhead in memory and multitasking making it a much better fit to handle large numbers of connections.

    Unfortunately, most people do not realize that there are two sides of the coin -- in a single threaded event loop, the programmer has the responsibility to make sure that it never blocks so that that your application remains responsive (I have seen quite a few programmers who simply lack the understanding and discipline to do that).

    Furthermore, whenever something unexpected goes wrong you end up with an application that crashes completely making it much more sensitive to disruptions. Also, this model is not a very good fit for computationally intensive applications.

    (Again: I am not trying to disqualify Node.js or the single threaded event loop concept -- they have legitimate use cases and benefits, but it is not always a good fit for all kinds of applications).

My own web framework


I have been extensively developing a custom PHP-based web framework between 2002 and 2009. Most of its ideas were born while I was developing a web-based information system to manage a documentation library for a side job. I noticed that there were quite a few complexities that I had to overcome to make it work properly, such as database integration, user authentication, and data validation.

After completing the documentation library, I had been developing a number of similarly looking information systems with similar functionality. Over the years, I learned more techniques, recognized common patterns, captured abstractions, and kept improving my personal framework. Moreover I had been using my framework for a variety of additional use cases including web sites for small companies.

In the end, it evolved into a framework providing the following high level components (the boxes in the diagram denote packages, while the arrows denote dependency relationships):


  • php-sbdata. This package can be used to validate data fields and present data fields. It can also manage collections of data fields as forms and tables. Originally this package was only used for presentation, but based on my experience with WebDSL I have also integrated validation.
  • php-sbeditor. This package provides an HTML editor implementation that can be embedded into a web page. It can also optionally integrate with the data framework to expose a field as an HTML editor. When JavaScript is unsupported or disabled, it will fall back to a text area in which the user can directly edit HTML.
  • php-sblayout. This package provides a layout manager that can be used to manage common sections, the menu structure and dynamic sections of a page. A couple of years ago, I wrote a blog post explaining how it came about. In addition to a PHP package, I also created a Java Servlet/JSP implementation of the same concepts.
  • php-sbcrud is my partial solution to the page-screen mismatch problem and combines the concepts of the data management and layout management packages.

    Using the CRUD manager, every data element and data collection has its own URL, such as http://localhost/index.php/books to display a collection of books and http://localhost/index.php/books/1 to display an individual book. By default, data is displayed in view mode. Modifications can be made by appending GET parameters to the URL, such as: http://localhost/index.php/books/1?__operation=remove_book.
  • php-sbgallery provides an embeddable gallery sub application that can be embedded in various ways -- directly in a page, as a collection of sub pages via the layout manager, in an HTML editor, and as a page manager allowing you to expose albums of the gallery as sub pages in a web application.
  • php-sbpagemanager extends the layout manager with the ability to dynamically manage pages. The page manager can be used to allow end-users to manage the page structure and contents of a web application. It also embeds a picture gallery so that users can manage the images to be displayed.
  • php-sbbiblio is a library I created to display my bibliography on my personal homepage while I was doing my PhD research.

For nowadays' standards the features provided by the above package are considered to be old fashioned. Still, I am particularly proud of the following quality properties (some people may consider them anti-features these days):

  • Being as declarative as possible. This means that the usage of JavaScript is minimized and non essential. Although it may not efficiently deal with the page-screen mismatch because of this deliberate choice, it does provide other benefits. For example, the system is still usable when JavaScript has been disabled and even works in text-oriented browsers.
  • Small page sizes. The layout manager allows you to conveniently separate common aspects from page-specific aspects, including external stylesheets and scripts. As a result, the size of the rendered pages are relatively small (in particular compared to many modern web sites), making page loading times fast.
  • Very thin data layer. The data manager basically works with primitive values, associative arrays and PDO. It has no strong ties to any database model or an Object-Relational-Mapper (ORM). Although this may be inconvenient from a productivity point of view, the little overhead ensures that your application is fast.

Conclusion


In this blog post, I have explained where my initial enthusiasm for the web came from, my experiences (including the negative ones), and my own framework.

The fact that I am not as passionate about the web anymore did not make me leave that domain -- web-based systems these days are ubiquitous. Today, much of the work I do is system configuration, back-end and architecture related. I am not so active on the front-end side anymore, but I still look at front-end related issues from time to time.

Moreover, I have used PHP for a very long time (in 2002 there were basically not that many appealing alternatives), but I have also used many other technologies such a Java Servlets/JSP, Node.js, and Django. Moreover, I have also used many client-side frameworks, such as Angular and React.

I was also briefly involved with the development of WebDSL, an ongoing project in my former research group, but my contributions were mostly system configuration management related.

Although these technologies all offer nice features and in some way impress me, it has been a very long time that I really felt enthusiastic about anything web related.

Availability


Three years ago, I have already published the data manager, layout manager and bibliography packages on my GitHub page. I have now also published the remaining components. They can be used under the terms and conditions of the Apache Software License version 2.0.

In addition to the framework components, I published a repository with a number of example applications that have comparable features to the information systems I used to implement. The example applications share the same authentication system and can be combined together through the portal application. The example applications are GPLv3 licensed.

You may probably wonder why I published these packages after such a long time? The are variety of reasons -- I always had the intention to make it open, but when I was younger I focused myself mostly on code, not additional concerns such as documenting how the API should be used or providing example cases.

Moreover, in 2002 platforms such as GitHub did not exist yet (there was Sourceforge, but it worked on a project-level and was not as convenient) so it was very hard to publish something properly.

Finally, there are always things to improve and I always run various kinds of experiments. Typically, I use my own projects as test subjects for other projects. I also have a couple of open ideas where I can use pieces of my web framework for. More about this later.

Friday, March 31, 2017

Subsituting impure version specifiers in node2nix generated package compositions

In a number of previous blog posts, I have described node2nix, a tool that can be used to automatically integrate NPM packages into the Nix packages ecosystem. The biggest challenge in making this integration possible is the fact that NPM does dependency management in addition to build management -- NPM's dependency management properties conflict with Nix's purity principles.

Dealing with a conflicting dependency manager is quite simple from a conceptual perspective -- you must substitute it by a custom implementation that uses Nix to obtain all required dependencies. The remaining responsibilities (such as build management) are left untouched and still have to be carried out by the guest package manager.

Although conceptually simple, implementing such a substitution approach is much more difficult than expected. For example, in my previous blog posts I have described the following techniques:

  • Extracting dependencies. In addition to the package we intend to deploy with Nix, we must also include all its dependencies and transitive dependencies in the generation process.
  • Computing output hashes. In order to make package deployments deterministic, Nix requires that the output hashes of downloads are known in advance. As a result, we must examine all dependencies and compute their corresponding SHA256 output hashes. Some NPM projects have thousands of transitive dependencies that need to be analyzed.
  • Snapshotting versions. Nix uses SHA256 hash codes (derived from all inputs to build a package) to address specific variants or versions of packages whereas version specifiers in NPM package.json configurations are nominal -- they permit version ranges and references to external artifacts (such as Git repositories and external URLs).

    For example, a version range of >= 1.0.3 might resolve to version 1.0.3 today and to version 1.0.4 tomorrow. Translating a version range to a Nix package with a hash code identifier breaks the ability for Nix to guarantee that a package with a specific hash code yields a (nearly) bit identical build.

    To ensure reproducibility, we must snapshot the resolved version of these nominal dependency version specifiers (such as a version range) at generation time and generate the corresponding Nix expression for the resulting snapshot.
  • Simulating shared and private dependencies. In NPM projects, dependencies of a package are stored in the node_modules/ sub folder of the package. Each dependency can have private dependencies by putting them in their corresponding node_modules/ sub folder. Sharing dependencies is also possible by placing the corresponding dependency in any of the parent node_modules/ sub folders.

    Moreover, although this is not explicitly advertised as such, NPM implicitly supports cyclic dependencies and is able cope with them because it will refuse to install a dependency in a node_modules/ sub folder if any parent folder already provides it.

    When generating Nix expressions, we must replicate the exact same behaviour when it comes to private and shared dependencies. This is particularly important to cope with cyclic dependencies -- the Nix package manager does not allow them and we have to break any potential cycles at generation time.
  • Simulating "flat module" installations. In NPM versions older than 3.0, every dependency was installed privately by default unless a shared dependency exists that fits within the required version range.

    In newer NPM versions, this strategy has been reversed -- every dependency will be shared as much as possible until a conflict has been encountered. This means that we have to move dependencies as high up in the node_modules/ folder hierarchy as possible which is an imperative operation -- in Nix this is a problem, because packages are cannot be changed after they have been built.

    To cope with flattening, we must compute the implications of flattening the dependency structure in advance at generation time.

With the above techniques it is possible construct a node_modules/ directory structure having a nearly identical structure that NPM would normally compose with a high degree of accuracy.

Impure version specifiers


Even if it would be possible to reproduce the node_modules/ directory hierarchy with 100% accuracy, there is another problem that remains -- some version specifiers always trigger network communication regardless whether the dependencies have been provided or not, such as:

[
  { "node2nix": "latest" }
, { "nijs": "git+https://github.com/svanderburg/nijs.git#master" }
, { "prom2cb": "github:svanderburg/prom2cb" }
]

When referring to tags or Git branches, NPM is unable to determine to which version a package resolves. As a consequence, it attempts to retrieve the corresponding packages to investigate even when a compatible version in the node_modules/ directory hierarchy already exists.

While performing package builds, Nix takes various precautions to prevent side effects from influencing builds including network connections. As a result, an NPM package deployment will still fail despite the fact that a compatible dependency has already been provided.

In the package builder Nix expression provided by node2nix, I used to substitute these version specifiers in the package.json configuration files by a wildcard: '*'. Wildcards used to work fine for old Node.js 4.x/NPM 2.x installations, but with NPM 3.x flat module installations they became another big source of problems -- in order to make flat module installations work, NPM needs to know to which version a package resolves to determine whether it can be shared on a higher level in the node_modules/ folder hierarchy or not. Wildcards prevent NPM from making these comparisons, and as a result, some package deployments fail that did not use to fail with older versions of NPM.

Pinpointing version specifiers


In the latest node2nix I have solved these issues by implementing a different substitution strategy -- instead of substituting impure version specifiers by wildcards, I pinpoint all the dependencies to the exact version numbers to which these dependencies resolve. Internally, NPM addresses all dependencies by their names and version numbers only (this also has a number of weird implications, because it disregards the origins of these dependencies, but I will not go into detail on that).

I got the inspiration for this pinpointing strategy from the yarn package manager (an alternative to NPM developed by Facebook) -- when deploying a project with yarn, yarn pinpoints the installed dependencies in a so-called yarn.lock file so that package deployments become reproducible when a system is deployed for a second time.

The pinpointing strategy will always prevent NPM from consulting external resources (under the condition that we have provided the package by our substitute dependency manager first) and always provide version numbers for any dependency so that NPM can perform flat module installations. As a result, the accuracy of node2nix with newer versions of NPM has improved quite a bit.

Availability


The pinpointing strategy is part of the latest node2nix that can be obtained from the NPM registry or the Nixpkgs repository.

One month ago, I have given a talk about node2nix at FOSDEM 2017 summarizing the techniques discussed in my blog posts written so far. For convenience, I have embedded the slides into this web page:

Tuesday, March 14, 2017

Reconstructing Disnix deployment configurations

In two earlier blog posts, I have described Dynamic Disnix, an experimental framework enabling self-adaptive redeployment on top of Disnix. The purpose of this framework is to redeploy a service-oriented system whenever the conditions of the environment change, so that the system can still meet its functional and non-functional requirements.

An important category of events that change the environment are machines that crash and disappear from the network -- when a disappearing machine used to host a crucial service, a system can no longer meet its functional requirements. Fortunately, Dynamic Disnix is capable of automatically responding to such events by deploying the missing components elsewhere.

Although Dynamic Disnix supports the recovery of missing services, there is one particular kind of failure I did not take into account. In addition to potentially crashing target machines that host the services of which a service-oriented systems consist, the coordinator machine that initiates the deployment process and stores the deployment state could also disappear. When the deployment state gets lost, it is no longer possible to reliably update the system.

In this blog post, I will describe a new addition to the Disnix toolset that can be used to cope with these kinds of failures by reconstructing a coordinator machine's deployment configuration from the meta data stored on the target machines.

The Disnix upgrade workflow


As explained in earlier blog posts, Disnix requires three kinds of deployment models to carry out a deployment process: a services model capturing the components of which a system consists, an infrastructure model describing the available target machines and their properties, and a distribution model mapping services in the services model to target machines in the infrastructure model. By writing instances of these three models and running the following command-line instruction:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Disnix will carry out all activities necessary to deploy the system: building the services and its intra-dependencies from source code, distributing the services and its intra-dependencies, and activating all services in the right order.

When changing any of the models and running the same command-line instruction again, Disnix attempts to upgrade the system by only rebuilding the aspects that have changed, and only deactivating the obsolete services and activating new services.

Disnix (as well as other Nix-related tools) attempt to optimize a redeployment process by only executing the steps that are required to reach a new deployment state. In Disnix, the building and distribution steps are optimized due to the fact that every package is stored in isolation the Nix store in which each package has a unique filename with a hash prefix, such as:

/nix/store/acv1y1zf7w0i6jx02kfa6gxyn2kfwj3l-firefox-48.0.2

As explained in a number of earlier blog posts, the hash prefix (acv1y1zf7w0i6jx02kfa6gxyn2kfwj3l...) is derived from all inputs used to build the package including its source code, build script, and libraries that it links to. That, for example, means that if we upgrade a system and nothing to the any of inputs of Firefox changes, we get an identical hash and if such a package build already exists, we do not have to build or transfer the package from an external site again.

The building step in Disnix produces a so-called low-level manifest file that is used by tools executing the remaining deployment activities:

<?xml version="1.0"?>
<manifest version="1">
  <distribution>
    <mapping>
      <profile>/nix/store/aiawhpk5irpjqj25kh6ah6pqfvaifm53-test1</profile>
      <target>test1</target>
    </mapping>
  </distribution>
  <activation>
    <mapping>
      <dependsOn>
        <dependency>
          <target>test1</target>
          <container>process</container>
          <key>d500194f55ce2096487c6d2cf69fd94a0d9b1340361ea76fb8b289c39cdc202d</key>
        </dependency>
      </dependsOn>
      <name>nginx</name>
      <service>/nix/store/aa5hn5n1pg2qbb7i8skr6vkgpnsjhlns-nginx-wrapper</service>
      <target>test1</target>
      <container>wrapper</container>
      <type>wrapper</type>
      <key>da8c3879ccf1b0ae34a952f36b0630d47211d7f9d185a8f2362fa001652a9753</key>
    </mapping>
  </activation>
  <targets>
    <target>
      <properties>
        <hostname>test1</hostname>
      </properties>
      <containers>
        <mongo-database/>
        <process/>
        <wrapper/>
      </containers>
      <system>x86_64-linux</system>
      <numOfCores>1</numOfCores>
      <clientInterface>disnix-ssh-client</clientInterface>
      <targetProperty>hostname</targetProperty>
    </target>
  </targets>
</manifest>

The above manifest file contains the following kinds of information:

  • The distribution element section maps Nix profiles (containing references to all packages implementing the services deployed to the machine) to target machines in the network. This information is used by the distribution step to transfer packages from the coordinator machine to a target machine.
  • The activation element section contains elements specifying which service to activate on which machine in the network including other properties relevant to the activation, such as the type plugin that needs to be invoked that takes care of the activation process. This information is used by the activation step.
  • The targets section contains properties of the machines in the network and is used by all tools that carry out remote deployment steps.
  • There is also an optional snapshots section (not shown in the code fragment above) that contains the properties of services whose state need to be snapshotted, transferred and restored in case their location changes.

When a Disnix (re)deployment process successfully completes, Disnix stores the above manifest as a Disnix coordinator Nix profile on the coorindator machine for future reference with the purpose to optimize the successive upgrade step -- when redeploying a system Disnix will compare the generated manifest with the previously deployed generated instance and only deactivate services that have become obsolete and activating services that are new, making upgrades more efficient than fresh installations.

Unfortunately, when the coordinator machine storing the manifests gets lost, then also the deployment manifest gets lost. As a result, a system can no longer be reliably upgraded -- without deactivating obsolete services, newly deployed services may conflict with services that are already running on the target machines preventing the system from working properly.

Reconstructible manifests


Recently, I have modified Disnix in such a way that the deployment manifests on the coordinator machine can be reconstructed. Each Nix profile that Disnix distributes to a target machine includes a so-called profile manifest file, e.g. /nix/store/aiawhpk5irpjqj25kh6ah6pqfvaifm53-test1/manifest. Previously, this file only contained the Nix store paths to the deployed services and was primarily used by the disnix-query tool to display the installed set of services per machines.

In the latest Disnix, I have changed the format of the profile manifest file to contain all required meta data so that the the activation mappings can be reconstructed on the coordinator machine:

stafftracker
/nix/store/mi7dn2wvwvpgdj7h8xpvyb04d1nycriy-stafftracker-wrapper
process
process
d500194f55ce2096487c6d2cf69fd94a0d9b1340361ea76fb8b289c39cdc202d
false
[{ target = "test2"; container = "process"; _key = "4827dfcde5497466b5d218edcd3326327a4174f2b23fd3c9956e664e2386a080"; } { target = "test2"; container = "process"; _key = "b629e50900fe8637c4d3ddf8e37fc5420f2f08a9ecd476648274da63f9e1ebcc"; } { target = "test1"; container = "process"; _key = "d85ba27c57ba626fa63be2520fee356570626674c5635435d9768cf7da943aa3"; }]

The above code fragment shows a portion of the profile manifest. It has a line-oriented structure in which every 7 lines represent the properties of a deployed service. The first line denotes the name of the service, second line the Nix store path, third line the Dysnomia container, fourth line the Dysnomia type, fifth line the hash code derived of all properties, sixth line whether the attached state must be managed by Disnix and the seventh line an encoding of the inter-dependencies.

The other portions of the deployment manifest can be reconstructed as follows: the distribution section can be derived by querying the Nix store paths of the installed profiles on the target machines, the snapshots section by checking which services have been marked as stateful and the targets section can be directly derived from a provided infrastructure model.

With the augmented data in the profile manifests on the target machines, I have developed a tool named disnix-reconstruct that can reconstruct a deployment manifest from all the meta data the manifests on the target machines provide.

I can now, for example, delete all the deployment manifest generations on the coordinator machine:

$ rm /nix/var/nix/profiles/per-user/sander/disnix-coordinator/*

and reconstruct the latest deployment manifest, by running:

$ disnix-reconstruct infrastructure.nix

The above command resolves the full paths to the Nix profiles on the target machines, then downloads their intra-dependency closures to the coordinator machine, reconstructs the deployment manifest from the profile manifests and finally installs the generated deployment manifest.

If the above command succeeds, then we can reliably upgrade a system again with the usual command-line instruction:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Extending the self-adaptive deployment framework


In addition to reconstructing deployment manifests that have gone missing, disnix-reconstruct offers another benefit -- the self-adaptive redeployment framework described in the two earlier blog posts is capable of responding to various kinds of events, including redeploying services to other machines when a machine crashes and disappears from the network.

However, when a machine disappears from the network and reappears at a later point in time, Disnix no longer knows about its configuration. When such a machine reappears in the network, this could have disastrous results.

Fortunately, by adding disnix-reconstruct to the framework we can solve this issue:


As shown in the above diagram, whenever a change in the infrastructure is detected, we reconstruct the deployment manifest so that Disnix knows which services are deployed to it. Then when the system is being redeployed, the services on the reappearing machines can also be upgraded or undeployed completely, if needed.

The automatic reconstruction feature can be used by providing the --reconstruct parameter to the self adapt tool:

$ dydisnix-self-adapt -s services.nix -i infrastructure.nix -q qos.nix \
  --reconstruct

Conclusion


In this blog post, I have described the latest addition to Disnix: disnix-reconstruct that can be used to reconstruct the deployment manifest on the coordinator machine from meta data stored on the target machines. With this addition, we can still update systems if the coordinator machine gets lost.

Furthermore, we can use this addition in the self-adaptive deployment framework to deal with reappearing machines that already have services deployed to them.

Finally, besides developing disnix-reconstruct, I have reached another stable point. As a result, I have decided to release Disnix 0.7. Consult the Disnix homepage for more information.

Sunday, February 12, 2017

MVC lessons in Titanium/Alloy

A while ago, I have ported the simple-xmpp library from the Node.js ecosystem to Appcelerator Titanium to enrich our company's product line with chat functionality. In addition, I have created a bare bones example app that exposes most of the library's features.



Although I am not doing that much front-end development these days, nor consider myself to be a Titanium-guru, I have observed that it is quite challenging to keep your app's code and organization clean.

In this blog post, I will report on my development experiences and describe the architecture that I have derived for the example chat application.

The Model-View-Controller (MVC) architectural pattern


Keeping the code of an end-user application sane is not unique to mobile applications or a specific framework, such as Titanium -- it basically applies to any system with a graphical user interface including desktop applications and web applications.

When diving into the literature or just by searching on the Internet, then you will most likely stumble upon a very common "solution" -- there is the Model-View-Controller (MVC) architectural pattern that can be used as a means to keep your system structured. It is a generically applicable pattern implemented by many kinds of libraries and frameworks for all kinds of domains, including the mobile application space.

The idea behind this pattern is that a system will be separated in three distinct concerns: the model, the view and the controller. The meaning of these concerns are somewhat ambiguously defined. For example, the design patterns book written by the gang of four (Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides) says:

MVC consists of three kinds of objects. The Model is the application object, the View is its screen presentation, and the Controller defines the way the user interface reacts to user input.

The "problem" I have with the above explanation is that it is a bit difficult to grasp the meaning of an "application object". Moreover, the definition of the controller object used in the explanation above, states that it only has a relation with a user interface (a.k.a. the view) while I could also think of many scenarios in which external events are involved without invoking the user interface. I have no idea how to categorize these kinds of interactions by looking at the above description.

The paper that the book cites: "A Cookbook for Using the Model-View-Controller User Interface Paradigm in Smalltalk-80" (written by: Glenn E. Krasner and Stephen T. Pope) provides more detailed definitions. For example, it defines the model as:

The model of an application is the domain-specific software simulation or implementation of the application's central structure.

I particularly find the term "domain-specific" important -- it suggests that a model should encapsulate what matters to the problem domain, without any obfuscations of things not related to it, for example, user interface components.

The paper defines the view as follows:

In this metaphor, views deal with everything graphical: they request data from their model, and display the data

The above definition suggests that views are everything about presentation of objects belonging to the model.

Finally, the paper defines controllers as follows:

Controllers contain the interface between their associated models and views and the input devices (e.g., keyboard, pointing device, time)

In contrast to the design patterns book's definition of a controller, this definition also suggests that a controller has a relationship with the model. Moreover, it does not say anything about interactions with a physical user. Instead, it refers to input devices.

Although the paper provides more detailed definitions, it still remains difficult to draw a hard line from my perspective. For example, what is the scope of MVC? Should it apply to an entire system, or can it also be applied to components of which a system consists?

For example, in an earlier blog post, I wrote a blog post about some of my experiences with web development in which I have developed a simple library MVC-based library managing the layouts of web applications. The model basically encapsulates the structure of a web applications from an abstract point of view, but it only applies to a specific sub concern, not the system as a whole.

Despite its unclarities and ambiguities, I still think MVC makes sense, for the following reasons:

  • View and controller code clutters the model with obfuscations making it much harder to read and maintain.
  • There are multiple ways to present an object visually. With a clear separation between a model and view this becomes much more flexible.
  • In general, more compact modules (in terms of lines of code) is many ways always better than having many lines of code in one module (for example for readability and maintainability). Separation of concerns stimulates reduction of the size of modules.

The Titanium and Alloy frameworks


As explained earlier, I have implemented the chat example app using the Titanium and Alloy frameworks.

Titanium is a framework targeting multiple mobile app platforms (e.g. Android, iOS, Windows and mobile web applications) using JavaScript as an implementation language providing a unified API with minor platform differences. In contrast to platforms such as Java, Titanium is not a write once, run anywhere approach, but a code reuse approach -- according to their information between 60 and 90% of the code can be reused among target platforms.

Moreover, the organization of Titanium's API makes a clear difference between UI and non-UI components, but does not impose anyone to strictly follow an MVC-like organization while implementing an application.

Alloy is a declarative MVC-framework that wraps around Titanium. To cite the Alloy documentation:

Alloy utilizes the model-view-controller (MVC) pattern, which separates the application into three different components:

  • Models provide the business logic, containing the rules, data and state of the application.
  • Views provide the GUI components to the user, either presenting data or allowing the user to interact with the model data.
  • Controllers provide the glue between the model and view components in the form of application logic.

(As may be noticed, the above description introduces yet another slightly different interpretation of the MVC architectural pattern.)

The Alloy framework uses a number of very specific technologies to realize a MVC organization:

  • For the models, it uses the backbone.js framework's model instances to organize the application's data. The framework supports automatic data binding to view components.
  • Views use an XML data encoding capturing the static structure of the view. Moreover, the style of each view is captured in TSS stylesheet (having many similarities with CSS).
  • The controllers are CommonJS modules using JavaScript as an implementation language.

Furthermore, the directory structure of an Alloy application also reflects separation of concerns. For example, each unit of an application stores each concern in a separate directory and file. For example, in the chat app, we can implement each concern of the contacts screen by providing the following files:

./app/views/contacts.xml
./app/controllers/contacts.js
./app/styles/contacts.tss

The above files reflect each concern of the contacts screen, such as the view, the controller and the style.

In addition to defining models, views, styles and controllers on unit-level, the app unit captures general properties applying of the app.

Organizing the example chat app


Despite the fact that the Alloy framework facilitates separation of concerns in some degree, I still observed that keeping the app's code structure sane remains difficult.

Constructing views


An immediate improvement of Alloy over plain Titanium is that the view code in XML is much better to read than constructing UI components in JavaScript -- the nesting of XML elements reflects the structure of the UI. Furthermore, the style of the UI elements can be separated from the layout improving the readability even further.

For example, the following snippet shows the structure of the login screen:

<Alloy>
    <Window class="container">
        <ScrollView>
            <View>
                <Label>Web socket URL</Label>
                <TextField id="url" hintText="ws://localhost:5280/websocket/" />
            </View>
            <View>
                <Label>Username</Label>
                <TextField id="username" hintText="sander" />
            </View>
            <View>
                 <Label>Domain name</Label>
                 <TextField id="domain" hintText="localhost" />
            </View>
            <View>
                 <Label>Resource</Label>
                 <TextField id="resource" hintText="" />
            </View>
            <View>
                  <Label>Password</Label>
                  <TextField id="password" passwordMask="true" hintText="" />
            </View>
            <Button onClick="doConnect">Connect</Button>
        </ScrollView>
    </Window>
</Alloy>

As may be observed, by reading the above code fragment, it becomes quite obvious that we have a window with a scroll view inside. Inside the scroll view, we have multiple views containing a label and text field pair, allowing users to provide their login credentials.

Although implementing most screens in XML is quite straight forward as their structures are quite static, I have noticed that Alloy's technologies are not particularly useful to dynamically compose screen structures, such as the contacts overview that displaying a row for each contact -- the structure of this table changes whenever a new contact gets added or an existing contact removed.

To dynamically compose a screen, I still need to write JavaScript code in the screen's controller. Furthermore, UI elements composed in JavaScript do not take the style settings of the corresponding TSS file into account. As a result, we need to manually provide styling properties while composing the dynamic screen elements.

To keep the controller's code structured and avoiding code repetition, I have encapsulated the construction of table rows into functions.

Notifying views for changes


Another practical issue I ran into is updating the UI components when something changes, such as a receiving a text messaging or an updated status of a contact. An update to a backbone model automatically updates the attached view components, but for anything that is not backbone-based (such as XMPP's internal roster object) this will not work.

I ended up implementing my own custom non-backbone based data model, with my own implementation of the Observer design pattern -- each object in the data model inherits from the Observable prototype providing an infrastructure for observers to register and unregister themselves for notifications. Each view registers itself as an observer to the corresponding model object to update themselves.

The app's architecture


In the end, this is the architecture of the example chat app that I came up with:


The UML diagram shows the following aspects:

  • All classes can be divided into four concerns: controllers, views, models, and utility classes. The observer infrastructure, for example, does in my opinion not belong to any of the MVC-categories, because they are cross cutting.
  • The XMPPEventHandler is considered to be a controller. Despite not triggered by human actions, I still classify it as such. The event handler's only responsibility is to update the corresponding model objects once an event has been received from the XMPP server, such as a chat message.
  • All model objects inherit from a custom-made Observable prototype so that views can register and unregister themselves for update notifications.
  • Views extract information from the model objects to display. Furthermore, each view has its own controller responding to user input, such as button clicks.

Lessons learned


In addition to porting an XMPP library from the Node.js ecosystem to Titanium, I have also observed some recurring challenges when implementing the test application and keeping it structured. Despite the fact that the Alloy framework is MVC-based, it does not guarantee that your application's organization remains structured.

From my experiences, I have learned the following lessons:

  • The roles of each concern in MVC are not well defined, so you need to give your own interpretation to it. For example, I would consider any controller to be an object responding to external events, regardless whether they have been triggered by humans or external systems. By following this interpretation, I ended up implementing the XMPP event handler as a controller.
  • Similarly for the models -- the purpose of backbone.js models is mostly to organize data, but a model is more than just data -- from my perspective, the model encapsulates domain knowledge. This also means that non-backbone objects belong to this domain. The same thing applies to non-data objects, such as functions doing computations.
  • You always have to look at your structure from an aesthetic point of view. Does it makes sense? Is it readable? Can it be simplified?
  • Finally, do not rely on a framework or API to solve all your problems -- study the underlying concepts and remain critical, as a framework does not always guarantee that your organization will be right.

    Within the scope of Titanium/Alloy the problem is that models only make sense if you use backbone models. Using XML markup+TSS for views only make sense if your screen structure is static. The most logical outcome is to put all missing pieces that do not classify themselves into these categories into a controller, but that is probably the most likely reason why your code becomes a mess.

As a final note, the lessons learned do not apply to mobile applications or Titanium/Alloy only -- you will find similar challenges in other domains such as web applications and desktop applications.

Sunday, January 29, 2017

Some programming patterns for multi-process programming in POSIX applications

It has been a while since I wrote my last programming-related blog post. In this blog post, I am going to elaborate about some of my experiences developing multi-process POSIX applications.

From my perspective, processes are an interesting operating systems concept, in particular in UNIX/POSIX-like operating systems, such as Linux.

The IEEE Std 1003.1 POSIX standard defines a "live process" as:

An address space with one or more threads executing within that address space, and the required system resources for those threads.

Within the boundaries of many (relatively simple) applications, process creation and management is typically not required. Nonetheless, decomposing a system into sub processes can be quite useful for a variety of reasons, such as:

  • Improving a system's responsiveness by running slow/long-running tasks in the background, concurrently with other tasks. This is particularly useful to retain the ability to respond to user events, such as mouse clicks or keyboard input while data is being processed or to handle multiple connecting clients at the same time to a server application.
  • Increased protection. If an incorrectly implemented or malicious task crashes during execution, it neither tears down the entire system nor affects the state of any other processes.
  • More security. Child processes can be run under more restrictive user privileges, making it more difficult for the system to do any harm, such accessing privacy-sensitive filesystem areas.
  • Portability. In a child process, we can invoke an external executable implemented in a different programming language.
  • Scalability and performance. The execution of a collection of tasks can be parallelized by means of processes and their executions can be divided over multiple CPU cores by the operating system, potentially increasing the execution speed of a program.

Although multi-process applications may provide a number compelling benefits, programming such applications using the C programming language and the POSIX API is IMO not always straight forward -- I have found myself frequently repeating numerous patterns over and over again.

To alleviate the burden of repetition, I have identified a number of patterns, derived abstractions from them and constructed a library package, named libprocreact, providing these abstractions. The APIs that libprocreact provides are loosely inspired by reactive programming.

Programming patterns


When programming multi-process applications, there are many housekeeping tasks that need to be performed in order to properly organize them. Below, I have described a number of recurring patterns I typically implement:

Forking


The first and most prominent house keeping task is process creation. The key ingredient in creating processes is the fork() system call, as shown in the code example below:

#include <stdio.h>
#include <unistd.h>

pid_t pid = fork();

if(pid == -1)
    printf("The child process cannot be forked!\n");
else if(pid == 0)
{
    printf("Code executed by the child process\n");
    printf("It runs in the background!\n");
    _exit(0);
}
else
{
    printf("Code executed by the parent process!\n");
    printf("The pid of the child process is: %d\n", pid);
}

Forking is a relatively simple concept -- after successfully invoking the fork() function call (i.e. the return value is not -1), a child process gets created that appears as a nearly identical clone of the parent process. For example, their memory contents and file descriptors are identical.

Furthermore, a forked child process will be executed immediately in parallel to the parent process. Since a parent and child process are almost identical, we can use the return value of fork() to make a distinction between them -- in the parent, the fork() function call returns the PID of the child process so that it can monitor its status and interact with it. In the child process, fork() returns 0.

Although creating a clone of a parent process may sound very expensive, many POSIX-compliant operating systems have optimized this process by using a Copy-On-Write (COW) memory model -- instead of copying a parent process' memory, the memory between the parent and child processes is shared. The operating system maintains a table of shared and private memory pages for each process. When a process attempts to write to a shared memory page, then the corresponding memory page is copied and marked as private to the process.

Executing tasks


After forking a child process, we typically want it to execute a task so that we can use some of the process' positive traits in our advantage. In the if(pid == 0) { ... } block (shown in the previous example), we can put the code that the child process must execute.

For example, we can execute a long running task without blocking the parent process, such as doing an expensive linear search over an array of strings:

#include <string.h>

char *strings[] = { "foo", "bar", "baz", ..., NULL };
char *search = "baz";

...
else if(pid == 0)
{
    unsigned int i;
    
    for(i = 0; i < strlen(strings); i++)
    {
        if(strcmp(strings[i], search) == 0)
        {
            printf("%s has been found!\n", search);
            _exit(0);
        }
    }

    printf("%s cannot be found!\n", search);
    _exit(1);
}

(As may be observed in the example above, the string array is allocated by the parent process, but since the child process manifests itself as a duplicate on spawning, it has a reference to a "logical copy" as well).

We can also change the user permissions of the child process (for example, to restrict a task from having super-user permissions that might do potential harm):

#include <sys/types.h>
#include <unistd.h>

...
else if(pid == 0)
{
    if(setgid(100) == 0 && setuid(1000) == 0)
    {
        /* Execute some code with restrictive user permissions */
        ...
    }
    else
    {
        printf("Cannot change user permissions!\n");
        _exit(1);
    }
}

or invoking external (pre-built) executables, such as the cat command to stream the contents of a file to the standard output:

...
else if(pid == 0)
{
    char *const args[] = { "cat", "bigtextfile.txt", NULL };
    execvp(args[0], args);
    _exit(1);
}

Waiting


In addition to forking and carrying out tasks by child processes, the parent process must also take notice of a child process' status at some point. This is actually an obligation for certain events, for example, when a child process terminates -- a terminated child process remains a zombie until the parent process takes notice of it.

Taking notice of a process' status can be done by invoking a wait function, such as:

#include <sys/types.h>
#include <wait.h>

pid_t pid;
int wstatus;

/* Fork and execute */

pid_t ret_pid = waitpid(pid, &status, 0);

The above function call specifically waits for a process with a given PID to change state, and captures its wait status in the wstatus variable.

As a sidenote: besides waiting for a specific child process' state to change, it also possible to wait for any process in a process group to terminate (e.g. by invoking wait()). Furthermore, the wait function invocation blocks the parent process' execution by default, until a child process' state changes. We can also pass the WNOHANG flag to waitpid() to prevent it from blocking.

After a wait function invocation completes, we must interpret the return value and wait status:

if(ret_pid == -1)
    printf("Cannot obtain wait status of PID: %d\n", pid);
else if(!WIFEXITED(wstatus))
    printf("The process terminated abnormally!\n");
else if(WEXITSTATUS(wstatus) != 0)
    printf("The process execution failed!\n");
else
    printf("The process has completed its tasks successfully!\n");

In the above code fragment, we check for the following properties:

  • Whether the wait status could be obtained. Sometimes this may not be possible, for example, if a process with a given PID does not exists or when it is beyond the parent process' control.
  • Whether a process has been terminated abnormally or not. For example, abnormal termination happens when a process runs into a segmentation fault. In such cases, it may happen that a process still returns a zero exit status, incorrectly indicating that everything has succeeded.
  • Whether a process has succeeded its tasks or not. By convention, a zero exit status indicates success, while any non-zero exit status indicates failure.

Output buffering


Sometimes, it may also be desired to propagate data back to the parent, for example, after the completion of a data transformation task. Since processes operate in their own private address space, we can no longer rely on shared memory, but we must use some means to transfer the data.

One of the possible means is using a pipe, as shown in the following code fragment:

int pipefd[2];

if(pipe(pipefd) == 0)
{
    pid_t pid = fork();

    if(pid == -1)
        fprintf(stderr, "Cannot fork process!\n");
    else if(pid == 0)
    {
        char *const args[] = { "sort", "words.txt", NULL };

        close(pipefd[0]); /* Close read-end of pipe */
        dup2(pipefd[1], 1); /* Attach write-end to the stdout */
        execvp(args[0], args);
        _exit(0);
    }
    else
    {
        close(pipefd[1]); /* Close write-end of pipe */
        /* Read from pipefd[0] */
        close(pipefd[0]);
    }
}
else
    fprintf(stderr, "Cannot construct a pipe!\n");

Before forking a child process, we construct a pipe consisting of two file descriptors -- a read and write end. In the child process, we close the read-end of the pipe (since it is not needed), and we write data to the write-end. In the parent, we read from the read-end and we discard the unneeded write-end.

When retrieving data from a pipe, we may want to capture its output in a data structure (such as a string, string array or struct). Capturing and transforming data to a specific structure is often not very straight forward, for example:

#define BUFFER_SIZE 1024

ssize_t bytes_read;
char *captured_string = NULL;
unsigned int captured_string_size = 0;

while((bytes_read = read(pipefd[0], buffer, BUFFER_SIZE)) > 0)
{
    char buffer[BUFFER_SIZE];

    captured_string = (char*)realloc(captured_string, captured_string_size + bytes_read);
    memcpy(captured_string + captured_string_size, buffer, bytes_read);
    captured_string_size += bytes_read;
}

/* Add NUL-termination */
captured_string = (char*)realloc(captured_string, captured_string_size + 1));
captured_string[captured_string_size] = '\0';

The purpose of above code fragment is to read data from a pipe and constructing a NUL-terminated string. It repeatedly reads chunks of data (of one kilobyte each) from the read-end of the pipe, dynamically extends the size of the buffer collecting the output, appends each chunk to the buffer, and finally appends a NUL-termination to the result.

As may be noticed, there are many concerns that we have to take care of and the resulting code is not trivial at all.

Orchestrating collections of processes


In addition to implementing the above patterns for a single child process running in the background, we may also want to apply them to collections of processes running concurrently.

Orchestrating collections of processes introduce many additional challenges beyond those described in the previous sections. For example, in order to read from the processes' pipes without blocking their executions (which could happen if any of their buffers gets full), we have to multiplex the house keeping operations in such a way that they read from each pipe breadth-first.

Multiplexing any of the previously shown patterns makes developing multi-process applications even more difficult.

A functional programming discipline


Because housekeeping tasks require us to supply many lines of boilerplate code, multi-process applications tend to become quite messy without any proper organization. For example, if I would take the sorting example (shown earlier) and extend it to capture the output of the process invocation into a string, I may end up writing:

#define BUFFER_SIZE 1024

int pipefd[2];

if(pipe(pipefd) == 0)
{
    pid_t pid = fork();

    if(pid == -1)
        fprintf(stderr, "Cannot fork process!\n");
    else if(pid == 0)
    {
        char *const args[] = { "sort", "words.txt", NULL };

        close(pipefd[0]); /* Close read-end of pipe */
        dup2(pipefd[1], 1); /* Attach write-end to the stdout */
        execvp(args[0], args);
        _exit(0);
    }
    else
    {
        ssize_t bytes_read;
        char *captured_string = NULL;
        unsigned int captured_string_size = 0;

        close(pipefd[1]); /* Close write-end of pipe */
        
        while((bytes_read = read(pipefd[0], buffer, BUFFER_SIZE)) > 0)
        {
            char buffer[BUFFER_SIZE];

            captured_string = (char*)realloc(captured_string, captured_string_size + bytes_read);
            memcpy(captured_string + captured_string_size, buffer, bytes_read);
            captured_string_size += bytes_read;
        }

        /* Add NUL-termination */
        captured_string = (char*)realloc(captured_string, captured_string_size + 1));
        captured_string[captured_string_size] = '\0';

        close(pipefd[0]); /* Close read-end of pipe */
    }
}
else
    fprintf(stderr, "Cannot construct a pipe!\n");

I do not expect anyone the study the above code fragment in detail, but just by looking at its structure and length, you could clearly see that this is not very appealing way to construct applications.

In contrast, the same task can be implemented in only one line of bash shell code:

captured_string=$(sort words.txt)

When I look at the previous code fragment from an abstract point of view, then it encapsulates an implementation of a specific concern (i.e. its primary task, such as sorting an array), and a number of general house-keeping concerns, such as constructing a pipe, waiting, output buffering, and closing file descriptors.

One possible way to structure the code in a better way is by function decomposition -- we can separate the specific and general concerns into functions and we can put the general house keeping aspects into a (reusable) library.

Consider the following synchronous function definition and invocation example that checks whether a given string exists in array of strings by doing a linear search:

#include <stdio.h>
#include <string.h>

int array_contains_string(const char **strings, const char *search)
{
    unsigned int i;
    
    for(i = 0; i < strlen(strings); i++)
    {
        if(strcmp(strings[i], search) == 0)
            return TRUE;
    }

    return FALSE;
}

int main(int argc, char *argv[])
{
    char *strings[] = { "foo", "bar", "baz", NULL };
    char *search  = "baz";
    int result = array_contains_string(strings, search);

    if(result)
        printf("The array does contain the string: %s\n", search);
    else
        printf("The array does not contain the string: %s\n", search);
        
    return 0;
}

Since searching (in particular linear searching) may take some time, we may want to change the function it into an asynchronous function running its primary task (the searching) in a child process. I can concisely express this primary concern in one single function:

#include <stdio.h>
#include <unistd.h>

pid_t array_contains_string_async(const char **strings, const char *search)
{
    pid_t pid = fork();

    if(pid == 0)
    {
        unsigned int i;
    
        for(i = 0; i < strlen(strings); i++)
        {
            if(strcmp(strings[i], search) == 0)
                _exit(0);
        }

        _exit(1);
    }

    return pid;
}

As may be noticed, the above function executes the same linear search procedure shown in the previous code fragment, with the following differences:

  • The function forks a child process and carries out the search operation in the child process.
  • Instead of returning a boolean value, it exits the child process with an exit status. By convention, a zero exit status indicates success while a non-zero exit status indicates failure.

We can capture the wait and exit status checking in a general utility function (procreact_wait_for_boolean()) that interprets the exit status of a child process as a boolean value (meaning that when a process exits with exit status 0 it returns TRUE and for any non-zero exit status, it returns FALSE):

#include <procreact_pid.h>

int main(int argc, char *argv[])
{
    char *strings[] = { "foo", "bar", "baz", NULL };
    char *search  = "baz";

    ProcReact_Status status;
    int result = procreact_wait_for_boolean(array_contains_string_async(strings, search), &status);

    if(status == PROCREACT_STATUS_OK)
    {
        if(result)
            printf("The array does contain the string: %s\n", search);
        else
            printf("The array does not contain the string: %s\n", search);
    }
    else
        fprintf(stderr, "The process terminated abnormally!\n");

    return 0;
}

As may be observed, by separating concerns and putting common operations into a library, we can accomplish the same result as the synchronous code fragment example, with relatively little overhead of boilerplate code.

Managing arbitrary output


The previously shown abstractions work well for functions returning a byte, boolean, or void-functions. However, it may also be desirable to implement asynchronous functions returning more complex data, such as strings or arrays of strings, for example:

#include <stdlib.h>
#include <string.h>

char *say_hello_to(const char *name)
{
    char *result = (char*)malloc(strlen(name) + 7 + 1);
    sprintf(result, "Hello %s!", name);
    return result;
}

The above function composes a string that greets a person with a given name. As explained earlier, implementing an asynchronous variant of the above function requires extra facilities to propagate the result back to the parent process, such as constructing a pipe, forcing us to do more housekeeping work.

In the previous examples, we were able to separate a task's primary concern into a function returning a PID and a function waiting and interpreting the exit status by using the PID reference. To manage complex data, we need to memorize more than just the PID -- we also need the pipe's file descriptors, store the buffered data, and the end result.

In some ways, a PID reference resembles another software abstraction -- a future in reactive programming or a promise in JavaScript. A future/promise is an object encapsulating a return value that will be provided at some point in the future.

We can encapsulate the entire housekeeping procedure for transferring and returning complex data in a ProcReact_Future struct:

#include <procreact_future.h>

ProcReact_Future say_hello_to_async(const char *name)
{
    ProcReact_Future future = procreact_initialize_future(procreact_create_string_type());

    if(future.pid == 0)
    {
        dprintf(future.fd, "Hello %s!", name);
        _exit(0);
    }

    return future;
}

The above code fragment looks similar to the synchronous function definition shown in the previous example, with the following differences:

  • By constructing a ProcReact_Future struct, we no longer have to fork a child process and construct a pipe ourselves.
  • The string composition step is carried out by the forked child process.
  • Instead of returning a heap-allocated string, we write the resulting string to the write-end of the pipe provided by the future struct and we terminate the process by invoking the exit function call.

The procreact_initialize_future() function takes a parameter: a type, that is responsible for reading the output from the pipe and converting it into a representation of choice -- in this case a NUL-terminated string.

We can collect the return value of the function in the parent process by invoking the procreact_future_get() function:

ProcReact_Status status;
ProcReact_Future future = say_hello_to_async(name);
char *result = procreact_future_get(&future, &status);

if(status == PROCREACT_STATUS_OK && result != NULL)
    printf("%s\n", result);
else
    fprintf(stderr, "Some error occured!\n");

The procreact_future_get() function (that looks similar to a Future's .get() method or Promise's .then() method) takes care of reading from the pipe, buffering the output, converting the output to a string, waiting for the child process to terminate and closing the obsolete file descriptors.

Furthermore, analogous to a Future or Promise, when the retrieval function gets invoked for a second time, it will return its cached value instead of reading from the pipe again.

Orchestrating collections of processes


With concerns well separated, orchestration of collections of processes also becomes easier. For example, we may want to execute multiple invocations to the following function in parallel:

ProcReact_Future return_count_async(unsigned int count)
{
    ProcReact_Future future = procreact_initialize_future(procreact_create_string_type());

    if(future.pid == 0)
    {
        dprintf(future.fd, "%u", count);
        _exit(0);
    }

    return future;
}

The purpose of the function shown above is to simply return a string presentation of a given numeric counter value.

As with the abstraction facilities shown previously (such as ProcReact_Future), we can also create similar abstractions for orchestrating collections of processes including processes whose output need to be captured:

ProcReact_FutureIterator iterator = procreact_initialize_future_iterator(has_next_count,
  next_count_process,
  complete_count_process,
  &data);

The above function invocation: procreact_initialize_future_iterator() configures an ProcReact_FutureIterator struct. It takes the following parameters:

  • A pointer to a function that indicates whether there is a next element in the collection.
  • A pointer to a function that invokes the next process in the collection.
  • A pointer to a function that gets invoked when a process completes.
  • A void-pointer referring to an arbitrary data structure that gets passed to all functions above.

If I would like to invoke this function 5 times to say, count from 1 to 5, I can encapsulate the properties of this iteration process in the following data structure:

typedef struct
{
    unsigned int index;
    unsigned int amount;
    int success;
    char **results;
    unsigned int results_length;
}
IteratorData;

and compose the following instance from it:

IteratorData data = { 0, 5, TRUE, NULL, 0 };

The above struct maintains an index value indicating which element it currently processing, the amount value holds the amount of iterations that need to be executed, success is a boolean status flag indicating whether the iterations have all succeeded or not, and the results variable corresponds to an array of strings capturing the output of each function invocation.

The following function can be used to check whether we have completed the iteration or not:

static int has_next_count_process(void *data)
{
    IteratorData *iterator_data = (IteratorData*)data;
    return iterator_data->index < iterator_data->amount;
}

The following function executes each successive iteration step:

static ProcReact_Future next_count_process(void *data)
{
    IteratorData *iterator_data = (IteratorData*)data;
    ProcReact_Future future = return_count_async(iterator_data->index + 1);
    iterator_data->index++;
    return future;
}

The above function increases the index, invokes the function with the index as a parameter, and increases the index value.

The following function gets invoked when a process' execution finishes:

static void complete_count_process(void *data, ProcReact_Future *future, ProcReact_Status status)
{
    IteratorData *iterator_data = (IteratorData*)data;

    if(status == PROCREACT_STATUS_OK && future->result != NULL)
    {
        iterator_data->results = (char**)realloc(iterator_data->results, (iterator_data->results_length + 1) * sizeof(char*));
        iterator_data->results[iterator_data->results_length] = future->result;
        iterator_data->results_length++;
    }the 
    else
        iterator_data->success = FALSE;
}

The above function checks the status of the function invocation and captures the results that each function returns. When a process completes successfully (i.e. it does not terminate abnormally and provides a non-NULL result), it appends the result to the results array. In case of a failure, it sets the overall status flag of the iterator to FALSE.

With all iteration aspects abstracted away into a ProcReact_Future struct, we can execute the following function to do all iteration steps in parallel:

procreact_fork_in_parallel_buffer_and_wait(&iterator);

We can also limit the amount of processes that are allowed to run concurrently to a specific value, e.g. 2:

procreact_fork_buffer_and_wait_in_parallel_limit(&iterator, 2);

After executing all iterations, we can consult the data struct to figure out whether the iterations have succeeded and what their results are.

Asynchronously orchestrating collections of processes


The previous two collection examples are executed synchronously. This means that while the execution of each function that retrieves an element is done asynchronously, the overall iteration task blocks the parent process until it completes, which is not always desirable.

A possible solution to make iterations asynchronous is to fork another process and iterate over the collection in the child process, but this introduces another challenge when the collected data needs to be returned to the parent.

Instead of performing all iteration steps as a whole we can also control each iteration step ourselves. For example, the following command-line invocation executes a single iteration step that composes a ProcReact_Future struct:

if(procreact_spawn_next_future(&iterator))
    printf("Spawned a process and we have more of them!\n");
else
    printf("All processes have been spawned\n");

We can also run a each buffer iteration step ourselves and integrate that function call into a program's main loop:

while(TRUE)
{
    unsigned int running_processes = procreact_buffer(&iterator);

    if(running_processes == 0)
    {
        /* This indicates that there are no running processes anymore */
        /* You could do something with the end result here */
    }

    /* Do other stuff in the main loop */
}

The above code fragment allows us to evaluate processes' statuses and buffer their outputs without blocking the main loop.

Summary


In this blog post, I have described a number of common housekeeping tasks I typically need to implement when developing multi-process applications and I have described an API that abstracts over these common house keeping tasks.

To summarize, when we intend to execute task or a collection of tasks, we can use one of the following data structures, depending whether we need to evaluate a single value or a collection of values, and whether the type of the value is simple (e.g. a boolean, byte, or void) or complex (e.g. strings, arrays of strings, etc.):

One Many
Simple pid_t ProcReact_PidIterator
Complex ProcReact_Future<ProcReact_Type> ProcReact_FutureIterator

Processing collections of tasks for each type can be done either synchronously or asynchronously, by picking the appropriate utility functions:

Synchronous Asynchronous
Simple procreact_fork_in_parallel_and_wait()
procreact_fork_and_wait_in_parallel_limit()
procreact_register_signal_handler()
procreact_spawn_next_pid()
procreact_complete_all_finished_processes()
Complex procreact_fork_in_parallel_buffer_and_wait()
procreact_fork_buffer_and_wait_in_parallel_limit()
procreact_spawn_next_future()
procreact_buffer()

Motivation: Disnix


After reading this blog post, you may probably wonder why I have developed these abstractions? My main motivation is to organize Disnix's codebase in a better way.

Besides being a distributed deployment tool, all deployment activities that Disnix carries out (package management operations, state management operations, and communications) are carried out by external processes, for exactly the same reasons mentioned in the introduction.

Before deriving the abstractions described in this blog post, all process coordination in Disnix was hand coded -- Disnix's code was cluttered by boilerplate code doing output buffering, concurrency limiting, waiting, and resource allocation. As a result, some areas of the code were quite hard to read and difficult to extend. Moreover, it was also quite hard to ensure that the code is free from some serious issues, such as potential buffer overflows.

After restructuring the code to use these abstractions (between Git revisions 595c1ec and 4ba5e3d), I have managed to significantly reduce the amount of code. For example, the methods.c module (an RPC interface for all deployment operations that Disnix can execute remotely) previously consisted of 1976 LOC. After refactoring, it consists of only 488 LOC.

Moreover, the disnix-build utility's main module (build.c) size has been reduced from 250 LOC to 153 LOC. Each command-line utility that executes a deployment task (13 in total) each have at least 100 fewer lines lines of code.

In addition to reducing the size of many modules, I have also accomplished the following:

  • Better separation of concerns. I have managed to clearly separate all asynchronous functions spawning processes into three separate libraries: (libpkgmgmt for package manager, libstatemgmt for state management and libinterface for communication) considerably simplifying the toolset's architecture.
  • Performance improvements. Some tools (disnix-capture-infra and disnix-query) were still executing their tasks sequentially and were difficult to optimize. With these new abstractions, parallelizing their tasks became quite simple.
  • Better error reporting. In older versions of Disnix, it was difficult to relate error messages to target machines. With the abstractions described in this blog post, implementing a translation process became much easier. As a result, all tools can now clearly report the origins of an error.

Discussion


Although the abstractions described in this blog post have allowed me to structure Disnix's code in a much better way, they are not a solution for everything.

For example, the ProcReact_Future abstraction buffers the process' output, which is inadequate for processes transferring large quantities of data. Moreover, each function invocation implies a fork. On a very large scale, this could become quite expensive, as it takes time to set up a child process and memory for allocating the Copy-on-Write (COW) table. In theory, we could also make it possible to reuse spawned processes among function invocations to optimize this, but currently no such optimizations have been implemented yet.

Furthermore, processes are not the only solution for implementing concurrent applications. When frequent interaction is required between parent and child, it may be better to use threads as their memory is shared (at the same time, this also has disadvantages). A major disadvantage of processes (compared to threads) is that all communication data needs to be serialized, transferred and deserialized, introducing quite a bit of communication overhead.

Apart from processes and threads, it is also possible to use a single threaded event loop, in which the programmer has the obligation to divide bigger tasks into small tasks. For example, this model is prominently used in Node.js. The advantage of this approach is that each additional concurrent task does not require the allocation of additional resources. This works, for example, well for applications that are mostly I/O-bound (I/O is typically thousands of times slower than a CPU).

The disadvantages of such an application organization is that it becomes a programmer's obligation to ensure that the main thread never blocks and that the code remains structured properly. Moreover, "context-switching" between single threaded tasks is more expensive than a context switch on CPU-level, making it quite inadequate for computationally intensive applications. Finally, the operating system cannot automatically divide tasks over multiple CPU cores.

Availability


libprocreact is part of the latest development version of Disnix. Additionally, I have isolated the library code and created a separate GitHub repository for it.

Although libprocreact can be used independently from Disnix, I currently do not have any plans to make it a fully fledged general purpose library. At the moment, it is not supposed to be used for any other use cases than Disnix's.