I was reading an interesting article on DZone recently that pointed
out how many different programming languages have a rich community
producing modules that are available for reuse. In some cases this
community is not strictly affiliated with the creators of the language
It’s instructive to see just how many examples there are out there. This
is just a few that popped into my head without thinking hard.
Together these repositories represent millions of programmer hours spent
putting together libraries, and probably millions of hours spent maintaining
the index and the servers on which they run. On the other hand, even though
C and its semi-successor C++ have been around since before the Internet age,
there isn’t an exactly equivalent library and module system available.
Of course, C is the language of Linux, and so to some extent your favorite
package manager is a C and C++ library and module system. But there’s
pretty clearly a difference in scope between what’s available for C or C++ and
what’s available even for a relative newcomer of a language like Go or a
language with a relatively small user base like Haskell. Compare the
Hackage with this list of C++ libraries.
Why is that? There’s clearly a large and high quality user base for C and
C++. I think that the issue comes down to language features and to one
language feature in particular: garbage collection. To explain why I’ll give
Several years back, I spent some time building a cross-platform user interface
based on the Mozilla codebase. How we arrived at that approach is a long story,
but it did allow us to create something truly cross-platform, with the ability
on the back end, and to end up with a rich client user interface. (This was back
before modern browsers made browser-based interfaces good.)
As a result, I spent a lot of time in the Mozilla code base for someone who
wasn’t part of the team. I learned a lot about C++ from that code base; it
remains a very impressive example of large-scale C++ development.
As expected, a decent amount of the core work that became the Mozilla codebase
was centered around making the same C++ work well on Windows, Linux, and Mac,
via two isolation layers: the Netscape Portable Runtime (NSPR) and the
Cross-Platform Component Object Model (XPCOM). NSPR allows most of
the Mozilla codebase to be written independent of the underlying operating
system. It handles the kinds of things you would expect: threads, input /
output, networking, etc.
XPCOM provides a number of cross-platform features similar to Microsoft COM.
The most visible in terms of its impact in the Mozilla codebase is memory
management in the form of “nsCOMPtr” smart pointers, similar to COM smart
pointers. Basically, a smart pointer is a pointer that keeps track of
how many things are pointing to it, and cleans up the thing it’s pointing to
when no one is using it anymore. It’s an elegant alternative to garbage
collection; instead of having a core runtime function that finds unused
allocated memory and cleans it up, each new allocation of memory comes with a
little bit of garbage collection intelligence.
Of course, in order for this smart pointer to work, any time a new reference to the
pointer is stored outside a single function scope, the smart pointer needs to be notified
so it can increment its count. And whenever that reference goes out of scope, the pointer
also needs to be notified. And the consequences of missing one of these is pretty
significant, either in the form of memory leaks or in the form of dereferencing a pointer
that is no longer valid.
This is one of the most sophisticated memory management schemes available in a
non-garbage-collected runtime. But it’s hard to imagine how it could be made to work
across a wide range of libraries. Either each library would have to build on top of
a single implementation of smart pointers, or each library would have its own similar
scheme to learn (and bridge between when integrating multiple libraries).
And that’s exactly what we see when we look at some example libraries. Imagine
an XML or JSON parser in C that provides data binding to a struct. The parser
needs to allocate the memory for the struct (or array of structs), along with
variable sized data inside. How long should this allocated memory stick around?
The application code that uses the struct data ideally should not have to worry
about the fact that it came from JSON, but somehow there has to be a way to
identify that the application is “done” with the data. To solve this issue, the
very well written json-c library provides its own reference counting
that is very similar to COM smart pointers. Unfortunately, that leaves the
onus on the using application to help out with reference counting, even if the
corresponding data is passed to some other library that might use it for an
unknown amount of time. The same kind of functionality can be found in GLib
as used in the GNOME Toolkit, and in Boost.
All of these implementations are different, so a programmer using these three
libraries would need to learn and remember three different APIs for handling
pointer references. Under these circumstances, it’s little wonder that
libraries in C and C++ tend to be limited in scope and mostly serve to isolate
from hardware or provide low-level implementation of a standard.
I for one consider this a very interesting thought. The primary argument in
adding garbage collection to a language is that it raises the level of
abstraction, allowing a clearer focus on the business logic of the application
by avoiding boilerplate memory management code (and also thereby decreasing the
chances of errors caused by memory management done poorly). But the broader
effect is to create a different kind of culture where a broad set of
community-contributed libraries are possible. It neatly illustrates the way
that the best features of a system are often emergent.