The long tail of programming languages

When we look at programming languages, we tend to believe there are only a few dominant ones, such as Java, C++, VB and C#, and that all the rest are rarely used. Real life data, however, indicates that nothing could be further from the truth. For example, look (click graph for a larger version) at the following April, 2005 data point listing 50 languages from a monthly survey published by Tiobe. We can of course argue whether or not the data is correct, or should be measured in a different way; but the point is that there is a wide variety of languages out there that are being used daily. Just to cover 80% of usage, we need to go down to number 9 on the list (which happens to be C#).

This Tiobe survey tracks another 50 languages as well that could potentially break into the top 50 list. Those languages are, in alphabetical order:

ABC, Algol, APL, AppleScript, BCPL, Beta, Clarion, Clean, Curl, Dylan, Eiffel, Erlang, Groovy, Haskell, Inform, Io, Lua, Mantis, Maple, Mathematica, Modula-2, Moto, MS-DOS batch, MUMPS, Oberon, Objective-C, Occam, OPL, Oz, Pike, PL/1, Powerbuilder, Progress, Q, REALBasic, Rebol, Verilog, VHDL, Whitespace, and XSLT.

The pattern we see here follows a power law curve that stretches quite a bit. This stretching pattern has also has been noticed in digitalized markets such as music and film, internet search, design, books, DVD rentals and TV series. Popularized as the long tail by Chris Anderson, editor-in-chief of Wired in an article from October 2004. Chris is now writing a book and has launched a web site about this trend from mainstream to niches. To quote Chris:

The Long Tail, on the other hand, is about nicheification. Rather than finding ways to create an even lower lowest common denominator, the Long Tail is about finding economically efficient ways to capitalize on the infinite diversity of taste and demand that has heretofore been overshadowed by mass markets. The millions who find themselves in the tail in some aspect of their life (and that includes all of us) are no poorer than those in the head. Indeed, they are often drawn down the tail by their refined taste, in pursuit of qualities that are not afforded by one-size-fits-all. And they are often willing to pay a premium for those goods and services that suit them better. The Long Tail is, indeed, the very opposite of commodification.

What is growing the long tail is the increasing number of ways that are becoming available to serve it through new technology. Serving niche markets prior to the internet was simply not practical, which meant that everyone was forced into the mainstream. For example Amazon carries 2.3 million book titles compared to the 130,000 average of Barnes & Noble. Netflix carries 25,000 titles compared to the 3,000 average of a Blockbuster store. Both Amazon and Netflix estimate that 20-30% of their revenue comes from this long tail that is simply not available through non-internet channels, and that number is growing. Companies like Google and EBay are building entire new businesses to serve niche markets.

We met with Chris at the recent PC Forum, where everyone was talking about his long tail idea. It is amazing how quickly this concept has been picked up and used to describe a specialization for personalized needs. This started us thinking about the diversity of taste among software programmers, and programming languages came to mind. Every developer has a favorite language, although with age, language religion fades away and style of programming becomes more important. And sure enough, the data above indicates that there is also a long tail in programming languages where we find that many different languages are widely used in different niches.

The fact that the top languages are general purpose languages is probably due primarily to the fact that they are general purpose and the choice of programming language is usually a corporate decision or at least standardized within a project, rather than an individual developer’s choice. Many developers who must use Java or C# at work during the day go home to use Ruby or Python on their personal projects at night.

So why is there such a long tail of programming languages in the community? And is there an efficient way to satisfy the obvious need for more niche languages and promote the language innovation that happens in the long tail?

The first question is pretty easy to answer. There is a long tail because the more specialized a language is to a domain, the better it fits to solve problems for that domain. These niche languages trade off generality for efficiency in a domain and they are simply better and more efficient tools for that domain.

The second question is a bit harder. Given the huge cost to bring a commercially viable language to market, it is astonishing that we have such a wide variety of niche languages in use today. Even though these languages do excel in what they were designed for, it is still hard to find state-of-the art environments for them. And even though the barrier to support new languages, and therefore for language innovation, is coming down through phenomena like Eclipse and the .Net Framework, we still have a long way for specialized languages to be considered commercially viable.

Intentional Software is an approach where the choice of language is not as dramatic as it is today. Rather, for our industry to mature we think that a greater variety of language options needs to be available for use so we can use the best tool in each instance and put more focus on end results.

This is something we call Language-Oriented Development. We believe it is a gradual shift to escape from the current poverty of notations in which today’s developers live. JetBrains is a company that also shares this vision.

There are certainly more efficient ways to tell a computer our specific programming intentions than today’s general purpose languages. And the long tail of programming languages certainly shows that there is a strong need for that.

For this vision to become reality there are a number of barriers to entry. Some of these barriers include:

  • co-existence between languages including languages to share definitions
  • changes in a language over time
  • incorporation of code in legacy languages
  • creation of new languages, difficulty of satisfactory syntax and notation
  • cost of implementing parsers, compilers, runtimes, libraries, debuggers
  • education and adoption of rapidly changing languages

We think there are ways to break through these barriers, and we will discuss them in forthcoming blog entries.