Languages_6 When we look at programming languages, we tend to believe there are only a few dominant ones, such as Java, C++, VB and C#, and that all the rest are rarely used. Real life data, however, indicates that nothing could be further from the truth. For example, look (click graph for a larger version) at the following April, 2005 data point listing 50 languages from a monthly survey published by Tiobe. We can of course argue whether or not the data is correct, or should be measured in a different way; but the point is that there is a wide variety of languages out there that are being used daily. Just to cover 80% of usage, we need to go down to number 9 on the list (which happens to be C#).

This Tiobe survey tracks another 50 languages as well that could potentially break into the top 50 list. Those languages are, in alphabetical order:

ABC, Algol, APL, AppleScript, BCPL, Beta, Clarion, Clean, Curl, Dylan, Eiffel, Erlang, Groovy, Haskell, Inform, Io, Lua, Mantis, Maple, Mathematica, Modula-2, Moto, MS-DOS batch, MUMPS, Oberon, Objective-C, Occam, OPL, Oz, Pike, PL/1, Powerbuilder, Progress, Q, REALBasic, Rebol, Verilog, VHDL, Whitespace, and XSLT.

The pattern we see here follows a power law curve that stretches quite a bit. This stretching pattern has also has been noticed in digitalized markets such as music and film, internet search, design, books, DVD rentals and TV series. Popularized as the long tail by Chris Anderson, editor-in-chief of Wired, in this article from October 2004. Chris is now writing a book and has launched a web site about this trend from mainstream to niches. To quote Chris:

Tail_1The Long Tail, on the other hand, is about nicheification. Rather than finding ways to create an even lower lowest common denominator, the Long Tail is about finding economically efficient ways to capitalize on the infinite diversity of taste and demand that has heretofore been overshadowed by mass markets. The millions who find themselves in the tail in some aspect of their life (and that includes all of us) are no poorer than those in the head. Indeed, they are often drawn down the tail by their refined taste, in pursuit of qualities that are not afforded by one-size-fits-all. And they are often willing to pay a premium for those goods and services that suit them better. The Long Tail is, indeed, the very opposite of commodification.

What is growing the long tail is the increasing number of ways that are becoming available to serve it through new technology. Serving niche markets prior to the internet was simply not practical, which meant that everyone was forced into the mainstream. For example Amazon carries 2.3 million book titles compared to the 130,000 average of Barnes & Noble. Netflix carries 25,000 titles compared to the 3,000 average of a Blockbuster store. Both Amazon and Netflix estimate that 20-30% of their revenue comes from this long tail that is simply not available through non-internet channels, and that number is growing. Companies like Google and EBay are building entire new businesses to serve niche markets.

We met with Chris at the recent PC Forum, where everyone was talking about his long tail. It is amazing how quickly this concept has been picked up and used to describe a specialization for personalized needs. This started us thinking about the diversity of taste among software programmers, and programming languages came to mind. Every developer has a favorite language, although with age, language religion fades away and style of programming becomes more important. And sure enough, the data above indicates that there is also a long tail in programming languages where we find that many different languages are widely used in different niches.

The fact that the top languages are general purpose languages is probably due primarily to the fact that they are general purpose and the choice of programming language is usually a corporate decision or at least standardized within a project, rather than an individual developer’s choice. Many developers who must use Java or C# at work during the day go home to use Ruby or Python on their personal projects at night.

So why is there such a long tail of programming languages in the community? And is there an efficient way to satisfy the obvious need for more niche languages and promote the language innovation that happens in the long tail?

The first question is pretty easy to answer. There is a long tail because the more specialized a language is to a domain, the better it fits to solve problems for that domain. These niche languages trade off generality for efficiency in a domain and they are simply better and more efficient tools for that domain.

The second question is a bit harder. Given the huge cost to bring a commercially viable language to market, it is astonishing that we have such a wide variety of niche languages in use today. Even though these languages do excel in what they were designed for, it is still hard to find state-of-the art environments for them. And even though the barrier to support new languages, and therefore for language innovation, is coming down through phenomena like Eclipse and the .Net Framework, we still have a long way for specialized languages to be considered commercially viable.

Intentional Software is an approach where the choice of language is not as dramatic as it is today. Rather, for our industry to mature we think that a greater variety of language options needs to be available for use so we can use the best tool in each instance and put more focus on end results.

This is something we call Language-Oriented Development. We believe it is a gradual shift to escape from the current poverty of notations in which today’s developers live. JetBrains is a company that also shares this vision.

There are certainly more efficient ways to tell a computer our specific programming intentions than today’s general purpose languages. And the long tail of programming languages certainly shows that there is a strong need for that.

For this vision to become reality there are a number of barriers to entry. Some of these barriers include:

  • co-existence between languages including languages to share definitions
  • changes in a language over time
  • incorporation of code in legacy languages
  • creation of new languages, difficulty of satisfactory syntax and notation
  • cost of implementing parsers, compilers, runtimes, libraries, debuggers
  • education and adoption of rapidly changing languages

We think there are ways to break through these barriers, and we will discuss them in forthcoming blog entries.

Share →

6 Responses to The long tail of programming languages

  1. I think it’s interesting that you note the diversity of programming languages in use today, and the associated need for co-existence and interoperability. One of the problems I often encounter with “new” languages is that while they may have some very nice features, it’s often harder than it should be to get them to work with existing tools, libraries and platforms due to different assumptions. Ideally, we would be able to mix together the right set of abstractions to solve a particular problem.
    Here at the University of Illinois, we are working on an interesting approach to this problem — we are defining the formal executable semantics of a variety of different programming languages using a rewriting logic-based meta-language called Maude (http://maude.cs.uiuc.edu). By defining languages in this way, we get a variety of tools, such as interpreters and model checkers, with no extra effort. Moreover, these language specifications tend to be very concise and free of extraneous implementation details, since the Maude language provides little other than pattern matching operations. As part of Prof. Grigore Rosu’s class this semester, CS 522: Programming Language Semantics, we are working in groups to define the semantics for 8 different languages. These should be up on the web in a few weeks, but you can also see some already-complete language definitions in Maude at the following websites:
    Java – http://fsl.cs.uiuc.edu/javafan/
    BC, C and Scheme – http://fsl.cs.uiuc.edu/es/index.jsp
    Simple and Fun (toy languages) – http://fsl.cs.uiuc.edu/~grosu/classes/2005/spring/cs522/
    Overall, I have found formal executable semantics in Maude to be a very powerful methodology for defining programming languages. As compared to more concrete virtual machine-based approaches, the language implementor has the flexibility to define new semantic constructs for language-specific peculiarities, while still using existing modules to define the parts of a language which are more standard. This also makes it is relatively easy to modify and evolve language definitions. Maude is by no means a final solution, but it is a very good system which unfortunately appears nowhere on the langauge popularity scales at this time.

  2. hakank.blogg says:

    The long tail of programming languages

    Charles Simonyis (Intentional Software) skriver i The long tail of programming languages om att användningen av programspråk förhåller sig till varandra som en long tail (cf power laws). Man har använt statistiken från TIOBE Programming Community Index…

  3. xeo says:

    Any set of nominal data can be ordered in such a manner as to display an apparent power law distribution. The data could as easily have been arranged in an apparent normal distribution, for example. [ Try it: select the highest value and make it the mean; select the two lower values and place them symetrically around the first value; continue 'til done.]
    A true power law distribution requires that the domain be defined on an interval scale. Inotherwords the absence of a scale on the X-axis tells us that it makes no sense to speak of power law distributions (indeed of any particular distribution).

  4. Xeo, you are right, of course, but the main point of the observation was that besides a few very popular languages there are also a large number of less popular languages. “Long tail” refers to the integral of the remainder when the items are sorted by magnitude. Not everything has a “long tail”, for example personal computer types do not.

  5. Alt Text says:

    Google to buy Sun?

    Daniel M. Harrison at blogcritics has been all over the potential sale of Sun to Google and what it might mean….

  6. chainsaw says:

    Personally, I am pretty stunned at the large amount of choice in languages there is today. Back in the 80′s, I don’t recall their being so much choice… (granted, I was still a youngin) – I only remember C, C+, Pascal, Turbo Pascal and Basic. Today, we seem to have much, much more choice (and the list above doesn’t even cover it.) I’m still deciding if this is good or bad – I guess it is good, in that (As you say) niche languages to the job better for their specific purpose (I’m thinking of Ruby or Python, for example) and you can learn these quicker for the task. If you had to learn C# for simple tasks, it might make it more difficult for anything to get done. At least, here, most people can do the really simple stuff and learn a simple language quick and easy.
    So… come to think about it, the wide variety of choice is probably good, although maybe confusing for newbies.