I was just reading Chenglie Hu’s important article in the Communications of the ACM under the title "Dataless
Objects Considered Harmful
" (ACM membership required to download or you can read it here). I feel
this is significant for two reasons.

First, on a practical level, I’ve also noticed that recently graduated programmers are not well educated in the classical theories of Modular programming as described by Parnas among others. Too many programmers seem to think that just by writing classes and methods, and by the appropriate use of private and public attributes, the program will be automatically modular. In fact the opposite seems to be the case. By not having an independent guiding principle for organization, the classes and public attributes become expressions of the programmers’ whims, mere packing crates for the required functionality. The OO programming languages, most notably Java, but unfortunately also C# do not do much to alleviate the situation, by prohibiting the use of procedures outside of classes and not providing information-hiding boundaries other than the class, which hides information but it is also an implementation concept. Here we have
again the old programming language conundrum: to get the benefits of a specific abstraction, say information hiding, you have to buy into a specific implementation at the same time. Of course the whole idea of abstraction should have been to leave room for different implementations, that is to factor
complexity.

The other reason is that the article recalls the seminal letter to the editor from Dijkstra: “Go To statement considered harmful” which created a firestorm of controversy when it was published in 1968. I was maintaining an Algol compiler at the time for the CDC 6400 so it was not a big shock for me, but I recall that the Fortran users were quite upset. We now “know” that Dijkstra was advocating structured programming and that his objection had to do
with the difficulty of proving correctness of programs with unstructured Go To statements. But re-reading the original text, I think that Dijkstra’s letter was quite obscure and I can now sympathize with the initial incomprehension of its readers. Yet the letter started a very important movement to improve Software
Engineering.

In presentations on Intentional Software, I often ask the audience: Why should we think, in general, that “X is considered harmful” for any programming language feature X? At this time we have X= “Go To statement”, and X=”Dataless Objects” (for more examples see Hello
world considered harmful
or even Aspect-Oriented
Programming Considered Harmful
, and for a rather harsh critique of the
question itself see here). I think the answer has to be situational in any case: it is not that an abstraction is harmful per se, it is more that
certain uses of an abstraction are counterproductive.

I hinted at the solution in my other posts here
and here.
To decide whether something is harmful or useful, we have to refer to the problem being solved. If we do not know what the problem is, a tool is devoid of meaning (have you ever had that feeling when looking at strange tools in a cabinetmaker’s or bookbinder’s shop?) Hu also hints at this in his article:

“[...the undesirable result is that there will be] many intermediate variables that don’t correspond to separate entities of the application domain” (emphasis added)

So to generalize the “harmfulness” theory, we need to refer to the degrees of freedom in the problem statement, in the domain intention. A programming feature is harmful in proportion that it has more degrees of freedom than in the domain intention that it is used to solve. By degrees of
freedom we mean the potential parametrizations (arguments, properties, attributes, or parameters) for the abstraction and the gamut of their values. This latter is determined by the parameter type: the range of possible parameter values can be small for enumerated types, or very large, for example
the “parameter” is a statement list as in a loop. Harmfulness is related to excess
degrees of freedoms: more parameters than necessary, or larger parameter types
than necessary for the purposes of the problem intention.

The theory seems to work with the two key examples above. For example, a Go To that is used with a label parameter to implement the end of a loop has more degrees of freedom than it is necessary; it could go to any place in the scope when it needs to go to just to a single well defined place which
is the test for loop completion. Similarly the dataless object has too many degrees of freedom when it is encoding a modular unit and its procedures. The object type and the object instance are all superfluous for the purpose.

It is worth restating what is wrong and what is right with extra degrees of freedom.

The wrongs are twofold: work and errors. For any parameter the programmer has to choose a value – that is work. If the parameter is “extra”, or if the parameter type accepts a greater infinity (powerset) of values relative to the problem, this work does not get easier; in fact, if anything, it gets harder. By lacking proper motivation, the naming of extra quantities (such as the arbitrary label at the top of the loop) can be particularly vexing as we discussed
in the loop constructs in an earlier post
. The second problem, the errors, comes from to straightforward communication theory. The larger the space of encoding, the larger the error rate. This is why voice recognition systems with limited vocabularies work much better than those with unlimited vocabularies. This is why the Palm Pilot’s Graffiti has lower error rates than other, more general, handwriting recognition systems, and keyboards, have lower error rates still. Similarly, the error rate of Go To statements that encode loops will be greater than the error rate of a structured statement, which is by the way, still not zero.

Here are a few things that can go wrong with a structured “while” statement. Consider the code:

    while (i<0);
        f(i++ );

The result is probably not what was
intended due to the extra semicolon in the first line. What would be more
intentional here? Suppose we could say:

f([i, 0))

meaning

“execute f(j) for all j in the interval starting with i, up to but not including 0 – in other words in a half
closed half open interval; and yes, we do want to use the standard mathematical notation [a,b) for such an interval”

Using this abstraction there would be fewer degrees of freedom and there would be no possibility for the above error, and we would also eliminate the possibilities for other errors in this simple while loop like these:

    i<0 or i<=0?
    i++ or ++i?
    i written in the comparison and some other variable written by mistake in the increment.

Of course this interval construct may not represent the absolute minimum in the degrees of freedom, but to go further we would have to know more about the problem being solved. However, the
example shows that even with very local knowledge we can make substantial reduction of work and of errors if we concentrate on the degrees of freedom.

Why do we still like degrees of freedom? Why were the Fortran programmers upset when the future of their GOTO’s was threatened? Strangely enough it had to do with the lack of choice.

What I mean by this is that if we have limited choice of tools, for whatever reason, we will rationally choose the most general one so that we can cover a greater set of problems. So the Swiss Army knife is the choice of mountain climbers. It would not be the choice for the master bookbinder, not because the multi-purpose tool would not work in bookbinding, but because its degrees of freedom would mostly just get in the way both by having to open the right blade each time and it could also cause “binding errors” when the general purpose blade slipped while used for burnishing, for example.

So the better question is how come we programmers have a limited choice of tools at the level of language features? After all we are not limited in creating procedure contents, names, comments, icons, error messages, and so on. Since we are more like master bookbinders than mountain climbers, we
should be better equipped.

Footnote: Actually, even within these normal functions of programming languages we have some limitations: in comments I always wanted to include sketches and diagrams, but I cannot. Names
are also quite limited: the 50 year old rule of “only the 26 upper case letters in names” is breaking down only at the glacial speed of roughly 1 extra character/year. In programming we can’t have more than one name for something (name or cite or point to one thing or person that has only one name) and things in programs with names that somehow become similar to other names around them can lose
their identity, which is another rule that would be utterly impractical in real life. This all may sound like griping, but in fact as we are used to restrictions in the small things, we accept the restrictions in the large.

This is further illustrated in our earlier post about the
long tail of programming languages
. All languages at the top were general purpose languages while we found more domain specific languages further down. Today we live in the reality that you have to pick one language and you are stuck with the abstractions of that language. The ability to tune or extend a
language to get closer to the domain we solve problems in is yet possible. And of course mixing languages leads to all kinds of complications because our tools do not work well across languages.

Just as we need a language and a run-time to create the procedures that we want, we could have a language and a run-time to create language features. Unfortunately languages for syntax directed compilation (such as Yacc) did not solve this problem – they described parsers, not language features, they, in effect, had “too many degrees of freedom” which made them very difficult to use. The main difficulty, of course, was the need for the design for the syntax itself – remember, what we set out to create was a language feature, not a syntax. The road toward creating solutions to the meta-problem is truly an arduous one. But until we increase the choice of
language features corresponding to the greater variation in applications domains we face, more and more of the features we now have, will at one time or another have to be “considered harmful” as the degrees of freedoms of the features exceed the degrees of freedoms in the domains.

Share →

13 Responses to Feature X Considered Harmful

  1. Sam Spadafora says:

    You’ve mentioned C# a couple of times in this blog now, including a statement that it’s your favorite programming language. I presume that your forthcoming product will be written in C#.
    The comparison of current software devlopment to how nature generates a human from DNA was most pleasing. I also liked very much your exposition in “Things I Believe But Cannot Prove” of how eg today’s flight control programs are bizarrely costly.
    Why don’t you use Lisp? Lisp of course stands alone in terms of simplicity. I understand that IP involves dynamic alteration of the parse tree, and Lisp stands alone here too. Furthermore, the progression of C, C++, Java, C# is one of more and more machinery. Is this not antagonistic to the notion of generating software from a seed?

  2. Dear Sam,
    Re C#: It is my favorite language in the practical sense; given the current technology and considering the libraries, the development environment and costs we are indeed using Microsoft Visual C# .NET enhanced with JetBrains’ ReSharper. In my earlier post I actually used the expression “my choice” instead of “my favorite”. We are also in the process of bootstrapping to our own system so I can say that my real favorite is the Intentional system.
    Re Lisp: This is a very interesting point. Lisp is historically very important and intentional software is more similar to Lisp with Lisp macros than to any other of the well-known languages. But Lisp is over 40 years old now and it was born in very modest circumstances (the list primitives CAR and CDR stood for the contents of the address register and the decrement register of the IBM 704.) So Lisp used its data model for three purposes:
    1. to represent the program – this is the heritage that Intentional primarily builds on.
    2. to be the syntax for editing the program text
    3. for the run-time environment
    The unification of these concepts was and has remained very profound and placed Lisp way ahead in its application to difficult problems. However, today we enjoy more resources and face a greater variety of problems. For this reason, intentional software separates the notation from the representation (using editable projections of the program representation), and also separates the issues of the run time environment (using generative programming.)
    This of course means that the system is much more complex than Lisp. But I feel that simplicity is not the ultimate goal (otherwise we would be all working on Turing Machines ;-) but being close to the problem domain is the goal. So we need to be able to track the complexity of the domains as closely as we can – if we are more complex we get the harmful “superfluous degrees of freedom” and if we are less complex we will have to model the domain complexity with the implementation primitives and we will incur not only the conceptual costs of the inherent complexity but also the modeling overhead. This will show up in the program sources, and in run time debugging as a jumble of primitives instead of the underlying domain structures.
    So my complaint with C# is not that it is too complex, but that its complexity can not track the domain.

  3. Franz Bell says:

    I’m really interested in the forthcoming International Software products. When do you plan to make them available to the public? There will be an Early Access program?
    thanks, Franz

  4. Magnus Christerson says:

    Franz,
    Thank you for your interest in the Intentional System. We are not yet ready to discuss timing of our product releases, but we will start to discuss more of the technology and how it applies to the problems we see in the industry. Part of that is this blog, so keep watching here.
    And, yes, we do intend to run Early Access programs.

  5. Werner Schulz says:

    I disagree with the proposition that dataless objects are harmful.
    As a usefull classification, objects can be divided into three groups, entities, value objects and services. The first has identity but its data content varies over its lifetime, value objects are immutable and have no identity, while services are neither. There are many cases where I want to encapsulate a service (simplest ones are algorithms) but where I need more than a procedure.
    Secondly, the old argument about modules comes up again. Modules are not sufficient! They only provide namespace but otherwise still degenerate into global functions and data. Just look at VB6, Modula and Fortran 90 modules. They don’t solve the problems but classes/objects do.

  6. Charles Simonyi says:

    Dear Werner,
    I think your classification for object use is an example of a useful refinement that one can superimpose on the implementation language. The point is that most (all?) languages do not make this distinction in a natural way – do you put comments in your code or use a naming convention to distinguish “entities”, “value objects” or “services”? Are there uniform ways in the IDE to communicate this to other people in the project, to enforce consequences, or to tune the distinctions when necessary? This is where language workbenches will be indispensable. Today Java or C# simply say “class” and your intentions and distinctions are either lost or relegated to comment status where it is impossible to process them (unless they are in xml, which creates other problems)
    The old argument was indeed “classes vs. modules” and classes had to win because they did more. The new argument will be “classes with information hiding and with many other things” in a way where the intention will be clearer.

  7. You say that, in contrast to Lisp, you separate notation from representation, and you separate issues of the runtime evironment. I don’t understand what you mean by the latter (wrt runtime environments). Could you explain this in more detail?
    I think I understand what you mean by the former: Basically, you want to add more syntactic variations than Lisp allows you to have. However, I wonder what problems this effectively solves. I think that the semantic closeness to a problem domain is much more important than the notation one uses, right?
    Are you aware that there exist a number of very advanced and modern Common Lisp implementations that fit nicely in today’s programming tasks? See http://lisp.tech.coop/implementation for more information…

  8. Magnus Christerson says:

    Dear Pascal,
    Lisp was a powerful and influential innovation as it unified notation, representation and runtime environment around a simple construct – Lists. This is a very powerful model, and of course it’s easy to build more sophisticated structures like Trees etc around this model. As long as your domain is easily represented in lists and your data is easily represented, things are wonderful. But in a more general case, many notations and representation require different data structures. Think for example of mathematical formulas, a spread sheet or graphical images or for that matter SQL, C#, HTML or XML either in form as the program or the data. Yes, they could be done in lists, but for many uses it would be impractical and not the optimal structure, I think.
    By separating the notation from the representation and also from the runtime environment, we introduce more degrees of freedom to optimize for the domain and the required implementation. As you correctly point out, the semantic closeness of a problem domain is very important. So we need the possibility to optimize there. And for practical purposes, the notation is similarly important, especially when we engage non-programmers. Tables might be the most natural notation for accountants, for example. And compatibility with other run time implementation structures is also important in today’s heterogenous and interconnected environment. We really hope to have the Lisp cake and eat it too ;-) .
    And thanks for the list of Common Lisp environments – seems like the Lisp community is still healthy. Unfortunately I did not see Intellicorp’s KEE environment on the list, a wonderful Lisp based environment I used heavily in graduate school.

  9. Dear Magnus,
    Thanks for your reply. Here are some minor comments.
    Your response seems to imply that lists are the only data structure available in Lisp. This is, of course, not the case. ANSI Common Lisp provides classes, structs, all kinds of numbers, extensible characters, single- and multi-dimensional arrays, strings, hash tables, streams, etc., as part of the standard. These data types do not feel “alien” in Lisp either.
    I don’t really buy the argument that notation plays an important role, not even for non-programmers. If you replace textual with graphical representation, there are typically no complaints that such representations don’t reflect “traditional” notations either, so people seem to be able to adapt. However, only time will tell who’s right in this regard. ;)
    Thanks for the hint to KEE. Apparently, that tool was for an older dialect of Lisp. One of the steps that Common Lisp took was that it switched from dynamic scoping to lexical scoping, which was an important step forward. For that reason, you typically only find Common Lisp and Scheme as the Lisp dialects used nowadays, apart from a few exceptions, because they are the only ones that took lexical scoping seriously. Unfortunately, this indeed meant that some good ideas were thrown out as well. Let’s hope that they will be properly rediscovered…

  10. Hello Pascal,
    We take your point that Lisp has greatly evolved from its early days.
    As for the notation question we will see. I think there is a spectrum here, so not black and white. Charles often tells the story about how the WYSIWYG metaphor emerged (and I’ll ask him to write something about this here). I remember using a pre-wysiwyg text editor and typesetting systems. You could certainly produce nice looking documents with the commands available (italic, bold etc), but it was much easier once the Wysiwyg editors became mainstream. For younger readers, it’s like comparing the coding to a good web design tool. The coding tools can typically control more properties (degrees of freedom as we discussed here http://blog.intentionalsoftware.com/intentional_software/2005/05/feature_x_is_co.html), but for most users the higher level design wysiwyg tools is easier to use, and therefore make it accessible to many more users, just like the wysiwyg text editors did.

  11. schemeboy says:

    The lisp people ask “why don’t you use lisp?”
    I think it is a waste of time to ask this question: those who know and understand what lisp is about use it. People who don’t stick to what they know.
    Asking this question is a bit like watching someone in a pool of water thrashing while doing the dog paddle and asking, “why don’t you do the crawl instead?”

  12. Magnus Christerson says:

    Lisp still, after more than 35 years, has a large and active user base which shows its excellence; an excellence we think should, and could, be made available to a broader audience.