Posted on 16 Comments

Delivering provably-correct code

A while ago on this blog I mentioned the idea that a programmer’s job is not to deliver code, but to deliver provably correct code. Twitter responses ranged from strongly negative (wrong on so many levels I don’t know where to start) to mildly negative (not completely wrong, but incompletely right). There were no positive responses.

The converse of deliver provably correct code is deliver crap code, as I read it. Yet, it seems unlikely that is what those people meant to say. Unfortunately, no one followed up to explain their interpretation. I can only speculate that my choice of words has different implications for them than it does for me.

Since no one wants to explain their interpretation of the phrase, I’ll explain mine.

The key point is that everyone who has a stake in the product needs to feel confident that the code works. Different stakeholders will have different levels of interest in this, and will want assurances of it at different times. To that end, we would like to have reliable, repeatable, easy-to-use, and possibly even automatic means to give people confidence in the code. These methods should not depend on the physical presence of the original author. (Here ends the short version.)

The phrase deliver provably correct code comprises words that are subject to interpretation. The word programmer is also subject to interpretation. Here are my interpretations of the words.

Programmer

Software development calls for the application of several different skill sets. These include such things as analysis, user experience design, data architecture, solution architecture, software design, programming, testing, documentation, deployment, and user training. In very simple cases, a single individual can handle all the development work. This is probably not feasible in an effort of any significant complexity.

When I use the word developer I am thinking of a person who performs one or more of these activities, and who may specialize in as few as one of them. When I use the word programmer I am thinking of a developer whose primary area of specialization is designing and writing the code. Other developers may specialize in different areas, such as software testing, database design, network security, and others. Usually, a subset of these development skill sets is required to deliver a solution. Here we are concerned specifically with the programming skill set.

Deliver

In the context of the phrase, deliver provably correct code, the word deliver means to make your code available to someone else. Your way of working will determine who “someone else” is and at what point they see your code. Here is a non-exhaustive list of possibilities for illustration:

  • Traditional environment, old school methods. You work alone. There is no continuous integration. There may or may not be a version control system. If there is, then programmers only check in their code when they believe it is “finished.” No one else sees your code until you declare it is complete enough to be seen. The point you are ready to show your code (or check it in) is your delivery point.
  • Collaborative environment, mixed methods. You mainly work alone, but engage other team members as needed. Programmers may check in incomplete code from time to time, but this does not rise to the level of continuous integration. Testers may informally check some of the code before it is formally handed off, on request. Each point when you check in code or show it to a team mate for early testing or for technical collaboration is a delivery point, for purposes of this definition.
  • Highly collaborative environment, multi-skilled team members, frequent crossing of “role” boundaries. You mainly code in pairs. Integration and testing are frequent, if not nearly continuous. Team members routinely check in code throughout the day and use feedback from the build to guide their work. You deliver code nearly continuously.

Regardless of exactly how you go about producing code, any time you deliver the code other interested parties need a simple and low-effort way to be confident it is correct, even (or especially) if you are not in the office at the moment.

Correct code

For purposes of this definition, correct refers to functional correctness, and possibly to some basic non-functional requirements. It does not imply the code is production-ready. It is the level of correctness we can expect a professional programmer to provide. It is the level of correctness the code must have before it can be exercised properly by software testing specialists. In a collaborative development environment, it is the level of functional correctness your team mates have a right to expect when they check your code out from version control. Minimally, this includes:

  • Code behaves according to expectations through a representative set of sample executions, given both valid and invalid input values, at least at the unit level.
  • Code handles border conditions properly.
  • Code handles unexpected (but typical) events such as exceptions in a way consistent with the team’s or organization’s standards.
  • Code complies with any relevant standards or guidelines for naming, branding, structure, and interaction with other systems in the environment.
  • Code deals with well-known technical issues that are not usually explicitly stated in application requirements, but that are expected in professionally-developed code, such as preventing buffer overflow exploits, etc.
  • Code supports any relevant governance requirements and regulatory requirements.
  • Code provides any relevant “hooks” into enterprise facilities such as auditing, business activity monitoring, or security systems.
  • Code satisfies basic non-functional requirements insofar as this can be validated using the tools normally available to programmers.

Whichever of these aspects is relevant to the code you are writing must be in an acceptable state each time you deliver code. If you deliver once a month, you can get away with deferring the non-functional aspects and doing them just prior to delivery. On the other hand, if you are working in a collaborative, rapid-delivery mode, then you will have to learn to build support for these aspects incrementally.

Provable

For purposes of this definition, provable means there is a way for interested parties to assure themselves the code is in acceptable condition at any time. We want this to require a minimum of effort on their part.

Earlier I described the means we use to prove code correctness as “reliable, repeatable, easy-to-use, and possibly even automatic.” When people have to expend additional effort to prove the correctness of the code, they will tend to overlook details or to neglect to do it at all. We want our “proof” to be as painless and mistake-resistant as possible.

What you don’t write, you need not prove

The most automatic, most repeatable, and most reliable way to produce correct code is to avoid writing it altogether. Generally, we can assume code that other people wrote has already been vetted. We only need to worry about code we write ourselves. Depending on the development tools we use, some portion of the solution may be provided without any need for custom code. Those portions of the solution are automatically valid. Different kinds of tools provide different levels of pre-proven code:

  • code generator
  • service orchestrator
  • framework
  • callable service
  • third-party library
  • language features

Code generator

For very straightforward applications with little or no unusual functionlity, it may be possible to use a code generator to produce a complete or nearly-complete solution. For example, Oracle’s Application Development Framework (ADF) can produce a fully-functional CRUD application based on an Oracle database schema. It offers configuration options that allow for customization, and the generated application can be extended with custom code if necessary. Only the custom code has to be “proven” explicitly.

Even custom applications usually contain quite a bit of boilerplate code. Many development tools can generate some or most of this code. Tools like Visual Studio for .NET, IntelliJ and Eclipse for Java, and Ruby Mine for Ruby can produce certain common blocks of code automatically. We need not “prove” this generated code is correct. An example of this is Eclipse’s feature to generate accessor methods for Java classes.

Service orchestration

When reusable components are implemented (or wrapped in) Web Services or RESTful services, some applications (or portions of applications) can be constructed by orchestrating selected services. Products like Microsoft’s BizTalk provide tools to assemble existing services into a meaningful sequence of execution. Tools like Juju support the same approach using cloud services. (This is not meant to be an exhaustive list of products or a recommendation.) When the nature of the problem lends itself to this approach, it may be possible to create a useful solution with a minimum of custom code by orchestrating existing services.

Frameworks

When our solution aligns with a well-known reference architecture, we can usually find frameworks designed to support that architecture. For example, many business applications these days are based on Web technologies. The Model 2 architectural model — a variant of model-view-controller in which the controller and view are tightly coupled — is a popular reference architecture for Web-based business applications. Quite a few frameworks exist that support Model 2, such as Rails, Django, Spring MVC, and JSF, to name just a few.

Provided we take care to keep our application code loosely coupled with our framework-interaction code, we can “prove” our application code without burdensome dependencies on the framework. It also falls to us to prove our application interacts properly with the framework. We need not prove functionality the framework supports directly.

Callable services

Sometimes a portion of our solution requirements can be supported by a third-party component that exposes a public interface. Examples include Web Services, RESTful services, and various forms of RPC-based services. These are loosely coupled with our application code, making it easy to isolate our custom code for explicit “proof” while trusting the third-party components have already been vetted by their owners.

Examples include the Weather Channel’s API for access to weather information, described at http://portal.theweatherchannel.com/, and Google Maps web service APIs described at https://developers.google.com/maps/documentation/webservices/. Any organization that uses a service-oriented architecture will have its own services that are available to application developers, typically providing implementation-independent access to enterprise information assets. Earlier technologies for component-based solutions may still be in use in some environments, as well. These may not be entirely implementation-independent.

Libraries

Third-party libraries that provide utility functions offer a slightly finer-grained level of pre-proven code than services. Libraries typically contain several functions that relate to the same general area of software, such as object-relational mapping, XML processing, mathematical and statistical functions, transforming and formatting data, or wrapping a low-level communication protocol.

With respect to “provability,” libraries have the same general considerations as services. That is, we need not prove the functions work, but we do need to prove our own code behaves properly regardless of what happens when we call a function. Libraries are not implementation-independent, like services. They are packaged in whatever manner works with the given programming language — .NET assemblies, Java packages, Ruby gems, .dll or .so files, etc.

Language features

Programming languages offer various levels of assurance that code written in the language will be “correct” for some definition of correct. Obviously, no programming language can “know” and guarantee the intent of an application. But languages that have strong type systems can provide a level of confidence about correctness.

For example, Idris (see http://idris-lang.org/) supports a strong type system that can assure certain kinds of code correctness. Edwin Brady, the creator of Idris, explains that “dependent types allow types to be predicated on values, meaning that some aspects of a program’s behaviour can be specified precisely in the type.” By virtue of the fact you are writing in a language that supports dependent types, you automatically have a certain level of “proof” of correctness with no additional effort on your part.

Some categories of problems may be solvable using this kind of type system alone, but it has limited application in the business application domain. Even so, similar reasoning applies to more-conventional statically-typed languages, in a much more limited way. A Java collection declared as follows

    List myList = new ArrayList();

might contain objects of any type. If our application requires the collection to contain objects of type Pokemon, then we have to go to some special effort to prove it will always contain objects of that type. We will also have to include logic in the solution to deal with cases when the list contains some other type, and in turn we will have to prove that code is correct. On the other hand, if we declare the list this way

    List<Pokemon> myList = new ArrayList<Pokemon>();

then it is not possible for the application to add any other type of object to the list. We need not “prove” this is true because it is guaranteed by a feature of the programming language.

Proving the correctness of code you can’t avoid writing

Sooner or later, you will arrive at a part of your new application that no one else has already written for you, and that cannot be generated automatically by a tool. When you deliver that code, it has to be provably correct, but it won’t happen automatically. You have to do something to make it so.

Early in the history of computer science, people became interested in the idea of using computer programs to prove the correctness of other computer programs automatically. Seminal papers by Robert Floyd in 1967 and C.A.R. Hoare in 1969 gave the field its start. There is too much material to summarize in a blog post. Please allow me a bit of an oversimplification — if we think of a program execution as a state machine, we can make assertions about the state of the machine before and after the program executes. A formal approach to this makes the process mechanical and therefore programmable. Because they are programmable, formal proofs of this kind can be made automatic. If you want to pursue the formalities further, I suggest the following resources as a starting point:

On a practical level, what we can get from all this when we’re in the middle of a development project in a corporate IT shop is a couple of types of tools for proving the correctness of code.

One type of tool performs a static analysis of the code that can determine whether the code is semantically valid. It consists of a grammar that defines how statements can be codified, axioms that are formulae asserted to be theorems, and inference rules that define mechanical ways to derive new theorems from old ones.

The other type of tool dynamically exercises the code by executing it and making assertions about the state of the system before and after the execution. This seems to be of more practical use in a business software development context than the static analysis type. It does not necessarily prove the code will handle literally every possible input value, but it can go beyond proving just the semantic validity of the code.

Both static and dynamic proofs are expressed as Hoare Triples. The standard notation looks like this:

    {P} C {Q}

where P asserts the precondition, Q asserts the postcondition, and C is the code to be verified. An example might look something like this (not a real language):

    { a > b } 
    begin max(int a, int b) {
        if a >= b 
            return a
        else 
            return b
        end
    end
    { result == a }

This became the basis for development techniques called design by contract and test-driven development, as well as providing the underpinnings for automated unit testing tools. These techniques and tools give us exactly what we need to deliver provably correct code.

Using design by contract, each function or method has a well-defined calling protocol known as a contract. If the type system of the language we are using is robust enough to define the contract completely, we are in good shape. Otherwise, we include assertions in the code to check that callers have adhered to the contract. For example, consider a Java method whose purpose is to average two numerical values.

We specify a contract that the input values must be positive and non-zero floats. The type system of the Java language can guarantee that the input values are floats, but cannot guarantee that they will be positive and non-zero values. We declare the method in a way that exploits the type system as far as it will take us, and we write assertion statements to fill in the rest of the contract:

    public float average(float value1, float value2) {
        assert value1 > 0 : value1;
        assert value2 > 0 : value2;
        return ( (value1 + value2) / 2);
    }

The Java language allows assertions to be enabled or disabled at compile time. We might decide to enable assertions in the test environment and disable them in production for performance reasons. Alternatively we might decide to keep the assertions enabled in production for safety.

Many languages support assertion statements to specify invariants, and something similar to the example above might work just fine, but the example is not ideal for Java. It is not recommended to use assertions to validate method arguments in Java (see this guide for a discussion). One reason is that when an assertion is not satisfied it will throw AssertionError, which is not very descriptive.

In Java, assertions are normally used as a last resort in the default clause of a switch statement or in the last else block in a series of if-else blocks, as a sort of safety valve to protect against proceeding with a routine using dangerously wrong inputs. They aren’t really meant to validate functionality.

We can include conditional logic rather than assert statements to verify that method arguments adhere to the contract:

    public float average(float value1, float value2) {
        if (value1 <= 0 || value2 <= 0) {
            throw new IllegalArgumentException("Arguments to average() must be positive and non-zero; called with " + value1 + " and " + value2);
        return ( (value1 + value2) / 2);
    }

It’s a bit more verbose, but possibly better when we are using Java. Other languages may not have this sort of issue. With this code in place, whenever we run the application it will throw InvalidArgumentException if client code invokes the average() method with inappropriate arguments. That exeption gives us a more precise clue than AssertionError about what’s wrong.

We can provide a higher degree of confidence in the code by verifying this behavior at a finer level of detail. We can write unit test cases to ensure the average() method behaves according to expectations. Unit test cases can be written after the fact or they can be used to drive development of the production code (test-driven development, or TDD).

If you are writing fresh code, consider using TDD. If you are working with existing code, you may need to refactor to break dependencies so that it becomes feasible to write properly-isolated unit test cases.

Unit test cases written for the JUnit test framework might look like this:

	
    @Test
    public void average_returns_the_average_of_two_floats() throws Exception {
        assertEquals(5.0f, average(2.5f, 7.5f), 0.5f);
    }

    @Test(expected=IllegalArgumentException.class)
    public void average_throws_when_first_argument_is_zero() throws Exception {
        average(0.0f, 7.5f);
    }

    @Test(expected=IllegalArgumentException.class)
    public void average_throws_when_first_argument_is_negative() throws Exception {
        average(-1.0f, 7.5f);
    }

    @Test(expected=IllegalArgumentException.class)
    public void average_throws_when_second_argument_is_zero() throws Exception {
        average(2.5f, 0.0f);
    }

    @Test(expected=IllegalArgumentException.class)
    public void average_throws_when_second_argument_is_negative() throws Exception {
        average(2.5f, -1.0f);
    }

The same general idea applies to other Java unit test frameworks and to unit test frameworks for other languages.

This approach provides the repeatability and automation we want, both for checking the basic behavior of the code and for serving as a unit-level regression test suite to help us with future enhancements.

Exploit well-known software design principles

There are various software design guidelines that help us craft clean code. Possibly the most popular set of guidelines for object-oriented design is known as the SOLID principles (see Robert C. Martin’s paper at http://www.objectmentor.com/resources/articles/Principles_and_Patterns.pdf). At the heart of all these ideas is the notion that we should strive for high cohesion and low coupling.

Keeping that basic principle in mind helps us structure our code in a way that makes it easier to deliver provably correct code. One situation in particular is very common — using a framework of some kind in conjunction with custom code. When we use a framework, our application consists of three (at least) distinct types of code:

  1. Custom code that we write ourselves;
  2. Functionality provided directly by the framework; and
  3. The “glue” between 1 and 2.

The functionality provided directly by the framework is code that someone else has kindly written and tested for us. We don’t have to prove it works.

Our custom code is hand-written, and there are no shortcuts for proving it works. A set of well-considered, automated unit tests seems to be the best way to prove the correctness of this sort of code. The unit tests should cover all the issues that are not provided automatically for us by the language’s type system or by our development tools.

The “glue” code should also be covered by automated unit tests, using test doubles for any framework components so that the unit tests can run outside the framework’s usual execution environment. Running the tests outside the normal execution environment means we know the reason for a test failure will be that the code does not behave as expected, and not that we configured the runtime environment incorrectly or some remote resource outside our direct control happens to be unavailable. It also enables unit test cases to run very fast, so that we can include many of them in each build that the continuous integration server starts.

The reason I call out this situation in particular is that I have seen quite a bit of code in the field in which the application logic is tightly intertwined with the “glue” code — webapps that commingle input validation logic with the code to move values in and out of framework-specific containers, database apps that embed product-specific SQL statements directly in the application code, functions or methods that include calls to static methods that are only available in a framework’s runtime environment, etc.

This sort of tight coupling makes it all but impossible to write fine-grained unit test cases that can run independently of the framework’s runtime environment. By implication, it means programmers have no practical way to deliver provably correct code. It is important to maintain loose coupling between the application logic and the “glue” code for interacting with the framework.

Here is a Java example that proves the correctness of the “glue” code in a Portlet filter class, as well as a little bit of custom functionality. A Portlet filter must call the doFilter() method on the FilterChain object that is passed in (glue). Our filter sets an attribute value on the RenderRequest object that is passed in (custom functionality). We assume the framework knows how to instantiate and pass those two objects (framework functionality), so we don’t explicitly verify that behavior.

A Portlet container is not required to run the test because the framework components RenderRequest, RenderResponse, and FilterChain are fake. The test cases demonstrate that the custom code we wrote to interact with the framework does in fact interact with the framework in the way we expect. It’s okay that the framework components are fake because our immediate purpose is not to verify the functionality of the framework itself.

public class SampleRenderFilterTest {

	private SampleRenderFilter filter;
	private RenderRequest request;
	private RenderResponse response;
	private FilterChain chain;

	@Before
	public void before_each_test_case() throws Exception {
		filter = new SampleRenderFilter();
		request = mock(RenderRequest.class);
		response = mock(RenderResponse.class);
		chain = mock(FilterChain.class);
		filter.doFilter(request, response, chain);
	}

	@Test
	public void it_calls_doFilter_on_the_filter_chain() throws Exception {
		verify(chain).doFilter(request, response);
	}

	@Test
	public void it_sets_the_value_of_request_attribute_foo() throws Exception {
		verify(request).setAttribute("foo", "bar");
	}

}

This is the code under test:

public class SampleRenderFilter implements RenderFilter {

	@Override
	public void destroy() {
	}

	@Override
	public void init(FilterConfig arg0) throws PortletException {
	}

	@Override
	public void doFilter(RenderRequest request, RenderResponse response, FilterChain chain) 
			throws IOException, PortletException {
		request.setAttribute("foo", "bar");
		chain.doFilter(request, response);
	}

}

Summary

Depending on the nature of the solution and the development tools in use, we can choose from a range of different ways to provide high confidence in the basic behavior of the code. Code generators, the type systems of programming languages, frameworks, standardized services, and libraries can provide pre-validated building blocks we can use to construct our solutions. When it comes to proving the correctness of code that we write with our own hands, we have to take special measures such as writing automated unit tests. Each of these things takes us part of the way toward provably correct code.

Given the characteristics of our working context, we want to choose the lightest-weight and least-intrusive methods of proof that make sense. We want it to be as easy as possible for people to prove the correctness of the code at any time they need to do so.

An invitation

It’s possible that even after all this explanation, people will consider the idea of delivering provably-correct code incompletely right at best. I invite them to help me make the explanation more completely right, keeping in mind the intended scope of the statement — that is, not intended to assure the code is absolutely production-ready, but just that it is clean enough for others on the team to use with confidence even when the original author is not available for consultation.

It’s possible that people will consider the whole idea to be wrong on so many levels that they don’t know where to start criticizing it. I invite them to imagine, for a moment, that I am their team mate, QA lead, project manager, or key stakeholder, and to explain how they propose to give me confidence that their code is correct, if not by using one or more of the means described here.

I look forward to learning.

16 thoughts on “Delivering provably-correct code

  1. Hi Dave, one problem is that nobody has ever (that I know of) written a formal specification for something like a web service. This would be an interesting exercise, but one that will be quite difficult.

    A second problem is that most of the time when people need code to be written, they don’t really know what they want–certainly not to the level that would permit them to specify it in math. This is probably a bigger problem than the first one.

    A third problem, also significant, is that I’m not sure that people value correctness enough that they are willing to pay for it. Let’s say you can get pretty good code (passes test cases, etc.) in three months for $10,000 and you can get proved-correct code in nine months for $50,000. Which would you choose if you’re trying to deploy a product like right now? In my experience the actual multiplier is much larger than what I’ve implied here.

    In summary your goals are good but I think you will notice a large gap between your example specifications and what the spec for a serious real-world API would look like. I think the kind of thing you envision will happen (at least in some industries, but probably not web services) but it’ll take another few decades of research to make it possible.

    1. Hi John,

      Thanks for your thoughtful comments.

      * Formal specification for a web service.

      For Web Services specifically, the formal specification is the WSDL. Are we thinking of two different things?

      * Specifying a program in math

      Agreed. Mathematical methods of proof can apply to implementations of mathematical algorithms. Some people believe it should be possible to prove business applications mathematically, too. To me, that seems like the “perfectly elastic spherical horse in a vacuum” thing. Even if it’s theoretically possible, it would take too long to be practical, and would be significantly more difficult than the application itself.

      The idea of specifying a program in math did not occur to me. I can’t visualize how one would do that.

      * Willingness to pay for correctness.

      “Let’s say you can get pretty good code (passes test cases, etc.) in three months for $10,000” – that meets my definition of provably correct, actually.

      Applying well-known software engineering principles and using techniques like TDD doesn’t extend development time as compared with hacking up crap code, so the cost of initial development is about the same either way. To compare apples to apples, we have to factor in the costs of support, enhancement, and ultimately replacement of the solution when it is no longer maintainable.

      * “I think the kind of thing you envision”

      I must not have expressed my ideas clearly, because the post does not describe an abstraction that I envision, but rather a practical reality in everyday work. It’s just not a reality in all organizations, unfortunately, because many programmers don’t take care to deliver clean code.

      * “will happen (at least in some industries, but probably not web services)”

      Okay, now I’m pretty sure you and I are not thinking of the same thing when we read the phrase “web services”. What does the term mean to you?

      * “another few decades of research to make it possible”

      Clearly I have not expressed myself clearly. Sorry about that.

      Now I have to get back to doing, in real life, with real life specifications and APIs, exactly what I described in the post, for a real life client who cares about both correctness and costs.

      Cheers,
      Dave

  2. Hello Dave.

    The words “provably correct code” have a precise meaning: code that you can mathematically prove to be correct. Why do you want to dilute that meaning to include things like “code that’s good enough for my purposes” ?

    1. Hi Matteo,

      I don’t want to dilute the meaning of terms that are already established. What do you think of the term, “demonstratable” or something like that, instead of “provable?”

      What worries me is that (some or many) programmers (seem to) assume that the only alternative to “mathematical proof” is to hack up crap code with no thought to how others can have any confidence that it will work. How do you suggest we address that problem?

      Thanks,
      Dave

      1. How do we address that problem? Don’t we do that every day with the kind of work we do? 🙂 But seriously, I get your point. Perhaps you could say “convincingly” correct? Because what we need to do is to convince ourselves and our collegues that our code is correct. Technical reviews worked well in a team I worked with (in addition to lots of automated tests).

        1. For certain definitions of “we,” yes. Unfortunately, it doesn’t seem to be done in most organizations.

          1. Hi Malapine,

            “well-tested” is not the same as “I read this code through and through and I reasoned about it and I’m convinced it works”. For concurrent code testing may be not enough; you might want to inspect the code and reason about it. You might want to reason mathematically about it, and you may do that at different levels of depth and of formality. Even better, reasoning about the code might lead you to find ways to make it more simple so that the correctbess becomes obvious.

          2. On reflection, I think Matteo is right about “well-tested.” It does avoid the miscommunication caused by my use of the word “provable,” but it still doesn’t quite capture the core idea here. For one thing, “well-tested” is a bit subjective. How well-tested does the code have to be? For another, “testing” is not the only way to achieve “confidence” in the code. I tried to suggest (perhaps unsuccessfully) that other things besides tests can give us confidence in the code, such as strong type systems, frameworks, code generators, and even just good software engineering practices. Even such a simple thing as following consistent naming conventions can increase confidence in code while also reducing the opportunities for defects. I’m still interested in finding a descriptive term for this concept that doesn’t have any unintended implications.

      2. Without formal mathematical proof, the most honest label for your code would be “well-tested”.

        1. I like that. There may be a more-descriptive term than that, but I haven’t thought of one yet.

  3. Hi Dave, WSDL has no mathematical meaning that I know of. Until it does, we can’t do proofs about it.

    1. Hi John,

      Thanks for the clarification. I think my mistake was to use the word “proof” in this context.

  4. Dave,

    I appreciate that you’ve backed off of the “mathematical proof” concept, but I think the ideas here are still problematic. If you take your definition of provable: “provable means there is a way for interested parties to assure themselves the code is in acceptable condition at any time,” you raise a great question: How do you provide evidence to someone else (especially a non-technical person) that they should have confidence in your code?

    You don’t answer that question. Instead, we kind of just get, “write good, well-tested code,” which is a tautology that adds nothing of real value to the discussion.

    The first hard question is: how “acceptable” do you want your assurances to be, and how do you express that requirement? There have been attempts to define general rules for how to quantify code acceptability, but they have all failed. For example, there’s the concept of “100% code coverage” — that may be a helpful concept, but is problematic for reasons that others have explained very well. (see http://java.dzone.com/articles/100-code-coverage)

    The fact is: you can always do more, up to the point when you are delivering truly mathematically provably correct code. How much should you do? The answer depends on your problem domain and the costs of failure. If you’re designing a social web app, and you’re verifying your software like you did for your military drone control system, you are either doing one or the other wrong.

    True “mathematically provably correct” systems are very much a science. Subjective assurances of the “acceptable condition” of code is still (perhaps too much) an art.

    Finally, there’s a huge problem with the assumption that the libraries, compilers, runtimes and OSs that you rely upon should be “trusted” in any sense, and thus outside the realm of what you must consider when convincing yourself that your application is correct. When you write a new feature for your web application, might be actually delivered to your user via an amazingly complex combination of browser rending technology, client OS, TCP/IP, HTTP, router and switch embedded systems, server OS and virtualization software, server app runtimes, database system(s), ORM frameworks, web application framework, compiler, linker, etc, along with a (comparatively) small sprinkling of your code. The fact that your feature is coded correctly can’t be assured without appropriate attention to how it interacts in/with this complex environment. Again, the amount of attention that you need to pay to any of these depends on your confidence in your perception of the stability/reliability of those systems and how your code interacts with it, as well as the level of confidence that your domain requires.

    You mentioned that you are currently applying these techniques to your work with your client. What evidence do you provide to your client to assure them that your code is correct? A code coverage percentage? Code samples that show that you are exploiting well-known software design principles?

    1. Brandon,

      “…we kind of just get, “write good, well-tested code,” which is a tautology…”

      Yes, I can see that. To clarify, I don’t think I said (and I’m sure I didn’t intend to say) just “well-tested,” for reasons explained in another comment. It’s also true that I don’t answer the question, in the sense that I don’t prescribe a magic pill to solve the problem.

      I would observe that most of the programmers I work with in the field don’t even have a basic concept of delivering reliable code; they just hack up any old code that happens to spill from their fingertips. Given that level of audience, the “answer” I would like to suggest is that they at least think about ways they can give others confidence in the code they deliver. Many of them don’t want to write automated tests, or they don’t know how, or they think they’re such great programmers that they don’t need to; therefore, I suggest that there are other ways to provide confidence in our code. The key point is that we don’t want to shovel crap code out the door as if we were cleaning out a stable on a rainy day, using the lack of robust mathematical proofs or “delivery pressure” or anything else as an excuse to avoid every other possible way of providing confidence in our code.

      “The fact is: you can always do more…How much should you do? The answer depends on your problem domain and the costs of failure.”

      Exactly. I can’t “answer” that in a blog post. What I hope to do is suggest that it’s a question programmers ought to be asking, and to point out a few ways they can get closer to an answer that would work in their context. Once they start thinking about it, they can probably come up with more ways than I can think of on my own. I hope they can, anyway, and I hope they share those ways with the rest of us.

      “What evidence do you provide to your client to assure them that your code is correct? A code coverage percentage? Code samples that show that you are exploiting well-known software design principles?”

      All the above, although personally I tend to de-emphasize code coverage metrics. They tend to lead to gaming. I do like to use some static code analysis metrics to demonstrate code quality, such as cyclomatic complexity. Some of it is a question of (horrors!) documentation. Any organization has architectural standards, governance requirements, and possibly regulatory requirements that must be supported, as well as coding standards and so forth. That sort of thing can be centrally documented, and teams can be reminded that the “definition of done” includes those factors in addition to the functional requirements.

      Some of it is about practices. Once a team becomes accustomed to checking in code frequently and seeing the build run frequently, they become more confident in one another’s code. The two teams I’m currently working with have been accustomed to keeping code checked out for weeks at a time and trying not to work on the same modules, to avoid collisions. I’ve encouraged them to check in multiple times per day and refactor at will, and to use messy collisions as a sort of “indicator” that they might want to revisit the design of a class; it often means the class has multiple responsibilities. At first they were reticent but they are increasingly comfortable with this, and they like the results they are getting. It’s already (in weeks) shortening lead times and reducing defects.

      Pair programming and lunch and learn events in which team members share coding techniques, refactorings, and testing techniques also tend to raise mutual confidence, as team members get to know one another’s thought process and coding habits more intimately.

      You also ask: “How do you provide evidence to someone else (especially a non-technical person) that they should have confidence in your code?”

      Via direct collaboration throughout the delivery cycle, business stakeholders and managers always know exactly where we are, what’s working and what isn’t working at the moment, and why or why not. We’ve also introduced a method known as “specification by example” to express requirements in an executable form that also becomes the acceptance test suite and, ultimately, the regression test suite (on this engagement we’re using Cucumber to support this). This pulls analysts, testers, and business stakeholders together from the start of each release cycle. Stakeholders do not take delivery of some unknown chunk of “stuff” at the end of a release cycle, with no idea what’s in the box. This tends to increase confidence.

      As Matteo mentioned in another comment, many developers already pay attention to these issues in their work. The problem is not with those (statistically few) developers. The problem is with the way code is produced generally. I’m sure I don’t have to remind anyone here of the typical quality of existing code they find in the various companies where they work. The code didn’t get that way by accident. It got that way because so many programmers don’t think about what the definition of “acceptable” needs to be in their own context. It got that way because the programmers did not know basic development techniques that can help them avoid technical debt without incurring longer lead times.

      None of this rises to the level of mathematical proof, of course. The word “provable” was ill-chosen on my part. I hope that mistake doesn’t cause people to overlook the core idea – that programmers ought to pay attention to what they are doing. You might complain that I don’t spell out exactly how they should do this in all circumstances. I’m not smart enough to do that.

    2. Brandon,

      I owe you more explanation.

      “there’s a huge problem with the assumption that the libraries, compilers, runtimes and OSs that you rely upon should be “trusted” in any sense…The fact that your feature is coded correctly can’t be assured without appropriate attention to how it interacts in/with this complex environment.”

      You raise an important point. In the context of a “typical” business application development project, the responsibility of the programmers on the development team is primarily the code they write with their own hands. At the risk of sounding glib, I’ll say the following:

      [1] Why did the team (or the organization’s architecture group) select incompatible frameworks/services/tools? Who has primary responsibility for that decision? Who came up with this wacky stack that doesn’t fit together? Why should our team’s business stakeholders pay for that decision with their project budget?

      [2] There is a point of diminishing returns in testing and re-testing third-party code. We have to draw the line somewhere. Unless there is a specific reason to do otherwise (and sometimes there is), my default is to draw the line at “the code we write” vs. “the code someone else wrote.” It’s a pragmatic trade-off, not a magically-perfect answer.

      [3] In a complicated stack, there are often compatibility issues between specific versions of the various products, and we often have to apply particular patches to some of the products to make them play nicely (or grudgingly) together. In most IT shops this is a responsibility of the enterprise architecture group. “Agile” teams don’t like this, but in a large organization this is an important function, and that group will tell your team which versions of which products are supported in the company’s technical infrastructure. If your organization is small enough that individual development teams choose and install their own third-party products, then in that context it makes sense to spend a bit of time vetting the stack.

  5. My favorite “easy” way to assess code quality is to subject it to random testing.

Comments are closed.