Konubinix' opinionated web of thoughts

Verified/Proved Promised Contracts Based Programming

Fleeting

thinking about your solution to figure out its properties can give you deeper insights into the problem, leading to even better solutions

https://semaphoreci.com/blog/property-based-testing-python-hypothesis-pytest

link with TDD and BDD -> you focus on the final objective -> describe as good as possible the properties the code needs to have. The tests are only the measure of those property in time.

consider tests as being an instrumental objective

how to organise your tests?, common structure of tests, tdd vs bdd,

the test checks that the property P is there, hence

the code has property P => the test passes

therefore the test does not pass => the code does not have property P

but also

the code does not have property P => the test does not pass

therefore

the test passes => the code has property P

There should be a logical link between those. In current situations, there are tests and code, but they are not linked to the promises.

When creating some code, one makes promises about what the code does. The tests should be the way to make sure those (the verifiers) promises still hold in time.

because docs and tests are two sides of the same coin, it would be great to have some link between those, maybe using literate programming. The source of truth would not be the test or the documentation, but this first document that link both. (curated tests)

When a test fails, you can trace it back to the promise that won’t hold anymore. Having a document telling the story and extracting the tests from it would ease this.

Also, this would ease getting to promises coverage.

Anecdote. I was being shown the code of some person that took pride in how much the code coverage was high and in the number of tests. I simply tried the code in the situation that was documented. Doing so, we highlighted several bugs and some “yeah, I know, I need to work on that”. In that case, I think the colleague focused so much on the test that per did not think about what promise about the code per would need to make.

I think that writing low level tests help a lot, mostly during the development (see TDD). But people tend to write them and forget them. I think they need to be constantly revised to check whether they still verify something that makes sense with regard to what the code is promised to do. Sometimes, you write higher level tests that encompass several stuff and are fast enough to be run in all the situations the low level tests also run. In that case, the low level tests are of little importance. Even the argument of “one test for each tiny piece of information” does not hold in my experience,

  1. I have never seen a case where I could not guess the low level issue behind a higher level test,
  2. the low level test is generally so precise that you don’t even understand why you tested that in the first place,

There are so many ways the code may behave that we know we cannot follow all those paths. We have to make choices about the paths we ought to test. Using promises helps a lot doing so. Focusing to much on the low level tests makes people loose sight of the gestalt and the paths that actually make sense.

When one works on a piece of code, one needs to be clear about how per code will be used. Those promises become the contract that the person ought to respect. For instance, if you write a whole application, the contract is towards the end user. Testing the libraries should be done only if there is confidence that it gives insights on whether this high level contract is respected or not. On the other hand, when working in a big company on a big program, you might end up working on only a library of the whole stuff. In that case, I think knowing the bigger pictures helps making good decisions, but the contract is about the library itself, with whomever will use it.

IOT, it is the interface with whomever will use your work that matter, not the lower level details that build those interfaces.

In my mind, this reasoning leads to the same kind of pyramid of tests, with unit tests, integration tests and so on. Only that having this reasoning helps not being right for the wrong reasons and then make better decisions when the test stack becomes so big it cannot pass in a few seconds and some architectural choices need to be taken w.r.t. testing.

For example, it makes total sense to scan thoroughly some function if it is part of your promise.

@given( st.integers(min_value=1, max_value=100), st.integers(min_value=-500, max_value=500), ) def test_gcd(n, m): d = gcd(n, m)

https://semaphoreci.com/blog/property-based-testing-python-hypothesis-pytest

Yet, it makes much fewer sense if this function is only part of some bigger stack that you know won’t use most of its values.

It’s like having E -> f -> F, you could test all the possible values of E, or only the subset E’ that matter for your promises.

Tests should not be done per se, but they should be done because they measure how much we satisfy some promise about some interface. The real question is “Is this promise held about the interface with the user of my work?” and by question substitution, it becomes “How much tests and code coverage do I have?”. I think it is important to work on the latter (as heuristic) while being able to focus on the former when hard decisions are needed.

If one has the sensible habit of always testing the code per write, it is easy to have a high code coverage value in the end. But when the code is not part of the paths triggered by the interface anymore, then this coverage covers mostly dead code.

Also, you may write tests to check some paths that theoretically could happen when using the library, but cannot structurally be called while using the interface.

There should be a link between the interface and the tests so that you can answer the question “why does this tests exist?”. The answer should be “this test is useful because it verifies the promise X”.

Gherkin helps in the sense it gives a high level way of telling the promise the interface makes, written in collaboration with the three amigos. There should be a way to link those to lower level tests.

  1. tell a proposition on the code that is important. This is generally a high level story.
  2. decide how you will measure that this proposition is true
    1. either run the whole story,
    2. or run a thing slightly shorter, but with explanation (proof) of how it entails the proposition,
    3. or run several slightly shorter stuff, but with explanation (proof) of how those, together, entail the proposition.

Doing so, you hopefully will get used to maintain the proofs as well as the code running them and the logical construct “Proposition1 => Test1 ^ Test2 ^ … ^ Testn” is written down. That way, you can safely say that if either Test1 to Testn fails, then the proposition is false and hope for the equivalence.

docs and tests are two sides of the same coin

When you write the doc, you try to educate the reader with what the code promises it will bring to them.

When you write the tests, you try as automatically as possible to make sure that those promises hold

example with clk

After a small change of only 3 lines of code.

  1. only two lines were covered by tests, hence 66% on sonar new code definition
  2. sonar cloud tells me I don’t pass the sonar quality gate
  3. my system 1 tells me to create a test to cover that line and keep going
  4. but my system 2 tells me to wonder why this line exists.
  5. I end up writing a full document1 to explain the feature, why it is there and how it is meant to be used.
  6. in the process I realized that there were a bunch of issues that I did not spot. Even the two lines of code that where covered were completely wrong. The story I wanted to tell did not actually work unless I fixed the code at a few more locations. This is the commit I ended with.
  7. in the end, it took a long time to write, but it was worth it, because
    1. I could clarify the promise I wanted to make,
    2. I could fix other parts of the code,
    3. It felt right to make a promise about the code and making sure it goes as intended. I felt like I did not fall into the trap of measure-objective inversion and feeling good bias

Notes linking here


  1. I don’t have permalinks for that. Hopefully it won’t lead to 404 error. If so, please let me know.

     ↩︎