What programming strategies/good practices can be adopted to minimize the risk of software regression?
Questions that can help guide:
- what "gotchas"/bad-practices can fool the programmer and let a bug or performance regression go unnoticed?
- how to manage the workflow for this?
- what programming strategies and what tests can facilitate this process?
The answers do not need to be detailed, they can be like a "checklist" of best practices, a guide that serves as a guide for interested parties to go deeper into the subject. But of course the more complete the better!
Context: I ask because I'm participating in the development of more complex packages in
R and this, which before I thought it wasn't something complicated, is proving difficult to manage! PS: There is a question like that on the English OS specifically for
R with a very interesting thread. I first thought of asking the question specifically for
R . However, since it seems to me that this can be treated as a general programming question, and the
R community on SOPT is still small, I leave it that way at first.
The term " software regression " means "software bug regression " or "fault re-emergence". On a daily basis we also use the term " side-effect " ( side-effect ): we apply good doses of correction, but the sick software still suffers, precisely from some harm caused by the correction.
If we use the notion that the software " goes forward " with fixing bugs, then we can say that the software " goes backwards " with the inclusion of side-effect bugs; which gives us a more refined sense of "regression".
Complex software is not that different from the human body, and programmers are no different from doctors with years of training and experience: they will avoid it but will not shy away from the side effects. It is a " systemic effect ", it is in the nature of complex systems … As in a magic cube, situations arise where we try to fix one side, but compromise the other.
Are they like hidden demons, the perfectionist programmer will never have peace of mind? The first tip for the unsuspecting reader is this: only worry about the subject in complex systems , that is, in situations where you have lost the detailed view of the whole.
By the way, from this tip there is already a previous approach to the problem… Maybe it doesn't exist (!), if we are able to modularize the program.
Before suspecting systemic problems, best practices suggest organizing the system into modules with as much decoupling as possible . It is the " divide and conquer heuristic". A system of well isolated and decoupled modules will not be complex if none of those modules are complex.
Despite being "in the nature" of the complex system, there are two fundamental approaches for nonconformists to deal with the fact (investing in one is usually enough):
Isolate and use the homologation version in production (see "Concepts" section below). Always trust with suspicion, the "stable version" requires quarantine, users must use the new version in production in a separate ("approval") environment or be on the lookout for the "new version" (then be ready to restore the old version ). Approval would not just be "the customer approves" but "the customer uses it for a while and then says he or she approves"… This is all because in most work environments, software testing is not taken very seriously , you only "test" when you are producing.
Example: in a Web software, offer a group of more experienced users another address for the software being approved (with it already working in the production database).
Simulate the usage environment , where assertions can be made massively and automatically: all "stable software" can be monitored and have inputs and outputs recorded for the purpose of "memorial of good behavior".
It can in theory, because the more sophisticated the user-interface, the more difficult it is to monitor. A POST and GET log of a webservice , for example, can be stored. It is necessary to certify and filter only items in this log that can be considered "good examples of how it should work". I've already done this by building huge XML files, and then simulating the use of the webservice. It takes a lot of work, but it's an almost perfect solution (!).
PS: There is no mention of coverage tool or non-regression testing before such a log. Assertion statistics are the methodological basis of any simulation approach.
Of course, equally important, as these practices require investment, be sure to invest in good documentation, demonstrations and team support — as @GuilhermeBernal recalled, there is the practice of "peer-review", which it also raises the reliability of critical algorithms.
PS: in the particular case of R language (which seems to be the challenge of @Carlos), which is very Math oriented and allows the use of the functional programming paradigm, it is convenient to invest in the "mathematical proof" of each critical algorithm. Algorithms with proof do not need systematic testing… In high reliability contexts (military, aeronautical, banking applications) proof is more important than testing.
(Human and managerial side)
Ah, although it's obvious, it's good to remember, especially the client or the boss who asks you for deadlines: if you're going to need to approve it later, never depend on something that hasn't been approved yet. It is important to "tap your foot" and charge those responsible for the approval (or for the construction of the test logs) that this comes to an end. It is important to share (if possible in a contract!) your responsibility with the testers. A stable building is built on stable foundations. " Regression regressions " are common when the programmer has no voice on the team, or when testing is "mere formality".
User side: some "side effects" arise from lack of notice to users that "something has changed" and that this change implies that the user also changes his way of proceeding. In this case, it is not the programmer's fault, but the lack of update of the software operation manual, or of communication with users.
Psychological side: We often overlook cases of "acceptable side effect", when the side effect is rare or its "bad" is no worse than the software before the fix. We can't settle down: document it and put it on the bugs list, Murphy's Law says it will resurface (regression regression!) and do worse damage if you don't fix it.
(topium included after seeing disparity of terminology in discussion) A brief dictionary of terminology used, and a personal view of the context.
Types of failure (of interest to the present scope): software failure and requirements failure . The formulation, analysis and documentation of requirements is part of the software development process , and results in what we generically and loosely call "requirements". If the requirements are flawed, it will lead to flawed software. If the requirements are reasonable, we can talk about software failures , also calledbugs .
Bugtracking and new requirements : tools like Bugzilla or the same community interfaces like Github's issue tracking allow you to accurately document and assess bugs and new requirements (requests for new features).
Bug fix : We use the term "fix" sometimes confusingly, also covering the notion of "add new functionality" (satisfying new requirement request ). For practical purposes, I will adopt this terrible habit in the present text.
Reliability foundations : I'm going to assume (ignore other theories for practicality) that the only two ways to make a software more reliable are by testing it after it's done, and "demonstrating" (mathematical proof) at each step that the algorithm meets the requirements, with some black-box condition, or with some high-level formal description. In other words, test and proof are the only ways.
Version control : I will use the term "version" to designate "software after bug fix" only, let's ignore "fork versions". Version control is exercised by source code management software such as git .
Test version : let's call "alpha version" the one the development team is testing, and "beta version" the one that a select group of users are testing (only testing, not yet producing)…
Approval vs production : "Production" is when it has been accepted and is stable and in use, by everyone. "Approval" is a term commonly interchanged by others. In the present jargon what Debian calls testing releases I would call "homologation releases". In the same Debian what has already been homologated, and is "in production", is called stable release . In many development environments there is no distinction between approval testing. One of the proposals in the answer is precisely to make this distinction.
The purpose of so-called " regression testing " is to make sure that software change ( bug fix ) does not introduce new flaws or side effects. Strictly speaking, it is the same as making several assertions (item 2 of the Answer section), but in practice it is an appropriate application (which simulates a user) or the test team (real user) who do this. It has a much more black box testing profile.
Another important thing in this type of test is the mapping of the modules into the functionalities tested by the end user: testing first, or more insistently, the modules most closely coupled to the modified module .
When it comes to new features (not correction), the assert can become more complex, as there is no set of previously approved outputs to compare. In this case diff tools can help to compare "new" and "old" outputs.