- If it's not in the repository, it doesn't exist.
- If it's not running under continuous integration, it's broken.
These two maxims come to me through long and painful experience, which I'd like to pass on to you, in hopes that your learning process will be less long and less painful.
If it's not in the repository, it doesn't exist
The first maxim, "if it's not in the repository, it doesn't exist," is something that I first learned in writing LARPs but is just as true in scientific projects or any other form of collaboration. For any project I am working with people on, I always, always set up some sort of shared storage repository, whether it be DropBox, Google Drive, git, subversion, etc. If something matters, it needs to be in that repository, because if it isn't, there are oh-so-many ways for it to get accidentally deleted.
More importantly, however, anything in the repository can be seen by other people on the team, which means there's some accountability for its content. I can't count the number of times that somebody has said they're working on something, but it's just not checked in yet, and then it turns out that they weren't working on it at all, or they were working on it but it was terrible and wrong. Some of the worst experiences of my professional life, like nearly-quit-your-job level of painful, have involved somebody I was counting on failing me in this way. If somebody's reluctant to put their work in the team repository, well, that's a pretty good hint that they are embarrassed by it in some way, and thus that their work might as well not exist.
Share your work with your team. Even if it's "messy" and "not ready," insulate yourself from disaster and give people evidence that you are on the right track---or a chance to help you and correct you if you aren't.
If it's not running under continuous integration, it's broken.
The second maxim, "If it's not running under continuous integration, it's broken," appears on the surface to be more specific to software. Continuous integration is a type of software testing infrastructure, where on a regular basis a copy of your system gets checked out of the repository (see Maxim #1), and a batch of tests are run to see if it's still working or not. Typically, continuous integration gets run both every time something changes in the repository and also nightly (because something might have changed the external systems it depends on).
This makes a lot of sense to do for software, because software is complicated. When you improve one thing, it's easy to accidentally break another as a side effect. Building tests as you go is a way to make sure that you don't accidentally break anything (at least not anything you're testing for). If you don't test, it's a good bet that you will break things and not know it. Likewise, the environment is always changing too, as other people improve their software and hardware, so code tends to "rot" if left untouched and untested over time. So if you don't test, you won't know when it breaks, and if you don't automate the testing, you won't remember to run the tests, and then everything will break and it will be a pain.
Surprisingly, I find that this applies not just to software, but to pretty much anything where there's a chance to make a mistake and a chance to check your work. Whenever I analyze data, for example, I always make sure that I automate the calculation so that I can easily re-run the analysis from scratch---and then I add "idiot checks" that give me numbers and graphs that I can look at to make sure that the analysis is actually working properly. Things often go wrong, even in routine experiments and analyses, and if I put these tests in, then I can notice when things go wrong and re-run the analysis to make it right. I fear that I annoy my collaborators with these checks, sometimes, because they find embarrassing problems, but I'd much rather have a little bit of friction than a retraction due to easily avoidable mistakes in our interpretation of our experiments.
Even my personal finances use tests. In my spreadsheets, I always include check-sums that add things up two different ways so that I can make sure that they match. Otherwise I'm going to make some little cut-and-paste error or typo and then have some sort of unpleasant surprise when I figure out I've got ten thousand dollars less than I thought I did or something like that.
Check your work, and check it more than one way, and add a little bit of automation so that the checks run even when you don't think about them. It takes a bit of extra time and thought, and it's easy to neglect it because it's hard to measure disasters that don't happen. I promise you, though, investing in testing is worth it for the bigger mistakes that you'll avoid making and the crises that you'll avoid creating.
No comments:
Post a Comment