–Geoff-Hart.com: Editing, Writing, and Translation —Home —Services —Books —Articles —Resources —Fiction —Contact me —Français |
You are here: Articles --> 2021 --> Backing up your data... and other important things (part 2)
Vous êtes ici : Essais --> 2021 --> Backing up your data... and other important things (part 2)
By Geoffrey Hart
Previously published as: Hart, G. 2021. Backing up your data... and other important things (part 2). https://www.worldts.com/english-writing/398/index.html
In part I of this article, I discussed how to back up your computer data to protect it from theft, sabotage, or accidental loss. In this part, I’ll discuss non-data things you may not have considered but that are equally essential to your work. In some cases, loss of these things may set you back years in your research as you try to repeat the work you performed to create these things.
“Implicit” knowledge represents information that has become so much a part of your standard thought process that you are no longer consciously aware of its importance. This include things as simple and basic as the knowledge that you should not pipette caustic or toxic materials using your mouth or the knowledge that you should lock your lab door before returning home at the end of the day. The knowledge can also be more complex and difficult to identify without help from someone who isn’t intimately familiar with your work, such as the reason you use a specific chemical extraction method instead of published alternatives. A new graduate student in your lab is an excellent person to help you identify such knowledge, since they have not yet worked with you long enough to learn why you work in a certain way without requesting an explanation. Document this knowledge in your laboratory procedures manual to ensure that it will not be lost if a member of your research group retires or moves to another university or institute.
Implicit knowledge also includes any lessons you have learned during your research. This may be details of changes in a standard methodology, or things such as learning that a sample size of 10 individuals is inadequate because of high variation in the study population; experience may have taught you that a minimum sample size of 20 individuals is necessary to increase the chance of detecting statistically significant results. Other lessons include considerations related to field studies. For example, if you’re studying part of a commercial forest that is being managed for timber production, you must find ways to ensure that your study plots are not accidentally harvested. For example, install warning signs or fences, and each time you go to that study site, stop at the managing forester's office to say hello and remind them about your research. Similarly, if you’re studying an agricultural system, talk with the farmer who is providing a field for your study to ensure that they don’t change their cultivation methods without first discussing the change with you. In both cases, there are usually ways to negotiate with your cooperator to protect your investment in those study sites.
More explicit and easily identifiable knowledge includes details of your research and analytical methods. Even when these are based on clear and standardized publicly available protocols, there are usually modifications you have made based on your experience. For example, a chemical analysis may require small changes in the quantities and concentrations of reagents to account for unique properties of a specific study organism. You will often see this appear in journal papers with words such as “the method of Hart (2020), with small modifications”, but often there are no details about what those modifications are. Report those modifications in your papers so that this implicit knowledge becomes explicit, so that other researchers can benefit from what you’ve learned, and record these modifications in your lab or research protocol manuals so that all members of your research will group will use the same methods.
Many research groups spend considerable time optimizing how their hardware and software work. This often results from slight modifications to hardware or software settings for analytical equipment, most often with expert input from the manufacturer. For example, a particular model of instrument may exhibit calibration drift faster or more often than other models, so that it’s necessary to calibrate it more frequently than the user manual says or to use a different calibration standard. This knowledge may not be included in the instrument’s user manual, or may be present but may not be read by a new graduate student who has used similar equipment in another lab. Specifically defining a calibration method and schedule and displaying it prominently near the instrument will increase the likelihood that this method is used consistently by your whole research group.
Software customizations are often developed to save time or improve the results of an analysis. Sharing these shortcuts with members of your research group improves everyone’s productivity and efficiency. This category of information includes scripts developed for statistical software that cause the software to perform a standard series of tests (e.g., to test for homogeneity of variance before performing ANOVA) or to perform data-cleaning measures (e.g., to identify and highlight outliers). It may even include methods of detecting data-entry errors for data that is not created in its initial form as a computer file (e.g., data that is entered manually rather than recorded by a datalogger). Examples include interviews with human subjects, in which the interviewer often records the responses on paper. One simple but excellent way to detect data-entry errors is to have two of your group independently enter the data into a computer file, such as an Excel worksheet. You can then make Excel compare the data by subtraction: if the result of subtracting the value in worksheet 2 from the corresponding value in worksheet 1 is not equal to 0, this means that one of the two values is incorrect, and returning to the original data will let you find and correct the problem before it affects your analysis.
Note: Computer scientists use the phrase “garbage in, garbage out” to refer to the consequences of failing to ensure that your input data is of high quality. Many graduate students never learn to carefully confirm the quality of their input data. This is something you should teach each new member of your research group. Lives may be at risk, not to mention your professional reputation.
So far, I’ve focused on intangible things (things you cannot touch) such as data and procedural information. But it’s also important to create backups for physical things. For example, you may have spent years breeding a specific genotype of an organism (e.g., mice with specific genetic properties that make them useful in physiological or pharmaceutical research) or specific bacterial or fungal strains that you isolated from a forest during your field research, and that show promise (e.g., because they improve plant survival in harsh environments). You may also develop specialized hardware that is essential for performing your research.
Although carefully documenting what you’ve done to develop these materials is important, it won’t help if your lab is destroyed by an accident (e.g., a fire in a neighboring lab) or a disaster (e.g., a hurricane or tsunami). You may be able to repeat a 5-year development process in only 2 years, but that’s still 2 years of lost time—time when you are unable to do any new research. This is particularly important if you live in a place such as California that is highly vulnerable to severe wildfires and earthquakes, or in a coastal area that is vulnerable to hurricanes (typhoons) or flooding. Some of these problems are nearly impossible to predict; for others, you can predict in general terms that such phenomena will increase in frequency in response to global climate change.
I have read stories of graduate students rushing to their laboratory as a hurricane approaches, so they can bring home all of their laboratory subjects (including aquariums full of exotic ant species, mice undergoing various medical treatments, and petri dishes full of microorganisms). In an emergency, that may be the best that you can do. A better solution is to be proactive and avoid the need for such an emergency response. Instead, identify colleagues who can store and preserve your materials so that you can retrieve those materials quickly if your copies are lost. You can offer to reciprocate by storing their materials. This often takes the form of international cooperations, such as seed banks, but you can also do this on a small scale, with exchanges between individual labs. Multiple backups, distributed across countries or continents, are even better, since it’s possible that one cooperator will be unavailable (as was often the case during the 2020–2021 COVID-19 pandemic).
If you don’t already have a comprehensive list of the implicit knowledge, explicit knowledge, intangible things, and physical things that support your research, consider making time for your research group to hold a special meeting to document these things. Most will not be copied and preserved by your employer’s computer staff, so you’ll need to find a way to protect them yourself.
©2004–2025 Geoffrey Hart. All rights reserved.