Geoff-Hart.com: Editing, Writing, and Translation

Home Services Books Articles Resources Fiction Contact me Français

You are here: Articles --> 2021 --> Analyzing your data (part 3 of 3): presenting your data

Vous êtes ici : Essais --> 2021 --> Analyzing your data (part 3 of 3): presenting your data

Analyzing your data (part 3 of 3): presenting your data

By Geoffrey Hart

In part 1 and part 2 of this three-part article, I described how to explore your data to see what you’ve discovered and how to rigorously analyze the data to confirm your preliminary interpretations. In this concluding part, I’ll discuss how to present your data to show your readers what you’ve discovered and convince them your interpretation is correct. Here, the primary goal is to present data that support your conclusions, and to choose a sequence of results that creates a compelling argument in favor of your conclusions. Think of this as organizing your thoughts and your argument in a persuasive way, not as describing the mathematical and statistical methods used in your analysis; those details will be present in the Methods section.

Standardizing data

Note: Although terminology varies, normalization usually refers to transformations that are intended to produce a normal distribution, whereas standardization is intended to account for different initial values in different treatments, regardless of their statistical distribution, or different units of measurement for two variables used in a multiple regression.

In part 2 of this article, I discussed some problems that result from transforming your data. A less problematic form of transformation is to express results as a proportion of some base value, such as the value in a control or the initial value in a time series, rather than examining only the raw data. The analysis then changes from a comparison of sample means to a comparison of changes in those means. The changes may be based on a difference (i.e., you calculate the final value minus the original value) or a proportion (you divide all values by the original value) or on the proportional change in values (you subtract the original value from the current value, and divide that difference by the original value). One popular technique is to use z-scores, which transform all values into a number of standard deviations from the mean. Other standardizations include expressing values per unit area, per unit mass, or per capita.

Note: Always provide the standard deviation or standard error with every mean, or a box plot that shows the variation around a median, so that readers will understand the magnitude of the variation in your results. Present the sample size to provide additional insights into that variation.

Standardization is a powerful way to clarify changes in a data series and differences between treatments because it accounts for factors that might bias your interpretation of those changes, such as differences in the initial value. However, as is the case in any transformation of data, you must remember to account for the consequences of the transformation. The raw values and standardized values have different meanings. For example, if only a small percentage of a region’s farmland becomes degraded due to an unsustainable agricultural practice, the percentage suggests that the impacts of that practice are not serious. The proportion is, after all, small. But if this degradation occurs over a very large agricultural area, the total area that became degraded (the percentage multiplied by the total area) becomes large and important.

Conversely, what seems like a large proportional change based on the transformed data may prove to be unimportant in practice. For example, if the survival rate for a plant disease increases from 1 in 1000 to 2 in 1000, the increase is [2–1]/1 =1.00 = 100%. However, that increase has little practical significance for farmers, and is as likely to result from random chance as it does from a successful disease treatment. An increase from 100 in 1000 to 200 in 1000 represents exactly the same proportional change (100%), but the survival of 100 additional individuals is more likely to be important.

Like any transformation, standardizing your data loses some information and changes the nature of the data you’re looking at. Keep those changes in mind as you decide how to interpret the changes and how to present your interpretation.

Evaluating non-significant results

Sometimes a specific experimental design fails to reveal a significant difference between treatments. Ask yourself why. For example, researchers who don’t review the literature to learn the expected magnitude of the variation before they design their experiment often choose a too-small sample size, leading to high variation in the results that can obscure differences that would be significant with a larger sample size. Alternatively, budget and time constraints may force you to use a too-small sample. In that case, you may need to present your study as exploratory, with the goal of increasing understanding of the study system so you can design a better experiment for your subsequent research.

Researchers prefer positive (i.e., statistically significant) results because journals have a strong bias against reporting negative results (i.e., differences that are not statistically significant). However, negative results can be very important, as in the case of a medicine that produces no beneficial effect. If you designed your experiment well, have carefully controlled your selection of the study population, have obtained a large dataset, have validated your data by repeatedly calibrating your instruments against lab standards, and have replicated your results, you can be more confident that the negative result is real and that the medicine is not useful.

For subjective data, such as the data generated by many sociology and psychology studies, asking a colleague to classify the results to see whether they agree with your classification increases confidence in the classification results. Where interpretations differ, you can discuss the difference and try to design a criterion that makes the classification more objective. Ideally, that criterion will help you to agree about the correct classification.

Additional confidence can be provided using an experimental design based on triangulation. If two methods of measuring the same variable agree, the probability that a negative (non-significant) result is an error rather than a true lack of difference is much lower. For example, you could calculate the area of a leaf using a digital caliper and an empirically derived relationship between length, width, and area, or you could scan the leaf and use software to calculate its area. Ideally, the two values will agree. Similarly, if analyses of two different aspects of the same process lead to the same conclusion, that also reduces the likelihood that the lack of significance is an error. For example, if you measure the effect of activation of a gene using both the RNA produced by the gene and the quantity of the protein produced by transcription of that RNA, and both results show no significant change in the study system in response to that gene expression, you can be more confident that activation of the gene produced no significant effect.

In extreme cases, negative results may even reverse the conclusions you reached in previous research. This can be a very good thing if it improves understanding of your subject. As an example, see Sager (2020). Of course, if you want to replace the prevailing understanding of a phenomenon with a new understanding, you’ll need strong evidence, and lots of it, to convince everyone. Tell readers what additional research will be required to support your proposed new description.

Presenting datasets clearly and consistently

Help your readers follow your description of the data by choosing a criterion for judging a result’s importance. Statistical significance is one obvious criterion, but significant results may not be meaningful in practice, as in the example of proportional changes in plant survival that I described earlier in this article. Choose an appropriate characteristic of the data you are describing. For example, when you discuss the vectors for the variables in a redundancy analysis or principal-coordinates ordination, you can limit your description to only the vectors that are longer than a certain threshold length and that also lie at an angle of <30° from the axis. Other vectors may be significant, but their correlation with the axis will be weaker, and that means you can omit those vectors from your discussion. The criterion you choose tells you which results you should focus on, which is particularly important when you can’t discuss every result (e.g., in a large, multi-variable dataset).

Next, choose an efficient sequence to work through the data in a figure or table. For example:

  1. In a linear regression analysis, describe the trend for each regression line separately. For example, y increased continuously with increasing x in treatment 1, but decreased continuously with increasing x in the control. Next, examine the differences between each pair of lines. Treatment 1 may have values less than those in treatment 2 up to a certain point, then achieve higher values subsequently.

  2. In a table that presents multiple variables for each treatment, describe the results for each variable, one at a time, to show how the value of that variable differs among the treatments. Then repeat this process for the next variable and the next one until you reach the end of the variables.

Follow that order rigorously until you have described all data in the figure or table that meets your criterion for what to describe.

Constraining your presentation

Be cautious about extrapolating beyond the range of your data. Your data often describes only a small portion of the total range of possible values for a variable. If that total range is much larger, extending your interpretation beyond the range of your data is risky. For example, the beneficial response to a drug often increases with increasing dosage, right up to the point that the drug reaches toxic levels in the patient. Even if your intuitive knowledge of the situation suggests the behavior does not change for smaller or larger values, explain why you believe your assumption is valid, and suggest any cautions that are required if someone tries to extrapolate beyond your data.

Acknowledgments

I’m grateful for the reality check on my statistical descriptions provided by Dr. Julian Norghauer. Any errors in this article are my sole responsibility.

Reference

Sager, W.W. 2020. Massif redo. Scientific American May 2020:48-53.


©2004–2024 Geoffrey Hart. All rights reserved.