Doing Successful Literature Reviews

Lesson I-5

Integrating Findings

The most immediate findings from a research study are the results of the data analysis. The results might be a statistically significant difference between two groups, the portion of a population that holds a certain view, or a theme that pervades a given social situation. These results are often accompanied by conclusions, implications, predictions, and recommendations, all of which are derivative findings.

Often reviews of the literature are focused heavily, if not exclusively, on the results of the individual studies, in an effort to make the most informed inference possible about the nature of some phenomena. For that reason, this lesson will first address integrating results, and then discuss integrating various forms of derivative findings. The integration of results will be discussed separately for quantitative and qualitative studies.

The almost universal challenge when integrating results across a set of social science studies on a given topic is this: the studies vary and the results vary. Even though all the examined studies appear ostensibly to be on the same topic, they always vary at least in some respect to their contexts (social circumstances, economic conditions, etc), conceptual framework, sampled subjects or participants, treatments applied on naturally occurring interventions being studied, and methodology. They also always vary at to some degree in their results, and indeed it is common to have some results that appear to contradict others. In short, consistency is not the norm in social science, and the challenge of the reviewer is to make the best possible inferences from a set of applicable studies despite those inconsistencies.

Fraudulent Integration of Results

The are two widely used fraudulent means of integrating research results. They are used deliberately to misrepresent what the research says on a given subject:

Only the Good News Fraud: Cite several studies, or all the studies that support one's predispositions, biases, or agendas, and ignore those that contradict them.
Different Strokes for Different Studies Fraud: Acknowledge one or more studies that oppose one's interests; discredit their conceptual frameworks, methodology or other features; and then cite the studies that support one's interests, but fail to assess their adequacy by the same criteria applied to the opposing studies

These frauds are common among advocates, political pundits, and charlatans. They are also not uncommon among over-zealous graduate students. Worse yet, established scholars sometimes perpetrate them.

Means for Integrating Results Across Quantitative Studies

The results of studies can be expressed in many metrics-such as mean differences, various types of correlations, and statistical significance. Since it is rare that all research studies use the measuring instruments, one of the prerequisites for integration across studies is that the results have to be express is some common metric. Three common metrics for quantitative research are often used: the direction of the results (positive or negative), whether or not the result is statistically significant, and the effect sizes.

Whether the result is positive or negative is a very crude metric. It ignores the magnitude of the result and also whether a result from a sample could have easy occurred from random sampling error. Despite those limitations, the direction of results is often indicated or implied in research reports, and it can be revealing when examined over a full set of studies.

The statistical significance of a result is a more sophisticated measure, indicating whether a result from a sample is likely to have occurred by chance when the phenomena of interest does not exist in the full population from which the sample was drawn. But statistical significance will occur when the phenomena of interest in the population is weak but the sample size is large (over 1,000), in which case the results are usually trivial. Conversely, if the phenomena in the population is of only modest magnitude, and most of the research studies had small sample sizes (less than 50), only a few of the studies are likely to show statistically significant results. Despite these limitations, the significance levels, when used correctly can be helpful when integrating results across studies.

Effect sizes are a measure of magnitude relative to the variance in the measure. The effect size for the difference between two groups' mean values, the effect size is computed as a ratio of the difference of the two groups' mean values and their pooled variance. In other words:

where, M₁ = the mean value of the first group, M₂ = the mean value of the second group, SD₁ = the standard deviation of the first group, and SD₂ = the standard deviation of the second group.

Mathematically comparable measures can be computed or reasonably closely estimated from several other measures. Effect sizes require some work to compute, and the needed information is sometimes not indicated in research reports, but most experts agree effect size is the best metric for integrating results across quantitative studies.

The following are several means for integrating results across quantitative studies:

Portray: Describe the results, which usually varying at least some, in a manner that highlights both similarities and differences in the results. It is usually helpful to summarize key characteristics of the studies and their results in a matrix with one row devoted to each study and each column summarizing a key characteristic and the last column indicating the result expressed in a common metric. Note that the common practice of new researchers is to describe each study and its results, one after the other, with no effort to highlight similarities and differences.

Trace History: Present the results in chronological order, trying to account for possible trends in the results by changes in the research problems addressed, the conceptual frameworks applied, the methodologies used, and possibly in the phenomena itself.

Categorize: Examine how the results may become more consistent when the studies are categorized by various aspects of the research problems addressed, the conceptual frameworks applied, the methodologies used, and the contexts under which the studies were conducted.

Summarize: Compute the portion of all studies with results in the expected direction, combine the probabilities of all the tests of statistical significance, or compute the weighted average effect sizes.

Synthesize: Use regression techniques to explain and predict the variance in the results by variations in the characteristics of the studies. These varying characteristics might include the contexts of the studies, the conceptual frameworks applied, the backgrounds of the subjects or participants, the nature of the interventions (both planned and actually implemented), the quality and possibly biases of the methodologies used, and the outcomes measured.

The procedures for summarizing and synthesizing quantitative research studies have improved dramatically since the development of "meta-analysis" techniques in the 1970s. Only a conceptual introduction and some quick procedures can be covered below in this lesson. The most readable book on these procedures is:

Cooper, H. (1998). Synthesizing research (3rd ed.). Thousand Oaks, CA: Sage. The third edition was published in 1998.

When summarizing and synthesizing quantitative research studies, researchers often encounter an interesting conundrum: are the sum of results of the studies statistically significant? For example, it is common in a set of, say, 30 studies, based on samples of on some aspect of education or human resource development, to find that only about eight of the results are statistically significant. In that case many neophyte researchers are inclined to conclude the phenomena of interest does not exist. But it is also not uncommon to find that those 30 studies had, say, 24 results in the expected direction (of which eight were statistically significant and 16 were statistically insignificant) and a remaining six results were in the unexpected direction. Compare that distribution with flipping a coin 30 times-it is highly unlikely that heads, or tails, would come up 24 times. Indeed, a "sign test" using the binomial distribution or the large sample normal approximation suggests that this will happen in not more than once in 1,000 trials of flipping a coin 30 times if the phenomena of interest does not exist in the population. Thus in this example we can prudently conclude there was some difference in the population, even though most of the studies did not find statistically significant differences. (The "sign test" is discussed in many introductory statistics textbooks and the binomial distribution is often included in the appended tables.)

This is a simple but profound insight into summarizing results across studies based on samples. Summarizing by calculating the percentage of studies with statistically significant results will often provide misleading inference, suggesting the phenomena of interest does not exist when it actually does. That is particularly so when the phenomena is of only modest strength and when most of the studies have small sample sizes (below 100). Both of those conditions are common in the social sciences. Why does the percentage of statistically significant results provide misleading results? The answer is that statistical significance testing procedures put priority on avoiding inferences that the phenomena exists when it really does not, and the consequence of that priority is to raise the chances of inferring the phenomena does not existwhen it really does exist. While that trade-off is often justified when making inferences from one study, it is not necessary with making inferences across many studies.

Yes, this is complicated! If your training in statistics is limited and you don't understand the above explanation, just remember: you should never summarize results across studies by calculating the percent of results that are statistically significant.

There are several other acceptable options for summarizing results across studies in addition to examining the proportion of positive results with the sign test. Indeed, when the necessary data is available from almost all of the studies, they are superior to the proportion of positive results. The most commonly used are: combining the actual statistical significance levels achieved in every study to simulate an overall significance level, calculating a weighted average of the effect sizes from every study, and calculating a weighted average of the Pearson correlation coefficients for every study. Harris Cooper describes these procedures on pages 120-142 of the above-mentioned book.

All these summary procedures can also be applied after "categorizing" the studies by characteristics. For instance you might group the studies into those that examined younger children and older ones, a short duration treatment and a longer duration of the treatment, standardized achievement tests and performance measures. When doing that, you are seeking to determine whether these differences in the studies help to explain the variations in the results. If so, there will be less variation within each category than for the full set of studies, and some differences between the categories. Looking at the variation of results within and between categories, for one category at a time, is the equivalent of doing several different univariate analyses within a given study. That can be revealing, but it also can be misleading when the study characteristics (which become the independent variables) happen to be correlated.

A superior approach, but applicable only when there are 30 or more results, is to use multiple regression procedures to synthesize the results. This involves using several characteristics of the studies simultaneously to predict and explain the variance in the results. The procedure involves the following steps:

for each study, code the characteristics of the studies thought to perhaps affect the results (such as age of children, duration of treatment, and type of outcome measure);

for each study calculate the effect size (as defined above);

check that the assumptions for multiple regression analysis are met by the data;

run the multiple regression analyses.

Other procedures for synthesizing quantitative research results can be found in:

Cooper, H. & Hedges, L.V. (Eds.). (1994). The handbook of research synthesis. New York: Russell Sage Foundation.
Hunter, J.E. & Schmidt, F.L. (1990). Methods of meta-analysis: Correcting error and bias in research findings.

Means for Integrating Results Across Qualitative Studies

The means for integrating results across qualitative research studies are far less formalized than those for integrating across quantitative studies. There are several reasons for this. Qualitative research itself is rarely designed to make generalizations. Thus, unlike quantitative research, it is more difficult to summarize and synthesize across qualitative studies. The contexts of qualitative research are usually better described than those of quantitative research, but the methodology, other than the general approach, is often not described in any detail; the first makes it easier to categorize the studies but the latter makes it harder. The "results" of qualitative research are rarely stated as parsimoniously as in quantitative research, making it more difficult to portray, summarize, and synthesize them. Finally, considerably less attention has been devoted to integrating across qualitative studies than has been devoted to integrating across quantitative studies. Little work has been devoted to developing procedures for integrating qualitative studies.

Despite all this, all the means cited above for integrating conceptual frameworks, methodologies, and interventions can be applied to the results of a set of qualitative studies on a given topic. The following will explain some ways of doing so:

Portray: Map out analogous and disparate themes, metaphors and concepts found among the studies. Identify gaps.

Trace History: Describe how earlier results inspired further research and new results.

Categorize: Sort studies by their conceptual frameworks, methodology, or interventions, and see if the results are more consistent within each category.

Summarize: Translate the themes, metaphors, and concepts into subsuming ones that are common to a set or sub-set of the studies.

Undermine: Identify important contradictions obscured by the rich description of the studies.

Synthesize: Use a grounded theory approach to put the similarities and differences of the results into an interpretive order.

For more on approaches for integrating qualitative research, see:

Noblit, G.W. & Hare, R.D. (1988). Meta-ethnography: Synthesizing qualitative studies. Thousand Oaks, CA: Sage.

Means for Integrating Derivative Findings Across Studies

The results of studies are usually reported with accompanying conclusions, implications, and/or recommendations, which have been inferred, at least in part, from the results. Conclusions are summaries of two or more results of a study, or statements about the generalizability of the results. Implications are the larger lessons suggested about the phenomena studied. Recommendations are exhortations to action. All three are usually found in the "Discussion" section of a research article or in the last chapter of a dissertation.

It is not uncommon to find two studies with identical results accompanied by dramatically different implications, and recommendations. The converse is also true, two studies on the same topic with different results might be accompanied by identical implications, and recommendations.

The following are some ways to integrate the conclusions, stated implications, and recommendations from a set of studies on a given topic:

Portray: Describe the similarities and differences among the conclusions, the stated implications, and recommendations.

Trace History: Show how the evolving conclusions and recommendations were influenced by the larger historical context.

Categorize: Sort the large number of varying conclusions, stated implications, and recommendations into a small number of categories.

Summarize: State the conclusions, implications, and recommendations in general terms that subsume all or most of them.

Undermine: Identify and document widespread inconsistencies between the conclusions, stated implications, and/or recommendations and the results of the studies.

Synthesize: Prepare a set of differential conclusions, implications, or recommendations for different contexts and/or interventions.

Return Home or Advance to Lesson I-6

Last Update: June 29, 2000