A statistical analysis of word patterns to elicit evidence of authorship of four Gentleman's Magazine articles
George, Ronald Gordon
This thesis describes attempts to obtain objective evidence concerning the authorship of the 1750 to 1754 Gentleman’s Magazine reviews of the plays, Gil Blas, Elfrida, Constantine, and The Brothers, reviews tentatively attributed on subjective internal and external evidence to Samuel Johnson by D. J. Greene and Arthur Sherbo. A digital computer was used to count the occurrence of factors such as sentence and word length and the rate of use of various words. Control samples of known Johnson works of the same time and somewhat similar subject matter were scanned in an attempt to find factors occurring at a consistent rate. When found, the rates of these occurrences were compared to the rates found for the same factors in the test samples by means of tests of statistical significance. The shortness of the samples necessitated the splicing together of articles to make up 2,000 word samples. This factor, along with the small number of samples, made it difficult to come to statistically convincing results, yet some significant results appeared. A standard error test showed significant differences to exist between the Johnson samples and all the test samples in one or more of the mean, median, and third quartile of the sentence length distributions. Significant word length median and third quartile differences were similarly found for all of the test samples except the review of Elfrida. A standard deviation test applied to the sums of percentage differences found in comparing inter-Johnson comparisons with Johnson test sample comparisons showed a significant difference for the review of Gil Blas and The Brothers. A standard error test applied to the mean number of "and's" found significant differences for the reviews of Constantine and The Brothers. A chi-square test applied to the most frequently occurring words showed significant differences between consistent Johnson occurrences and the test sample occurrences for the words “and", "that", "to”, "in", and "which". Standard deviation tests applied to the sums of differences found in making inter-Johnson comparisons with Johnson-test sample comparisons on the basis of various combinations of frequently occurring words produced tentatively significant differences for all of the test samples except The Brothers, but great validity cannot be attributed to these results because they are based on derived numbers. Other less conclusive studies produced results often not subject to tests of significance. Graphs of sentence length sequences showed no obvious trends, probably because the samples were so short. A manual counting of occurrences of the words "by" and "I" in a number of Johnson and non-Johnson samples of 2,000 words each failed to produce significant results in a standard deviation analysis. A Johnson sample and two test samples through coding were scanned by the computer for various grammatical and stylistic factors. These studies were not continued because they were subjective, time-consuming, and did not seem likely to produce valid results. Finally a running glossary for a Fielding sample and two Johnson samples did not show any marked difference in pattern when the results were graphically plotted. The results of these studies, while far from conclusive, did provide some evidence to suggest in various instances that each of the test samples bore significant differences from the Johnson samples and that there is at least some reasonable ground for suspicion of the theory that Johnson wrote the articles in question.