In this essay, I will attempt to explain what Witztum, Rips and
Rosenberg found in their now famous statistical science paper.[1] I will
try to explain why their work is taken more seriously than that of other
codes researchers, and why it is still not considered to support their
conclusions. It is somewhat misleading to call the patterns discovered by
Witztum, Rips, and Rosenberg (WRR) "codes". The word "codes" was not used in the original WRR paper.
The patterns they found were not in the form of coded information;
unlike patterns purportedly found by some other codes enthusiasts such
as Michael Drosnin. The authors of the original WRR paper have spoken
out strongly against the use of "codes" to predict future
events, and I would guess that most readers here recognize that
Drosnin's codes have no mathematical validity. The methodology used by WRR is much more sophisticated, and does
indeed have a sound mathematical basis. The paper is quite technical,
but the concepts are not actually hard to grasp. This discussion is
intended to explain for interested readers with no mathematical
background the salient points of the dispute. The idea of codes formed by taking letters spaced at regular
intervals in text has been around for some time. Such coded words or
phrases are called equidistant-letter-sequences (ELS). Fascinating
patterns can and have been found with such codes in all kinds of texts,
and particular attention is paid to related phrases or words which
appear close to one another in the text. Where WRR stands out from other codes proponents is in the method
they use to test whether associated words are closer than can be
explained by chance alone. WRR were the first to propose a mathematically based method for
measuring the significance of such patterns; and in the paper they apply
their method to determine if the names of famous Jewish rabbis were
"close to" the dates of their birth or death. In their paper they state how the lists of rabbis and dates was
obtained. The define a single numeric quantity which captures the idea
of how close the rabbis coded names appear to their dates. They then generate 999999 other lists by associating the names of
one rabbi with the dates of another randomly chosen rabbi. With the one
semantically correct list, this gives 1000000 lists altogether. All
lists are then given a measure of closeness, and ranked accordingly. The
experiment considers the rank of the one correct list in a race with the
other 999999 random lists. Four such races were conducted, using
different measures and name conventions. The result is that the one correct list placed highly in the four
races. Specifically, it placed 453, 5, 570 and 4 out of one million.
The probability than a randomly chosen association of names to dates
would place 4th or better in one of the four races is 0.000016. The authors conclude that the extent of the proximity of the ELSs
for rabbi names with the ELSs of their dates of birth or death is not
due to chance. Though this is not explicit in WRR, the authors do suggest elsewhere
that the proximity is due to a deliberate divine encoding of this
knowledge at the time the Torah was written down. First, let's clarify what is not wrong with the paper. It does not
contain mathematical errors. The conclusion -- that the performance of
the correct list is not due to chance -- is solid. A large number of criticisms have been made, but one in particular
stands out from all the others, and that is the subjective nature of the
list of names and dates used in the experiment. It is important to note that this criticism does not constitute a
charge of fraud. It is a straightforward question on the nature of the
data, and it central to the significance of the results of the paper.
The experiment involves lists of names and dates, and attempts to
discover if dates are unusually close to the related names. The list was
constructed by first identifying 32 famous rabbis. Then for each rabbi,
a number of different names were given. The list of names is credited to
Professor Havlin. The question of subjectivity of the list is a question of how much
freedom there was in choosing the names. Professor Havlin has recently
made his criteria for choosing names public, and they do without any
question involve a significant degree of arbitrary judgement. That is
now a matter of public record.[2] Furthermore, the extent of subjectivity in the choice of names was
not something that referees of Statistical Science could be expected to
judge. This is especially true since it is now a matter of public record
that the published criteria for finding names was not actually the
criteria used. The real criteria involved far more freedom than is
apparent from the published paper. WRR speaks only of a search of the "Responsa" database at
Bar Ilan University. (This is a well known reference for debates on
Jewish law.) In fact, a huge range of sources were used, and there was
no protocol established prior to construction of the list for
determining names. According to the authors of the paper, the names
reflect the professional judgement of Professor Havlin. The list uses a great many names; up to eleven for a single rabbi.
Many other names could have been chosen; and so the lack of an
established protocol for choosing the particular names makes the list,
by definition, subjective. A similar point applies to the dates of birth and death; a number of
variant forms were used for writing these dates, up to six in the case
of one rabbi. The choice of date forms was also a matter of judgement.
In choosing a list of names for the rabbis, there are a large number
of cases where some judgement needs to be applied. If those choices were
made differently, no correlation is measured at all. This has been
verified by experiment.[3] The implications of this is that the significance of the phenomenon
identified in the paper applies as much to the construction of the list
of names as it does to the construction of the text of Genesis. That is, the correlation demonstrated in the paper is now known not
to be a correlation between a body of text, and the dates of birth and
death of rabbis. It is a correlation between two bodies of text. Since
one of those bodies of text is due to the experimenters, the phenomenon
found in the paper is not particularly interesting and certainly not
scientifically or mathematically relevant. This wraps up the case against the "codes". However, I do
continue to look at a few other aspects of the experiment. The paper by Rips, Witztum and Rosenberg contains a very curious
omission for a statistical paper; and that is the absence of a clear
hypothesis being tested by the experiment. Usually a statistical experiment of this kind of intended to test a
hypothesis. For example, we might hypothesize that the book of Genesis
contains a descriptions of the lives of various famous rabbis encoded by
reading letters in a different order. As a hypothesis, of course, this is far too vague. A "different
order" is not something which can be tested; and indeed WRR
considers a specific kind of order. They consider ELS patterns. There
are a number of other such choices; they look for so-called minimum skip
ELS patterns, by applying a weighting function that gives reduced
influence to ELSs with longer skips. They also use an extraordinarily
complex measure of closeness; certainly not one which would be naturally
proposed by a statistician. We do not need to delve into the actual mathematical methods used to
weight minimality, or to measure closeness. The question is: what,
exactly, are the actual measures really trying to capture? That is, what
is the research hypothesis to be confirmed by the experiment? WRR is
rather unclear on this important point. Whatever effect is proposed as an alternative hypothesis, it should
be capable of explaining the following points. (1) The phenomenon is not a feature of the Torah, or of the bible.
It shows up only in Genesis, and is quite absent in the other four
books. Singling out Genesis for testing must thus be added to a long
list of subjective decisions which further dilute the research
hypothesis. (2) The measure of closeness used by WRR is sensitive to fairly
subtle changes in the relative locations of names and dates. Thus,
although names and dates in the correct list are shown to be closer on
average than in the permuted lists, it is still the case that nearly all
the rabbis are closest of all to the wrong dates. (3) The performance of the "correct" list ranks highly in
a race with random permutations, but it does not win. That is, this is
not a "code". There are a huge number of random permutations
which perform better than the "correct" list, and so it is not
possible to identify the correct list using the reported phenomenon. WRR plainly state that the particular formulae used to calculate
distance could be chosen differently. In their own words: This extract illustrates a couple of points which need to be kept in
mind when evaluating the paper. (1) There are some general principles which have been adopted, such
as "minimum skip". The paper nowhere gives any research
hypothesis that justifies this principle. They only state that they
obtain a statistical anomaly when they focus attention on minimum skip
ELSs. Since there is no clear hypothesis to justify minimum skip, this
becomes yet another subjective choice; and the significance of the
measured phenomenon is further diluted. (2) The authors do make some testable predictions, of a kind which
would not normally be tested in pre-publication review. They suggest
that the anomalous result would be likely to persist with other distance
measures. We now know from subsequent research that the authors were
wrong. The effect reduces substantially with other distance measures
tested; which suggests that whatever agency or cause is leading to the
anomaly, it is also connected with the choice of the distance function.
[4] (3) The authors recognize the problem presented by subjectivity.
They assert that the function was chosen before any sample was chosen,
and it underwent no changes. (4) As a matter of fact choosing the function before the sample only
prevents tuning of the function to the sample. But it assists in tuning
the sample to the function. What *should* have been done to verify true
independence is to either have the function chosen by an independent
person (since they concede that other functions could be chosen) and
then kept secret until after the data samples were chosen; or else the
tests should have been run on a range of functions. The latter has now
been done, as indicated above, establishing that the data and the
function are in fact strongly correlated. The particular highly complex distance measure used in WRR is a hold
over from earlier research conducted in the eighties, in which they
attempted to calculate probabilities directly. This earlier method was
mathematically nonsense; the numbers calculated were not probabilities
at all, and made many invalid independence assumptions. This has been
pointed out by a number of statisticians. This fact is implicitly conceded by the authors. In the WRR paper,
appendix A.5, the numbers P1 to P4 are defined, and justified in the
following terms (for example): What is not stated here is that the original research *did* treat
the numbers as probabilities! The suggestion of using comparisons with
permutations was made by a referee, almost certainly to overcome this
defect. However; the authors did not (despite Witztum's subsequent
remarks to the contrary!) use the referee's suggestion directly. They
continued to use their rather strange "probabilities" in the
permutations race (as described above) and (unsurprisingly!) they
obtained an exceptionally good result. Had they actually used the referee's suggestion the result would
have been orders of magnitude worse, and the paper would most likely
have been rejected as not demonstrating the purported effect. Some other curious effects of the actual measure used in WRR are
worth of note. It is quite brittle. The ranking of the
"correct" list can be made many times worse by changing dates
for a rabbi whose name does not even appear in Genesis as an ELS! Two of
the rabbis in the list have no associated correct dates -- any yet have
a strong effect on the result! That is, the measure used has
incorporates a significant component of noise. Also the complexity of the measurement function conceals a number of
fairly arbitrary choices intended to assist programming such a complex
function. For example, (appendix A.2 of WRR) a cutoff is applied to
obtain an expected number of ELSs for a word which was chosen to be 10.
A number of other such choices can be found in appendix A.3. We do not
need to know the mathematical significance of the number 10, or how it
is used. All we need to know is that the results have since been tested
using a range of other numbers: nearly always with a corresponding
reduction in the published significance level. A paper giving the
details of these experiments is in preparation by Brendan McKay and
others. The authors selected a particular text of Genesis. There are in fact
a number of texts which could have been used in its place. It is important to note that these different texts are not analogous
to the many vastly different versions of the Christian bible which are
in common use. They differ only in a few words or spellings, usually.
However, such subtle differences have a powerful effect on the patterns
of ELSs studied. Traditional Orthodox Jewish thought does consider the Torah divine
as given. There is no question of introducing minor variations to make a
"better" translation. The Torah is the Torah, and it should not be
changed. However, this ideal of single Torah does not mean that Orthodox
Jewish thought insists that there have been no changes. There is a long
tradition in Jewish study of the Torah of focusing on textual problems
involving extra words or letters; and there is no justification for
claiming the Koren edition (used by WRR) as the one perfect edition.
When the same experiment is run on other editions, the reported effect
is always substantially reduced. A comprehensive discussion on textual issues is provided by
Professor Jeffrey Tigay at http://www.sas.upenn.edu/~jtigay/codetext.html
It should be emphasized that none of the points made above
constitute an allegation of fraud. There has been a long string of coincidences identified: and it
would be tempting to suggest that this means the authors carefully made
each of these choices to assist the result. That does not follow, and
this point should be explained. First, consider the list of choices made, all of which have a
significant effect on the given result. (1) And by far the most important: the names chosen for the rabbis.
(2) The formats chosen for the dates. (3) The distance function. (4)
Various tunable parameters of the chosen function. (5) The choice of the
book of Genesis rather than other books. (6) The choice of a particular
text for Genesis. In each case, any attempt to make independent choices leads to a
substantial reduction of the purported effect. This does NOT mean that each of the choices was deliberately made in
order to get a better result. In fact, that is almost certainly not what
occurred. We can conclude, however, that the published significance level is
definitely highly subjective. Any proposed hypothesis which explains the
result is not finding a code in the torah. It is finding a code in the
combination of Genesis, a list of names, a distance function, an
edition, a set of tuning parameters, etc. The mostly likely explanation is that the results are obtained
because the names and dates were chosen after the other choices had been
fixed, and that the choice of names was not independent of the other
choices. All it takes to obtain the long list of coincidences listed is
to have some tuning of the data to obtain a good result under the given
experimental conditions. The paper does claim a high degree of independence in the choice of
names, in that lists of names were prepared separately. (Although
mathematically speaking, the non-independence of the lists of names and
the testing functions has the status of a theorem.) However, the
official account of the preparation of data is not entirely accurate, as
has been shown by looking at lectures by Professor Rips in the mid
eighties.[5] There appears to have be ample scope for some amount of
information exchange which would invalidate the experiment. Furthermore, the earlier reports of this "research" do not
place the same high emphasis on independent generation of data and test
conditions. It is a fairly clear case of investigators being mislead by
their own biases. This does, of course, have implications for the competence of the
investigators. But there is no reason to shy away from such a conclusion
because it reflects badly on those concerned. One of the reasons for
having independent review and investigation is to identify shoddy
research. Professor Rips' professional reputation has, I am quite sure,
suffered a blow from which it will never fully recover. He should have
recognized the sensitivity of the experiment to the text used for naming
rabbis and giving the dates of death and birth. However, he turned over
the task of preparing and testing the list to Witztum, and did not
establish the kinds of controls needed to ensure independence. The actual tuning of the sample was almost certainly not directly
due to Rips, but the first author of the paper: Doron Witztum. WRR has almost no credible supporters. Mathematicians (excepting
only Rips and Michelson who are authors of "codes" papers) are
unanimous on the subject. The vast majority of Orthodox Jewish Torah
scholars consider it nonsense. No independent religious body has come
out in support of this idea; and churches generally denounce it as
misuse of scripture. Three further articles are cited in my references from Jewish
mathematicians who are concerned to show the errors in WRR.[6,7,8] Why so much reaction? Vacuous mathematical papers get published from
time to time without this level of response. It can't be that the
response is from atheists determined to upset a dangerous proof of
divine action -- the most vocal response is from those who DO consider
the Torah to be divine. And this explains the reaction. People who care about the bible are
the most vocal in refuting vacuous nonsense which if left unchallenged
could only serve to bring the bible into disrepute. The paper describes, at first sight, an interesting phenomenon. The
major defect -- subjectivity of the list of rabbis -- was not
particularly apparent in the paper. Subsequent investigation has
confirmed this beyond doubt: the phenomenon is not a feature of the text
of Genesis: it is a feature of two texts -- Genesis, and the text of the
data. That is, the correlation is not with actual dates and rabbis. The
correlation is with the way in which the experimenters chose to write
down the dates and the rabbis. The experiment does not set out sufficient controls on information
exchange between the sample selection and the distance functions, and
its conclusions are worthless. [1] Witztum, D. Rips, E. and Rosenberg, Y. "Equidistant Letter
Sequences in the Book of Genesis" in Statistical Science 1994, Vol 9, No
3, pp 429-438. Abridged version on-line at http://www.fortunecity.com/tattooine/delany/11/genesis.htm
[2] http://www.torahcodes.co.il/havlin.htm
[3] http://cs.anu.edu.au/~bdm/dilugim/report2.html
[4] Bar-Hillel, M. Bar-Natan, D. McKay, B. "The Torah Codes: Puzzle
and Solution". in Chance, Vol 11, No 2, 1998, pp 13-19. On-line at http://cs.anu.edu.au/~bdm/dilugim/Chance.pdf
[5] http://cs.anu.edu.au/~bdm/dilugim/ripslect/
[6] http://wopr.com/biblecodes/TheCase.htm
From: Chris Ho-Stuart <>
Newsgroups: aus.religion.christian
Subject: Equidistant Letter Sequences in Genesis
Date: 4 Nov 1998 01:21:05 GMT
What are the Torah Codes?
What is the problem with the paper?
What is the alternative hypothesis?
The distance calculation.
"We stress than our definition of distance is not unique.
Although there are certain general principles (like
minimizing the skip d) some of the details can be carried
out in other ways. We feel that varying these details
is unlikely to affect the results substantially. Be that
as it may, we chose one particular definition, and have,
throughout, used _only_ it, that is, the function c(w, w')
described in appendix A.2 of the Appendix had been defined
before any sample was chosen, and it underwent no changes."
"If the c(w,w') were independent random variables [..] then
P2 would be the probability that the product PIs(w,w') is
as small as it is, or smaller. But as before, we do not use
any such uniformity or independence assumptions. Like P1,
the statistic P2 is calibrated in probability terms; but
[...] one should think of it simply as an ordinal index
that enables [comparisons of permutations]."
The text used.
The question of fraud.
The response.
Conclusion
References.
top of page