Categories
General Statistical Applications

Why Surgery-Duration Predictions are So Poor, and a Possible Remedy

Operating theatres are the most expensive resource at the disposal of hospitals. This renders optimizing scheduling of surgeries to operating rooms a top priority. A pre-condition to optimal scheduling is that accurate predictions of surgery-duration be available. Much research effort has in recent years been invested to develop methods that improve the accuracy of surgery-duration predictions. This ongoing effort includes both traditional statistical methods and newer Artificial Intelligence (AI) methods. The state-of-the-art of these methods, with relevant peer-reviewed literature, have recently been summarized by us in a new entry on Wikipedia, titled “Predictive Methods for Surgery Duation”.     

Personally, I was first exposed to the problem of predicting surgery-duration over thirty years ago, when I was involved in a large-scale project encompassing all governmental hospitals in Israel (at the time). Partial results of this effort had been reported in my published paper of 1986, and further details can be found in my more recent paper of 2020. Both articles are listed in the literature section at the end of this post (for podcast listeners, this list may be found on haimshore.blog).

My second involvement in developing predictive methods for surgery-duration was in more recent years, culminating in three peer-reviewed published papers (Shore 2020, 2021 ab; see references below).

Surgery-duration is known to be very highly volatile. The larger the variability between surgeries, the less accurate the prediction may be expected to be. To reduce this variability, newly devised predictive methods for surgery-duration tend to concentrate on subsets of surgeries, classified according to some classification system. It is assumed that via this classification, prediction accuracy may be enhanced. A common method to classify surgeries, implemented worldwide, is Current Procedural Terminology (CPT®). This coding system delivers, in a hierarchical fashion, particular codes to subsets of surgeries. In doing so, variability between surgeries sharing same CPT code is expected to be reduced, allowing for better prediction accuracy.

A second effort to increase accuracy is to include, in the predictive method, certain factors, known prior to surgery, which deliver variability to surgery-duration. It is hoped that by taking account of these factors, in the predictive method, unexplained variability in surgery-duration will be reduced, thereby enhancing prediction accuracy (examples will soon be given).

A third factor that influence accuracy is the amount of reliable data, used to generate predictions. Given recent developments in our ability to process large amounts of data, commonly known as Big Data, Artificial Intelligence (AI) methods have been summoned to assist in predicting surgery times.

These new methods and others are surveyed more thoroughly in the aforementioned entry on Wikipedia.

The new methods notwithstanding, current predictive methods for surgery-duration still deliver unsatisfactory accuracy.

Why is that so?

We believe that a major factor for the poor performance of current predictive methods is lack of essential understanding of what constitute major sources of variability to surgery-duration. Based on our own personal experience, as alluded to earlier, and also on our professional background as industrial engineers, specializing in analysis of work processes (of which surgeries are an example), we believe that there are two sets of factors that generate variability in surgery-duration: A set of major factors and a set of secondary factors. We denote these Set 1 and Set 2 (henceforth, we refer only to variability between surgeries within a subset of same code):

Set 1 — Two Major Factors:

  • Factor I. Work-content instability (possibly affected by variability in patient condition);
  • Factor II. Error variability.

Set 2 — Multiple Secondary Factors, like: patient age, professional experience and size of medical team, number of surgeries a surgeon has to perform in a shift, type of anaesthetic administered. 

Let us explain why, in contrast to current practices, we believe that work-content instability has critical effect on prediction accuracy, and why accounting for it, in the predictive method, is crucial to improving current accuracy, obtained via traditional methods.

To prepare predictions for any random phenomenon, assumed to be in steady-state, the best approach is to define its statistical distribution and estimate its parameters, based on real data. Once the distribution is completely defined, various statements about the conduct of the random phenomenon (like surgery-duration) can be made.

For example:

  • What is the most likely realization (given by distribution’s mode);
  • What is the middle value, which delivers equal probabilities, for any realization, to be larger or smaller than that value (expressed by distribution’s median);
  • What is the probability that any realization of the random phenomenon exceeds a specified value (calculated by the cumulative density function, CDF)?

Understanding that complete definition of the distribution is the best approach to predict surgery-duration, let us next explain what type of distributions can one expect in the two extreme states, regarding the two major factors of Set 1:

State 1. There is no variability in work-content (there is only error variability);

State 2. There is no error (error variability is zero; there is only work-content variability).

The two states define two different distributions for surgery-duration.

The first state, State 1, implies that the only source of variability is error. This incurs the normal distribution, for an additive error, or the log-normal distribution, for a multiplicative error (namely, error expressed as a percentage).

State 2, lack of error variability, by definition can only materialize when there is no typical value (like the mode), on which error can be defined. Since no definition of error is feasible, error variability becomes zero. For work-processes, like surgery, this can happen only when there is no typical work-content. In statistical terms, this is a state of lack-of-memory. An example is the duration of repair jobs at a car garage, relating to all types of repair. The distribution typical to such situations is the memoryless exponential.

We learn from this discussion, that any statistical model of surgery-duration, from which its distribution may be derived, needs to include, as extreme cases, both the normal/lognormal distributions and the exponential distribution.

This is a major constraint on any model for the distribution of surgery-duration. It has so far eluded individuals engaged in developing predictive methods for surgery-duration. Lack of knowledge of basic principles of industrial engineering, as well as total ignorance regarding how instability in work-content of a work process (like surgery) influences the form of the distribution, these probably constitute the major culprit for the poor current state-of-the-art of predicting surgery-duration.

In Shore (2020), we have developed a bi-variate model for surgery-duration, which delivers not only the distributions of surgery-duration in the extreme states (State 1 and State 2), but also the distributions of intermediate states, residing between the two extreme states. The two components of the bi-variate model represent work-content and error as two multiplicative random variables, with relative variabilities (standard deviations) that gradually change as surgery-duration moves from State 1 (normal/lognormal case) to State 2 (exponential case).

What do we hope to achieve with publishing of this post (and the accompanying podcast)?

We hope that individuals, engaged in developing predictive methods for surgery-duration, internalize the grim reality that:

  1. Unless their predictive method allows for the normal/lognormal and for the exponential to serve as exact distributions of surgery-duration at the extreme states;
  2. Unless their predictive method allows intermediate states, spanned on a continuous spectrum between the two extreme states, to converge smoothly to these states (as in Shore, 2020),

unless these two conditions be met, the likelihood for the accuracy of predictive methods for surgery-duration to improve anytime soon, this likelihood would remain, as it is today, extremely slim.

Literature

[1] Shore, H (1986). An approximation for the inverse distribution function of a combination of random variables, with an application to operating theatres. J. Statist. Com. Simul. 1986; 23:157-81. Available on Shore’s ResearchGate page.

[2] Shore, H (2020). An explanatory bi-variate model for surgery-duration and its empirical validation, Communications in Statistics: Case Studies, Data Analysis and Applications, 6:2, 142-166, DOI: 10.1080/23737484.2020.1740066 .

[3] Shore, H (2021a). SPC scheme to monitor surgery-duration. Qual Reliab Eng Int. 37: 1561– 1577. DOI: 10.1002/qre.2813 .

[4] Shore, H (2021b). Estimating operating room utilisation rate for differently distributed surgery times. International Journal of Production Research. DOI: 10.1080/00207543.2021.2009141

[5] Shore, H (2021c). “Predictive Methods for Surgery Duation”. Wikipedia. April 16, 2021.

Categories
My Research in Statistics

Where Statistics Went Wrong? And Why?

  1. Introduction

Is Statistics, a branch of mathematics that serves central tool to investigate nature, heading in the right direction, comparable to other branches of science that explore nature?

I believe it is not.

This belief is based on my own personal experience in a recent research project, aimed to model surgery time (separately for different subcategories of surgeries). This research effort culminated in a trilogy of published articles (Shore 2020ab, 2021). The belief is also based on my life-long experience in academia. I am professor emeritus, after forty years in academia and scores of articles published in refereed professional journals, dealing both with the theory and application of Statistics. In this post, I deliver an account of my recent personal experience with modeling surgery time, and conclusions I have derived thereof, and from my own cumulative experience in the analysis of data and in data-based modeling.

The post is minimally technical, so that a layperson, with little knowledge of basic terms in Statistics, can easily understand.

We define a random phenomenon as one associated with uncertainty, for example, “Surgery”. A random variable (r.v) is any random quantitative property defined on a random phenomenon. Examples are surgery medical outcome (Success:  X=1; Failure: X=0), surgery duration (X>0) or patient’s maximum blood pressure during surgery (Max. X).

In practice, an r.v is characterized by its statistical distribution. The latter delivers the probability, P (0≤P≤1), that the random variable, X, assumes a certain value (if X is discrete), or that it will fall in a specified interval (if X is continuous). For example, the probability that surgery outcome will be a success, Pr(X=1), or the probability of surgery duration (SD) to exceed one hour, Pr(X>1).

Numerous statistical distributions have been developed over the centuries, starting with Bernoulli (1713), who derived what is now known as the binomial distribution, and Gauss (1809), deriving the “astronomer’s curve of error”, nowadays known as Gauss distribution, or the normal distribution. Good accounts of the historical development of the science of probability and statistics to its present-day appear at Britannica and Wikipedia, entry History_of_statistics.

A central part of these descriptions is, naturally, the development of the concept of statistical distribution. At first, the main source of motivation was games of chance. This later transformed into the study of errors, as we may learn from the development of the normal distribution by Gauss. In more recent years, emphasis shifted to describing random variation as observed in all disciplines of science and technology, resulting, to date, in thousands of new distributions. The scope of this ongoing research effort may be appreciated by the sheer volume of the four-volume Compendium on Statistical Distributions by Johnson and Kotz (First Edition 1969–1972, updated periodically with Balakrishnan as an additional co-author).

The development of thousands of statistical distributions over the years, up to the present, is puzzling, if not bizarre. An innocent observer may wonder, how is it that in most other branches of science, the historical development shows a clear trend towards convergence, while in modeling random variation, the most basic concept to describe processes of nature, the opposite has happened, namely, divergence?

Put in more basic terms: Why in science, in general, a continuous attempt is exercised to unify, under an umbrella of a unifying theory, the “objects of enquiry” (forces in physics; properties of materials in chemistry; human characteristics in biology), while in the mathematical modelling of random variation, this has not happened? Why in Statistics, the number of “objects of enquiry”, instead of diminishing, keeps growing?

And more succinctly: Where did Statistics go wrong? And why?

I have already had the opportunity to address this issue (the miserable state-of-the-art of modelling random variation) some years ago, when I wrote (Shore, 2015):

“ “All science is either physics or stamp collecting”. This assertion, ascribed to physicist Ernest Rutherford (the discoverer of the proton, in 1919) and quoted in Kaku (1994, p. 131), intended to convey a general sentiment that the drive to converge the five fundamental forces of nature into a unifying theory, nowadays a central theme of modern physics, represents science at its best. Furthermore, this is the right approach to the scientific investigation of nature. By contrast, at least until recently, most other scientific disciplines have engaged in taxonomy (“bug collecting” or “stamp collecting”). With “stamp collecting” the scientific inquiry is restricted to the discovery and classification of the “objects of enquiry” particular to that science, however this never culminates, as in physics, in a unifying theory, from which all these objects may be deductively derived as “special cases”. Is statistics a science in a state of “stamp collecting”?”

This question remains valid today, eight years later: Why has the science of Statistics, as a central tool to describe statistically stable random phenomena of nature, has deviated so fundamentally from the general trend at unification?

In Section 2, we enumerate the errors that, we believe, triggered this departure of Statistics from the general trend in the scientific study of nature, and outline possible outlets to eliminate these errors. Section 3 is an account of the personal learning experience that I have gone through while attempting to model surgery duration and its distribution. This article is a personal account, for the naive (non-statistician) reader, of that experience. As alluded to earlier, the research effort resulted in a trilogy of articles , and in the new “Random identity paradigm”. The latter is addressed in Section 4, where new concepts, heretofore ignored by Statistics, are introduced (based on Shore, 2022). Examples are “Random identity”, “identity variation”, “losing identity” (with characterization of the process), and “Identity-full/identity-less distributions”. These concepts are underlying a new methodology to model observed variation in natural processes (as contrasted with variation of r.v.s that are mathematical function of other r.v.s).The new methodology is outlined, based on Shore, 2022. Section 5 delivers some final thoughts and conclusions.

  1. The historical errors embedded in current-day Statistics

Studying the history of the development of statistical distributions to date, we believe Statistics departure from the general trend, resulting in a gigantic number of “objects of enquiry” (as alluded to earlier), may be traced to three fundamental, inter-related, errors, historically committed within Statistics:

Error 1: Failure to distinguish between two categories of statistical distributions:

Category A: Distributions that describe observed random variation of natural processes;

Category B: Distributions that describe behavior of statistics, namely, of random variables that are, by definition, mathematical functions of other random variables.

The difference between the two categories is simple: Category A succumbs to certain constraints on the shape of distribution, imposed by nature, which Category B does not (the latter succumbs to other constraints, imposed by the structure of the mathematical function, describing the r.v). As we shall soon realize, a major distinction between the two sets of constraints (not the only one) is the permissible values for skewness and kurtosis. While for Category A, these fluctuate in a specified interval, confined between values of an identity-full distribution and an identity-less distribution (like the normal and the exponential, respectively; both types of distribution shall be explained soon), for Category B such constraints do not hold.     

Error 2: Ignoring the real nature of error:

A necessary condition for the existence of an error, indeed a basic assumption integrated implicitly into its classic definition, is that for any observed random phenomenon, and the allied r.v, there is a typical constant, an outcome of various factors inherent to the process/system (“internal factors”), and there is error (multiplicative or additive), generated by random factors external to the system/process (“external factors”). This perception of error allows its distribution to be represented by the normal, since the latter is the only one having mean/mode (supposedly determined by “internal factors”) disconnected from the standard deviation, STD (supposedly determined by a separate set of factors, “external factors”).

A good representative of the constant, relative to which error is defined, is the raw mode or the standardized mode (raw mode divided by the STD). As perceived today, the error indeed expresses random deviations from this characteristic value (the most frequently observed value).

What happens to the error, when the mode itself ceases to be constant and becomes random? How does this affect the observed random variation or, more specifically, how is error then defined and modelled?

Statistics does not provide an answer to this quandary, except for stating that varying “internal factors”, namely, non-constant system/process factors, may produce systematic variation, and the latter may be captured and integrated into a model for variation, for example, via regression models (linear regression, nonlinear regression, generalized linear models and the like). In this case, the model ceases to represent purely random variation (as univariate statistical distributions are supposed to do). It becomes a model for systematic variation, coupled with a component of random variation (the nature of the latter may be studied by “freezing” “internal factors” at specified values). It is generally assumed in such models that a single distribution represents the component of random variation, though possibly with different parameters’ values for different values of the systematic effects, integrated into the model. Thus, implementing generalized linear models, the user is requested to specify a single distribution (not several), valid for all different sets of the effects’ values. As we shall soon learn (Error 3), “internal factors” may produce not only systematic effects, as currently wrongly assumed, but also a different component of variation, unrecognized to date. It will be addressed next as the third error.

Error 3: Failure to recognize the existence of a third type of variation (additional to random and systematic) — “Identity variation”:

System/process factors may potentially produce not only systematic variation, as currently commonly assumed, but also a third component of variation, passed under the radar, so to speak, in the science of Statistics. Ignoring this type of variation is the third historic error of Statistics. For reasons to be described soon (Sections 3 and 4), we denote this unrecognized type of variation — “Identity variation”.

  1. Modeling surgery duration — Personal learning experience that resulted in the new “Random identity paradigm”

I have not realized the enormity of the consequences of the above three errors, committed within Statistics to date, until a few years ago, when I have embarked on a comprehensive research effort to model the statistical distribution of surgery duration (SD), separately for each of over a hundred medically-specified subcategories of surgeries (the latter defined according to a universally accepted standard; find details in Shore 2020a). The subject (modeling SD distribution) was not new to me. I had been engaged in a similar effort years ago, in the eighties of the previous century (Shore, 1986). Then, based on analysis of available data and given the computing facilities available at the time, I divided all surgeries (except open-heart surgeries and neurosurgeries), into two broad groups: short surgeries, which were assumed to be normally distributed, and long surgeries, assumed to be exponential. There, for the first time, I have become aware of “Identity variation”, though not so defined, which resulted in modeling SD distribution differently for short surgeries (assumed to pursue a normal distribution) and long ones (assumed to be exponential). With modern available computing means, and with my own cumulative experience since publication of that paper (Shore, 1986), I thought, and felt, that a better model may be conceived, and embarked on the new project. 

Probing into the available data (about ten thousand surgery times with affiliated surgery subcategories), four insights/observations were apparent:

1. It was obvious to me that different subcategories pursue different statistical distributions, beyond just differences in values of distribution’s parameters (as currently generally assumed in modeling SD distribution);

2. Given point (1), it was obvious to me that differences in distribution between subcategories should be attributed to differences in the characteristic level of work-content instability (work-content variation between surgeries within subcategory);

3. Given points (1) and (2), it was obvious to me that this instability cannot be attributed to systematic variation. Indeed, it represents a different type of variation, “identity variation”, to-date unrecognized in the Statistics literature (as alluded to earlier);

4. Given points (1) to (3), it was obvious to me that any general model of surgery time (SD) should include the normal and the exponential as exact special cases.

For the naive reader, I will explain the new concept, “identity variation”. Understanding this concept will render all of the above insights clearer.

As an industrial engineer in profession, it was obvious to me, right from the beginning of the research project, that, ignoring negligible systematic effects caused by covariates (like the surgeon performing the operation), a model for SD, representing only random variation in its classical sense, would not be adequate to deliver proper representation to the observed variation. Changes between subcategories in the type of distribution, as revealed by changes in distribution shape (from the symmetric shape of the normal to the extremely non-symmetric of the exponential, as first noticed by me in the earlier project, Shore, 1986),  these changes have made it abundantly clear that the desired SD model should account for “identity loss”, occurring as we move from a repetitive process (subcategory with repetitive surgeries, having characteristic/constant work-content) to a memory-less non-repetitive process (subcategory with surgeries having no characteristic common work-content). As such, the SD model should include, as exact special cases, the exponential and the normal distributions.

What else do we know of the process of losing identity, as we move from the normal to the exponential, which account for “identity variation”?

In fact, several changes in distribution properties accompany “identity loss”. We relate again to surgeries. As work processes in general, surgeries too may be divided into three non-overlapping and exhaustive set of groups: repetitive, semi-repetitive and non-repetitive. In terms of work-content, this implies:

  • Work-processes with constant work-content (only error generates variation; SD normally distributed);
  • Semi-repetitive work-processes (work-content varies somewhat between surgeries, to a degree dependent on subcategory);
  • Memory-less work-processes (no characteristic work-content; For example, surgeries performed within an emergency room for all types of emergency, or service performed in a pharmacy, serving customers with varying number of items on the prescription list).

Thus, work-content, however it is defined (find an example in Shore, 2020a), forms “surgery identity”, with a characteristic value, the mode, that vanishes (becomes zero) for the exponential scenario (non-repetitive work-process).    

Let us delve somewhat deeper into the claim that a model for SD should include the normal and the exponential as exact special cases (not merely asymptotically, as, for example, the gamma tends to the normal).

There are four observations/properties, which put the two distributions, the identity-full normal and the identity-less exponential, apart from other distributions:

Observation 1: The mean and standard deviation are represented by different parameters for the normal distribution, and by a single parameter for the exponential. This difference is reflection of a reality, where, in the normal scenario, a set of process/system factors (“internal factors”) produces signal only, and a separate set (“external factors”) produces noise only (traditionally modelled as a zero-mean symmetrically distributed error). Moving away from the normal scenario to the exponential scenario, we witness a transition towards merging of the mean with the standard deviation, until, in the exponential scenario, both signal and noise are produced by the same set of factors — the mean and standard deviation merge to be expressed by a single parameter. The clear distinction, between “system/process factors” and “external/error factors”, typical to the normal scenario, this distinction has utterly vanished;

Observation 2: The mode, supposedly representing the typical constant on which the classical multiplicative error is defined in the normal scenario, this mode, or rather the standardized mode, shrinks, as we move away from the normal to the exponential. This movement, in reality, represents passing through semi-repetitive work-processes, with increasing degree of work-content instability. The standardized mode finally disappears (becoming zero) in the exponential scenario. What does this signify? What are the implications?

Observation 3: For both the normal and the exponential, skewness and kurtosis are non-parametric. Why is that, and what does this signify?

Observation 4: What happens to the classic error, when the r.v moves away from the normal scenario to the exponential? Can we still hold on to the classic definition of error, given that “internal factors”, assumed to generate a constant mode (signal), these factors start to produce noise? How would then error (in its classical sense) be re-specified? Can an error be defined at all?

All these considerations, as well as the need to include semi-repetitive surgeries within the desired model, brought me to the realization that we encounter here a different type of variation, heretofore unrecognized and not addressed in the literature. The instability of work-content (within subcategory), which I have traced to be the culprit for change in distribution as we move from one subcategory to another, could not possibly be regarded as cause for systematic variation. The latter is never assumed to change the shape of distribution, only at most its first two moments (mean and variance). This is evident, for example, on implementing generalized linear models, a regression methodology frequently used to model systematic variation in a non-normal environment. The user is requested to specify a single distribution (normal or otherwise), never different distributions for different sets of values of the effects being modeled (supposed delivering systematic variation). Neither can work-content variation be considered part of the classic random variation (as realized in Category A distributions) since the latter assumes existence of a single non-zero mode (for a single non-mixture univariate distribution), not zero mode or multiple modes (as, for example, with the identity-less exponential (zero mode), its allied Poisson distribution (two modes for an integer parameter), or the identity-less uniform (infinite number of modes); find details in Shore, 2022).

A new paradigm was born out of these deliberations — the “Random identity paradigm”. Under the new paradigm, observed non-systematic variation is assumed to originate in two components of variation: random variation, represented by a multiplicative normal/lognormal error, and identity variation, represented by an extended exponential distribution. A detailed technical development of this methodology, allied conjectures and their empirical support (from known theory-based results) are given in Shore (2022; A link to a pre-print is given at the References section). In the next Section 4 we deliver an outline of the “Random identity paradigm”.

  1. The “Random identity paradigm” — “Random identity”, “Identity variation”, “identity loss”, “identity-full/identity-less distributions” (based on Shore, 2022)

The insights, detailed earlier, have led to the development of the new “Random identity paradigm”, and its allied explanatory two-variate model for SD (Shore, 2020a). The model was designed to fulfill an a-priori specified set of requirements. Central among these is that the model includes the normal and the exponential distributions as exact special cases. After implementing the new model for various applications (as alluded to earlier), we have arrived at the realization that the model used in the article may, in fact, be expanded to introduce a new type of random variation, “random identity variation”, which served the basis for the new “Random Identity Paradigm” (Shore, 2022).

A major outcome of the new paradigm is the definition of two new types of distributions, an identity-full distribution and an identity-less distribution, and a criterion to diagnose a given distribution as “identity-full”, “identity-less”, or in between. Properties of identity-less and identity-full distributions are described, in particular, the property that such distributions have non-parametric skewness and kurtosis, namely, both types of distribution assume constant values, irrespective of values assumed by distribution parameters. Another requirement, naturally, is that the desired model includes a component of “identity variation”. However, the requirement also specifies that the allied distribution (representing “identity variation”) have support with the mode, if it exists, as its extreme left point (detailed explanation is given in Shore, 2022). As shown in Shore (2020ab, 2021, 2022), this resulted in defining the exponential distribution anew (the extended exponential distribution), adding a parameter, α, that assumes a value of α=0, for the exponential scenario (error STD becomes zero), and a value tending to 1, as the model moves towards normality (with “identity variation”, expressed in the extended exponential by parameter σi, tending to zero).

Sparing the naive reader the technical details of the complete picture, conveyed by the new “Random identity paradigm” (Shore, 2022), we outline herewith the associated model, as used in the trilogy of published paper.

The basic model is given in eq. (1):

Haim Shore_Equations_The Problem with Statistics_January 26 2022

where R is the observed response (an r.v), L and S are location and scale parameters, respectively, Y is the standardized response (L=0, S=1), {Yi ,Ye} are independent r.v.s representing internal/identity variation and external/error variation, respectively, ε is zero-mode normal disturbance (error) with standard deviation σε and Z is standard normal. The density function of the distribution of Yi in this model (the extended exponential) is eq. (2), where Yi is the component representing “identity variation” (caused by variation of system/process factors, “Internal factors”), CYi is a normalizing coefficient, and σi is a parameter representing internal/identity variation. It is easy to realize that α is the mode. At α=1, Yi becomes left-truncated normal (re-located half normal). However, it is assumed that at α=1 “identity variation” vanishes, so Yi becomes a constant, equal to the mode (1). For the exponential scenario (complete loss of identity), we obtain α=0, and the disturbance, assumed to be multiplicative, renders meaningless, namely, it vanishes (σe=0, Ye=1). Therefore, Yi and Y then both become exponential.

Let us introduce eq. (3).  From (2), we obtain the pdf of Zi: (eqs. (4) and (5)). Note that the mode of Zi is zero (mode of Yi is α).

Various theorems and conjectures are articulated in Shore (2022), which deliver eye-opening insights into various regularities in the behavior of statistical distributions, previously un-noticed, and good explanation to various statistical theoretical results, heretofore considered separate and unrelated (like a logical derivation of the Central Limit Theorem from the “Random identity paradigm”).

  1. Conclusions

In this article, I have reported about my personal experience, which led me to the development of the new “Random identity paradigm” and allied concepts. It followed my research effort to model surgery duration, which resulted in a bi-variate explanatory model, with the extended exponential distribution as the intermediate tool, that paved a smooth way to unify, under a single umbrella model, execution times of all types of work processes/surgeries, namely, not only repetitive (normal), or non-repetitive (exponential), but also those in between (semi-repetitive processes/surgeries). To date, we are not aware of a similar unifying model that is as capable in unifying diverse phenomena as the three categories of work-processes/surgeries. Furthermore, this modeling effort has led directly to conceiving the new “Random identity paradigm” with allied new concepts (as alluded to earlier).

The new paradigm has produced three major outcomes:

First, as demonstrated in the linked pre-print, under the new paradigm virtually scores of theoretical statistical results that have formerly been derived independently and considered unrelated, are explained in a consistent and coherent manner, becoming inter-related under the unifying “Random identity paradigm”.

Secondly, various conjectures about properties of distributions are empirically verified with scores of examples/predictions from the Statistics literature. For example, the conjectures that Category B r.v.s, which are function of only identity-less r.v.s, are also identity-less, and similarly for identity-full r.v.s.

Thirdly, the new bi-variate model has been demonstrated to represent well numerous existent distributions, as has been shown for diversely-shaped distributions in Shore, 2020a (see Supplementary Materials therein).

It is hoped that the new “Random identity paradigm”, representing an initial effort at unifying distributions of natural processes (Category A distributions), this new paradigm may pave the way for Statistics to join other branches of science in a common effort to reduce, via unification mediated by unifying theories, the number of statistical distributions, the “objects of enquiry” of modeling random-variation within the science/branch-of-mathematics of Statistics.

References

[1] Shore H (1986). An approximation for the inverse distribution function of a combination of random variables, with an application to operating theatres. Journal of Statistical Computation and Simulation, 23:157-181. DOI: 10.1080/00949658608810870 .

[2] Shore H (2015). A General Model of Random Variation. Communication in Statistics- Theory and Methods, 49(9):1819-1841. DOI: 10.1080/03610926.2013.784990.

[3] Shore H (2020a). An explanatory bi-variate model for surgery-duration and its empirical validation. Communications in Statistics: Case Studies, Data Analysis and Applications, 6(2):142-166. Published online: 07 May 2020. DOI: 10.1080/23737484.2020.1740066

[4] Shore H (2020b). SPC scheme to monitor surgery duration. Quality and Reliability Engineering International. Published on line 03 December 2020. DOI: 10.1002/qre.2813

[5] Shore H (2021). Estimating operating room utilisation rate for differently distributed surgery times. International Journal of Production Research. Published on line 13 Dec 2021. DOI: 10.1080/00207543.2021.2009141.

[6] Shore H (2022). “When an error ceases to be error” — On the process of merging the mean with the standard deviation and the vanishing of the mode. Preprint.

Haim Shore_Blog_Merging of Mean with STD and Vanishing of Mode_Jan 07 2022

 

Categories
Podcasts (audio)

“Shamayim” — The Most Counter-intuitive Yet Scientifically Accurate Word in Biblical Hebrew (Podcast)

The deeper meaning and implications of the biblical Hebrew Shamayim (Sky; A post of same title may be found here ):

Categories
My Research on the Bible and Biblical Hebrew Shorties

“Shamayim” — The Most Counter-intuitive Yet Scientifically Accurate Word in Biblical Hebrew

The word Shamayim in Hebrew simply means Sky (Rakia in biblical Hebrew; Genesis 1:8):

“And God called the Rakia Shamayim, and there was evening and there was morning second day”.

Rakia in biblical Hebrew, like in modern Hebrew, simply means sky.

So why, in the first chapter of Genesis, is the sky Divinely called Shamayim?

And why, according to the rules of biblical Hebrew, is it fundamentally counter-intuitive, yet, so scientifically accurate?

The word Shamayim comprises two syllables. The first is Sham, which simply means there, namely, that which is inaccessible from here. The second syllable, ayim, is a suffix, namely, an affix added to the end of the stem of the word. Such suffix in added, in Hebrew, to words that represent a symmetric pair of objects, or, more generally, to words that represent objects that appear in symmetry. Thus, all visible organs in the human body that appear in pairs have same suffix, like legs (raglayim), hands (yadayim), eyes (einayim) and ears (oznayim). However, teeth, arranged in symmetry in the human mouth, though not in pairs, also have same suffix. Teeth in Hebrew is shinayim. Other examples may be read in my book at Chapter 5.

Let us address the two claims in the title:

  • Why Shamayim is counter-intuitive?
  • Why is Shamayim so scientifically accurate?

The answer to the first claim is nearly self-evident. When one observes the sky, at dark hours, the observed is far from symmetric. So much so that the twelve Zodiacal constellations had to be invented, in ancient times, to deliver some sense to the different non-symmetric configurations of stars that to this day can be observed by the naked eye in the sky.

Yet, despite the apparent non-symmetry observed in the sky, the Divine chose to grant the sky a word indicative of the most fundamental property of the sky, as we have scientifically learned it to be in recent times, namely, its symmetry (as observed from Plant Earth), or its uniformity (as preached by modern cosmology).

To learn how fundamentally uniform (or symmetric) the universe is, the reader is referred to Chapters 5 and 7 of my book, and references therein. Another good source to learn about the uniformity of the universe, as observed via telescopes and as articulated by modern science, is the excellent presentation by Don Lincoln at Wondrium channel:

https://www.youtube.com/watch?v=CRQvp3XPH_s

Note the term Desert, addressed in the lecture. The term is used, in modern cosmology, to denote the uniformity of the universe at the Big Bang (“In the beginning”).

Surprisingly, the words, Tohu Va-Vohu, describing the universe “in the beginning” (Genesis 1:2), are also associated with desert, as they are employed elsewhere in the Hebrew Bible.

Consider, for example Jeremiah (4:23, 26):

“I beheld the earth, and, lo, it was Tohu Va-Vohu…I beheld and, lo, the fruitful land has become the desert…”.

Refer also to Isaiah (34:11).

So:

  • Shamayim is counter-intuitive and at odds with the picture, revealed in ancient times to the naive observer, our pre-science ancestors;
  • Shamayim yet accurately describes current scientific picture of the universe, as formed in the last hundred years or so, based on cumulative empirical data (gathered via telescopes), and based on modern theories of the evolution and structure of the universe.

Articulated more simply:

Whatever direction in the sky you point to, Shamayim states that it is all the same, contrary to what the naked eyes are telling us, in conformance with what modern science is telling.

Personal confession, mind boggling…

Categories
Podcasts (audio)

The Three Pillars of Truth (Lessons from the Hebrew Alphabet; Podcast-audio)

What does “Truth” stand on? How do we tell truth from falsehood?

The Hebrew Alphabet conveys to us the essential ingredients of truth.

We denote these:

The Three Pillars of Truth.

What are they?

Categories
Podcasts (audio)

Free Will — The Act of Separating and Choosing (Podcast-audio)

Why is there free-will?

What are the necessary and sufficient requirements for free-will to be exercised?

How do we make decisions within the two worlds, comprising our lives, the “World of Law-of-Nature” and the “World of Randomness”?

These questions and others are addressed, supported by excerpts from the Bible.

 

Categories
My Research on the Bible and Biblical Hebrew Podcasts (audio)

What Do We Know of God? (Podcast-audio)

The detailed answer, based on the Jewish Hebrew Bible (Torah, the prophets), on in-dept analysis of biblical Hebrew words and traditional Jewish interpreters — may surprise you:

 

Categories
Podcasts (audio)

“Do Not Steal” – Is it in the Ten Commandments? (Podcast-audio)

The answer to this intriguing question may surprise you. The true meaning of the Eighth Commandment, according to traditional Jewish scholarship, is not what it appears to be.

So where prohibition on stealing, in the common sense of the word, does appear in the Ten Commandments?

Find details in this podcast:

 

Categories
General

Free Will— The Act of Separating and Choosing

The essence of being human is exercising free will. This is the act by which we continuously create ourselves and form our personality and character.

The Divine has created mankind (“So God created mankind in his own image…”, Genesis 1:27); but He has also formed it (“And the Lord God formed mankind of the dust of the ground…”, Genesis 2:7). We, human beings, whether we wish it or not, are doomed throughout our lives to repeat, via exercising free will, the two acts of creating (establishing a solid link between soul and body, while we grow) and forming.

What is the needed environment for human beings to be able to exercise their free-will?

There are two conditions (necessary and sufficient):

[1] Existence of “Good” and “Bad” mixed together (as in “The Tree of Knowledge, good and bad”, Genesis 2:9);

[2] Hidden-ness of God and the concealment of God’s hidden-ness.

Prophet Isaiah delivers succinct and stunning expression to the existence of the first condition:

“That men may know from the rising of the sun to its setting that there is none besides me— I am Jehovah and there is no one else; Forming light and creating darkness, making peace and creating the bad, I Jehovah am doing all these” (Isaiah 45:6-7).

Note that creating (“something from nothing”) precedes forming ((“imprinting form on the created”), just as forming precedes making. Yet prophet Isaiah sets absence of light (darkness) and the bad (the harmful, the evil) at a level higher than that of light— the former were created, the latter was “just” formed.

Existence of the second condition, a daily human experience revealed in countless debates on whether God exists, is evidenced both by biblical Hebrew and by the Bible. In biblical Hebrew, “World” (Olam) derives from same root as all Hebrew words pointing to concealment. Examples: Ta’aluma (Mystery); He’almut (disappearance); Ne’elam (unknown (noun), as in an algebraic equation); Alum (secret, adj.). In other words, the whole world is testimony to the hidden-ness of God. Prophet Isaiah repeats same motive:

“Indeed, thou are a God who hides thyself, O God of Israel, savior” (Isaiah 45:15).

Concealment of God, however, is itself concealed (“Does God exist?”):

“And I will surely hide my face on that day…” (Haster Astir; Deuteronomy 31:18).

The repeat of same root twice (in two consecutive words) is traditionally interpreted by Jewish scholars as implying concealment of the concealment, an integrated fact of life that we all have probably experienced at one time or another throughout our lives (“Does God exist?”).

Having studied the two conditions for the existence of free-will, the next question to ask is:

What are the limitations to exercising free-will and what does the latter entail?

We continuously live in two worlds, intermingled and most often inseparable and indistinguishable from one another: “World of Law-of-Nature” and “World of Randomness”. We can exercise free-will only in an environment that allows choice, namely, in the “World of Randomness”. Unlike in the “World of Law-of-Nature”, where external constraints force us to behave in certain ways (and not others, namely, no free choice is available), in the “World of Randomness”, where randomness prevails, we are free to exercise whatever our heart desires. It is only then, in the “World of randomness”, that we become an agent of our own free will.

What exercising free-will is comprised of? It comprises two actions:

Separating;

Choosing.

We need to separate “Good” from “Bad”, before choosing. Most often in our daily lives, the good and the bad are intermingled to a degree that the two can rarely be told apart; Therefore, we need to separate before choosing. God created darkness (per prophet Isaiah), thereby allowing the good and the bad in our world to co-exist, mixed. Consider the biblical Hebrew word for “evening” (as in “…and there was evening and there was morning…”; Genesis 1:5, for example). The Hebrew word derives from same Hebrew root used for mixing (as in “mixture”). The “Tree of Knowledge good and bad” also implies mixed together. In biblical terms, one may allegorically assert that we all have eaten of “The Tree of Knowledge, good and bad”, where “Good” and “Bad” are mixed together in the same fruit. And since then, “Good” and “Bad” have become intermingled in our body and soul, delivering us our mission in life to grow and mature and create ourselves and form our personality and character, all via the process of separating (“Good” from “Bad”) and then choosing.

The act of separating (good from bad) is two-folded and it is expressed differently in the two worlds we inhabit:

  • In the “World of Law-of-Nature”, we need to separate “good” from “bad” because absent this separation we may choose the “bad”, thereby harming our well-being and possibly even endangering our life. Thus, buying fruit in the supermarket, we are careful to separate good apples from the bad ones (rotten apples) so that we can then make the correct choice of purchasing good apples only, benefiting our health and well-being. Separation is also inherent to many of our bodily processes (like in the kidney);
  • In the “World of Randomness”, the act of separating good from bad (or “good” from “evil”, as commonly used in biblical parlance) is a much harder task. Unlike in the “World of Law-of-Nature”, where science assists us in forming clear distinction and separation between the good and the bad, we do not easily, clearly and immediately differentiate between the two in the “World of Randomness”. Let us demonstrate with a simple example. I am selling a used car, aware that the car carries a certain defect. I can inform the buyer about it or I can inform her not. In the latter case, the thinking goes like this: “I have allowed the buyer to inspect and check the car thoroughly, have I not? However, the defect was not exposed. It is the buyer’s responsibility to identify the defect, not mine, is it not?”. Such thinking testifies to the daily blurring, in the “World of Randomness”, of “good” and “bad” (or “good” and “evil”, in biblical terms). Therefore, Jewish Torah explicitly instructs: “Thou shalt not curse the deaf, nor GIVE a stumbling block to the blind…” (Leviticus 19:14). In other words, one cannot hide behind an argument like the one just articulated. It is the seller’s responsibility to turn the blind into non-blind by alerting the buyer to the car’s defect.

Once we understand the act of separation in the two worlds, and grasp the role of science in assisting us separating in the “World of Law-of-Nature”, how do we separate and choose right in the “World of Randomness”?

Moses, speaking to the Children of Israel on behalf of the Divine, set to them clear separation and clear choice:

* Separation: “Behold, I have given thee this day life and the good, and death and the bad” (Deuteronomy 30:15);

* Choosing: “I call upon heaven and earth to witness this day against you that I have set before thee life and death, blessing and cursing; therefore, choose life that both thou and thy seed may live” (Deuteronomy 30:19).

Is free-will an endowment of the human species, granted to it for eternity?

Not according to Scripture. The free-will act bestowed on humankind, that of separating and choosing, has a limited life-span. It is not eternal. Time will come when God will reveal Himself and then free-will, by definition, will be no more:

“For then I will convert the peoples to a non-confounded language that they all call upon the name of Jehovah to serve him shoulder to shoulder” (Zephaniah 3:9);

“And Jehovah will be king over all the earth; on that day Jehovah will be one and his name One” (Zechariah 14:7).

Furthermore, not only the task of separating and choosing no longer be in the hands of mankind; At End-Times, the Divine will conduct a process of separation of His own; However, the separation process will not be between “Good” and “Evil” (as the latter exists in the “World of Randomness”), but rather between the righteous and the evil (who exist amidst humankind):

“I will also turn my hand against thee, and will purge away your dross as with lye and remove all thy alloy” (Isaiah 1:25);

“Therefore, thus says the Lord of hosts: Behold, I will smelt them and try them…” (Jeremiah 9:6);

“As silver is melted in the midst of the furnace, so shall you be melted in the midst of it…” (Ezekiel 22:22);

“I will bring the third part through the fire, and refine them as one refines silver and test them as one tests gold…” (Zechariah 13:9);

“But who may abide the day of his coming? and who shall stand when He appears? For He is like a refiner’s fire and like the washers’ soap; and He shall sit as a refiner and purifier of silver…” (Malachi 3:2);

“Many will be purged, and purified and refined…” (Daniel 12:10).

 

 

 

 

Categories
General Statistical Applications My Research in Statistics

Response Modeling Methodology — Now on Wikipedia

Response Modeling Methodology (RMM) is now on Wikipedia!! RMM is a general platform for modeling monotone convex relationships, which I have been developing over the last fifteen years (as of May, 2017),  applying it to various scientific and engineering disciplines.

A new entry about Response Modeling Methodology (RMM) has now been added to Wikipedia, with a comprehensive literature review::

Response Modeling Methodology – Haim Shore (Wikipedia).