Category: My Research in Statistics

My Research in Statistics

Engineering Implications of Semi-Repetitive Processes (4-part Series on “Wiley StatsRef: Statistics Reference Online”; Now Published)

Post author By Haim Shore
Post date February 2, 2026
No Comments on Engineering Implications of Semi-Repetitive Processes (4-part Series on “Wiley StatsRef: Statistics Reference Online”; Now Published)
Sticky post

I am please to share that my new 4-part series on semi-repetitive processes is now published (February 18, 2026).

Below, please find abstracts and links for all four parts.

Part 1: Engineering Implications of Semi-Repetitive Processes

Abstract: Process predictability may be impaired in two ways — by lack of process information and by lack of process repetitiveness (process is partially repetitive (semi-repetitive) or not repetitive at all). In this four-part series, we address statistical engineering implications of the latter, namely, how lack of complete repetitiveness affects engineering and managerial decisions, required in the analysis and design of semi-repetitive processes and in their management. In this first part, we deliver an overview of the other three parts of the series, addressing statistical engineering questions and problems this series is intended to respond to, and the adaptations needed (relative to repetitive or non-repetitive processes). In particular, we address the dual-component variation of semi-repetitive processes (second part), measuring process repetitiveness (third part) and assessing reliability of process-time predictions, as we move from repetitive to semi-repetitive to non-repetitive processes (fourth part).

Part 2: The Dual-Component Variation of Semi-Repetitive Processes

Abstarct: This is the second of a four-part series on engineering implications of semi-repetitive (SR) processes. In this part, we briefly summarize the “Random Identity Paradigm”, and in compliance with this paradigm make a distinction between two sources of variation affecting SR processes, identity/work-content instability and error. This dual-component variation affects appreciably distributions associated with SR processes. We formulate requirements for models of the dual-component variation and review examples of published models that fulfill these requirements. Adding a new requirement relating to error variation, a new model is partially developed that fulfills this requirement. The link between process repetitiveness and process predictability is addressed as preparation for the third part of this series.

Part 3: Measuring Repetitiveness of Semi-Repetitive Processes

Abstract: This is the third of a four-part series on engineering implications of semi-repetitive processes. In the fourth part, we address how process degree of repetitiveness affects its predictability. Here we explore measuring of process repetitiveness. A measure of the latter had been published, denoted Process Repetitiveness Measure (PRM). It is based on the standardized departure of the mode from the mean and is expressed in terms of the first four moments of the process distribution. Any measure that can be shown to be linearly related to PRM may obviously also serve to measure process repetitiveness. In this article, we explore two additional measures – a probability measure and one based on the coefficient of variation (CV). We show that CV is qualified for this role, having the added benefit of sparing the need to estimate third and fourth moments (known for their large standard errors). CV is appreciated both theoretically, by examining a small sample of arbitrarily selected statistical distributions, and empirically, using a database of surgery durations.

Part 4: Reliability of Process-Time Prediction for Semi-Repetitive Processes

Abstract: This is the fourth of a four-part series on “Engineering Implications of Semi-repetitive Processes”. In Part 3, we have examined and compared several candidate measures to evaluate process repetitiveness, the basis for evaluating predictability of a semi-repetitive process. In particular, we have evaluated the coefficient of variation (CV) and found it to be statistically linearly related to process repetitiveness measure (PRM), which measures process repetitiveness based on the standardized distance of the mode from the mean. In this entry, we employ CV to address how process degree of repetitiveness affects its predictability. More specifically, we formulate for semi-repetitive processes a statistical criterion by which to determine when process-time predictions cease to be acceptable due to insufficient process repetitiveness.

Tags semi-repetitive processes, technology, The Random-Identity Paradigm

My Research in Statistics

Re-defining Error within the Random Identity Paradigm (RIP; Pre-Print)

Post author By Haim Shore
Post date August 20, 2024
No Comments on Re-defining Error within the Random Identity Paradigm (RIP; Pre-Print)

I have just submitted a new paper to be considered for publication (after review).

The new paper continues an earlier popular paper:

Why the Mode Departs from the Mean (Published, Open Access)

Here is an Abstract of the new submission.

Abstract

In a recent short communication (Shore, 2024a), we have introduced a new paradigm for sources of random variation. It helps explain why the mode occasionally departs from the mean. In essence, the new paradigm states that for any perceived random variation there are at most two sources of variation —identity instability and error. When identity is stable (there is only error variation), the allied statistical distribution is symmetric (mode equals the mean). When identity is completely random (there is only identity variation, error undefinable), mode either does not exist or resides at either end-points of the distribution support. The purpose of this communication is to explore how error is re-defined, consistent with the new paradigm. A general model for random variation is developed, comprising two additive independent random variables, averaged by a repetitiveness measure. Model’s implications are probed.

Tags Random Identity Paradigm (RIP)

My Research in Statistics Podcasts (audio)

Review of an Accepted Paper (“Why the Mode Departs from the Mean”)

Post author By Haim Shore
Post date March 26, 2024
No Comments on Review of an Accepted Paper (“Why the Mode Departs from the Mean”)

My new paper (Why the Mode Departs from the Mean ) has been accepted for publication in Communications in Statistics – Theory and Methods. It is now published (Open Access; Online April, 14, 2024).

One reviewer, in particular, has captured well the significance of the new paper, and its ramifications for Statistics.

The review is given below.

Review of my last paper to CIS_March 25 2024

My Research in Statistics

Why the Mode Departs from the Mean (Published, Open Access)

Post author By Haim Shore
Post date January 31, 2024
2 Comments on Why the Mode Departs from the Mean (Published, Open Access)

I have recently authored a short communication, explaining the occasional departure of the mode from the mean in terms of the new “Random Identity Paradigm”.

This article has been accepted for publication in Communications in Statistics – Theory and Methods. It is now published (Shore, online April 2024; Open Access):

Why the mode departs from the mean—a short communication.

An insightful review of the paper, by an anonymous referee, is linked in a separate post:

Review of an Accepted Paper (“Why the Mode Departs from the Mean”)

A core concept in the explanation for the departure of the mode from the mean is the new “Random Identity Paradigm”. It is associated with new terms like Identity Stability, Identity Variation, and Identity-full/less distributions. A thorough introduction to the new paradigm, and allied terms, is delivered in Appendix A of the new paper (Shore, January 2024; Open Access):

A novel approach to modeling steady-state process-time with smooth transition from repetitive to semi-repetitive to non-repetitive (memoryless) processes. January 2024. Quality and Reliability Engineering. 40(1):220-235.

See also an earlier post, referring to the latter paper:

Why the Mode Occasionally Departs from the Mean?

Some further implications of the new paradigm are explored in my four-part series at “Wiley StatsRef: Statistics Reference Online”:

Parametric and Parameter-Free Shape Moments (Stat08459)

Comment: As of writing this comment (May 10, 2024), the new article has become the most read of all articles published in the last twelve months in the host journal (CIS – Theory and Method, with 233K annual downloads/views):

Most read articles (of articles published in the last twelve months)

General Statistical Applications My Research in Statistics

My Four-Part Mini-Series Now on Wiley StatsRef Online

Post author By Haim Shore
Post date November 28, 2023
No Comments on My Four-Part Mini-Series Now on Wiley StatsRef Online

My four-part mini-series on Statistics is now published by Wiley:

Shore Four-Part Mini-Series on: “Wiley StatsRef: Statistics Reference Online”

Here are links to all four parts (stat08456 to stat08459):

Parametric and Parameter-Free Shape Moments (Stat08459)

Asymptotic Normality and the Coefficient of Variation (Stat08458)

The Mean, Mode, Standard Deviation and Their Mutual Relationships (Stat08457)

The Effects of the Box–Cox Transformation (Stat08456)

Tags Shore Four-Part Mini-Series on Wiley

Forecasting and Monitoring of Surgery Times General Statistical Applications My Research in Statistics

Why the Mode Occasionally Departs from the Mean?

Post author By Haim Shore
Post date June 2, 2023
No Comments on Why the Mode Occasionally Departs from the Mean?

The answer to this question is detailed in a new paper, just published (Shore, 2024a; Open Access):

A novel approach to modeling steady-state process-time with smooth transition from repetitive to semi-repetitive to non-repetitive (memoryless) processes

A related post, referring to a more recent paper (Shore, 2024b; Open Access):

Why the Mode Departs from the Mean (Published, Open Access)

A Layman’s Abstract, published by Wiley, may be found here:

Layman’s Abstract for Quality and Reliability Engineering International article: A novel approach to modeling steady-state process-time with smooth transition from repetitive to semi-repetitive to non-repetitive (memoryless) processes

Enjoy and please share!!

Tags Process time modeling

My Research in Statistics

Tutorial on Response Modeling Methodology (RMM)

Post author By Haim Shore
Post date February 3, 2023
No Comments on Tutorial on Response Modeling Methodology (RMM)

I have now uploaded to YouTube my presentation of March, 2006, delivered at Auburn University (USA), in which I explain my new methodology (RMM) to model variation (random or systematic).

Recent progress in artificial intelligence (AI) has allowed enhancing the quality of the video audio so that it could be uploaded to YouTube.

Associated PowerPoint presentation, first in PowerPoint format, second in PDF format (helps preserve the correct form of the equations):

Haim Shore Seminar_ Auburn Univ_March2006

Haim Shore seminar_RMM_Auburn Univ_March 2006

Entry at Wikipedia: Response modeling methodology – Wikipedia

YouTube link:

My Research in Statistics Podcasts (audio)

Where Statistics Went Wrong Modeling Random Variation (Podcast)

Post author By Haim Shore
Post date August 21, 2022
2 Comments on Where Statistics Went Wrong Modeling Random Variation (Podcast)

To-date, within the Statistics literature, one may literally find thousands of statistical distributions.

Is this acceptable?

Or perhaps we are wrong in how we model random variation?

The related post, with references:

Where Statistics Went Wrong Modeling Random Variation

References:

My Trilogy of Articles on Surgery Times – Now Complete (Published)

Tags Professor Haim Shore Statistical Research, Random-identity Paradigm, Unification of "Objects of Enquiry"

My Research in Statistics

Where Statistics Went Wrong Modeling Random Variation

Post author By Haim Shore
Post date August 19, 2022
No Comments on Where Statistics Went Wrong Modeling Random Variation

Update: A new free-access article, published 2024 (“Why the Mode Departs from the Mean – A Short Communication“) adds a new dimension to the contents of the post below.

A model of random variation, generated by a “random variable”, is presented in Statistics in the form of a statistical distribution (like the normal or the exponential).

For example, the weight of people at a certain age is a random variable, and its observed variation may be modeled by the normal distribution; Surgery duration is a random variable, and its observed variation may, at a specified circumstance, be modeled by the exponential distribution.

In the Statistics literature, one may find statistical distributions modeling random variation directly observed in nature (as the above two examples), or random variation associated with a function of random variables (like a sample average calculated from a sample of n observations).

To-date, within the Statistics literature, one may literally find thousands of statistical distributions.

Is this acceptable?

Or perhaps we are wrong in how we model random variation?

Pursuant to a large-scale project, where I have modeled surgery times (a research effort reported in three recent publications, Shore 2020ab, 2021), I have reached certain conclusions of how random variation should be modeled as to be more truthful to reality. The new approach seems to reduce the problem of the insanely gigantic number of distributions, as currently appearing in the Statistics literature.

I have summarized these new insights in a new paper, carrying the title of the post.

The Introduction section of this paper is posted below. Underneath it, one may find a link to the entire article.

Where Statistics Went Wrong Modeling Random Variation

Introduction

The development of thousands of statistical distributions to-date is puzzling, if not bizarre. An innocent observer may wonder, how in most other branches of science the historical development shows a clear trend towards unifying the “objects of enquiry” (forces in physics; properties of materials in chemistry; human characteristics in biology), this has not taken place within the mathematical modelling of random variation? Why in Statistics, as the branch of science engaged in modeling random variation observed in nature, the number of “objects of enquiry” (statistical distributions) keeps growing?

In other words: Where has Statistics gone wrong modeling observed random variation?

Based on new insights, gained from a recent personal experience with data-based modeling of surgery time (resulting in a trilogy of published papers, Shore 2020ab, 2021), we present in this paper a new paradigm to modeling observed random variation. A fundamental insight is a new perception of how observed random variation is generated, and how it affects the form of the observed distribution. The latter is perceived to be generated not by a single source of variation (as the common concept of “random variable”, r.v., implies), but by two interacting sources of variation. One source is “Identity”, formed by “identity factors”. This source is represented in the distribution by the mode (if one exists), and it may generate identity-variation. A detailed example for this source, regarding modeling of surgery times, is presented in Shore (2020a). Another source is an interacting error, formed by “non-identity/error factors”. This source generates error variation (separate from identity variation). Combined, the two interacting sources generate the observed random variation. The random phenomenon, generating the latter, may be in two extreme states: An identity-full state (there is only error variation), and an identity-less state (identity factors become so unstable as to be indistinguishable from error factors; identity vanishes; no error can be defined). Scenarios, residing in between these two extreme states, reflect a source of variation with partial lack of identity (LoI).

The new “Random Identity Paradigm”, attributing two contributing sources to observed random variation (rather than a single one, as to date assumed), has far reaching implications to the true relationships between location, scale and shape moments. These are probed and demonstrated extensively in this paper, with numerous examples from current Statistics literature (relate, in particular, to Section 3).

In this paper, we first introduce, in Section 2, basic terms and definitions that form the skeleton for the new random-identity paradigm. Section 3 addresses implications of the new paradigm in the form of six propositions (subsection 3.1) and five predictions (presented as conjectures, subsection 3.2). The latter are empirically supported, in Section 4, with examples from the published Statistics literature. A general model for observed random variation (Shore, 2020a), bridging the gap between current models for the two extreme states (normal, for identity-full state; exponential, for the other), is reviewed in Section 5, and its properties and implications probed. Section 6 delivers some concluding comments.

A link to the complete article:

Haim Shore_RG_Where Statistics Went Wrong_Aug 31 2022 Download

Tags Haim Shore Blog, Where Statistics went wrong?

My Research in Statistics

Where Statistics Went Wrong? And Why?

Post author By Haim Shore
Post date January 26, 2022
No Comments on Where Statistics Went Wrong? And Why?

Update: A new free-access article, published 2024 (“Why the Mode Departs from the Mean – A Short Communication“) adds a new dimension to the contents of the post below.

Introduction

Is Statistics, a branch of mathematics that serves central tool to investigate nature, heading in the right direction, comparable to other branches of science that explore nature?

I believe it is not.

This belief is based on my own personal experience in a recent research project, aimed to model surgery time (separately for different subcategories of surgeries). This research effort culminated in a trilogy of published articles (Shore 2020ab, 2021). The belief is also based on my life-long experience in academia. I am professor emeritus, after forty years in academia and scores of articles published in refereed professional journals, dealing both with the theory and application of Statistics. In this post, I deliver an account of my recent personal experience with modeling surgery time, and conclusions I have derived thereof, and from my own cumulative experience in the analysis of data and in data-based modeling.

The post is minimally technical, so that a layperson, with little knowledge of basic terms in Statistics, can easily understand.

We define a random phenomenon as one associated with uncertainty, for example, “Surgery”. A random variable (r.v) is any random quantitative property defined on a random phenomenon. Examples are surgery medical outcome (Success: X=1; Failure: X=0), surgery duration (X>0) or patient’s maximum blood pressure during surgery (Max. X).

In practice, an r.v is characterized by its statistical distribution. The latter delivers the probability, P (0≤P≤1), that the random variable, X, assumes a certain value (if X is discrete), or that it will fall in a specified interval (if X is continuous). For example, the probability that surgery outcome will be a success, P_r(X=1), or the probability of surgery duration (SD) to exceed one hour, P_r(X>1).

Numerous statistical distributions have been developed over the centuries, starting with Bernoulli (1713), who derived what is now known as the binomial distribution, and Gauss (1809), deriving the “astronomer’s curve of error”, nowadays known as Gauss distribution, or the normal distribution. Good accounts of the historical development of the science of probability and statistics to its present-day appear at Britannica and Wikipedia, entry History_of_statistics.

A central part of these descriptions is, naturally, the development of the concept of statistical distribution. At first, the main source of motivation was games of chance. This later transformed into the study of errors, as we may learn from the development of the normal distribution by Gauss. In more recent years, emphasis shifted to describing random variation as observed in all disciplines of science and technology, resulting, to date, in thousands of new distributions. The scope of this ongoing research effort may be appreciated by the sheer volume of the four-volume Compendium on Statistical Distributions by Johnson and Kotz (First Edition 1969–1972, updated periodically with Balakrishnan as an additional co-author).

The development of thousands of statistical distributions over the years, up to the present, is puzzling, if not bizarre. An innocent observer may wonder, how is it that in most other branches of science, the historical development shows a clear trend towards convergence, while in modeling random variation, the most basic concept to describe processes of nature, the opposite has happened, namely, divergence?

Put in more basic terms: Why in science, in general, a continuous attempt is exercised to unify, under an umbrella of a unifying theory, the “objects of enquiry” (forces in physics; properties of materials in chemistry; human characteristics in biology), while in the mathematical modelling of random variation, this has not happened? Why in Statistics, the number of “objects of enquiry”, instead of diminishing, keeps growing?

And more succinctly: Where did Statistics go wrong? And why?

I have already had the opportunity to address this issue (the miserable state-of-the-art of modelling random variation) some years ago, when I wrote (Shore, 2015):

“ “All science is either physics or stamp collecting”. This assertion, ascribed to physicist Ernest Rutherford (the discoverer of the proton, in 1919) and quoted in Kaku (1994, p. 131), intended to convey a general sentiment that the drive to converge the five fundamental forces of nature into a unifying theory, nowadays a central theme of modern physics, represents science at its best. Furthermore, this is the right approach to the scientific investigation of nature. By contrast, at least until recently, most other scientific disciplines have engaged in taxonomy (“bug collecting” or “stamp collecting”). With “stamp collecting” the scientific inquiry is restricted to the discovery and classification of the “objects of enquiry” particular to that science, however this never culminates, as in physics, in a unifying theory, from which all these objects may be deductively derived as “special cases”. Is statistics a science in a state of “stamp collecting”?”

This question remains valid today, eight years later: Why has the science of Statistics, as a central tool to describe statistically stable random phenomena of nature, has deviated so fundamentally from the general trend at unification?

In Section 2, we enumerate the errors that, we believe, triggered this departure of Statistics from the general trend in the scientific study of nature, and outline possible outlets to eliminate these errors. Section 3 is an account of the personal learning experience that I have gone through while attempting to model surgery duration and its distribution. This article is a personal account, for the naive (non-statistician) reader, of that experience. As alluded to earlier, the research effort resulted in a trilogy of articles , and in the new “Random identity paradigm”. The latter is addressed in Section 4, where new concepts, heretofore ignored by Statistics, are introduced (based on Shore, 2022). Examples are “Random identity”, “identity variation”, “losing identity” (with characterization of the process), and “Identity-full/identity-less distributions”. These concepts are underlying a new methodology to model observed variation in natural processes (as contrasted with variation of r.v.s that are mathematical function of other r.v.s).The new methodology is outlined, based on Shore, 2022. Section 5 delivers some final thoughts and conclusions.

The historical errors embedded in current-day Statistics

Studying the history of the development of statistical distributions to date, we believe Statistics departure from the general trend, resulting in a gigantic number of “objects of enquiry” (as alluded to earlier), may be traced to three fundamental, inter-related, errors, historically committed within Statistics:

Error 1: Failure to distinguish between two categories of statistical distributions:

Category A: Distributions that describe observed random variation of natural processes;

Category B: Distributions that describe behavior of statistics, namely, of random variables that are, by definition, mathematical functions of other random variables.

The difference between the two categories is simple: Category A succumbs to certain constraints on the shape of distribution, imposed by nature, which Category B does not (the latter succumbs to other constraints, imposed by the structure of the mathematical function, describing the r.v). As we shall soon realize, a major distinction between the two sets of constraints (not the only one) is the permissible values for skewness and kurtosis. While for Category A, these fluctuate in a specified interval, confined between values of an identity-full distribution and an identity-less distribution (like the normal and the exponential, respectively; both types of distribution shall be explained soon), for Category B such constraints do not hold.

Error 2: Ignoring the real nature of error:

A necessary condition for the existence of an error, indeed a basic assumption integrated implicitly into its classic definition, is that for any observed random phenomenon, and the allied r.v, there is a typical constant, an outcome of various factors inherent to the process/system (“internal factors”), and there is error (multiplicative or additive), generated by random factors external to the system/process (“external factors”). This perception of error allows its distribution to be represented by the normal, since the latter is the only one having mean/mode (supposedly determined by “internal factors”) disconnected from the standard deviation, STD (supposedly determined by a separate set of factors, “external factors”).

A good representative of the constant, relative to which error is defined, is the raw mode or the standardized mode (raw mode divided by the STD). As perceived today, the error indeed expresses random deviations from this characteristic value (the most frequently observed value).

What happens to the error, when the mode itself ceases to be constant and becomes random? How does this affect the observed random variation or, more specifically, how is error then defined and modelled?

Statistics does not provide an answer to this quandary, except for stating that varying “internal factors”, namely, non-constant system/process factors, may produce systematic variation, and the latter may be captured and integrated into a model for variation, for example, via regression models (linear regression, nonlinear regression, generalized linear models and the like). In this case, the model ceases to represent purely random variation (as univariate statistical distributions are supposed to do). It becomes a model for systematic variation, coupled with a component of random variation (the nature of the latter may be studied by “freezing” “internal factors” at specified values). It is generally assumed in such models that a single distribution represents the component of random variation, though possibly with different parameters’ values for different values of the systematic effects, integrated into the model. Thus, implementing generalized linear models, the user is requested to specify a single distribution (not several), valid for all different sets of the effects’ values. As we shall soon learn (Error 3), “internal factors” may produce not only systematic effects, as currently wrongly assumed, but also a different component of variation, unrecognized to date. It will be addressed next as the third error.

Error 3: Failure to recognize the existence of a third type of variation (additional to random and systematic) — “Identity variation”:

System/process factors may potentially produce not only systematic variation, as currently commonly assumed, but also a third component of variation, passed under the radar, so to speak, in the science of Statistics. Ignoring this type of variation is the third historic error of Statistics. For reasons to be described soon (Sections 3 and 4), we denote this unrecognized type of variation — “Identity variation”.

Modeling surgery duration — Personal learning experience that resulted in the new “Random identity paradigm”

I have not realized the enormity of the consequences of the above three errors, committed within Statistics to date, until a few years ago, when I have embarked on a comprehensive research effort to model the statistical distribution of surgery duration (SD), separately for each of over a hundred medically-specified subcategories of surgeries (the latter defined according to a universally accepted standard; find details in Shore 2020a). The subject (modeling SD distribution) was not new to me. I had been engaged in a similar effort years ago, in the eighties of the previous century (Shore, 1986). Then, based on analysis of available data and given the computing facilities available at the time, I divided all surgeries (except open-heart surgeries and neurosurgeries), into two broad groups: short surgeries, which were assumed to be normally distributed, and long surgeries, assumed to be exponential. There, for the first time, I have become aware of “Identity variation”, though not so defined, which resulted in modeling SD distribution differently for short surgeries (assumed to pursue a normal distribution) and long ones (assumed to be exponential). With modern available computing means, and with my own cumulative experience since publication of that paper (Shore, 1986), I thought, and felt, that a better model may be conceived, and embarked on the new project.

Probing into the available data (about ten thousand surgery times with affiliated surgery subcategories), four insights/observations were apparent:

1. It was obvious to me that different subcategories pursue different statistical distributions, beyond just differences in values of distribution’s parameters (as currently generally assumed in modeling SD distribution);

2. Given point (1), it was obvious to me that differences in distribution between subcategories should be attributed to differences in the characteristic level of work-content instability (work-content variation between surgeries within subcategory);

3. Given points (1) and (2), it was obvious to me that this instability cannot be attributed to systematic variation. Indeed, it represents a different type of variation, “identity variation”, to-date unrecognized in the Statistics literature (as alluded to earlier);

4. Given points (1) to (3), it was obvious to me that any general model of surgery time (SD) should include the normal and the exponential as exact special cases.

For the naive reader, I will explain the new concept, “identity variation”. Understanding this concept will render all of the above insights clearer.

As an industrial engineer in profession, it was obvious to me, right from the beginning of the research project, that, ignoring negligible systematic effects caused by covariates (like the surgeon performing the operation), a model for SD, representing only random variation in its classical sense, would not be adequate to deliver proper representation to the observed variation. Changes between subcategories in the type of distribution, as revealed by changes in distribution shape (from the symmetric shape of the normal to the extremely non-symmetric of the exponential, as first noticed by me in the earlier project, Shore, 1986), these changes have made it abundantly clear that the desired SD model should account for “identity loss”, occurring as we move from a repetitive process (subcategory with repetitive surgeries, having characteristic/constant work-content) to a memory-less non-repetitive process (subcategory with surgeries having no characteristic common work-content). As such, the SD model should include, as exact special cases, the exponential and the normal distributions.

What else do we know of the process of losing identity, as we move from the normal to the exponential, which account for “identity variation”?

In fact, several changes in distribution properties accompany “identity loss”. We relate again to surgeries. As work processes in general, surgeries too may be divided into three non-overlapping and exhaustive set of groups: repetitive, semi-repetitive and non-repetitive. In terms of work-content, this implies:

Work-processes with constant work-content (only error generates variation; SD normally distributed);
Semi-repetitive work-processes (work-content varies somewhat between surgeries, to a degree dependent on subcategory);
Memory-less work-processes (no characteristic work-content; For example, surgeries performed within an emergency room for all types of emergency, or service performed in a pharmacy, serving customers with varying number of items on the prescription list).

Thus, work-content, however it is defined (find an example in Shore, 2020a), forms “surgery identity”, with a characteristic value, the mode, that vanishes (becomes zero) for the exponential scenario (non-repetitive work-process).

Let us delve somewhat deeper into the claim that a model for SD should include the normal and the exponential as exact special cases (not merely asymptotically, as, for example, the gamma tends to the normal).

There are four observations/properties, which put the two distributions, the identity-full normal and the identity-less exponential, apart from other distributions:

Observation 1: The mean and standard deviation are represented by different parameters for the normal distribution, and by a single parameter for the exponential. This difference is reflection of a reality, where, in the normal scenario, a set of process/system factors (“internal factors”) produces signal only, and a separate set (“external factors”) produces noise only (traditionally modelled as a zero-mean symmetrically distributed error). Moving away from the normal scenario to the exponential scenario, we witness a transition towards merging of the mean with the standard deviation, until, in the exponential scenario, both signal and noise are produced by the same set of factors — the mean and standard deviation merge to be expressed by a single parameter. The clear distinction, between “system/process factors” and “external/error factors”, typical to the normal scenario, this distinction has utterly vanished;

Observation 2: The mode, supposedly representing the typical constant on which the classical multiplicative error is defined in the normal scenario, this mode, or rather the standardized mode, shrinks, as we move away from the normal to the exponential. This movement, in reality, represents passing through semi-repetitive work-processes, with increasing degree of work-content instability. The standardized mode finally disappears (becoming zero) in the exponential scenario. What does this signify? What are the implications?

Observation 3: For both the normal and the exponential, skewness and kurtosis are non-parametric. Why is that, and what does this signify?

Observation 4: What happens to the classic error, when the r.v moves away from the normal scenario to the exponential? Can we still hold on to the classic definition of error, given that “internal factors”, assumed to generate a constant mode (signal), these factors start to produce noise? How would then error (in its classical sense) be re-specified? Can an error be defined at all?

All these considerations, as well as the need to include semi-repetitive surgeries within the desired model, brought me to the realization that we encounter here a different type of variation, heretofore unrecognized and not addressed in the literature. The instability of work-content (within subcategory), which I have traced to be the culprit for change in distribution as we move from one subcategory to another, could not possibly be regarded as cause for systematic variation. The latter is never assumed to change the shape of distribution, only at most its first two moments (mean and variance). This is evident, for example, on implementing generalized linear models, a regression methodology frequently used to model systematic variation in a non-normal environment. The user is requested to specify a single distribution (normal or otherwise), never different distributions for different sets of values of the effects being modeled (supposed delivering systematic variation). Neither can work-content variation be considered part of the classic random variation (as realized in Category A distributions) since the latter assumes existence of a single non-zero mode (for a single non-mixture univariate distribution), not zero mode or multiple modes (as, for example, with the identity-less exponential (zero mode), its allied Poisson distribution (two modes for an integer parameter), or the identity-less uniform (infinite number of modes); find details in Shore, 2022).

A new paradigm was born out of these deliberations — the “Random identity paradigm”. Under the new paradigm, observed non-systematic variation is assumed to originate in two components of variation: random variation, represented by a multiplicative normal/lognormal error, and identity variation, represented by an extended exponential distribution. A detailed technical development of this methodology, allied conjectures and their empirical support (from known theory-based results) are given in Shore (2022; A link to a pre-print is given at the References section). In the next Section 4 we deliver an outline of the “Random identity paradigm”.

The “Random identity paradigm” — “Random identity”, “Identity variation”, “identity loss”, “identity-full/identity-less distributions” (based on Shore, 2022)

The insights, detailed earlier, have led to the development of the new “Random identity paradigm”, and its allied explanatory two-variate model for SD (Shore, 2020a). The model was designed to fulfill an a-priori specified set of requirements. Central among these is that the model includes the normal and the exponential distributions as exact special cases. After implementing the new model for various applications (as alluded to earlier), we have arrived at the realization that the model used in the article may, in fact, be expanded to introduce a new type of random variation, “random identity variation”, which served the basis for the new “Random Identity Paradigm” (Shore, 2022).

A major outcome of the new paradigm is the definition of two new types of distributions, an identity-full distribution and an identity-less distribution, and a criterion to diagnose a given distribution as “identity-full”, “identity-less”, or in between. Properties of identity-less and identity-full distributions are described, in particular, the property that such distributions have non-parametric skewness and kurtosis, namely, both types of distribution assume constant values, irrespective of values assumed by distribution parameters. Another requirement, naturally, is that the desired model includes a component of “identity variation”. However, the requirement also specifies that the allied distribution (representing “identity variation”) have support with the mode, if it exists, as its extreme left point (detailed explanation is given in Shore, 2022). As shown in Shore (2020ab, 2021, 2022), this resulted in defining the exponential distribution anew (the extended exponential distribution), adding a parameter, α, that assumes a value of α=0, for the exponential scenario (error STD becomes zero), and a value tending to 1, as the model moves towards normality (with “identity variation”, expressed in the extended exponential by parameter σ_i, tending to zero).

Sparing the naive reader the technical details of the complete picture, conveyed by the new “Random identity paradigm” (Shore, 2022), we outline herewith the associated model, as used in the trilogy of published paper.

The basic model is given in eq. (1):

Haim Shore_Equations_The Problem with Statistics_January 26 2022

where R is the observed response (an r.v), L and S are location and scale parameters, respectively, Y is the standardized response (L=0, S=1), {Y_i,Y_e} are independent r.v.s representing internal/identity variation and external/error variation, respectively, ε is zero-mode normal disturbance (error) with standard deviation σε and Z is standard normal. The density function of the distribution of Y_i in this model (the extended exponential) is eq. (2), where Y_i is the component representing “identity variation” (caused by variation of system/process factors, “Internal factors”), C_Yi is a normalizing coefficient, and σ_i is a parameter representing internal/identity variation. It is easy to realize that α is the mode. At α=1, Y_i becomes left-truncated normal (re-located half normal). However, it is assumed that at α=1 “identity variation” vanishes, so Y_i becomes a constant, equal to the mode (1). For the exponential scenario (complete loss of identity), we obtain α=0, and the disturbance, assumed to be multiplicative, renders meaningless, namely, it vanishes (σ_e=0, Y_e=1). Therefore, Y_i and Y then both become exponential.

Let us introduce eq. (3). From (2), we obtain the pdf of Z_i: (eqs. (4) and (5)). Note that the mode of Z_i is zero (mode of Y_i is α).

Various theorems and conjectures are articulated in Shore (2022), which deliver eye-opening insights into various regularities in the behavior of statistical distributions, previously un-noticed, and good explanation to various statistical theoretical results, heretofore considered separate and unrelated (like a logical derivation of the Central Limit Theorem from the “Random identity paradigm”).

Conclusions

In this article, I have reported about my personal experience, which led me to the development of the new “Random identity paradigm” and allied concepts. It followed my research effort to model surgery duration, which resulted in a bi-variate explanatory model, with the extended exponential distribution as the intermediate tool, that paved a smooth way to unify, under a single umbrella model, execution times of all types of work processes/surgeries, namely, not only repetitive (normal), or non-repetitive (exponential), but also those in between (semi-repetitive processes/surgeries). To date, we are not aware of a similar unifying model that is as capable in unifying diverse phenomena as the three categories of work-processes/surgeries. Furthermore, this modeling effort has led directly to conceiving the new “Random identity paradigm” with allied new concepts (as alluded to earlier).

The new paradigm has produced three major outcomes:

First, as demonstrated in the linked pre-print, under the new paradigm virtually scores of theoretical statistical results that have formerly been derived independently and considered unrelated, are explained in a consistent and coherent manner, becoming inter-related under the unifying “Random identity paradigm”.

Secondly, various conjectures about properties of distributions are empirically verified with scores of examples/predictions from the Statistics literature. For example, the conjectures that Category B r.v.s, which are function of only identity-less r.v.s, are also identity-less, and similarly for identity-full r.v.s.

Thirdly, the new bi-variate model has been demonstrated to represent well numerous existent distributions, as has been shown for diversely-shaped distributions in Shore, 2020a (see Supplementary Materials therein).

It is hoped that the new “Random identity paradigm”, representing an initial effort at unifying distributions of natural processes (Category A distributions), this new paradigm may pave the way for Statistics to join other branches of science in a common effort to reduce, via unification mediated by unifying theories, the number of statistical distributions, the “objects of enquiry” of modeling random-variation within the science/branch-of-mathematics of Statistics.

References

[1] Shore H (1986). An approximation for the inverse distribution function of a combination of random variables, with an application to operating theatres. Journal of Statistical Computation and Simulation, 23:157-181. DOI: 10.1080/00949658608810870 .

[2] Shore H (2015). A General Model of Random Variation. Communication in Statistics- Theory and Methods, 49(9):1819-1841. DOI: 10.1080/03610926.2013.784990.

[3] Shore H (2020a). An explanatory bi-variate model for surgery-duration and its empirical validation. Communications in Statistics: Case Studies, Data Analysis and Applications, 6(2):142-166. Published online: 07 May 2020. DOI: 10.1080/23737484.2020.1740066

[4] Shore H (2020b). SPC scheme to monitor surgery duration. Quality and Reliability Engineering International. Published on line 03 December 2020. DOI: 10.1002/qre.2813

[5] Shore H (2021). Estimating operating room utilisation rate for differently distributed surgery times. International Journal of Production Research. Published on line 13 Dec 2021. DOI: 10.1080/00207543.2021.2009141.

[6] Shore H (2022). “When an error ceases to be error” — On the process of merging the mean with the standard deviation and the vanishing of the mode. Preprint.

Haim Shore_Blog_Merging of Mean with STD and Vanishing of Mode_Jan 07 2022

Tags Errors in modeling random variation, Haim Shore Blog, Numerous number of distributions, Where Statistics went wrong?

General Statistical Applications My Research in Statistics

How to Use Standard Deviations in Weighting Averages?

Post author By Haim Shore
Post date September 17, 2019
No Comments on How to Use Standard Deviations in Weighting Averages?

We wish to calculate a weighted average of a set of sample averages, given their standard deviations. How do we do that?

The objective is to find a weighting factor, alpha, that minimizes the variance of the weighted average, namely (for two averages):

Minimum { Variance[ (α)Average1 + (1-α)Average2 ] }

We first calculate the variance to obtain (Var is short for Variance; samples for averages assumed independent):

Variance[ (α)Average1 + (1-α)Average2 ] =

= α²Var(Average1) + (1-α)²Var(Average2) .

Differentiating with respect to alpha and equating to zero, we obtain:

(2α)Var(Average 1) – 2(1-α)Var(Average 2) = 0, and the optimal alpha is:

α* = var(Average 2) / [ var(Average1) + var(Average2) ] ,

where: var(Average)= variance/n, with n a sample size.

We may wish to adapt this reply to specific needs. For example, for three averages we have:

Variance[ (α₁)Average1 + (α₂)Average2 + (1-α₁-α₂)Average3 ] =

= α₁²Var(Average1) + α₂²Var(Average2) + (1-α₁-α₂)² Var(Average3)

To minimize this expression, we differentiate twice, with respect to α₁ and to α₂. Equating to zero we obtain two linear equations in two unknowns that may be easily identified:

(2α₁)Var(Average1) – 2(1-α₁-α₂)Var(Average3) = 0,

(2α₂)Var(Average2) – 2(1-α₁-α₂)Var(Average3) = 0,

or:

α₁= v₃ / [v₁ + v₃ + (v₁v₃)/v₂]

α₂= v₃ / [v₂ + v₃ + (v₂v₃)/v₁]

where v_i is Var(Average i) (i=1,2,3).

Since “in general, a system with the same number of equations and unknowns has a single unique solution” (Wikipedia, “System of linear equations”), extension to a higher number of averages (m>3), is straightforward, requiring solving a system of m-1 linear equations with m-1 unknowns.

(This post appears also on my personal page at ResearchGate)

Tags Weighting averages by standard deviations

General Statistical Applications My Research in Statistics

Response Modeling Methodology Explained by Developer

Post author By Haim Shore
Post date April 14, 2019
No Comments on Response Modeling Methodology Explained by Developer

Professor Haim Shore Lecture on RMM (Response Modeling Methodology), delivered at Department of Industrial and Systems Engineering, Samuel Ginn College of Engineering, Auburn University, USA; March 6 2006.

Comprehensive literature review may be found on Wikipedia:

Wikipedia: Response Modeling Methodology

Links to published articles about RMM on ResearchGate:

Haim Shore_ResearchGate Page_Response Modeling Methodology (RMM)_

PowerPoint Presentation:Shore_Seminar_Auburn-Univ_March 2006

PowerPoint Presentation:Shore_Seminar_Auburn-Univ_March 2006_2

Tags Distribution fitting, Model's Error Structure, Modeling Non-linear Relationships, Non-linear Modeling, Response Modeling Methodology

General Statistical Applications My Research in Statistics

Response Modeling Methodology — Now on Wikipedia

Post author By Haim Shore
Post date May 15, 2017
No Comments on Response Modeling Methodology — Now on Wikipedia

Response Modeling Methodology (RMM) is now on Wikipedia!! RMM is a general platform for modeling monotone convex relationships, which I have been developing over the last fifteen years (as of May, 2017), applying it to various scientific and engineering disciplines.

A new entry about Response Modeling Methodology (RMM) has now been added to Wikipedia, with a comprehensive literature review::

Response Modeling Methodology – Haim Shore (Wikipedia).

Tags Continuous Monotone Convexity, Haim Shore Blog, Response Modeling Methodology (RMM), RMM, פרופסור חיים שור

My Research in Statistics

New 3-Minute Trailer About “Math Unveils the Truth” by Oren Evron

Post author By Haim Shore
Post date September 14, 2016
No Comments on New 3-Minute Trailer About “Math Unveils the Truth” by Oren Evron

A new provocative and interesting trailer by Mr. Oren Evron, about his widely viewed one-hour movie:

“The Torah – Math Unveils the Truth”:

Oren Evron – “Math Unveils the Truth” (3-minute Trailer)

With Hebrew subtitles:

עם כתוביות בעברית:

Oren Evron – “Math Unveils the Truth” (3-minute Trailer; Hebrew subtitles)

Forecasting and Monitoring of Surgery Times General Statistical Applications My Research in Statistics

The Universal Distribution

Post author By Haim Shore
Post date February 3, 2015
No Comments on The Universal Distribution

Since studying as an undergraduate student at the Technion (Israel Institute of Technology) and learning, for the first time in my life, that randomness too has its own laws (in the form of statistical distributions, amongst others), I have become extremely appreciative of the ingenuity of the concept of statistical distribution. The sheer combining of randomness with laws, formulated in the language of mathematics not unlike any other branch of the exact sciences, fascinated me considerably, young man that I was at the time.

That admiration has all since evaporated as I have become increasingly aware of the gigantic number of statistical distributions, defined and used within the science of statistics to describe random behavior, either of real-world phenomena or of sample-statistics embedded in statistical-analysis procedures (like hypothesis testing). I realized that unlike with modern-day physics, engaged to this day in the unification of the basic forces of nature, the science of statistics has failed to carry out similar attempts at unification. What the latter implies for me is derivation of a single universal distribution, relative to which all current distributions might be regarded as statistically insignificant random deviations (not unlike a sample average is a random deviation from the population mean). Such unification has never materialized, or even been attempted or debated, within the science of statistics.

Personally, I attribute this failure at unification to the fact that current foundations of statistics, with its basic concepts like probability function, probability density function (pdf) or distribution function (often denoted cumulative density function, or CDF), have been established back in the eighteenth century to derive various early-day distributions. These foundations have not been challenged ever since. Some well-known mathematicians of the time, like Jacob and Daniel Bernoulli, Abraham de Moivre, Carl Friedrich Gauss, Pierre-Simon Laplace and Joseph Louis Lagrange have all used those basic terms of statistics to derive specific distributions. However, the basic tenets underlying formation of those mathematical models of random variation have not been challenged to this day. Central amongst these tenets is the belief that random phenomena, with their associated properly-defined random variables, have each its own specific distribution. That tenet remained intact and unchallenged to this day. Consequently, no serious attempt at unification has ever become the core objective of the science of statistics. Furthermore, no discussion of how to proceed in the pursuit of the “universal distribution” has ever been conducted.

My sentiment about the feasibility of revolutionizing the concept of statistical distribution and deriving a universal distribution, relative to which all current distributions may be regarded as random deviations, has changed dramatically with the introduction of a new non-linear modeling approach, denoted Response Modeling Methodology, RMM). I have developed RMM back in the closing years of the previous century (Shore, 2005, and references therein), and only some years later I realized that the “Continuous Monotone Convexity (CMC)” property, part and parcel of RMM, could serve to derive the universal distribution, in the sense described in the previous paragraph. (Read about the CMC property in another post in this blog).

The results of the new realization are two articles (Shore 2015, 2017), one of which has already been published and the second currently under review (see references here).

More recently, I have reached new insights regarding the “Universal Distribution”, the result of ongoing research on predicting and statistical control of surgery time. This research effort has produced the new “Random Identity Paradigm”, described and explained in various published resources. Some of these are detailed below (for others refer to references therein):

Novel approach to model process time_Haim Shore (January 2024, Free Access)

Why the mode departs from the mean — a short communication (CIS, Free Access).

Why the Mode Departs from the Mean (Post on this blog)

My Four-Part Mini-Series Now on Wiley StatsRef Online

Modeling and Forecasting Surgey-Time (Post on this blog)

Tags History of Statistics, Random-identity Paradigm, Response Modeling Methodology (RMM), Universal Distribution

My Research in Statistics

‎”What is the significance of the significance level?”‎

Post author By Haim Shore
Post date March 13, 2014
1 Comment on ‎”What is the significance of the significance level?”‎

This post delivers an in-depth analysis of the significance of the statistical term Significance Level (in response to an article in Significance Journal).

In a focus article that has appeared in Significance magazine (October, 2013), the author Mark Kelly delivers an excellent review of what “luminaries have to say” regarding the proper significance level to use in statistical hypothesis testing. The author thence concludes:

“No one therefore has come up with an objective statistically based reasoning behind choosing the now ubiquitous 5% level, although there are objective reasons for levels above and below it. And no one is forcing us to choose 5% either.”

In a response article, sent to the editor of Significance, Julian Champkin, I have made the point that, unlike the claim made in the original article, there is an obvious method to determine objectively the optimal statistical significance level. While the editor accepted my article, he declined to include the detailed numerical example therein since “Your illustration, though, is a little too technical for some of our readers – we have many who are not statisticians, and we try to keep heavy maths to a minimum in the magazine.”

In a further (unanswered) e-mail to the editor, I have suggested a solution to the editor’s concern and stated that “Personally I feel that there are many practitioners out there who could benefit from this simple practical example and get aware that engineering considerations are part and parcel of hypothesis testing in an engineering environment. I often feel that these engineers are somewhat neglected in the statistics literature in favor of pure science.”

Based on my own experience of over thirty years of academic teaching to industrial engineering undergraduates, I feel that it is important that individuals working in an engineering environment understand that the view point expressed in Kelly’s article in the Significance magazine, which is quite prevalent, is not accurate in all circumstances.

With this in mind, the originally submitted article, titled:

“What is the significance of the significance level?” ‎“It’s the error costs, stupid!”‎

is linked below:

Haim Shore_What is the significance of the significance level_Response to Significance_March 2014

Tags Error costs, Haim Shore Blog, Professor Haim Shore, Significance Magazine, Significance of the Significance Level, פרופסור חיים שור

My Research in Statistics

Statistics and “Stamp Collecting”

Post author By Haim Shore
Post date February 17, 2014
2 Comments on Statistics and “Stamp Collecting”

What is the linkage between the science of statistics and “Stamp Collecting”? More than you can imagine.. This blog entry (with the linked article and PP presentation) was originally posted, for a restricted time period, on the Community Blog of the American Statistical Association (ASA), where the linked items were visible to members only. The blog entry is now displayed, with the linked items, visible to all.

This is the fourth and last message in this series about the consequences to statistical modeling of the continuous monotone convexity (CMC) property. The new message discusses implications of the CMC property to modeling random variation.

As a departure point for this discussion, some historic perspective about the development of the principle of unification in human perception of nature can be useful.

Our ancestors believed in a multiplicity of gods. All phenomena of nature had their particular gods and various manifestations of same phenomenon were indeed different displays of wishes, desires and emotions of the relevant god. Thus, Prometheus was a deity who gave fire to the human race and for that was punished by Zeus, the king of the gods; Poseidon was the god of the seas; and Eros was the god of desire and attraction.

This convenient “explanation” for the diversity of nature phenomena had all but disappeared with the advent of monotheism. Under the “umbrella” of a single god, ancient gods were “deleted”, to be replaced by a “unified” and “unifying” almighty god, the source of all nature phenomena.

And the three major monotheistic religions had been born.

The “concept” of unification, however, did not stop there. It was migrated to science, where pioneering giants of modern scientific thinking observed diverse phenomena of nature and had attempted to unify them into an all-encompassing mathematics-based theory, from which the separate phenomena could be deduced as special cases. Some of the most well-known representatives of this mammoth shift in human thinking, in those early stages of modern science, were Copernicus (1473-1543), Johannes Kepler (1571-1630), Galileo Galilei (1564-1642) and Isaac Newton (1642-1727).

In particular, the science of physics had been at the forefront of these early attempts to pursue the basic concept of unity in the realm of science. Ernest Rutherford (1871–1937), known as the father of nuclear physics and the discoverer of the proton (in 1919), made the following observation at the time:

“All science is either physics or stamp collecting”.

The assertion, quoted in Kaku (1994, p. 131), intended to convey a general sentiment that the drive to converge the five fundamental forces of nature into a unifying theory, nowadays a central theme of modern physics, represented science at its best. Furthermore, this is the only correct approach to the scientific investigation of nature. By contrast, at least until recently, most other scientific disciplines have engaged in taxonomy (“bug collecting” or “stamp collecting”). With “stamp collecting” the scientific inquiry is restricted to the discovery and classification of the “objects of enquiry”, particular to that science. However, this never culminates, as in physics, in a unifying theory from which all these objects may be deductively derived as “special cases”.

Is statistics a science of “stamp collecting”?

Observing the abundance of statistical distributions, identified to-date, an unavoidable conclusion is that statistics is indeed a science engaged in “stamp collecting”. Furthermore, serious attempts at unification (partial, at least) are rarely reported in the literature.

In a recent article (Shore, 2015), I have attempted a new paradigm for modeling random variation. The new paradigm, so I believe, may constitute an initial effort to unite all distributions under a unified “umbrella distribution”. In the new paradigm, the “Continuous Monotone Convexity (CMC)” property plays a central role in deriving a general expression to the normal-based quantile function of a generic random variable (assuming a single mode and a non-mixture distribution). Employing numeric fitting to current distributions, the new model has been shown to deliver accurate representation to scores of differently-shaped distributions (including some suggested by anonymous reviewers). Furthermore, negligible deviations from the fitted general model may be attributed to the natural imperfection of the fitting procedure or being perceived as realization of random variation around the fitted general model, not unlike a sample average is a random deviation from the population mean.

In a more recent effort (Shore, 2017), a new paradigm for modeling random variation is introduced and validated via certain predictions about known “statistical facts” (like the Central Limit Theorem), shown to be empirically true, and via distribution fitting, via 5-moment matching procedure, to a sample of known distributions.

These topics and others are addressed extensively in the afore-cited new article. It is my judgment that at present the CMC property constitutes the only possible avenue for achieving in statistics (as in most other modern branches of science) unification of the “objects of enquiry”, as these relate to modeling random variation.

In the affiliated Article #4 , I introduce in a more comprehensive fashion (yet minimally technical) an outline of the new paradigm and elaborate on how the CMC property is employed to arrive at a “general model of random variation”. A related PowerPoint presentation, delivered last summer at a conference in Michigan, is also displayed.

Haim Shore_4_ASA_Feb 2014

Haim Shore_4_ASA_PP Presentation_Feb 2014

References

[1] Kaku M (1994). Hyperspace- A Scientific Odyssey Through Parallel Universes, Time Warps and the Tenth Dimension. Book. Oxford University Press Inc., NY.

[2] Shore, H. (2015). A General Model of Random Variation. Communications in Statistics – Theory and Methods 44 (9): 1819-1841.

[3] Shore, H. (2017). The Fundamental Process by which Random Variation is Generated. Under review.

Tags Box-Cox transformation, Distribution fitting, General model of random variation, Haim Shore Blog, Professor Haim Shore, Response Modeling Methodology (RMM)

My Research in Statistics

CMC-Based Modeling — the Approach and Its Performance Evaluation

Post author By Haim Shore
Post date January 27, 2014
No Comments on CMC-Based Modeling — the Approach and Its Performance Evaluation

This post explains the central role of Continuous Monotone Convexity (CMC) in Response Modeling Methodology (RMM).

In earlier blog entries, the unique effectiveness of the Box-Cox transformation (BCT) was addressed. I concluded that the BCT effectiveness could probably be attributed to the Continuous Monotone Convexity (CMC) property, unique to the inverse BCT (IBCT). Rather than requiring the analyst to specify a model in advance (prior to analysis), the CMC property allows the data, via parameter estimation, determine the final form of the model (linear, power or exponential). This would most likely lead to better fit of the—estimated model, as cumulative reported experience with implementation of IBCT (or BCT) clearly attest to.

In the most recent blog entry in this series, I have introduced the “Ladder of Monotone Convex Functions”, and have demonstrated that IBCT delivers only the first three “steps” of the Ladder. Furthermore, IBCT can be extended so that a single general model can represent all monotone convex functions belonging to the Ladder. This transforms monotone convexity into a continuous spectrum so that the discrete “steps” of the Ladder (the separate models) become mere points on that spectrum.

In this third entry on the subject (and Article #3, linked below), I introduce in a more comprehensive fashion (yet minimally technical) the general model from which all the Ladder functions can be derived as special cases. This model was initially conceived in the last years of the previous century (Shore, 2005, and references therein) and had since been developed into a comprehensive modeling approach, denoted Response Modeling Methodology (RMM). In the affiliated article, an axiomatic derivation of RMM basic model is outlined and specific adaptations of RMM to model systematic variation and to model random variation are addressed. Published evidence for the capability of RMM to replace current published models, previously derived within various scientific and engineering disciplines as either theoretical, empirical or semi-empirical models, is reviewed. Disciplines surveyed include chemical engineering, software quality engineering, process capability analysis, ecology and ultra-sound-based fetal-growth modeling (based on cross-sectional data).

This blog entry (with the linked article given below) was originally posted on the site of the American Statistical Association (ASA), where the linked article was visible to members only.

Haim Shore_3_ASA_Jan 2014

Tags Box-Cox transformation, Continuous Monotone Convexity, Haim Shore Blog, normalizing transformation, Professor Haim Shore, Response Modeling Methodology (RMM)

My Research in Statistics

The “Continuous Monotone Convexity (CMC)” Property and its ‎Implications to Statistical Modeling

Post author By Haim Shore
Post date December 26, 2013
No Comments on The “Continuous Monotone Convexity (CMC)” Property and its ‎Implications to Statistical Modeling

In a previous post in this series, I have discussed reasons for the effectiveness of the Box-Cox (BC) transformation, particularly when applied to a response variable within linear regression analysis. The final conclusion was that this effectiveness could probably be attributed to the “Continuous Monotone Convexity (CMC)” property, owned by the inverse BC transformation. It was emphasized that the latter, comprising the three most fundamental monotone convex functions, the “linear-power-exponential” trio, delivers only partial representation to a whole host of models of monotone convex relationships, which can be arranged in a hierarchy of monotone convexity. This hierarchy had been denoted the “Ladder of Monotone Convex Functions.”

In this post (and Article #2, linked below), I address in more detail the nature of the CMC property. I specify models included in the Ladder, and show how one can deliver, via a single model, representation to all models belonging to the Ladder (analogously with the inverse BC transformation, a special case of that model). Furthermore, I point to published evidence demonstrating that models of the Ladder may often substitute, with negligible loss in accuracy, published models of monotone convexity, which had been derived from theoretical discipline-specific considerations.

This blog entry (with the linked article given below) was originally posted on the site of the American Statistical Association (ASA), where the linked article was visible to members only.

Haim Shore_2_ASA_Dec 2013

Tags Box-Cox transformation, Continuous Monotone Convexity, Haim Shore Blog, normalizing transformation, Professor Haim Shore, Response Modeling Methodology (RMM)

My Research in Statistics

Why is Box-Cox transformation so effective?

Post author By Haim Shore
Post date December 2, 2013
No Comments on Why is Box-Cox transformation so effective?

Comment: Read my latest peer-reviewed article on the subject (2023): 10.1002/9781118445112.stat08456

The Box-Cox transformation and why is it so effective has intrigued my curiosity for many years. I have had the opportunity to talk both to Box and to Cox about their transformation (Box and Cox, 1964).

I conversed with the late George Box (deceased last March at age 94) when I was a visitor in Madison, Wisconsin, back in 1993-4.

A few years later I talked to David Cox at a conference on reliability in Bordeaux (MMR’2000).

I asked them both the same question, I received the same response.

The question was: What was the theory that led to the derivation of the Box-Cox transformation?

The answer was: “No theory. This was a purely empirical observation”.

The question therefore remains: Why is the Box-Cox transformation so effective, in particular when applied to a response variable in the framework of linear regression analysis?

In a new article, posted in my personal library at the American Statistical Association (ASA) site, I discuss this issue at some length. The article is now generally available for download here (Article #1 below).

Haim Shore_1_ASA_Nov 2013

Tags Box-Cox transformation, Haim Shore Blog, normalizing transformation, Professor Haim Shore

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Introduction

The historical errors embedded in current-day Statistics

Modeling surgery duration — Personal learning experience that resulted in the new “Random identity paradigm”

The “Random identity paradigm” — “Random identity”, “Identity variation”, “identity loss”, “identity-full/identity-less distributions” (based on Shore, 2022)

Conclusions

References

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

“All science is either physics or stamp collecting”.

Share this:

Share this:

Share this:

Share this: