Categories
My Research in Statistics

Statistics and “Stamp Collecting”

What is the linkage between the science of statistics and “Stamp Collecting”? More than you can imagine.. This blog entry (with the linked article and PP presentation) was originally posted, for a restricted time period, on the Community Blog of the American Statistical Association (ASA), where the linked items were visible to members only. The blog entry is now displayed, with the linked items, visible to all.

This is the fourth and last message in this series about the consequences to statistical modeling of the continuous monotone convexity (CMC) property. The new message discusses implications of the CMC property to modeling random variation.

As a departure point for this discussion, some historic perspective about the development of the principle of unification in human perception of nature can be useful.

Our ancestors believed in a multiplicity of gods. All phenomena of nature had their particular gods and various manifestations of same phenomenon were indeed different displays of wishes, desires and emotions of the relevant god. Thus, Prometheus was a deity who gave fire to the human race and for that was punished by Zeus, the king of the gods; Poseidon was the god of the seas; and Eros was the god of desire and attraction.

This convenient “explanation” for the diversity of nature phenomena had all but disappeared with the advent of monotheism. Under the “umbrella” of a single god, ancient gods were “deleted”, to be replaced by a “unified” and “unifying” almighty god, the source of all nature phenomena.

And the three major monotheistic religions had been born.

The “concept” of unification, however, did not stop there. It was migrated to science, where pioneering giants of modern scientific thinking observed diverse phenomena of nature and had attempted to unify them into an all-encompassing mathematics-based theory, from which the separate phenomena could be deduced as special cases. Some of the most well-known representatives of this mammoth shift in human thinking, in those early stages of modern science, were Copernicus (1473-1543), Johannes Kepler (1571-1630), Galileo Galilei (1564-1642) and Isaac Newton (1642-1727).

In particular, the science of physics had been at the forefront of these early attempts to pursue the basic concept of unity in the realm of science. Ernest Rutherford (1871–1937), known as the father of nuclear physics and the discoverer of the proton (in 1919), made the following observation at the time:

“All science is either physics or stamp collecting”.

The assertion, quoted in Kaku (1994, p. 131), intended to convey a general sentiment that the drive to converge the five fundamental forces of nature into a unifying theory, nowadays a central theme of modern physics, represented science at its best. Furthermore, this is the only correct approach to the scientific investigation of nature. By contrast, at least until recently, most other scientific disciplines have engaged in taxonomy (“bug collecting” or “stamp collecting”). With “stamp collecting” the scientific inquiry is restricted to the discovery and classification of the “objects of enquiry”, particular to that science. However, this never culminates, as in physics, in a unifying theory from which all these objects may be deductively derived as “special cases”.

Is statistics a science of “stamp collecting”?

Observing the abundance of statistical distributions, identified to-date, an unavoidable conclusion is that statistics is indeed a science engaged in “stamp collecting”. Furthermore, serious attempts at unification (partial, at least) are rarely reported in the literature.

In a recent article (Shore, 2015), I have attempted a new paradigm for modeling random variation. The new paradigm, so I believe, may constitute an initial effort to unite all distributions under a unified “umbrella distribution”. In the new paradigm, the “Continuous Monotone Convexity (CMC)” property plays a central role in deriving a general expression to the normal-based quantile function of a generic random variable (assuming a single mode and a non-mixture distribution). Employing numeric fitting to current distributions, the new model has been shown to deliver accurate representation to scores of differently-shaped distributions (including some suggested by anonymous reviewers). Furthermore, negligible deviations from the fitted general model may be attributed to the natural imperfection of the fitting procedure or being perceived as realization of random variation around the fitted general model, not unlike a sample average is a random deviation from the population mean.

In a more recent effort (Shore, 2017), a new paradigm for modeling random variation is introduced and validated via certain predictions about known “statistical facts” (like the Central Limit Theorem), shown to be empirically true, and via distribution fitting, via 5-moment matching procedure, to a sample of known distributions.

These topics and others are addressed extensively in the afore-cited new article. It is my judgment that at present the CMC property constitutes the only possible avenue for achieving in statistics (as in most other modern branches of science) unification of the “objects of enquiry”, as these relate to modeling random variation.

In the affiliated Article #4 , I introduce in a more comprehensive fashion (yet minimally technical) an outline of the new paradigm and elaborate on how the CMC property is employed to arrive at a “general model of random variation”. A related PowerPoint presentation, delivered last summer at a conference in Michigan, is also displayed.

Haim Shore_4_ASA_Feb 2014

Haim Shore_4_ASA_PP Presentation_Feb 2014

References

[1] Kaku M (1994). Hyperspace- A Scientific Odyssey Through Parallel Universes, Time Warps and the Tenth Dimension. Book. Oxford University Press Inc., NY.

[2] Shore, H. (2015). A General Model of Random Variation. Communications in Statistics – Theory and Methods  44 (9): 1819-1841.

[3] Shore, H. (2017). The Fundamental Process by which Random Variation is Generated. Under review.

Categories
My Research in Statistics

CMC-Based Modeling — the Approach and Its Performance Evaluation

This post explains the central role of Continuous Monotone Convexity (CMC) in Response Modeling Methodology (RMM).

In earlier blog entries, the unique effectiveness of the Box-Cox transformation (BCT) was addressed. I concluded that the BCT effectiveness could probably be attributed to the Continuous Monotone Convexity (CMC) property, unique to the inverse BCT (IBCT). Rather than requiring the analyst to specify a model in advance (prior to analysis), the CMC property allows the data, via parameter estimation, determine the final form of the model (linear, power or exponential). This would most likely lead to better fit of the—estimated model, as cumulative reported experience with implementation of IBCT (or BCT) clearly attest to.

In the most recent blog entry in this series, I have introduced the “Ladder of Monotone Convex Functions”, and have demonstrated that IBCT delivers only the first three “steps” of the Ladder. Furthermore, IBCT can be extended so that a single general model can represent all monotone convex functions belonging to the Ladder. This transforms monotone convexity into a continuous spectrum so that the discrete “steps” of the Ladder (the separate models) become mere points on that spectrum.

In this third entry on the subject (and Article #3, linked below), I introduce in a more comprehensive fashion (yet minimally technical) the general model from which all the Ladder functions can be derived as special cases. This model was initially conceived in the last years of the previous century (Shore, 2005, and references therein) and had since been developed into a comprehensive modeling approach, denoted Response Modeling Methodology (RMM). In the affiliated article, an axiomatic derivation of RMM basic model is outlined and specific adaptations of RMM to model systematic variation and to model random variation are addressed. Published evidence for the capability of RMM to replace current published models, previously derived within various scientific and engineering disciplines as either theoretical, empirical or semi-empirical models, is reviewed. Disciplines surveyed include chemical engineering, software quality engineering, process capability analysis, ecology and ultra-sound-based fetal-growth modeling (based on cross-sectional data).

This blog entry (with the linked article given below) was originally posted on the site of the American Statistical Association (ASA), where the linked article was visible to members only.

Haim Shore_3_ASA_Jan 2014

Categories
My Research in Statistics

The “Continuous Monotone Convexity (CMC)” Property and its ‎Implications to Statistical Modeling

In a previous post in this series, I have discussed reasons for the effectiveness of the Box-Cox (BC) transformation, particularly when applied to a response variable within linear regression analysis. The final conclusion was that this effectiveness could probably be attributed to the “Continuous Monotone Convexity (CMC)” property, owned by the inverse BC transformation. It was emphasized that the latter, comprising the three most fundamental monotone convex functions, the “linear-power-exponential” trio, delivers only partial representation to a whole host of models of monotone convex relationships, which can be arranged in a hierarchy of monotone convexity. This hierarchy had been denoted the “Ladder of Monotone Convex Functions.”

In this post (and Article #2, linked below), I address in more detail the nature of the CMC property. I specify models included in the Ladder, and show how one can deliver, via a single model, representation to all models belonging to the Ladder (analogously with the inverse BC transformation, a special case of that model). Furthermore, I point to published evidence demonstrating that models of the Ladder may often substitute, with negligible loss in accuracy, published models of monotone convexity, which had been derived from theoretical discipline-specific considerations.

This blog entry (with the linked article given below) was originally posted on the site of the American Statistical Association (ASA), where the linked article was visible to members only.

Haim Shore_2_ASA_Dec 2013

Categories
My Research in Statistics

Why is Box-Cox transformation so effective?

The Box-Cox transformation and why is it so effective has intrigued my curiosity for many years. I have had the opportunity to talk both to Box and to Cox about their transformation (Box and Cox, 1964).

I conversed with the late George Box (deceased last March at age 94) when I was a visitor in Madison, Wisconsin, back in 1993-4.

A few years later I talked to David Cox at a conference on reliability in Bordeaux (MMR’2000).

I asked them both the same question, I received the same response.

The question was: What was the theory that led to the derivation of the Box-Cox transformation?

The answer was: “No theory. This was a purely empirical observation”.

The question therefore remains: Why is the Box-Cox transformation so effective, in particular when applied to a response variable in the framework of linear regression analysis?

In a new article, posted in my personal library at the American Statistical Association (ASA) site, I discuss this issue at some length. The article is now generally available for download here (Article #1 below).

Haim Shore_1_ASA_Nov 2013