ACM
Lecture Notes |

Statistical
InferenceK.M.
Zuev[arXiv stat.AP 1603.04929] What is Statistics? Opinions vary. In fact, there
is a continuous spectrum of attitudes toward statistics ranging from
pure theoreticians, proving asymptotic efficiency and searching for
most powerful tests, to wild practitioners, blindly reporting p-values
and claiming statistical significance for scientifically insignificant
results. In these notes statistics is viewed as a branch of mathematical
engineering, that studies ways of extracting reliable information from
limited data for learning, prediction, and decision making in the presence
of uncertainty. The main goals of these notes are: (1) provide a logical
introduction to statistical inference, (2) develop statistical thinking
and intuitive feel for the subject, and (3) introduce the most fundamental
ideas, concepts, and methods of statistics, explain how and why they
work, and when they don’t. These ACM lecture notes are based on
the courses the author taught at the University of Southern California
in 2012 and 2013, and at the California Institute of Technology in 2016.Abstract: |

Refereed
Journal Publications |

18. Transitional
Annealed Adaptive Slice Sampling for Gaussian Process Hyper-Parameter
EstimationA. Garbuno-Iñigo, F.A. DiazDelaO, and K.M. Zuevsubmitted [arXiv stat.CO
1509.00349]Surrogate models have become ubiquitous in science
and engineering for their capability of emulating expensive computer
codes, necessary to model and investigate complex phenomena. Bayesian
emulators based on Gaussian processes adequately quantify the uncertainty
that results from the cost of the original simulator, and thus the inability
to evaluate it on the whole input space. However, it is common in the
literature that only a partial Bayesian analysis is carried out, whereby
the underlyig hyper-parameters are estimated via gradient-free optimisation
or genetic algorithms, to name a few methods. On the other hand, maximum
a posteriori (MAP) estimation could discard important regions of the
hyper-parameter space. In this paper, we carry out a more complete Bayesian
inference, through combining Slice Sampling with some recently developed
Sequential Monte Carlo samplers. The resulting algorithm improves the
mixing in the sampling through delayed-rejection, the inclusion of an
annealing scheme akin to Asymptotically Independent Markov Sampling
and parallelisation via Trainsitional Markov Chain Monte Carlo. Examples
related to the estimation of hyper-parameters, as well as examples applied
in other contexts of Bayesian inference, are presented. For the purpose
of reproducibility, further development, and use in other applications,
the code to generate the examples in this paper is freely available
for download at this
http URL.Abstract: |

17. Rare
Event SimulationJ.L. Beck and K.M. Zuevaccepted for publication by the Springer Handbook on Uncertainty
Quantification [arXiv stat.CO 1508.05047]Rare events are events that are expected to occur infrequently, or more technically, those that have low probabilities (say, order of $10^{-3}$ or less) of occurring according to a probability model. In the context of uncertainty quantification, the rare events often correspond to failure of systems designed for high reliability, meaning that the system performance fails to meet some design or operation specifications. As reviewed in this section, computation of such rare-event probabilities is challenging. Analytical solutions are usually not available for non-trivial problems and standard Monte Carlo simulation is computationally inefficient. Therefore, much research effort has focused on developing advanced stochastic simulation methods that are more efficient. In this section, we address the problem of estimating rare-event probabilities by Monte Carlo simulation, Importance Sampling and Subset Simulation for highly reliable dynamic systems.Abstract: |

16. Gaussian
Process Hyper-Parameter Estimation using Parallel Asymptotically Independent
Markov SamplingA. Garbuno-Iñigo, F.A. DiazDelaO, and K.M. ZuevComputational Statistics & Data Analysis, vol.
103, pp. 367-383, Jun. 2016.[Web DOI
| Paper pdf | arXiv stat.CO
1506.08010]Gaussian process surrogates of computationally expensive
computer codes provide fast statistical approximations to model physical
processes. The training of these surrogates depends on the set of design
points chosen to run the expensive simulator. However, such training
set is bound to be limited and quantifying the resulting uncertainty
in the hyper-parameters of the emulator by uni-modal distributions is
likely to induce bias. This paper proposes a computationally efficient
sampler based on an extension of the Asymptotically Independent Markov
Sampling, a recently developed algorithm for Bayesian inference and
extended to optimisation problems. Structural uncertainty of the emulator
is obtained as a by-product of the Bayesian treatment of the hyper-parameters.
Model uncertainty is also acknowledged through numerical stabilisation
measures by including a nugget term in the formulation of the probability
model. The efficiency of the proposed sampler is illustrated in examples
where multi-modal distributions are encountered. For the purposes of
reproducibility, further development, and use in other applications,
we made the code used to generate the examples in this paper freely
available for download at this
https URL.Abstract: |

15. Hamiltonian
Dynamics of Preferential AttachmentK.M. Zuev, F. Papadopoulos, and D. KrioukovJournal of Physics A: Mathematical and Theoretical,
vol. 49, art. num. 105001, Jan. 2016. [Web DOI
| Paper pdf | arXiv physics.soc-ph
1504.07981]Prediction and control of network dynamics are grand-challenge
problems in network science. The lack of understanding of fundamental
laws driving the dynamics of networks is among the reasons why many
practical problems of great significance remain unsolved for decades.
Here we study the dynamics of networks evolving according to preferential
attachment, known to approximate well the large-scale growth dynamics
of a variety of real networks. We show that this dynamics is Hamiltonian,
thus casting the study of complex networks dynamics to the powerful
canonical formalism, in which the time evolution of a dynamical system
is described by Hamilton's equations. We derive the explicit form of
the Hamiltonian that governs network growth in preferential attachment.
This Hamiltonian turns out to be nearly identical to graph energy in
the configuration model, which shows that the ensemble of random graphs
generated by preferential attachment is nearly identical to the ensemble
of random graphs with scale-free degree distributions. In other words,
preferential attachment generates nothing but random graphs with power-law
degree distribution. The extension of the developed canonical formalism
for network analysis to richer geometric network models with non-degenerate
groups of symmetries may eventually lead to a system of equations describing
network dynamics at small scales.Abstract: |

14.
Subset Simulation Method for Rare Event Estimation: An IntroductionK.M. ZuevIn: Beer M. et. al. (Ed.) Encyclopedia of Earthquake Engineering,
Springer Berlin Heidelberg, pp. 3671-3691, Oct. 2015. [Web DOI | Paper pdf | arXiv stat.CO 1505.03506] Abstract: Subset Simulation is an efficient and elegant
method for simulating rare events and estimating the corresponding small
tail probabilities. The method was originally developed by Siu-Kui Au
and James Beck for estimation of structural reliability of complex civil
engineering systems such as tall buildings and bridges at risk from
earthquakes. The method turned out to be so powerful and general that
over the last decade, Subset Simulation has been successfully applied
to reliability problems in geotechnical, aerospace, fire, and nuclear
engineering. This paper provides a detailed introductory description
of Subset Simulation. A simple and intuitive derivation of the method
is given along with the discussion on its implementation. The method
is illustrated with several easy-to-understand examples. The reader
is assumed to be familiar only with elementary probability theory and
statistics. |

13. Exponential
Random Simplicial ComplexesK.M. Zuev, O. Eisenberg, and D. KrioukovJournal of Physics A: Mathematical and Theoretical,
vol. 48, art. num. 465002, Oct. 2015. [Web DOI
| Paper pdf | arXiv math.ST
1502.05032]Exponential random graph models have attracted significant
research attention over the past decades. These models are maximum-entropy
ensembles subject to the constraints that the expected values of a set
of graph observables are equal to given values. Here we extend these
maximum-entropy ensembles to random simplicial complexes, which are
more adequate and versatile constructions to model complex systems in
many applications. We show that many random simplicial complex models
considered in the literature can be casted as maximum-entropy ensembles
under certain constraints. We introduce and analyze the most general
random simplicial complex ensemble $\mathbf{\Delta}$ with statistically
independent simplices. Our analysis is simplified by the observation
that any distribution $\mathbb{P}(O)$ on any collection of objects $\mathcal{O}=\{O\}$,
including graphs and simplicial complexes, is maximum-entropy subject
to the constraint that the expected value of $-\ln \mathbb{P}(O)$ is
equal to the entropy of the distribution. With the help of this observation,
we prove that ensemble $\mathbf{\Delta}$ is maximum-entropy subject
to the two types of constraints which fix the expected numbers of simplices
and their boundaries.Abstract: |

12. Emergence
of Soft Communities from Geometric Preferential AttachmentK.M. Zuev, D. Krioukov, M. Boguñá, G. BianconiScientific Reports, vol. 5, art. num. 9421, Apr.
2015.[Web DOI
| Paper pdf |arXiv physics.soc-ph
1501.06835]All real networks are different, but many have some
structural properties in common. There seems to be no consensus on what
the most common properties are, but scale-free degree distributions,
strong clustering, and community structure are frequently mentioned
without question. Surprisingly, there exists no simple generative mechanism
explaining all the three properties at once in growing networks. Here
we show how latent network geometry coupled with preferential attachment
of nodes to this geometry fills this gap. We call this mechanism geometric
preferential attachment (GPA), and validate it against the Internet.
GPA gives rise to soft communities that provide a different perspective
on the community structure in networks. The connections between GPA
and cosmological models, including inflation, are also discussed.Abstract: |

11. General
Network Reliability Problem and its Efficient Solution by Subset SimulationK.M. Zuev, S. Wu, and J.L. Beck Probabilistic Engineering Mechanics, vol. 40, pp.
25-35, Apr. 2015.[Web DOI | Paper pdf] Abstract: Complex technological networks designed for
distribution of some resource or commodity are a pervasive feature of
modern society. Moreover, the dependence of our society on modern technological
networks constantly grows. As a result, there is an increasing demand
for these networks to be highly reliable in delivering their service.
As a consequence, there is a pressing need for efficient computational
methods that can uantitatively assess the reliability of technological
networks to enhance their design and operation in the presence of uncertainty
in their future demand, supply and capacity. In this paper, we propose
a stochastic framework for quantitative assessment of the reliability
of network service, formulate a general network reliability problem
within this framework, and then show how to calculate the service reliability
using Subset Simulation, an efficient Markov chain Monte Carlo method,
that was originally developed for estimating small failure probabilities
of complex dynamic systems. The efficiency of the method is demonstrated
with an illustrative example where two small-world network generation
models are compared in terms of the reliability of the networks that
they produce. |

10. The
Maximum Number of 3- and 4-Cliques within a Planar Maximally Filtered
GraphJ. Birch, A.A. Pantelous, and K.M. ZuevPhysica A, vol. 417, pp 221-229, Jan. 2015.[Web DOI | Paper pdf | arXiv math-ph 1507.02929] Abstract: Planar Maximally Filtered Graphs (PMFG) are
an important tool for filtering the most relevant information from correlation
based networks such as stock market networks. One of the main characteristics
of a PMFG is the number of its 3- and 4-cliques. Recently in a few high
impact papers it was stated that, based on heuristic evidence, the maximum
number of 3- and 4-cliques that can exist in a PMFG with n vertices
is $3n-8$ and $n-4$ respectively. In this paper, we prove that this
is indeed the case. |

9. Global
Optimization using the Asymptotically Independent Markov Sampling MethodK.M. Zuev and J.L. BeckComputers & Structures, vol. 126, pp. 107-119,
Sep. 2013.[Web DOI
| Paper pdf]In this paper, we introduce a new efficient stochastic
simulation method, AIMS-OPT, for approximating the set of globally optimal
solutions when solving optimization problems such as optimal performance-based
design problems. This method is based on Asymptotically Independent Markov
Sampling (AIMS), a recently developed advanced simulation scheme originally
proposed for Bayesian inference. This scheme combines importance sampling,
Markov chain Monte Carlo simulation and annealing for efficient sampling
from an arbitrary target distribution over a multi-dimensional space.
Instead of a single approximation of the optimal solution, AIMS-OPT produces
a set of nearly optimal solutions where the accuracy of the near-optimality
is controlled by the user. Having a set of nearly optimal system designs,
for example, can be advantageous in many practical cases such as when
there exists a whole set of optimal designs or in multi-objective optimization
where there is a Pareto optimal set. AIMS-OPT is also useful for efficient
exploration of the global sensitivity of the objective function to the
design parameters. The efficiency of AIMS-OPT is demonstrated with several
examples which have different topologies of the optimal solution sets.
Comparison is made with the results of applying Simulated Annealing, a
well-known stochastic optimization algorithm, to the three two-dimensional
problems.Abstract: |

8. Asymptotically
Independent Markov Sampling: a new MCMC Scheme for Bayesian InferenceJ.L. Beck and K.M. ZuevInternational Journal for Uncertainty Quantification,
vol. 3, num. 5, pp. 445-474, 2013.[Web DOI
| Paper pdf | arXiv stat.CO
1110.1880]In Bayesian inference, many problems can be expressed
as the evaluation of the expectation of an uncertain quantity of interest
with respect to the posterior distribution based on relevant data. Standard
Monte Carlo method is often not applicable because the encountered posterior
distributions cannot be sampled directly. In this case, the most popular
strategies are the importance sampling method, Markov chain Monte Carlo,
and annealing. In this paper, we introduce a new scheme for Bayesian inference,
called asymptotically independent Markov sampling (AIMS), which is based
on the above methods. We derive important ergodic properties of AIMS.
In particular, it is shown that, under certain conditions, the AIMS algorithm
produces a uniformly ergodic Markov chain. The choice of the free parameters
of the algorithm is discussed and recommendations are provided for this
choice, both theoretically and heuristically based. The efficiency of
AIMS is demonstrated with three numerical examples, which include both
multimodal and higherdimensional target posterior distributions.Abstract: |

7. Bayesian
Post-Processor and other Enhancements of Subset Simulation for Estimating
Failure ProbabilitiesK.M. Zuev, J.L. Beck, S.K. Au, and L.S. KatafygiotisComputers & Structures, vol. 92-93, pp. 283-296,
Feb. 2012.[Web DOI
| Paper pdf | arXiv
stat.CO 1110.3390]Estimation of small failure probabilities is one of
the most important and challenging computational problems in reliability
engineering. The failure probability is usually given by an integral over
a high-dimensional uncertain parameter space that is difficult to evaluate
numerically. This paper focuses on enhancements to Subset Simulation (SS),
proposed by Au and Beck, which provides an efficient algorithm based on
MCMC (Markov chain Monte Carlo) simulation for computing small failure
probabilities for general high-dimensional reliability problems. First,
we analyze the Modified Metropolis algorithm (MMA), an MCMC technique,
which is used in SS for sampling from high-dimensional conditional distributions.
We present some observations on the optimal scaling of MMA, and develop
an optimal scaling strategy for this algorithm when it is employed within
SS. Next, we provide a theoretical basis for the optimal value of the
conditional failure probability $p_0$, an important parameter one has
to choose when using SS. Finally, a Bayesian post-processor SS+ for the
original SS method is developed where the uncertain failure probability
that one is estimating is modeled as a stochastic variable whose possible
values belong to the unit interval. Simulated samples from SS are viewed
as informative data relevant to the system's reliability. Instead of a
single real number as an estimate, SS+ produces the posterior PDF of the
failure probability, which takes into account both prior information and
the information in the sampled data. This PDF quantifies the uncertainty
in the value of the failure probability and it may be further used in
risk analyses to incorporate this uncertainty. The relationship between
the original SS and SS+ is also discussed.Abstract: |

6. Modified
Metropolis-Hastings Algorithm with Delayed RejectionK.M. Zuev and L.S. KatafygiotisProbabilistic Engineering Mechanics, vol. 26, num.
3, pp. 405-412, Jul. 2011.[Web DOI
| Paper pdf]The development of an efficient MCMC strategy for sampling
from complex distributions is a difficult task that needs to be solved
for calculating the small failure probabilities encountered in the high-dimensional
reliability analysis of engineering systems. Usually different variations
of the Metropolis–Hastings algorithm (MH) are used. However, the
standard MH algorithm does not generally work in high dimensions, since
it leads to very frequent repeated samples. In order to overcome this
deficiency one can use the Modified Metropolis–Hastings algorithm
(MMH) proposed in Au and Beck in 2001. Another variation of the MH algorithm,
called the Metropolis–Hastings algorithm with delayed rejection
(MHDR) has been proposed by Tierney and Mira in 1999. The key idea behind
the MHDR algorithm is to reduce the correlation between states of the
Markov chain. In this paper we combine the ideas of MMH and MHDR and propose
a novel modification of the MH algorithm, called the Modified Metropolis–Hastings
algorithm with delayed rejection (MMHDR). The efficiency of the new algorithm
is demonstrated with a numerical example where MMHDR is used together
with Subset simulation for computing small failure probabilities in high
dimensions.Abstract: |

5. The
Horseracing Simulation Algorithm for Evaluation of Small Failure ProbabilitiesK.M. Zuev and L.S. KatafygiotisProbabilistic Engineering Mechanics, vol. 26, num.
2, pp. 157-164, Apr. 2011.[Web DOI
| Paper pdf] Over the past decade, the civil engineering community
has ever more realized the importance and perspective of reliability-based
design optimization (RBDO). Since then several advanced stochastic simulation
algorithms for computing small failure probabilities encountered in reliability
analysis of engineering systems have been developed. In this paper we
propose a novel advanced stochastic simulation algorithm for solving high-dimensional
reliability problems, called Horseracing Simulation (HRS). The key idea
behind HS is as follows. Although the reliability problem itself is high-dimensional,
the limit-state function maps this high-dimensional parameter space into
a one-dimensional real line. This mapping transforms a highdimensional
random parameter vector, which may represent the stochastic input load
as well as any uncertain structural parameters, into a random variable
with unknown distribution, which represents the uncertain structural response.
It turns out that the corresponding cumulative distribution function (CDF)
of this random variable of interest can be accurately approximated by
empirical CDFs constructed from specially designed samples. The generation
of samples is governed by a process of ‘‘racing’’
towards the failure domain, hence the name of the algorithm. The accuracy
and efficiency of the new method are demonstrated with a real-life wind
engineering example.Abstract: |

4. A
Formal Frobenius Theorem and Argument ShiftA.V. Bolsinov and K.M. ZuevMathematical Notes, vol. 86, num. 1-2, pp. 10-18,
Aug. 2009.[Web DOI | Paper pdf] vol. 86, num. 1, pp. 3-13, 2009.Matematicheskie Zametki, [Web DOI
| Paper pdf]A formal Frobenius theorem, which is an analog of
the classical integrability theorem for smooth distributions, is proved
and applied to generalize the argument shift method of A. S. Mishchenko
and A. T. Fomenko to finite-dimensional Lie algebras over any field
of characteristic zero. A completeness criterion for a commutative set
of polynomials constructed by the formal argument shift method is obtained.Abstract: |

3. Geometric
Insight into the Challenges of Solving High-Dimensional Reliability ProblemsL.S. Katafygiotis and K.M. ZuevProbabilistic Engineering Mechanics, vol. 23, num.
2-3, pp. 208-218, Apr.-Jul. 2008.[Web DOI
| Paper pdf]In this paper we adopt a geometric perspective to highlight
the challenges associated with solving high-dimensional reliability problems.
Adopting a geometric point of view we highlight and explain a range of
results concerning the performance of several well-known reliability methods.
We start by investigating geometric properties of the $N$-dimensional
Gaussian space and the distribution of samples in such a space or in a
subspace corresponding to a failure domain. Next, we discuss Importance
Sampling (IS) in high dimensions. We provide a geometric understanding
as to why IS generally does not work in high dimensions. We furthermore
challenge the significance of “design point” when dealing
with strongly nonlinear problems. We conclude by showing that for the
general high-dimensional nonlinear reliability problems the selection
of an appropriate fixed IS density is practically impossible. Next, we
discuss the simulation of samples using Markov Chain Monte Carlo (MCMC)
methods. Firstly, we provide a geometric explanation as to why the standard
Metropolis–Hastings (MH) algorithm does “not work” in
high-dimensions. We then explain why the modified Metropolis–Hastings
(MMH) algorithm overcomes this problem. A study of the correlation of
samples obtained using MMH as a function of different parameters follows.
Such study leads to recommendations for fine-tuning the MMH algorithm.
Finally, the MMH algorithm is compared with the MCMC algorithm proposed
by Katafygiotis and Cheung in 2006 in terms of the correlation of samples
they generate.Abstract: |

2. Spectrum
of the Laplace-Beltrami Operator on Suspensions of Toric AutomorphismsK.M. ZuevSbornik: Mathematics, vol. 197, num. 9, pp. 1297-1308,
2006.[Web DOI | Paper pdf] vol. 197, num. 9, pp. 43-54, 2006.Matematicheskii Sbornik, [Web DOI
| Paper pdf]The spectrum and the eigenbasis of the Laplace–Beltrami
operator on the suspensions of toric automorphisms are investigated.
A description in terms of solutions of one-dimensional Schrödinger’s
equation is presented.Abstract: |

1.
On the case of Integrability of a Geodesic Flow on a Homogeneous
ManifoldK.M. ZuevMoscow University Mathematics Bulletin, num. 2, pp.
13-16, 2006.[Paper pdf] |

Refereed
Conference Publications |

12. Asymptotically
Independent Markov Sampling: a new MCMC Scheme for Bayesian InferenceK.M. Zuev and J.L. Beck 2nd International Conference on Vulnerability and Risk Analysis and
Management & ,
Liverpool, UK, 2014.6th International Symposium on Uncertainty Modelling and Analysis ASCE-ICVRAM-ISUMA-2014 [Paper pdf] |

11. Rare
Events in Complex Networks and their Efficient EstimationK.M. Zuev, J.L. Beck, and S. Wu SIAM Workshop on Network Science NS14, Chicago, IL, USA, 2014.[Paper pdf] |

10. Efficient
Estimation of Complex Network ReliabilityK.M. Zuev, S. Wu, and J.L. Beck MIPT 56th annual scientific conference, Dolgoprudny, Russia,
2013.[Paper pdf] |

9. |

8. |

7. |

6. |

5. Computational Methods in Structural Dynamics and Earthquake Engineering
COMPDYN-2009, Rhodos, Greece, 2009.[Paper pdf] |

4. |

3. |

2. |

1. |

Miscellaneous |

1. S.K. Au, J.L. Beck, K.M. Zuev, and L.S. Katafygiotis Structural Safety, vol. 34, no. 1, pp. 379-380, Jan. 2012.[Web DOI | Paper pdf] Back to Top |