Essential Probability

Part III- Various Discrete Distributions

Bernoulli Random Variables and Distributions

Binomial Distributions

Binomial Distributions and Sampling with Replacement

Hypergeometric Distributions

Poisson Distributions

Poisson Approximation to the Binomial

The Geometric Distributions

A Table of Discrete Distributions

Bernoulli Random Variables and Distributions

A Bernoulli random variable is one that has only two values, 0 and 1.  Often these values are used to encode the outcomes of an experiment which results in one of two mutually exclusive outcomes, for example, the toss of a coin.  As another example, a randomly selected student in a course either passes it ("success") or does not pass it ("failure").  If we let X = 1 if the student passes and X = 0 if he or she doesn't, X is a Bernoulli random variable. X is the number of successes in the experiment.  Bernoulli random variables are also called Bernoulli trials.

If X is a Bernoulli random variable, let θ = P[X = 1].  Then 1 - θ = P[X = 0].  θ is often called the "success probability".  The probability mass function for X can be written in compact form as

The mean of  a Bernoulli random variable X is E[X] = θ and its variance is var(X) = θ(1-θ).

 

Binomial Distributions

Let X1, X2, ..., Xn be independent Bernoulli random variables or trials, all with the same success probability θ.  For example, these could be the outcomes of n tosses of a coin, with 1 indicating a head and 0 indicating a tail.  The random variable

is the total number of successes in the n trials.  Its distribution is called the binomial distribution with n trials and success probability θ.  Its probability mass function is given by

for y = 0, 1, ..., n, where

is the binomial coefficient which also appears in the formula for the binomial expansion.  The mean of the random variable Y is

and its variance is

 

Binomial Distributions and Sampling with Replacement

Suppose that a fraction θ of the members of some population P belong to some subpopulation of interest.  If a random sample of size n is taken from the population P with replacement and the number of members of the subpopulation in the sample is denoted by Y, then Y has the binomial distribution with n trials and success probability θ.   If the sample is taken without replacement, the distribution of Y is approximately the binomial distribution, provided that the sample size n is small relative to the size of the population.  For example, if Y is the number of registered voters in a random sample of 100 residents of a large city, the distribution of Y can be considered to be binomial regardless of whether the sample is taken with or without replacement.

 

Hypergeometric Distributions

If a random sample without replacement is taken from a population which is not very large compared to the sample size, the binomial distribution may not be a good approximation.  The hypergeometric distribution describes the exact distribution of the number of sample items belonging to the subpopulation of interest.  If N is the size of the larger population, M is the size of the subpopulation of interest, n is the size of a sample taken without replacement, and the random variable Y is the number of items in the sample that belong to the subpopulation, then the probability mass function of Y is given by

where y is an integer such that 0 ≤ y ≤ M and 0 ≤ n-y ≤ N-M.  If we let θ = M/N be the proportion of population members belonging to the subpopulation, then the mean and variance of the hypergeometric random variable are given by

These expressions should be compared to the corresponding expressions for the binomial distribution.

 

Poisson Distributions

The number of unpredictable natural events of certain types occurring in a time interval of given length is a random variable that is often described by a Poisson distribution.  This is especially true of events at an atomic or subatomic level.  For example, the number of recorded arrivals at a cosmic ray detector in a one minute time interval is a Poisson random variable.  Poisson distributions are important because they are closely related to binomial distributions in certain circumstances.

A Poisson random variable X can have any nonnegative integer value.  Its probability mass function is

The parameter m is a positive number and is both the mean and the variance of the distribution.

 

Poisson Approximation to the Binomial

Let Y be a random variable with a binomial distribution based on n trials and with success probability θ.  If either nθ2 or n(1-θ)2 is small, the probability mass function of Y can be approximated by the Poisson probability mass function with mean m = nθ.  That is to say,

The error of this approximation is no greater than the smaller of the two numbers 2 and n(1-θ)2.

 

The Geometric Distributions

Let X1, X2, ... be a sequence of independent Bernoulli random variables, all with the same success probability θ.  Let N denote the number of the trial on which the first success occurs.  N is a random variable whose value could be any positive integer.  For example, an experiment might consist of rolling a standard 6-sided die until the first occurrence of a 2.  N would then be the number of rolls required to get a 2, and θ would be 1/6.  The distribution of such a random variable is called a geometric distribution with success probability θ.  The probability mass function of N is

 

The mean and variance of the geometric distribution are given by

 

A Table of Discrete Distributions

Name

PMF

Mean

Variance

Relationships

Bernoulli

θ

θ(1-θ)

Binomial, Geometric

Binomial

nθ

nθ(1-θ)

Hypergeometric, Poisson

Hypergeometric

Binomial

Poisson

μ

μ

Binomial

Geometric

Bernoulli