Essential Probability

Part II - Random Variables and Distributions

Random Variables

Nominal and Numeric Variables

Events and Random Variables

Jointly Distributed Variables

Cumulative Distribution Functions

Independent Random Variables

The Median and Other Quantiles

Discrete Random Variables

Probability Mass Functions

Expected Values of Discrete Variables

The Variance and Standard Deviation

Random Variables

A random variable is function whose domain is the sample space Ω of a random experiment and whose codomain is the set of real numbers. Informally, a random variable is a numerical observation resulting from the outcome of a random experiment. For example, if the experiment consists of selecting a random sample of 5 members of a human population, A, the average age of the 5 members, is a random variable. M, the number of males in the sample, is another one.

Nominal and Numeric Variables

The numerical values associated with experimental outcomes by a random variable may be mere tags or names for which arithmetic operations such as addition and multiplication are meaningless. Letters from the alphabet or spelled-out names would serve just as well. Such variables are called nominal variables, or sometimes factors. In contrast, numeric variables have values with true numerical significance and are often related to a scale of measurement.

Events and Random Variables

Let X denote a random variable and let I denote an interval of real numbers. I can be any kind of interval - open, closed, degenerate (consisting of a single point), a half line, or the entire set of real numbers. The set of experimental outcomes ω for which the corresponding value of X lies in I is an event, denoted by [X e I]. This notation is modified to suit particular kinds of intervals, e.g., [0 < X ≤ 1], [Y > 2.4], [Z = -1].

Jointly Distributed Variables

Two or more random variables X₁, X₂, ... are jointly distributed if they arise from the same random experiment, i.e., are defined for the same sample space. This means that for each outcome of the experiment, the variables in the list all have values simultaneously. For example, if the experiment is to randomly select one member of a human population, the age, height, sex, and marital status of the person selected are jointly distributed variables.

Cumulative Distribution Functions

If X is a random variable and x denotes an arbitrary real number, the event [X ≤ x] has a probability between 0 and 1. If x is allowed to vary over the set of all real numbers, this probability is a function of x. It is called the cumulative distribution function (cdf) of the random variable X. In symbols,

If X and Y are jointly distributed random variables, their joint cumulative distribution function is a function of two arguments, x and y. It is defined as

This notation can be extended to any number of jointly distributed random variables.

Independent Random Variables

Jointly distributed random variables X₁, X₂, ..., X_n are independent if for any sequence of intervals I₁, I₂, ..., I_n

Informally, this means that if the values assumed by some of these random variables are known, that knowledge does not help in predicting the values assumed by others. In the experiment of selecting a random sample of 5 members of a human population, the average age A of the members in the sample and the number of males M in the sample are independent random variables. The average age and the largest age of members of the sample are not independent. They are dependent.

The Median and Other Quantiles

A median of a random variable X with cdf F_X is any number m such that P[X ≤ m] ≥ 0.5 and P[X ≥ m] ≥ 0.5. Informally, this means that at least half the values of X are greater than or equal to m and at least half the values of X are less than or equal to m. In case there is more than one number m satisfying this condition, we usually let the median be the smallest number that does so. However, some authors define the median to be the middle number satisfying the condition. The median is also called the 50th percentile or the second quartile of the distribution of X.

If p is a number strictly between 0 and 1, the pth quantile, or 100pth percentile of the distribution of X is the smallest number q such that P[X ≤ q] ≥ p and P[X ≥ q] ≥ 1-p. The 25th percentile of a distribution is also called its first quartile and the 75th percentile is called the third quartile.

Discrete Random Variables

A random variable X is discrete if its values can be arranged in a finite or infinite sequence x₁, x₂, ... In contrast, a random variable of continuous type assumes all values in an interval of real numbers. For example, the number of heads in 10 successive tosses of a coin is a discrete random variable with possible values 0, 1, ... , 10. The average height of a random sample of 10 adult males from the U.S. population is much more conveniently treated as a random variable of continuous type.

Probability Mass Functions

If X is a discrete random variable with possible values x₁, x₂, ... , the probability mass function of X is a function whose domain is the set possible values. It is defined by

It is convenient to extend the domain of p_X to the set of all real numbers x by defining p_X(x) = 0 if x is not one of the x_i.

Expected Values of Discrete Variables

Let X be a discrete random variable with probability mass function p_X and let g be a real-valued function of a real argument x. The expected value of g(X) is defined as

If X has an infinite sequence of possible values, this is an infinite series and it is required that it be absolutely convergent. If we choose g(x) ≡ x, the expected value is called the mean of the random variable X. It is commonly denoted by the Greek letter m (mu).

The Variance and Standard Deviation

Let X be a discrete random variable and let m be the mean of X. The variance of X is defined as

The standard deviation of X is the square root of the variance. It is usually denoted by the Greek letter s (sigma). Thus, the variance is also denoted by s².