Qualitative/Categorical Univariate Analysis Descriptive Statistics

Qualitative/Categorical Univariate Analysis Descriptive Statistics

Qualitative/Categorical Univariate Analysis Descriptive Statistics

Statistics Terminology

Some may argue that statisticians are not really interested in generalizing from a sample to a specified population but to an idealized super­population spanning space and time

best course on statistics: https://bolt.mph.ufl.edu/6050-6052/

 Introduction & Terminology

The field of statistics exists because it is usually impossible to collect data from all individuals of interest (population). Our only solution is to collect data from a subset (sample) of the individuals of interest, but our real desire is to know the “truth” about the population. Quantities such as means, standard deviations and proportions are all important values and are called “parameters” when we are talking about a population. Since we usually cannot get data from the whole population, we cannot know the values of the parameters for that population. We can, however, calculate estimates of these quantities for our sample. When they are calculated from sample data, these quantities are called “statistics.” A statistic estimates a parameter.

 Random Process - Random Variables - Stochastic Model - Probability Distribution - Statistical Inference - Statistical Model - Exploratory Data Analysis - Estimator - Probability Model

Many times there are observable phenomena that are random in nature. We call it a Random Process (Random Experiment). The random process has outcomes, and subsets of these outcomes are called Events. We map these events to a numeric form using Random Variables.

We study and capture our knowledge about this random process by creating a Stochastic Model. The stochastic model predicts the output of an event by:

  1. providing different choices (of values of a random variable)
  2. the probability of those choices

These two elements are summarized as a Probability Distribution.

This distribution has some parameters (like mean, standard deviation, etc) which were inferred from the observable phenomena using Statistical Inference.

Before inference, the distribution had unknown (not inferred yet) parameters. It was, hence, a family of distributions, since each value of the parameter is a different distribution. This family is called a Statistical Model.

Usually, a statistical model is guessed (exponential, binomial, normal, uniform, Bernoulli, etc) using Exploratory Data Analysis, then its parameters are inferred (estimated) by applying statistical inference (say, algorithms involving loss function minimization) to arrive at a stochastic model (statistical model with known parameters) (a.k.a. Estimator) that captures our knowledge about the random process.

The term 'Probability Model' (probabilistic model) is usually an alias for stochastic models.

Qualitative/Categorical Univariate Analysis Descriptive Statistics - Types

Statisticpopulation parameter notation

sample statistic notation

Description
size๐‘๐‘›number of members of dataset (sample or population)
frequency๐‘“๐‘“ห†the number of observations for a particular category
proportion๐‘๐‘ฬ‚the percent that each category accounts for out of the whole (๐‘ = ๐‘“/๐‘)
marginal

the totals in a cross tabulation by row or column (similar to Marginal Probability Distribution)

Resources