Quantitative Methods

Quantitative Methods

基础概念

Random Variable

随机变量(研究对象)
A quantity whose future outcomes are uncertain

Outcomes

随机变量可能的取值
Possible values of a random variable

Outcome Space

样本空间Ω\Omega(所有的可能)
a set contains all possible outcomes

Event

事件
a specified set of outcomes

Probability

概率
a measure quantifies the likelihood that events will occur

事件之间的关系

Independent(独立)

occurrence of A isn’t related to B

Dependent(非独立)

occurrence of A is related to B

Mutually exclusive(互斥)

Only one event can occur at a time

Exhaustive(遍历)

条件概率和联合概率的关系

图形表示

联合概率:

重叠部分的阴影面积(或重叠部分阴影面积除以总面积)

条件概率

不能直接使用面积表示,它是“面积的比值”

公式表示

条件概率

P(AB)=P(AB)P(B)P(A|B) = \frac{P(AB)}{P(B)}

联合概率

P(AB)=P(B)P(AB)P(AB)=P(B)*P(A|B)

全概率公式

P(A)=P(AS1)P(S1)+P(AS2)P(S2)+P(ASn)P(Sn)P(A) = P(A|S_1) * P(S_1)+ P(A|S_2) * P(S_2)+P(A|S_n) * P(S_n)

某事件发生概率等于不同情形此事件发生的联合概率相加。
S1 S2 Sn are mutually exclusive and exhaustive.(遍历且互斥)

Descriptive Statistics(描述性统计)

两个重要角度

  1. Central Tendency 中心趋势
  2. Dispersion 离散程度

中心趋势指标对比

Mean(算术平均数)

  • 容易计算
  • 使用每一个观测值以及观测值数量
  • 收到计算值影响大

Mode(众数)

  • 可能会有一个或多个众数
  • 或者没有众数
  • 受极端值影响最小

Median(中位数)

  • 奇数个数值:位于(n+1)/2个位置
  • 偶数个数值:使用n/个数和(n+2)/2个数相加除以2
  • 基本不受极端值影响小

Expected Value

the expected value of a random variable X having possible values x1,x2,x3,xnx_1,x_2,x_3 \dots ,x_n is defined as:

E(X)=x1P(X=x1)+x2P(X=x2)++xnP(X=xn)E(X) = x_1 P(X=x_1) + x_2 P(X=x_2) + \dots +x_n P(X=x_n)

Range 极差

Range = Maximum Value - Minimum Value$$\ ## Mean Absolute Deviation 绝对平均离差 $$MAD = \frac{\sum^{n}{i=1}|X_i - \bar{X}|}{N}

where Xˉ\bar{X} is mean and n is the observations

Variance 方差

equals to average of the sum of squared deviations around the mean

σ2=i=1N(Xiμ)2N\sigma^2 = \frac{\sum^{N}_{i=1}(X_i -\mu)^2}{N}

s2=i=1N(Xiμ)2n1s^2 = \frac{\sum^{N}_{i=1}(X_i -\mu)^2}{n-1}

注意总体和样本方差的区别

graph TB;
    population-->parameter;
    population --sampling-->sample;
    sample-->estimator;
    estimator --estimation-->parameter;

sampling and estimation

why sampling?

  • examining every member of the population would not be economically efficient
  • we cannot possibly examine every member of the population

抽样的方法

simple random sampling

a subset of larger population created in such a way that each element of the population has an equal probability of being selected to the subset

stratified random sampling

the population divided into subpopulations based on one or more classification criteria. then samples are pooled from these stratified subpopulations

survivorship bias

拿到样本算什么

estimator

formulas to compute the sample statistics to estimate the population parameter
an estimator is a random variable,so it has a sampling distribution

estimate

particular value calculated from sample observations using an estimator

unbiasedness 无偏性

A=g(X1,X2,XN)A'= g(X_1,X_2, \dots X_N)是未知参数AA的一个估计量,若AA'满足E(A)=AE(A') = A,则称AA'AA的无偏估计量。

efficiency 有效性

对同一总体参数的两个无偏估计量,有更小的标准差的估计量更有效。

consistency 一致性

随着样本容量的增大,估计量的值越来越接近被估计的总体参数

estimation

point estimate

the calculated value of the sample statistic in a given sample, used as an estimate of the population parameter

interval estimate

calculating a range of values that brackets the unknown population parameter with some specified level of probability

normal distributions 正态分布

正态分布是用来刻画随机变量取值概率密度的函数。横坐标是随机变量可能的取值,纵坐标是概率密度,而曲线与横坐标上某一段变量的取值所围成的面积,积为随机变量落在这一区间的概率
正态分布的曲线是一条关于 $ \mu 线对称的钟形曲线。特点是“两头小,中间大,左右对称” 随机变量落在\mu附近的概率比较大,落在离\mu$比较远的两头概率比较小
记作:

X (μ,σ2)X~(\mu ,\sigma^2)

avtor

approximately 68% of all observations fall in the interval( μ+σ or μσ\mu + \sigma \ or\ \mu - \sigma)
approximately 95% of all observations fall in the interval( μ+1.96σ or μ1.96σ\mu + 1.96\sigma \ or\ \mu - 1.96\sigma)
approximately 99% of all observations fall in the interval( μ+2.58σ or μ2.58σ\mu + 2.58\sigma \ or\ \mu - 2.58\sigma)

三个分步

均值 标准差 形态
总体 μ\mu σ\sigma 任意
样本 Xˉ\bar{X} S 任意
Xˉ\bar{X}(估计量) μ\mu σn\frac{\sigma}{\sqrt{n}} 正态分布(n>30)

sampling distribution (样本统计量的分步)

the distribution of all the distinct possible values that the statistic can assume when computed from samples of the same size randomly drawn from the same population

sampling error

sampling error is the difference between the estimator and the population parameter

standard error( 标准误)

standard deviation of the sample mean

σxˉ=σn\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}

interval estimate——置信区间的构建

A 100(1α)100(1 - \alpha)% confidence interval for a parameter has the following structure:

point estimat±reliability factorstandard errorpoint\ estimat \pm reliability\ factor * standard\ error

  • $\alpha $ = significant level(显著性水平),probability that the interval cannot covered the population parameter
  • 1-α\alpha = confident level(置信度),probability that the interval covered the population parameter
  • point estimate = a value of a sample statistic
  • reliablity factor = a number based on degree of confidence (1α1-\alpha) for the confidence interval
  • standard error = the standard deviation of the sample statistic

假设检验

条件

  • 证明比证伪难
  • 小概率事件很难发生

假设检验重要原理

由于证明比证伪难,所以把描述分为两个对立面:
想要去检验的,叫做原假设(null hypothesis)
等待拒绝原假设从而去接受的结论,叫备择假设(alternative hypothesis)

否定原假设——小概率事件难以发生

相关关系

统计的重要目标之一,是旨在寻找不同的变量之间存在的相关关系,然后判断一个因素是否会影响另一个因素
一旦相关关系得到证实,接下来就能够尝试判断其中是否蕴含着某些潜在的原因

covariance 协方差

Cov(Ri,Rj)=E[(RiERi)(RjERj)]Cov(R_i,R_j) = E[(R_i - ER_i)(R_j-ER_j)]

it measures the co-movements of 2 random variables

covariance > 0
两个变量同时大于或者同时小于它们各自的期望值

covariance < 0
一个变量大于其期望而另一个倾向于小于其期望值,或者相反

covariance = 0
两个变量之间没有线性关系

covariance的取值范围:从正无穷到负无穷

局限性

由于其取值范围为正无穷到负无穷,使其数值大小不具备可比性,由此导致协方差难以用于数据之间的横向比较

corelation 相关系数

a standarlized measure of the linear relationship between two variables:

ρX,Y=Cov(X,Y)σXσY\rho{X,Y} = \frac{Cov(X,Y)}{\sigma_X\sigma_Y}

values range from +1 to -1,it has no units
a correlation of 0 indicates an absence of any linear relationship between the variables
the bigger the absolute value of correlation coefficient,the stronger linear relationship

贝叶斯公式

P(AB)=P(A)P(BA)=P(B)P(AB)P(AB) = P(A)*P(B|A) = P(B)*P(A|B)

P(AB)=P(BA)P(B)P(A)P(A|B) = \frac{P(B|A)}{P(B)}*P(A)


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!