Quantitative Methods
Quantitative Methods
基础概念
Random Variable
随机变量(研究对象)
A quantity whose future outcomes are uncertain
Outcomes
随机变量可能的取值
Possible values of a random variable
Outcome Space
样本空间(所有的可能)
a set contains all possible outcomes
Event
事件
a specified set of outcomes
Probability
概率
a measure quantifies the likelihood that events will occur
事件之间的关系
Independent(独立)
occurrence of A isn’t related to B
Dependent(非独立)
occurrence of A is related to B
Mutually exclusive(互斥)
Only one event can occur at a time
Exhaustive(遍历)
条件概率和联合概率的关系
图形表示
联合概率:
重叠部分的阴影面积(或重叠部分阴影面积除以总面积)
条件概率
不能直接使用面积表示,它是“面积的比值”
公式表示
条件概率
联合概率
全概率公式
某事件发生概率等于不同情形此事件发生的联合概率相加。
S1 S2 Sn are mutually exclusive and exhaustive.(遍历且互斥)
Descriptive Statistics(描述性统计)
两个重要角度
- Central Tendency 中心趋势
- Dispersion 离散程度
中心趋势指标对比
Mean(算术平均数)
- 容易计算
- 使用每一个观测值以及观测值数量
- 收到计算值影响大
Mode(众数)
- 可能会有一个或多个众数
- 或者没有众数
- 受极端值影响最小
Median(中位数)
- 奇数个数值:位于(n+1)/2个位置
- 偶数个数值:使用n/个数和(n+2)/2个数相加除以2
- 基本不受极端值影响小
Expected Value
the expected value of a random variable X having possible values is defined as:
Range 极差
Range = Maximum Value - Minimum Value$$\ ## Mean Absolute Deviation 绝对平均离差 $$MAD = \frac{\sum^{n}{i=1}|X_i - \bar{X}|}{N}
where is mean and n is the observations
Variance 方差
equals to average of the sum of squared deviations around the mean
注意总体和样本方差的区别
graph TB;
population-->parameter;
population --sampling-->sample;
sample-->estimator;
estimator --estimation-->parameter;
sampling and estimation
why sampling?
- examining every member of the population would not be economically efficient
- we cannot possibly examine every member of the population
抽样的方法
simple random sampling
a subset of larger population created in such a way that each element of the population has an equal probability of being selected to the subset
stratified random sampling
the population divided into subpopulations based on one or more classification criteria. then samples are pooled from these stratified subpopulations
survivorship bias
拿到样本算什么
estimator
formulas to compute the sample statistics to estimate the population parameter
an estimator is a random variable,so it has a sampling distribution
estimate
particular value calculated from sample observations using an estimator
unbiasedness 无偏性
设是未知参数的一个估计量,若满足,则称为的无偏估计量。
efficiency 有效性
对同一总体参数的两个无偏估计量,有更小的标准差的估计量更有效。
consistency 一致性
随着样本容量的增大,估计量的值越来越接近被估计的总体参数
estimation
point estimate
the calculated value of the sample statistic in a given sample, used as an estimate of the population parameter
interval estimate
calculating a range of values that brackets the unknown population parameter with some specified level of probability
normal distributions 正态分布
正态分布是用来刻画随机变量取值概率密度的函数。横坐标是随机变量可能的取值,纵坐标是概率密度,而曲线与横坐标上某一段变量的取值所围成的面积,积为随机变量落在这一区间的概率
正态分布的曲线是一条关于 $ \mu \mu\mu$比较远的两头概率比较小
记作:

approximately 68% of all observations fall in the interval( )
approximately 95% of all observations fall in the interval( )
approximately 99% of all observations fall in the interval( )
三个分步
| 均值 | 标准差 | 形态 | |
|---|---|---|---|
| 总体 | 任意 | ||
| 样本 | S | 任意 | |
| (估计量) | 正态分布(n>30) |
sampling distribution (样本统计量的分步)
the distribution of all the distinct possible values that the statistic can assume when computed from samples of the same size randomly drawn from the same population
sampling error
sampling error is the difference between the estimator and the population parameter
standard error( 标准误)
standard deviation of the sample mean
interval estimate——置信区间的构建
A confidence interval for a parameter has the following structure:
- $\alpha $ = significant level(显著性水平),probability that the interval cannot covered the population parameter
- 1- = confident level(置信度),probability that the interval covered the population parameter
- point estimate = a value of a sample statistic
- reliablity factor = a number based on degree of confidence () for the confidence interval
- standard error = the standard deviation of the sample statistic
假设检验
条件
- 证明比证伪难
- 小概率事件很难发生
假设检验重要原理
由于证明比证伪难,所以把描述分为两个对立面:
想要去检验的,叫做原假设(null hypothesis)
等待拒绝原假设从而去接受的结论,叫备择假设(alternative hypothesis)
否定原假设——小概率事件难以发生
相关关系
统计的重要目标之一,是旨在寻找不同的变量之间存在的相关关系,然后判断一个因素是否会影响另一个因素
一旦相关关系得到证实,接下来就能够尝试判断其中是否蕴含着某些潜在的原因
covariance 协方差
it measures the co-movements of 2 random variables
covariance > 0
两个变量同时大于或者同时小于它们各自的期望值
covariance < 0
一个变量大于其期望而另一个倾向于小于其期望值,或者相反
covariance = 0
两个变量之间没有线性关系
covariance的取值范围:从正无穷到负无穷
局限性
由于其取值范围为正无穷到负无穷,使其数值大小不具备可比性,由此导致协方差难以用于数据之间的横向比较
corelation 相关系数
a standarlized measure of the linear relationship between two variables:
values range from +1 to -1,it has no units
a correlation of 0 indicates an absence of any linear relationship between the variables
the bigger the absolute value of correlation coefficient,the stronger linear relationship
贝叶斯公式
本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!