Quantitative Methods

基础概念

Random Variable

随机变量（研究对象）
A quantity whose future outcomes are uncertain

Outcomes

随机变量可能的取值
Possible values of a random variable

Outcome Space

样本空间 $\Omega$ (所有的可能)
a set contains all possible outcomes

Event

事件
a specified set of outcomes

Probability

概率
a measure quantifies the likelihood that events will occur

事件之间的关系

Independent(独立)

occurrence of A isn’t related to B

Dependent（非独立）

occurrence of A is related to B

Mutually exclusive(互斥)

Only one event can occur at a time

Exhaustive（遍历）

条件概率和联合概率的关系

图形表示

联合概率：

重叠部分的阴影面积（或重叠部分阴影面积除以总面积）

条件概率

不能直接使用面积表示，它是“面积的比值”

公式表示

条件概率

P(A|B) = \frac{P(AB)}{P(B)}

联合概率

P(AB)=P(B)*P(A|B)

全概率公式

P(A) = P(A|S_1) * P(S_1)+ P(A|S_2) * P(S_2)+P(A|S_n) * P(S_n)

某事件发生概率等于不同情形此事件发生的联合概率相加。
S1 S2 Sn are mutually exclusive and exhaustive.(遍历且互斥)

Descriptive Statistics(描述性统计)

两个重要角度

Central Tendency 中心趋势
Dispersion 离散程度

中心趋势指标对比

Mean（算术平均数）

容易计算
使用每一个观测值以及观测值数量
收到计算值影响大

Mode（众数）

可能会有一个或多个众数
或者没有众数
受极端值影响最小

Median（中位数）

奇数个数值：位于（n+1）/2个位置
偶数个数值：使用n/个数和（n+2）/2个数相加除以2
基本不受极端值影响小

Expected Value

the expected value of a random variable X having possible values $x_1,x_2,x_3 \dots ,x_n$ is defined as:

E(X) = x_1 P(X=x_1) + x_2 P(X=x_2) + \dots +x_n P(X=x_n)

Range 极差

Range = Maximum Value - Minimum Value$$\ ## Mean Absolute Deviation 绝对平均离差 $$MAD = \frac{\sum^{n}{i=1}|X_i - \bar{X}|}{N}

where $\bar{X}$ is mean and n is the observations

Variance 方差

equals to average of the sum of squared deviations around the mean

\sigma^2 = \frac{\sum^{N}_{i=1}(X_i -\mu)^2}{N}

s^2 = \frac{\sum^{N}_{i=1}(X_i -\mu)^2}{n-1}

注意总体和样本方差的区别

graph TB;
    population-->parameter;
    population --sampling-->sample;
    sample-->estimator;
    estimator --estimation-->parameter;

sampling and estimation

why sampling?

examining every member of the population would not be economically efficient
we cannot possibly examine every member of the population

抽样的方法

simple random sampling

a subset of larger population created in such a way that each element of the population has an equal probability of being selected to the subset

stratified random sampling

the population divided into subpopulations based on one or more classification criteria. then samples are pooled from these stratified subpopulations

survivorship bias

拿到样本算什么

estimator

formulas to compute the sample statistics to estimate the population parameter
an estimator is a random variable,so it has a sampling distribution

estimate

particular value calculated from sample observations using an estimator

unbiasedness 无偏性

设 $A'= g(X_1,X_2, \dots X_N)$ 是未知参数 $A$ 的一个估计量，若 $A'$ 满足 $E(A') = A$ ，则称 $A'$ 为 $A$ 的无偏估计量。

efficiency 有效性

对同一总体参数的两个无偏估计量，有更小的标准差的估计量更有效。

consistency 一致性

随着样本容量的增大，估计量的值越来越接近被估计的总体参数

estimation

point estimate

the calculated value of the sample statistic in a given sample, used as an estimate of the population parameter

interval estimate

calculating a range of values that brackets the unknown population parameter with some specified level of probability

normal distributions 正态分布

正态分布是用来刻画随机变量取值概率密度的函数。横坐标是随机变量可能的取值，纵坐标是概率密度，而曲线与横坐标上某一段变量的取值所围成的面积，积为随机变量落在这一区间的概率
正态分布的曲线是一条关于 $ \mu $对称的钟形曲线。特点是“两头小，中间大，左右对称” 随机变量落在$ \mu $附近的概率比较大，落在离$ \mu$比较远的两头概率比较小
记作：

X~(\mu ,\sigma^2)

avtor

approximately 68% of all observations fall in the interval( $\mu + \sigma \ or\ \mu - \sigma$ )
approximately 95% of all observations fall in the interval( $\mu + 1.96\sigma \ or\ \mu - 1.96\sigma$ )
approximately 99% of all observations fall in the interval( $\mu + 2.58\sigma \ or\ \mu - 2.58\sigma$ )

三个分步

	均值	标准差	形态
总体	$\mu$	$\sigma$	任意
样本	$\bar{X}$	S	任意
$\bar{X}$ (估计量)	$\mu$	$\frac{\sigma}{\sqrt{n}}$	正态分布（n>30）

sampling distribution (样本统计量的分步)

the distribution of all the distinct possible values that the statistic can assume when computed from samples of the same size randomly drawn from the same population

sampling error

sampling error is the difference between the estimator and the population parameter

standard error( 标准误)

standard deviation of the sample mean

\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}

interval estimate——置信区间的构建

A $100(1 - \alpha)%$ confidence interval for a parameter has the following structure:

point\ estimat \pm reliability\ factor * standard\ error

$\alpha $ = significant level(显著性水平),probability that the interval cannot covered the population parameter
1- $\alpha$ = confident level（置信度），probability that the interval covered the population parameter
point estimate = a value of a sample statistic
reliablity factor = a number based on degree of confidence ( $1-\alpha$ ) for the confidence interval
standard error = the standard deviation of the sample statistic

假设检验

条件

证明比证伪难
小概率事件很难发生

假设检验重要原理

由于证明比证伪难，所以把描述分为两个对立面：
想要去检验的，叫做原假设（null hypothesis）
等待拒绝原假设从而去接受的结论，叫备择假设（alternative hypothesis）

否定原假设——小概率事件难以发生

covariance 协方差

Cov(R_i,R_j) = E[(R_i - ER_i)(R_j-ER_j)]

it measures the co-movements of 2 random variables

covariance > 0
两个变量同时大于或者同时小于它们各自的期望值

covariance < 0
一个变量大于其期望而另一个倾向于小于其期望值，或者相反

covariance = 0
两个变量之间没有线性关系

covariance的取值范围：从正无穷到负无穷

局限性

由于其取值范围为正无穷到负无穷，使其数值大小不具备可比性，由此导致协方差难以用于数据之间的横向比较

corelation 相关系数

a standarlized measure of the linear relationship between two variables:

\rho{X,Y} = \frac{Cov(X,Y)}{\sigma_X\sigma_Y}

values range from +1 to -1,it has no units
a correlation of 0 indicates an absence of any linear relationship between the variables
the bigger the absolute value of correlation coefficient,the stronger linear relationship

贝叶斯公式

P(AB) = P(A)*P(B|A) = P(B)*P(A|B)

P(A|B) = \frac{P(B|A)}{P(B)}*P(A)

CFA

考证

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

TOEFL_wordlist_01 上一篇

人像摄影器材篇下一篇

Quantitative Methods

Quantitative Methods

基础概念

Random Variable

Outcomes

Outcome Space

Event

Probability

事件之间的关系

Independent(独立)

Dependent（非独立）

Mutually exclusive(互斥)

Exhaustive（遍历）

条件概率和联合概率的关系

图形表示

联合概率：

条件概率

公式表示

条件概率

联合概率

全概率公式

Descriptive Statistics(描述性统计)

两个重要角度

中心趋势指标对比

Mean（算术平均数）

Mode（众数）

Median（中位数）

Expected Value

Range 极差

Variance 方差

sampling and estimation

why sampling?

抽样的方法

simple random sampling

stratified random sampling

survivorship bias

拿到样本算什么

estimator

estimate

unbiasedness 无偏性

efficiency 有效性

consistency 一致性

estimation

point estimate

interval estimate

normal distributions 正态分布

三个分步

sampling distribution (样本统计量的分步)

sampling error

standard error( 标准误)

interval estimate——置信区间的构建

假设检验

条件

假设检验重要原理

相关关系

covariance 协方差

局限性

corelation 相关系数

贝叶斯公式