假设检验代码篇
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?假設檢驗代碼篇
假設檢驗常見的有單樣本T-檢驗、雙樣本T-檢驗、成對T-檢驗、方差分析等。詳細見如下代碼部分。
from scipy import stats import pandas as pd# 1 One-Sample T-Test #原假設為住院女醫生的血壓與一般人群的血壓無顯著差異,即和一般人群的血壓(120)差異不大,以下為血壓數據: female_doctor_bps = [128, 127, 118, 115, 144, 142, 133, 140, 132, 131,111, 132, 149, 122, 139, 119, 136, 129, 126, 128]d = pd.DataFrame(female_doctor_bps); d.columns=["amt"] d_ref=120 d_std=d.std()[0] d_n =d.shape[0] ##d_free=d.shape[0]-1 d_se=d_std/(d_n**0.5) d_tvalue=(d.mean()[0]-d_ref)/(d_se) print("one-sampe T-test:\tT values is:"+str(d_tvalue)) print(stats.ttest_1samp(female_doctor_bps, 120)) ## 本例p值為0.0002,遠低于0.05或0.01的標準閾值,因此我們拒絕原假設,可以認為住院女醫生的靜息收縮壓與一般人群有差異。# 2 Two-sample T-test female_doctor_bps = [128, 127, 118, 115, 144, 142, 133, 140, 132, 131,111, 132, 149, 122, 139, 119, 136, 129, 126, 128]male_consultant_bps = [118, 115, 112, 120, 124, 130, 123, 110, 120, 121,123, 125, 129, 130, 112, 117, 119, 120, 123, 128]d_femal=pd.DataFrame(female_doctor_bps) d_male=pd.DataFrame(male_consultant_bps) d_femal_mean=d_femal.mean()[0] d_male_mean=d_male.mean()[0] d_femal_var = d_femal.var()[0] d_male_var = d_male.var()[0] d_femal_n = d_femal.shape[0] d_male_n = d_male.shape[0] d_sp=((d_femal_n-1)*d_femal_var + (d_male_n-1)*d_male_var)/(d_femal_n+d_male_n-2) d_t = (d_femal_mean - d_male_mean)/((d_sp*(1/d_femal_n+1/d_male_n))**0.5) print("Two-sample T-test:\tT values is:"+str(d_t)) print(stats.ttest_ind(female_doctor_bps, male_consultant_bps)) #p值是0.0012,這比標準閾值低于0.05或0.01,所以我們拒絕零假設,我們可以說女醫生和男醫生的舒張壓有顯著差異。#3 Paired T-Test control = [8.0, 7.1, 6.5, 6.7, 7.2, 5.4, 4.7, 8.1, 6.3, 4.8] treatment = [9.9, 7.9, 7.6, 6.8, 7.1, 9.9, 10.5, 9.7, 10.9, 8.2]d_control=pd.DataFrame(control) d_treatment=pd.DataFrame(treatment) d_diff = d_treatment - d_control d_mean = d_diff.mean()[0] d_treatment_std = d_diff.std()[0] d_treatment_n = d_treatment.shape[0] d_t = (d_mean)/(d_treatment_std/(d_treatment_n**0.5)) print("Paired T-Test:\tT values is:"+"\t"+str(d_t)) print(stats.ttest_rel(control, treatment))#p值為0.0055,低于0.05或0.01的標準閾值,因此我們拒絕原假設,我們可以說,由安眠藥引起的睡眠時間有差異。# 4 Analysis of Variance (ANOVA) #ctrl = [4.17, 5.58, 5.18, 6.11, 4.5, 4.61, 5.17, 4.53, 5.33, 5.14] #trt1 = [4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69] #trt2 = [6.31, 5.12, 5.54, 5.5, 5.37, 5.29, 4.92, 6.15, 5.8, 5.26]## 這里樣本量是一樣的,每組的樣本量可以不一樣。 ctrl = [4.17, 5.58, 5.18] trt1 = [4.81, 4.17, 4.41] trt2 = [6.31, 5.12, 5.54]d_group = 3 d_ctr1 = pd.DataFrame(ctrl) d_trt1 = pd.DataFrame(trt1) d_trt2 = pd.DataFrame(trt2) ## 樣本相加除以總樣本數,總體均值(總共9個樣本) d_total_mean=(d_ctr1.sum()[0]+d_trt1.sum()[0]+d_trt2.sum()[0])/d_ctr1.shape[0]/d_group ##print(d_total_mean) d_ctr1_mean=d_ctr1.mean()[0] d_trt1_mean=d_trt1.mean()[0] d_trt2_mean=d_trt2.mean()[0] d_ctr1_n=d_ctr1.shape[0] d_trt1_n=d_trt1.shape[0] d_trt2_n=d_trt2.shape[0] # ## 組間平方和(SSA) d_ssa=(d_ctr1_mean-d_total_mean)**2*d_ctr1_n+ \ (d_trt1_mean-d_total_mean)**2*d_trt1_n+ \ (d_trt2_mean-d_total_mean)**2*d_trt2_n##print("組間平方和(SSA):\t"+str(d_ssa))## 組內平方和(SSE): d_sse=(4.17-d_ctr1_mean)**2+(5.58-d_ctr1_mean)**2+(5.18-d_ctr1_mean)**2+\ (4.81-d_trt1_mean)**2+(4.17-d_trt1_mean)**2+(4.41-d_trt1_mean)**2+\ (6.31-d_trt2_mean)**2+(5.12-d_trt2_mean)**2+(5.54-d_trt2_mean)**2##print("組內平方和(SSE):\t" + str(d_sse))#總體平方和(SST): d_sst = (4.17-d_total_mean)**2+(5.58-d_total_mean)**2+(5.18-d_total_mean)**2 +\(4.81-d_total_mean)**2+(4.17-d_total_mean)**2+(4.41-d_total_mean)**2 +\(6.31-d_total_mean)**2+(5.12-d_total_mean)**2+(5.54-d_total_mean)**2 ##print("總體平方和(SST):\t"+str(d_sst))#組間均方(MSA) = SSA/自由度 d_msa = d_ssa/(d_group-1) #組內均方(MSE) = SSE/自由度 d_mse = d_sse/(d_ctr1_n+d_ctr1_n+d_ctr1_n-d_group) #MSA又稱為組間方差,MSE稱為組內方差 d_f = d_msa/d_mse print("Analysis of Variance (ANOVA) f values:\t"+str(d_f)) print(stats.f_oneway(ctrl, trt1, trt2))# 5 chi-squared test w from scipy.stats import chi2_contingency from scipy.stats import chi2 table = [ [10, 20, 30],[6, 9, 17]] stat, p, dof, expected = chi2_contingency(table) print('dof=%d' % dof) #degrees of freedom: (rows - 1) * (cols - 1) ##print(expected) 打印每列的期望值 # 以第一列第一行為例,算期望值 print("第一行第一列期望值:\t"+str('%.8f'%((10+6)/(10+6+20+9+30+17)*(10+20+30) )))#[10.43478261 18.91304348 30.65217391] #[5.56521739 10.08695652 16.34782609] print('卡方值:\t'+str('%.10f'%( (10-10.43478261)**2/(10.43478261)+(20-18.91304348)**2/(18.91304348)+(30-30.65217391)**2/(30.65217391)+ (6-5.56521739)**2/(5.56521739)+(9-10.08695652)**2/(10.08695652)+(17-16.34782609)**2/(16.34782609) ))) prob = 0.95 critical = chi2.ppf(prob, dof) print('probability=%.3f, critical=%.3f, stat=%.8f' % (prob, critical, stat)) #這里p值大于0.05,所以接受原假設,即兩樣本之間沒有顯著差異,樣本均值無差異if abs(stat) >= critical:print('Dependent (reject H0)') else:print('Independent (fail to reject H0)') # interpret p-value alpha = 1.0 - prob print('significance=%.3f, p=%.3f' % (alpha, p)) if p <= alpha:print('Dependent (reject H0)') else:print('Independent (fail to reject H0)')?執行結果:
"F:\Python37\python.exe" E:/hypothesistest.py
one-sampe T-test:?? ?T values is:4.512403659336718
Ttest_1sampResult(statistic=4.512403659336718, pvalue=0.00023838063630967753)
Two-sample T-test:?? ?T values is:3.5143256412718564
Ttest_indResult(statistic=3.5143256412718564, pvalue=0.0011571376404026158)
Paired T-Test:?? ?T values is:?? ?3.624485995178213
Ttest_relResult(statistic=-3.6244859951782136, pvalue=0.0055329408161001415)
Analysis of Variance (ANOVA) f values:?? ?3.23528624933119
F_onewayResult(statistic=3.2352862493311934, pvalue=0.11137675915188745)
dof=2
第一行第一列期望值:?? ?10.43478261
卡方值:?? ?0.2715746509
probability=0.950, critical=5.991, stat=0.27157465
Independent (fail to reject H0)
significance=0.050, p=0.873
Independent (fail to reject H0)
Process finished with exit code 0
?
相關配圖:
1 One-Sample T-Test
注:SE即standard error 即樣本的標準誤
2?Two-sample T-test
注: 適用于判斷兩個樣本是否獨立或者相關?
?3?Paired T-Test
注: 1 其中??是兩個成對樣本均值的差,s是兩個樣本對應相減算出的標準差。
? ? ? ? 2 適用于比較兩個相關的樣本,比如測試前后的變化。?
4?Analysis of Variance (ANOVA)
計算過程詳見代碼部分。
5 chi-squared test
每個A對應的期望值T:?(所在的縱向和/總和)*所在的橫向和
卡方值0.27157465在0.21和0.45之間,所以p值在0.80和0.90之間。通過計算的0.873
6 refer
Comparative Statistics in Python using SciPy – Ben Alex Keen
方差分析_張俊紅-CSDN博客_方差分析csdn
卡方檢驗(詳解)_ludan_xia的博客-CSDN博客_卡方檢驗
A Gentle Introduction to the Chi-Squared Test for Machine Learning
總結
- 上一篇: 农行贵金属追加资金是什么意思
- 下一篇: RASA NLU Chi安装