當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

商务与经济统计（13版，Python）笔记 01-02章

發(fā)布時間：2023/12/20 python 29 豆豆

生活随笔收集整理的這篇文章主要介紹了商务与经济统计（13版，Python）笔记 01-02章小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

第1章數(shù)據(jù)與統(tǒng)計資料
- 1.1 統(tǒng)計學(xué)在商務(wù)經(jīng)濟中的應(yīng)用
- 1.2 數(shù)據(jù)
- 1.3 數(shù)據(jù)來源
- 1.4 描述統(tǒng)計
- 1.5 統(tǒng)計推斷
- 1.6 邏輯分析方法
- 1.7 大數(shù)據(jù)與數(shù)據(jù)挖掘
- 1.8 計算機與統(tǒng)計分析
- 1.9 統(tǒng)計實踐的道德準(zhǔn)則
第2章描述統(tǒng)計學(xué)1:表格法和圖形法
- 2.1 匯總分類變量的數(shù)據(jù)
- - 條形圖及樣例（bar chart）
  - 餅形圖及樣例（pie chart）
- 2.2 匯總數(shù)量變量的數(shù)據(jù)
- - 單變量：打點圖（dot plot）
  - 單變量：直方圖（histogram）
  - 單變量：累積分布（displot）
  - 單變量：莖葉顯示（stem-and-leaf display）
- 2.3 用表格方法匯總兩個變量的數(shù)據(jù)
- - 交叉分組表（crosstabulation）
- 2.4 用圖形顯示方法匯總兩個變量的數(shù)據(jù)
- - 散點圖（scatter diagram)和趨勢線(trendline)
  - 復(fù)合條形圖（side-by-side bar chart)和結(jié)構(gòu)條形圖(stacked chart)
- 2.5 數(shù)據(jù)可視化：創(chuàng)建有效圖形顯示的最佳實踐
- - 創(chuàng)建有效的圖形顯示

第一次讀本書的時候，因為有大學(xué)課程的基礎(chǔ)，更關(guān)注于技術(shù)性的內(nèi)容和理解，而忽略了看似簡單的基礎(chǔ)知識。實際上這應(yīng)該是入門新手的通病，總是著眼于實用性內(nèi)容，而忽略基礎(chǔ)知識。雖然這樣做有助于維持學(xué)習(xí)興趣，幫助新人堅持到入門，然后在實踐之中反過來學(xué)習(xí)基礎(chǔ)知識。但是最好在第一次學(xué)習(xí)就能認(rèn)識到基礎(chǔ)知識的重要性，并且盡量掌握。最好的辦法就是做習(xí)題。
最初是為了學(xué)習(xí)數(shù)據(jù)分析，然而當(dāng)業(yè)內(nèi)人士說數(shù)據(jù)分析最重要的知識是‘描述統(tǒng)計學(xué)’，我記憶中卻是將其歸為顯淺知識，囫圇吞棗。

第1章數(shù)據(jù)與統(tǒng)計資料

1.1 統(tǒng)計學(xué)在商務(wù)經(jīng)濟中的應(yīng)用

會計、財務(wù)、市場營銷、生產(chǎn)、經(jīng)濟、信息系統(tǒng)

1.2 數(shù)據(jù)

數(shù)據(jù)、數(shù)據(jù)集、個體、變量、觀測值、分類型數(shù)據(jù)、分類變量、數(shù)量型數(shù)據(jù)、數(shù)量變量、截面數(shù)據(jù)、時間序列數(shù)據(jù)
**1.2.2 測量尺度**
名義尺度、順序尺度、間隔尺度、比率尺度按順序?qū)訉影?br /> 其中，順序尺度加減無意義，間隔尺度乘除無意義，只有間隔尺度、比例尺度有計量單位測量尺度

1.3 數(shù)據(jù)來源

來源有：現(xiàn)有來源、觀測性研究、實驗，需要注意：時間與成本問題、數(shù)據(jù)采集誤差

1.4 描述統(tǒng)計

將數(shù)據(jù)以表格、圖形或數(shù)值形式匯總的統(tǒng)計方法

1.5 統(tǒng)計推斷

總體、樣本、普查、抽樣調(diào)查
統(tǒng)計學(xué)的一個主要貢獻就是利用樣本數(shù)據(jù)對總體特征進行估計和假設(shè)檢驗，即統(tǒng)計推斷

1.6 邏輯分析方法

邏輯分析方法包括：
描述性分析對過去數(shù)據(jù)的分析、BI、或復(fù)盤
預(yù)測性分析預(yù)測，或指出變量之間的影響
規(guī)范性分析產(chǎn)生一個最佳行動過程的分析技術(shù)集合，即在實際條件約束情況下的行動指導(dǎo)

1.7 大數(shù)據(jù)與數(shù)據(jù)挖掘

大數(shù)據(jù)容量（volume）、速度（velocity）、種類（variety），3V
數(shù)據(jù)挖掘data mining，從龐大的數(shù)據(jù)庫中自動提取預(yù)測性的信息

1.8 計算機與統(tǒng)計分析

1.9 統(tǒng)計實踐的道德準(zhǔn)則

統(tǒng)計是搜集、分析、表述、和解析數(shù)據(jù)的藝術(shù)和科學(xué)

第2章描述統(tǒng)計學(xué)1:表格法和圖形法

2.1 匯總分類變量的數(shù)據(jù)

頻數(shù)分布、相對頻數(shù)分布、百分比頻數(shù)分布

條形圖及樣例（bar chart）

條形圖（bar chat）描述：頻數(shù)分布、相對頻數(shù)分布、百分比頻數(shù)分布，分類變量的條形圖，應(yīng)該有一定的間隔
matplotlib.bar（有樣例）基本用法：

from matplotlib import pyplot as plt x,y,x2,y2= [5,8,10] ,[12,16,6],[6,9,11] ,[6,15,7] plt.bar(x, y, align = 'center') plt.bar(x2, y2, color = 'g', align = 'center') plt.title('Bar graph') plt.ylabel('Y axis') plt.xlabel('X axis') plt.show()

極坐標(biāo)條形圖：

import numpy as np import matplotlib.pyplot as plt np.random.seed(19680801) N = 20 theta = np.linspace(0.0, 2 * np.pi, N, endpoint=False) radii = 10 * np.random.rand(N) width = np.pi / 4 * np.random.rand(N) colors = plt.cm.viridis(radii / 10.) ax = plt.subplot(111, projection='polar') ax.bar(theta, radii, width=width, bottom=0.0, color=colors, alpha=0.5) plt.show()

seaborn.barplot（有樣例）就簡單多了：

ax = sns.barplot(x="day", y="total_bill", hue="sex", data=tips)

餅形圖及樣例（pie chart）

餅形圖（pie chat）描述：相對頻數(shù)分布、百分比頻數(shù)分布（相對角度差異，人更能判斷長度間的差異，所以最好標(biāo)注比例）
matplotlib.pyplot.pie（有樣例），個人覺得不錯的3各樣例（后附代碼）：

import matplotlib.pyplot as plt labels = 'Frogs', 'Hogs', 'Dogs', 'Logs' sizes = [15, 30, 45, 10] explode = (0, 0.1, 0, 0) # only "explode" the 2nd slice (i.e. 'Hogs') fig1, ax1 = plt.subplots() ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',shadow=True, startangle=90) ax1.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle. plt.show() import numpy as np import matplotlib.pyplot as plt fig, ax = plt.subplots(figsize=(6, 3), subplot_kw=dict(aspect="equal")) recipe = ["375 g flour","75 g sugar","250 g butter","300 g berries"] data = [float(x.split()[0]) for x in recipe] ingredients = [x.split()[-1] for x in recipe] def func(pct, allvals):absolute = int(pct/100.*np.sum(allvals))return "{:.1f}%\n({:d} g)".format(pct, absolute) wedges, texts, autotexts = ax.pie(data, autopct=lambda pct: func(pct, data),textprops=dict(color="w")) ax.legend(wedges, ingredients,title="Ingredients",loc="center left",bbox_to_anchor=(1, 0, 0.5, 1)) plt.setp(autotexts, size=8, weight="bold") ax.set_title("Matplotlib bakery: A pie") plt.show() fig, ax = plt.subplots(figsize=(6, 3), subplot_kw=dict(aspect="equal")) recipe = ["225 g flour","90 g sugar","1 egg","60 g butter","100 ml milk","1/2 package of yeast"] data = [225, 90, 50, 60, 100, 5] wedges, texts = ax.pie(data, wedgeprops=dict(width=0.5), startangle=-40) bbox_props = dict(boxstyle="square,pad=0.3", fc="w", ec="k", lw=0.72) kw = dict(arrowprops=dict(arrowstyle="-"),bbox=bbox_props, zorder=0, va="center") for i, p in enumerate(wedges):ang = (p.theta2 - p.theta1)/2. + p.theta1y = np.sin(np.deg2rad(ang))x = np.cos(np.deg2rad(ang))horizontalalignment = {-1: "right", 1: "left"}[int(np.sign(x))]connectionstyle = "angle,angleA=0,angleB={}".format(ang)kw["arrowprops"].update({"connectionstyle": connectionstyle})ax.annotate(recipe[i], xy=(x, y), xytext=(1.35*np.sign(x), 1.4*y),horizontalalignment=horizontalalignment, **kw) ax.set_title("Matplotlib bakery: A donut") plt.show()

Pandas 畫圖一個函數(shù)應(yīng)該夠用了，參數(shù)詳解

DataFrame.plot(x=None, y=None, kind='line', ax=None, subplots=False, sharex=None, sharey=False, layout=None,figsize=None, use_index=True, title=None, grid=None, legend=True, style=None, logx=False, logy=False, loglog=False, xticks=None, yticks=None, xlim=None, ylim=None, rot=None,xerr=None,secondary_y=False, sort_columns=False, **kwds)

樣例 Matplotlib examples
樣例 Seaborn Example gallery

2.2 匯總數(shù)量變量的數(shù)據(jù)

組數(shù)、組寬、組限、組中值、相對頻數(shù)分布、百分比頻數(shù)分布、累積頻數(shù)分布

單變量：打點圖（dot plot）

使用 matplotlib.scatter,seaborn.swarmplot模擬

import numpy as np import matplotlib.pyplot as plt import seaborn as sns import pandas as pdfrom matplotlib.pyplot import MultipleLocator fig,ax=plt.subplots(1,2,figsize=(12,2)) np.random.seed(1900) x=np.random.randint(1,99,size=20) data=pd.DataFrame(x,columns=['x']) data['y']=1 for i in range(len(data)):data['y'].at[i]=data['x'].iloc[:i+1][data['x'].iloc[:i+1]==data['x'].at[i]].count() plt.subplot(121)plt.scatter(data['x'],data['y']) plt.tick_params(axis='both',which='major') #刻度設(shè)置 # y_major_locator=MultipleLocator(1) # x_major_locator=MultipleLocator(10) # ax[0]=plt.gca() # ax[0].xaxis.set_major_locator(y_major_locator) # ax[0].xaxis.set_major_locator(x_major_locator) sns.swarmplot(x="x", y="y",palette=["r", "c", "y"],data=data,ax=ax[1]) plt.show()

單變量：直方圖（histogram）

與條形圖原理一樣，只是數(shù)量型變量進行分組，方條之間無間隔

from matplotlib import pyplot as plt import numpy as np np.random.seed(1900) x=np.random.randint(1,99,size=50) plt.hist(x, bins = [0,20,40,60,80,100]) plt.show()

單變量：累積分布（displot）

累積分布如果使用matplotlib則需要計算累積量，使用seaborn.displot，一口氣能畫4張圖Distribution plot options

import numpy as np import seaborn as sns import matplotlib.pyplot as plt sns.set(style="white", palette="muted", color_codes=True) rs = np.random.RandomState(10) f, axes = plt.subplots(2, 2, figsize=(7, 7), sharex=True) sns.despine(left=True) d = rs.normal(size=100) sns.distplot(d, kde=False, color="b", ax=axes[0, 0]) sns.distplot(d, hist=False, rug=True, color="r", ax=axes[0, 1]) sns.distplot(d, hist=False, color="g", kde_kws={"shade": True}, ax=axes[1, 0]) sns.distplot(d, color="m", ax=axes[1, 1]) plt.setp(axes, yticks=[]) plt.tight_layout()

單變量：莖葉顯示（stem-and-leaf display）

暫時沒找到莖葉圖的庫，手動實現(xiàn)

0 | 6 9 8 4
1 | 6 3 7 3 6 1 2
2 | 5 5 9 2
3 | 2 8 0 4
4 | 9 9
5 | 1 5 2 4 9 8 6
6 | 3 6 2
7 | 3 2 1 2
8 | 9 4 1 3 0 7 7 1 9 3 1
9 | 6 2 7 8

import numpy as np np.random.seed(2019) data=np.random.randint(1,99,size=50) _stem=[] for x in data:_stem.append(x//10)stem=list(set(_stem)) for m in stem:leaf=[]leaf.append(m)for n in data:if n//10==m:leaf.append(n%10)print(leaf[0],'|',end=' ')for i in range(1,len(leaf)):print(leaf[i],end=' ')print('\n')

2.3 用表格方法匯總兩個變量的數(shù)據(jù)

辛普森悖論：依據(jù)綜合和未綜合的數(shù)據(jù)得到相反的結(jié)論。（原因是未綜合的變量，本身權(quán)重不等）

交叉分組表（crosstabulation）

使用pandas.corsstab模擬了一下書上的表格:

import numpy as np import pandas as pd np.random.seed(900) y=np.random.randint(0,3,size=300) z=np.random.randint(11,49,size=300) data=pd.DataFrame({'質(zhì)量等級':y,'餐價':z}) data['質(zhì)量等級'].replace({0:'好',1:'很好',2:'優(yōu)秀'},inplace=True) bins=[10,19,29,39,49] quartiles = pd.cut(data['餐價'], bins,labels=['10~19','20~29','30~39','40~49']) data['餐價']=quartiles pd.crosstab(data['質(zhì)量等級'],data['餐價'],margins=True,margins_name='總計')

2.4 用圖形顯示方法匯總兩個變量的數(shù)據(jù)

散點圖（scatter diagram)和趨勢線(trendline)

帥氣的散點圖（matplotlib中，趨勢線要用numpy.ployfit函數(shù)）：

import matplotlib.pyplot as plt import numpy as np np.random.seed(19680801) x = np.arange(0.0, 50.0, 2.0) y = x ** 1.3 + np.random.rand(*x.shape) * 30.0 s = np.random.rand(*x.shape) * 800 + 500 colors = np.random.rand(*x.shape) plt.figure(figsize=(12,6)) plt.scatter(x, y, s, c=colors,alpha=0.5, marker=r'$\clubsuit$',label="Luck") p1 = np.poly1d(np.polyfit(x, y, 1)) l1=plt.plot(x,p1(x),'r--',label='trendline') plt.xlabel("Leprechauns") plt.ylabel("Gold") plt.legend(loc='upper left') plt.show()

使用seaborn庫則可以更加絢麗（sns.jointplot太占位置了，沒畫）：

import seaborn as sns; sns.set() import matplotlib.pyplot as plt fig,axes=plt.subplots(2,2,figsize=(12,6)) tips = sns.load_dataset("tips") cmap = sns.cubehelix_palette(dark=.3, light=.8, as_cmap=True) sns.scatterplot(x="total_bill", y="tip",hue="time", data=tips,ax=axes[0,0]) sns.residplot(x="total_bill", y="tip", data=tips,ax=axes[0,1]) sns.regplot(x="size", y="total_bill", data=tips, x_jitter=.1,ax=axes[1,1]) sns.lmplot(x="size", y="total_bill", hue="day", col="day",data=tips, height=6, aspect=.4, x_jitter=.1) #sns.jointplot("total_bill", "tip", data=tips, kind="reg", # xlim=(0, 60), ylim=(0, 12), color="m", height=7)

復(fù)合條形圖（side-by-side bar chart)和結(jié)構(gòu)條形圖(stacked chart)

matplotlib做這種復(fù)合圖，有點復(fù)雜，附上鏈接
Stacked Bar Graph
Grouped bar chart with labels
Discrete distribution as horizontal bar chart
首先使用，pandas畫圖，還是2.3模擬表格的數(shù)字，這次用groupby聚合，然后增加匯總，轉(zhuǎn)置

import numpy as np import pandas as pd import matplotlib.pyplot as plt pd.set_option('precision',1)#設(shè)置小數(shù)位 np.random.seed(900) y=np.random.randint(0,3,size=300) z=np.random.randint(11,49,size=300) data=pd.DataFrame({'質(zhì)量等級':y,'餐價':z}) data['質(zhì)量等級'].replace({0:'好',1:'很好',2:'優(yōu)秀'},inplace=True) bins=[10,19,29,39,49] quartiles = pd.cut(data['餐價'], bins,labels=['10~19','20~29','30~39','40~49']) df=data.groupby(['質(zhì)量等級',quartiles]).count().unstack() df=df.apply(lambda x: x/x.sum()*100) df.loc['總計'] = df.apply(lambda x: x.sum())#總計，作圖時候不需要 df.T.plot(kind='bar',stacked=True)

分組的條形圖，seaborn庫寫得少，圖多：

import matplotlib.pyplot as plt import seaborn as sns sns.set(style="darkgrid") fig,(ax1,ax2)=plt.subplots(1,2,figsize=(12,6)) tips = sns.load_dataset("tips") sns.countplot(y="day", hue="sex", data=tips,ax=ax1) sns.barplot(x="day", y="total_bill", data=tips,ax=ax2) sns.catplot(x="sex", y="total_bill",hue="smoker", col="time",data=tips, kind="bar",height=4, aspect=.7) g = sns.FacetGrid(tips, row="sex", col="time", margin_titles=True) bins = np.linspace(0, 60, 13) g.map(plt.hist, "total_bill", color="steelblue", bins=bins)

結(jié)構(gòu)條形圖：

import seaborn as sns import matplotlib.pyplot as plt sns.set(style="whitegrid") f, ax = plt.subplots(figsize=(15, 6)) crashes = sns.load_dataset("car_crashes").sort_values("total", ascending=False) sns.set_color_codes("pastel") sns.barplot(y="total", x="abbrev", data=crashes,label="Total", color="b") sns.set_color_codes("muted") sns.barplot(y="alcohol", x="abbrev", data=crashes,label="Alcohol-involved", color="b") ax.legend(ncol=2, loc="lower right", frameon=True) ax.set(xlim=(0, 24), ylabel="",xlabel="Automobile collisions per billion miles") sns.despine(left=True, bottom=True)

2.5 數(shù)據(jù)可視化：創(chuàng)建有效圖形顯示的最佳實踐

創(chuàng)建有效的圖形顯示

1、給予圖形顯示一個清晰、簡明的標(biāo)題。
2、使圖形顯示保持簡潔，當(dāng)能用二維表示時不要用三維表示。
3、每個坐標(biāo)有清楚的標(biāo)記，并給出測量單位。
4、如果使用顏色來區(qū)分類別，要確保顏色是不同的。
5、如果使用多種顏色或線型，用圖例來標(biāo)明時，要將圖例靠近所表示的數(shù)據(jù)。

總結(jié)

以上是生活随笔為你收集整理的商务与经济统计（13版，Python）笔记 01-02章的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： 2月4日星期三
下一篇： websocket python爬虫_p