python 安卓app 缺点_用python对android APP进行分析2
文章接著前一篇文章《用python對android APP進(jìn)行分析1》的內(nèi)容
轉(zhuǎn)換其他列數(shù)據(jù)類型
data.Reviews=data['Reviews'].astype(np.int,inpalce=True)
data.Reviews.head()
0 159
1 967
2 87510
3 215644
4 967
Name: Reviews, dtype: int32
print(data[~data.Size.str.contains('M')].head())
App Category Rating Reviews \
37 Floor Plan Creator ART_AND_DESIGN 4.1 36639
42 Textgram - write on photos ART_AND_DESIGN 4.4 295221
52 Used Cars and Trucks for Sale AUTO_AND_VEHICLES 4.6 17057
58 Restart Navigator AUTO_AND_VEHICLES 4.0 1403
67 Ulysse Speedometer AUTO_AND_VEHICLES 4.3 40211
Size Installs Type Price Content Rating Genres \
37 Varies with device 5000000 Free 0 Everyone Art & Design
42 Varies with device 10000000 Free 0 Everyone Art & Design
52 Varies with device 1000000 Free 0 Everyone Auto & Vehicles
58 201k 100000 Free 0 Everyone Auto & Vehicles
67 Varies with device 5000000 Free 0 Everyone Auto & Vehicles
Last Updated Current Ver Android Ver installs_range
37 July 14, 2018 Varies with device 2.3.3 and up 百萬+
42 July 30, 2018 Varies with device Varies with device 百萬+
52 July 30, 2018 Varies with device Varies with device 十萬+
58 August 26, 2014 1.0.1 2.2 and up 萬+
67 July 30, 2018 Varies with device Varies with device 百萬+
大體發(fā)現(xiàn)有三種大小,k級的,m級的,不確定的
#定義改變大小統(tǒng)一單位的函數(shù)
def size_normal(x):
if 'M' in x.upper():
return float(x.replace('M',''))*1000
elif 'k' in x.lower():
return float(x.replace('k',''))
else:
return np.nan
data.Size.map(size_normal)[[1,146,10595]]#檢驗是否裝換好
1 14000.0
146 NaN
10595 470.0
Name: Size, dtype: float64
data['size_k']=data.Size.map(size_normal)
print(data.head())
App Category Rating \
0 Photo Editor & Candy Camera & Grid & ScrapBook ART_AND_DESIGN 4.1
1 Coloring book moana ART_AND_DESIGN 3.9
2 U Launcher Lite – FREE Live Cool Themes, Hide ... ART_AND_DESIGN 4.7
3 Sketch - Draw & Paint ART_AND_DESIGN 4.5
4 Pixel Draw - Number Art Coloring Book ART_AND_DESIGN 4.3
Reviews Size Installs Type Price Content Rating \
0 159 19M 10000 Free 0 Everyone
1 967 14M 500000 Free 0 Everyone
2 87510 8.7M 5000000 Free 0 Everyone
3 215644 25M 50000000 Free 0 Teen
4 967 2.8M 100000 Free 0 Everyone
Genres Last Updated Current Ver \
0 Art & Design January 7, 2018 1.0.0
1 Art & Design;Pretend Play January 15, 2018 2.0.0
2 Art & Design August 1, 2018 1.2.4
3 Art & Design June 8, 2018 Varies with device
4 Art & Design;Creativity June 20, 2018 1.1
Android Ver installs_range size_k
0 4.0.3 and up 千+ 19000.0
1 4.0.3 and up 十萬+ 14000.0
2 4.0.3 and up 百萬+ 8700.0
3 4.2 and up 千萬+ 25000.0
4 4.4 and up 萬+ 2800.0
更新時間轉(zhuǎn)換
from dateutil.parser import parse
def time_normal(time):
return parse(time)
data['Last Updated']=data['Last Updated'].map(time_normal)
print(data.head())
App Category Rating \
0 Photo Editor & Candy Camera & Grid & ScrapBook ART_AND_DESIGN 4.1
1 Coloring book moana ART_AND_DESIGN 3.9
2 U Launcher Lite – FREE Live Cool Themes, Hide ... ART_AND_DESIGN 4.7
3 Sketch - Draw & Paint ART_AND_DESIGN 4.5
4 Pixel Draw - Number Art Coloring Book ART_AND_DESIGN 4.3
Reviews Size Installs Type Price Content Rating \
0 159 19M 10000 Free 0 Everyone
1 967 14M 500000 Free 0 Everyone
2 87510 8.7M 5000000 Free 0 Everyone
3 215644 25M 50000000 Free 0 Teen
4 967 2.8M 100000 Free 0 Everyone
Genres Last Updated Current Ver Android Ver \
0 Art & Design 2018-01-07 1.0.0 4.0.3 and up
1 Art & Design;Pretend Play 2018-01-15 2.0.0 4.0.3 and up
2 Art & Design 2018-08-01 1.2.4 4.0.3 and up
3 Art & Design 2018-06-08 Varies with device 4.2 and up
4 Art & Design;Creativity 2018-06-20 1.1 4.4 and up
installs_range size_k
0 千+ 19000.0
1 十萬+ 14000.0
2 百萬+ 8700.0
3 千萬+ 25000.0
4 萬+ 2800.0
更新時間轉(zhuǎn)換為時間格式,此處如果把時間裝換為索引,通時間序列方法進(jìn)行操作,但不做本次分析探討內(nèi)容。
檢查異常值
print(data.describe())
Rating Reviews Installs size_k
count 10841.000000 1.084100e+04 1.084100e+04 9146.000000
mean 4.190739 4.441119e+05 1.546291e+07 21514.504975
std 0.479738 2.927629e+06 8.502557e+07 22588.342683
min 1.000000 0.000000e+00 0.000000e+00 8.500000
25% 4.100000 3.800000e+01 1.000000e+03 4900.000000
50% 4.200000 2.094000e+03 1.000000e+05 13000.000000
75% 4.500000 5.476800e+04 5.000000e+06 30000.000000
max 5.000000 7.815831e+07 1.000000e+09 100000.000000
發(fā)現(xiàn)數(shù)值類型列沒有異常值,price將會在后面內(nèi)容進(jìn)行裝換
刪除重復(fù)值
data.duplicated().sum()
483
data.drop_duplicates(inplace=True)
data.info()
Int64Index: 10358 entries, 0 to 10840
Data columns (total 15 columns):
App 10358 non-null object
Category 10358 non-null object
Rating 10358 non-null float64
Reviews 10358 non-null int32
Size 10358 non-null object
Installs 10358 non-null int32
Type 10358 non-null object
Price 10358 non-null object
Content Rating 10358 non-null object
Genres 10358 non-null object
Last Updated 10358 non-null datetime64[ns]
Current Ver 10350 non-null object
Android Ver 10356 non-null object
installs_range 10358 non-null category
size_k 8832 non-null float64
dtypes: category(1), datetime64[ns](1), float64(2), int32(2), object(9)
memory usage: 1.1+ MB
data.to_csv(r'C:\Users\19078\Desktop\中級\第三關(guān)\android_data.csv',sep=',',encoding='utf_8_sig')#保存數(shù)據(jù)到csv格式
數(shù)據(jù)分析
分類對評論數(shù)數(shù)的影響
a=pd.pivot_table(data,columns='Type',index='Category',values='Reviews',aggfunc='mean').sort_values(by='Free',ascending=False)[:10]
b=pd.pivot_table(data,columns='Type',index='Category',values='Reviews',aggfunc='mean').sort_values(by='Paid',ascending=False)[:10]
a['Free'].plot(kind='bar',rot=60)
b['Paid'].plot(kind='bar',rot=60)
從兩個圖對比發(fā)現(xiàn),不同類型app平均評論數(shù)相差較大,免費方面以游戲,社交,聊天居多,而付費中家庭,游戲,天氣app評論居多,所以app種類和付費類型對評論數(shù)有一定影響。
類別與app軟件大小的關(guān)系
a=pd.pivot_table(data,index='Category',values='size_k',aggfunc='mean').sort_values(by='size_k',ascending=False)[:15]
print(a)
size_k
Category
GAME 44126.850000
FAMILY 27930.435770
TRAVEL_AND_LOCAL 24515.994413
SPORTS 24181.192568
ENTERTAINMENT 22638.805970
PARENTING 22512.962963
FOOD_AND_DRINK 22056.122449
HEALTH_AND_FITNESS 21643.216667
EDUCATION 20076.895833
AUTO_AND_VEHICLES 20037.146667
MEDICAL 19383.681579
FINANCE 17937.730263
SOCIAL 16875.827586
PHOTOGRAPHY 16832.045267
MAPS_AND_NAVIGATION 16614.712963
可以看出不同類型軟件大小也不同,游戲會比較大。同時也發(fā)現(xiàn)app普遍大小都是幾十兆,所以可以了解app趨向的大小也是十幾到及時兆比較合適。
付費軟件中什么類別價格更高
data_paid=data[data.Type.isin(['Paid'])]
print(data_paid.head())
App Category Rating \
234 TurboScan: scan documents and receipts in PDF BUSINESS 4.7
235 Tiny Scanner Pro: PDF Doc Scan BUSINESS 4.8
427 Puffin Browser Pro COMMUNICATION 4.0
476 Moco+ - Chat, Meet People DATING 4.2
477 Calculator DATING 2.6
Reviews Size Installs Type Price Content Rating \
234 11442 6.8M 100000 Paid $4.99 Everyone
235 10295 39M 100000 Paid $4.99 Everyone
427 18247 Varies with device 100000 Paid $3.99 Everyone
476 1545 Varies with device 10000 Paid $3.99 Mature 17+
477 57 6.2M 1000 Paid $6.99 Everyone
Genres Last Updated Current Ver Android Ver installs_range \
234 Business 2018-03-25 1.5.2 4.0 and up 萬+
235 Business 2017-04-11 3.4.6 3.0 and up 萬+
427 Communication 2018-07-05 7.5.3.20547 4.1 and up 萬+
476 Dating 2018-06-19 2.6.139 4.1 and up 千+
477 Dating 2017-10-25 1.1.6 4.0 and up 百+
size_k
234 6800.0
235 39000.0
427 NaN
476 NaN
477 6200.0
data_paid.Price=data_paid.Price.str.replace('$','').astype('float')
a=data_paid.groupby('Category')['Price'].agg(['mean','count']).sort_values(by='mean',ascending=False)[:15]
print(a)
mean count
Category
FINANCE 170.637059 17
LIFESTYLE 124.256316 19
EVENTS 109.990000 1
BUSINESS 14.607500 12
FAMILY 12.945561 187
MEDICAL 12.151071 84
PRODUCTIVITY 8.961786 28
PHOTOGRAPHY 6.111500 20
MAPS_AND_NAVIGATION 5.390000 5
SOCIAL 5.323333 3
PARENTING 4.790000 2
DATING 4.490000 7
EDUCATION 4.490000 4
AUTO_AND_VEHICLES 4.490000 3
HEALTH_AND_FITNESS 4.290000 15
C:\Users\19078\Anaconda3\envs\py\lib\site-packages\pandas\core\generic.py:4405: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self[name] = value
從上述結(jié)果看出,金融理財,生活類和事件類軟件收費較高。
不同類型軟件付費比率
data.size
155370
def p_f_rate(group):
rate=(group[group['Type'].isin(['Paid'])].size)/(group[group['Type'].isin(['Free'])].size)
return rate.round(2)
data.groupby('Category').apply(p_f_rate).sort_values(ascending=False)[:15]
Category
PERSONALIZATION 0.27
MEDICAL 0.26
BOOKS_AND_REFERENCE 0.14
WEATHER 0.11
FAMILY 0.11
TOOLS 0.10
COMMUNICATION 0.08
GAME 0.08
SPORTS 0.07
PRODUCTIVITY 0.07
PHOTOGRAPHY 0.07
LIFESTYLE 0.05
FINANCE 0.05
HEALTH_AND_FITNESS 0.05
ART_AND_DESIGN 0.05
dtype: float64
可以看出付費率高的個性化和醫(yī)療的app,縱觀所有,發(fā)現(xiàn)app不管什么類型,多數(shù)都是免費的,所以互聯(lián)網(wǎng)的免費思維對于運營很關(guān)鍵
總結(jié)
以上是生活随笔為你收集整理的python 安卓app 缺点_用python对android APP进行分析2的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 黑苹果macOS10.13安装记录
- 下一篇: 视频教程-清华-尹成老师-Python爬