数据挖掘 pandas基础入门之操作
生活随笔
收集整理的這篇文章主要介紹了
数据挖掘 pandas基础入门之操作
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
為什么80%的碼農(nóng)都做不了架構(gòu)師?>>> ??
統(tǒng)計(jì)
import pandas import numpy# 通過傳遞一個(gè) numpyarray,時(shí)間索引以及列標(biāo)簽來創(chuàng)建一個(gè)DataFrame: dates = pandas.date_range("20180509", periods=6) df = pandas.DataFrame(numpy.random.randn(6, 4), index=dates, columns=list('ABCD')) print("時(shí)間索引以及列標(biāo)簽來創(chuàng)建一個(gè)DataFrame:", df, sep="\n")# 描述性統(tǒng)計(jì),求每一列的平均數(shù) print("每一列的平均數(shù)", df.mean(), sep="\n")# 其他軸的形同操作 print("每一行的平均數(shù)", df.mean(1), sep="\n")# 對(duì)于擁有不同維度、需要對(duì)其的對(duì)象進(jìn)行操作。Pandas會(huì)自動(dòng)沿著指定的維度進(jìn)行廣播 s = pandas.Series([1, 3, 5, numpy.nan, 6, 8], index=dates).shift(2) # shift函數(shù)主要的功能就是使數(shù)據(jù)框中的數(shù)據(jù)移動(dòng)。 print("行索引不變,移動(dòng)列的數(shù)據(jù)。", s, sep="\n") print("df-s", df.sub(s, axis='index'), sep="\n")# 按照index進(jìn)行匹配,為s補(bǔ)全為一個(gè)矩陣后進(jìn)行計(jì)算,完成兩個(gè)矩陣相減(df-s) "E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py 時(shí)間索引以及列標(biāo)簽來創(chuàng)建一個(gè)DataFrame:A B C D 2018-05-09 0.689544 0.875232 0.452993 1.875628 2018-05-10 -0.216719 0.298931 -1.159366 0.188906 2018-05-11 0.268589 1.206928 -0.119726 -0.148764 2018-05-12 -1.035244 1.092390 1.006421 -0.226186 2018-05-13 0.670916 0.738597 -0.184312 -1.280867 2018-05-14 -0.359534 1.109787 0.650537 -0.030985 每一列的平均數(shù) A 0.002925 B 0.886978 C 0.107758 D 0.062955 dtype: float64 每一行的平均數(shù) 2018-05-09 0.973349 2018-05-10 -0.222062 2018-05-11 0.301757 2018-05-12 0.209345 2018-05-13 -0.013917 2018-05-14 0.342451 Freq: D, dtype: float64 行索引不變,移動(dòng)列的數(shù)據(jù)。 2018-05-09 NaN 2018-05-10 NaN 2018-05-11 1.0 2018-05-12 3.0 2018-05-13 5.0 2018-05-14 NaN Freq: D, dtype: float64A B C D 2018-05-09 NaN NaN NaN NaN 2018-05-10 NaN NaN NaN NaN 2018-05-11 -0.731411 0.206928 -1.119726 -1.148764 2018-05-12 -4.035244 -1.907610 -1.993579 -3.226186 2018-05-13 -4.329084 -4.261403 -5.184312 -6.280867 2018-05-14 NaN NaN NaN NaNProcess finished with exit code 0函數(shù)apply()
import pandas import numpy# 通過傳遞一個(gè) numpyarray,時(shí)間索引以及列標(biāo)簽來創(chuàng)建一個(gè)DataFrame: dates = pandas.date_range("20180509", periods=6) df = pandas.DataFrame(numpy.random.randn(6, 4), index=dates, columns=list('ABCD')) print("時(shí)間索引以及列標(biāo)簽來創(chuàng)建一個(gè)DataFrame:", df, sep="\n")# 對(duì)數(shù)據(jù)應(yīng)用函數(shù) print("從第一行開始,其下一行網(wǎng)上一行結(jié)果上累加:", df.apply(numpy.cumsum), sep="\n") # 每行數(shù)值向上求和 print("每列的最大數(shù)減去最小數(shù):", df.apply(lambda x: x.max() - x.min()), sep="\n") "E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py 時(shí)間索引以及列標(biāo)簽來創(chuàng)建一個(gè)DataFrame:A B C D 2018-05-09 0.628765 -1.453298 -0.169228 -0.185065 2018-05-10 0.444467 0.159900 -1.581807 0.852065 2018-05-11 1.537534 -1.718371 -1.378338 -0.183929 2018-05-12 -2.131473 -2.586691 -0.241944 -0.842446 2018-05-13 -0.898688 0.394125 1.413996 -1.897569 2018-05-14 -0.891981 0.913925 0.686605 -0.842980 從第一行開始,其下一行網(wǎng)上一行結(jié)果上累加:A B C D 2018-05-09 0.628765 -1.453298 -0.169228 -0.185065 2018-05-10 1.073232 -1.293399 -1.751035 0.667000 2018-05-11 2.610767 -3.011770 -3.129372 0.483071 2018-05-12 0.479293 -5.598461 -3.371316 -0.359374 2018-05-13 -0.419395 -5.204337 -1.957321 -2.256944 2018-05-14 -1.311376 -4.290412 -1.270715 -3.099924 每列的最大數(shù)減去最小數(shù): A 3.669008 B 3.500616 C 2.995802 D 2.749634 dtype: float64Process finished with exit code 0直方圖
import pandas import numpy# 通過傳遞一個(gè) numpyarray,時(shí)間索引以及列標(biāo)簽來創(chuàng)建一個(gè)DataFrame: dates = pandas.date_range("20180509", periods=6) df = pandas.DataFrame(numpy.random.randn(6, 4), index=dates, columns=list('ABCD')) print("時(shí)間索引以及列標(biāo)簽來創(chuàng)建一個(gè)DataFrame:", df, sep="\n")s = pandas.Series(numpy.random.randint(0, 7, size=10)) print("隨機(jī)生成十個(gè)數(shù)的序列:", s, sep="\n") print("統(tǒng)計(jì)每個(gè)數(shù)出現(xiàn)的次數(shù):", s.value_counts(), sep="\n") "E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py 時(shí)間索引以及列標(biāo)簽來創(chuàng)建一個(gè)DataFrame:A B C D 2018-05-09 -1.447060 0.998378 -0.272173 -0.240873 2018-05-10 2.019563 0.397001 1.469093 -0.313272 2018-05-11 0.932445 0.973830 -1.914278 -1.374748 2018-05-12 -0.980636 1.336340 -0.232319 1.176833 2018-05-13 -1.850315 -0.738035 -1.085791 1.378875 2018-05-14 1.162965 1.892369 0.499482 0.647424 0 5 1 2 2 1 3 4 4 1 5 5 6 0 7 1 8 0 9 3 dtype: int32Process finished with exit code 0字符串方法
Series對(duì)象在其str屬性中配備了一組字符串處理方法,可以很容易的應(yīng)用到數(shù)組中的每個(gè)元素。
import pandas import numpys = pandas.Series(['A', 'B', 'C', 'Aaba', 'Baca', numpy.nan, 'CABA', 'dog', 'cat']) print("序列值全部改成小寫:", s.str.lower(), sep="\n") "E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py 序列值全部改成小寫: 0 a 1 b 2 c 3 aaba 4 baca 5 NaN 6 caba 7 dog 8 cat dtype: objectProcess finished with exit code 0?
轉(zhuǎn)載于:https://my.oschina.net/gain/blog/1823689
總結(jié)
以上是生活随笔為你收集整理的数据挖掘 pandas基础入门之操作的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 设计模式C++实现--Observer模
- 下一篇: 【leetcode】521. Longe