日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 人文社科 > 生活经验 >内容正文

生活经验

时间序列学习笔记4

發(fā)布時(shí)間:2023/11/27 生活经验 28 豆豆
生活随笔 收集整理的這篇文章主要介紹了 时间序列学习笔记4 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

6. 重采樣及頻率轉(zhuǎn)換

重采樣(resample)表示將時(shí)間序列的頻率進(jìn)行轉(zhuǎn)換的過程。可以分為降采樣和升采樣等。

pandas對象都有一個(gè)resample方法,可以進(jìn)行頻率轉(zhuǎn)換。

In [5]: rng = pd.date_range('1/1/2000', periods=100, freq='D')In [6]: ts = Series(np.random.randn(len(rng)), index=rng)
# 聚合后的值如何處理,使用mean(),默認(rèn)即為mean,也可以使用sum,min等。
In [8]: ts.resample('M').mean()
Out[8]:
2000-01-31   -0.128802
2000-02-29    0.179255
2000-03-31    0.055778
2000-04-30   -0.736071
Freq: M, dtype: float64In [9]: ts.resample('M', kind='period').mean()
Out[9]:
2000-01   -0.128802
2000-02    0.179255
2000-03    0.055778
2000-04   -0.736071
Freq: M, dtype: float64

6.1 降采樣

# 12個(gè)每分鐘 的采樣
In [10]: rng = pd.date_range('1/1/2017', periods=12, freq='T')In [11]: ts = Series(np.arange(12), index=rng)In [12]: ts
Out[12]:
2017-01-01 00:00:00     0
2017-01-01 00:01:00     1
2017-01-01 00:02:00     2
...
2017-01-01 00:08:00     8
2017-01-01 00:09:00     9
2017-01-01 00:10:00    10
2017-01-01 00:11:00    11
Freq: T, dtype: int32# 每隔五分鐘采用,并將五分鐘內(nèi)的值求和,賦值到新的Series中。
# 默認(rèn) [0,4),前閉后開
In [14]: ts.resample('5min').sum()  
Out[14]:
2017-01-01 00:00:00    10
2017-01-01 00:05:00    35
2017-01-01 00:10:00    21
Freq: 5T, dtype: int32# 默認(rèn) closed就是left,
In [15]: ts.resample('5min', closed='left').sum()
Out[15]:
2017-01-01 00:00:00    10
2017-01-01 00:05:00    35
2017-01-01 00:10:00    21
Freq: 5T, dtype: int32# 調(diào)整到右閉左開后,但是時(shí)間取值還是left
In [16]: ts.resample('5min', closed='right').sum()
Out[16]:
2016-12-31 23:55:00     0
2017-01-01 00:00:00    15
2017-01-01 00:05:00    40
2017-01-01 00:10:00    11
Freq: 5T, dtype: int32# 時(shí)間取值也為left,默認(rèn)
In [17]: ts.resample('5min', closed='left', label='left').sum()
Out[17]:
2017-01-01 00:00:00    10
2017-01-01 00:05:00    35
2017-01-01 00:10:00    21
Freq: 5T, dtype: int32

還可以調(diào)整offset

# 向前調(diào)整1秒
In [18]: ts.resample('5T', loffset='1s').sum()
Out[18]:
2017-01-01 00:00:01    10
2017-01-01 00:05:01    35
2017-01-01 00:10:01    21
Freq: 5T, dtype: int32

OHLC重采樣

金融領(lǐng)域有一種ohlc重采樣方式,即開盤、收盤、最大值和最小值。

In [19]: ts.resample('5min').ohlc()
Out[19]:open  high  low  close
2017-01-01 00:00:00     0     4    0      4
2017-01-01 00:05:00     5     9    5      9
2017-01-01 00:10:00    10    11   10     11

利用groupby進(jìn)行重采樣

In [20]: rng = pd.date_range('1/1/2017', periods=100, freq='D')In [21]: ts = Series(np.arange(100), index=rng)In [22]: ts.groupby(lambda x: x.month).mean()
Out[22]:
1    15.0
2    44.5
3    74.0
4    94.5
dtype: float64In [23]: rng[0]
Out[23]: Timestamp('2017-01-01 00:00:00', offset='D')In [24]: rng[0].month
Out[24]: 1In [25]: ts.groupby(lambda x: x.weekday).mean()
Out[25]:
0    50.0
1    47.5
2    48.5
3    49.5
4    50.5
5    51.5
6    49.0
dtype: float64

6.2 升采樣和插值

低頻率到高頻率的時(shí)候就會(huì)有缺失值,因此需要進(jìn)行插值操作。

In [26]: frame = DataFrame(np.random.randn(2,4), index=pd.date_range('1/1/2017'...: , periods=2, freq='W-WED'), columns=['Colorda','Texas','NewYork','Ohio...: '])In [27]: frame
Out[27]:Colorda     Texas   NewYork      Ohio
2017-01-04  1.666793 -0.478740 -0.544072  1.934226
2017-01-11 -0.407898  1.072648  1.079074 -2.922704In [28]: df_daily = frame.resample('D')In [30]: df_daily = frame.resample('D').mean()In [31]: df_daily
Out[31]:Colorda     Texas   NewYork      Ohio
2017-01-04  1.666793 -0.478740 -0.544072  1.934226
2017-01-05       NaN       NaN       NaN       NaN
2017-01-06       NaN       NaN       NaN       NaN
2017-01-07       NaN       NaN       NaN       NaN
2017-01-08       NaN       NaN       NaN       NaN
2017-01-09       NaN       NaN       NaN       NaN
2017-01-10       NaN       NaN       NaN       NaN
2017-01-11 -0.407898  1.072648  1.079074 -2.922704In [33]: frame.resample('D', fill_method='ffill')
C:\Users\yangfl\Anaconda3\Scripts\ipython-script.py:1: FutureWarning: fill_metho
d is deprecated to .resample()
the new syntax is .resample(...).ffill()if __name__ == '__main__':
Out[33]:Colorda     Texas   NewYork      Ohio
2017-01-04  1.666793 -0.478740 -0.544072  1.934226
2017-01-05  1.666793 -0.478740 -0.544072  1.934226
2017-01-06  1.666793 -0.478740 -0.544072  1.934226
2017-01-07  1.666793 -0.478740 -0.544072  1.934226
2017-01-08  1.666793 -0.478740 -0.544072  1.934226
2017-01-09  1.666793 -0.478740 -0.544072  1.934226
2017-01-10  1.666793 -0.478740 -0.544072  1.934226
2017-01-11 -0.407898  1.072648  1.079074 -2.922704In [34]: frame.resample('D', fill_method='ffill', limit=2)
C:\Users\yangfl\Anaconda3\Scripts\ipython-script.py:1: FutureWarning: fill_metho
d is deprecated to .resample()
the new syntax is .resample(...).ffill(limit=2)if __name__ == '__main__':
Out[34]:Colorda     Texas   NewYork      Ohio
2017-01-04  1.666793 -0.478740 -0.544072  1.934226
2017-01-05  1.666793 -0.478740 -0.544072  1.934226
2017-01-06  1.666793 -0.478740 -0.544072  1.934226
2017-01-07       NaN       NaN       NaN       NaN
2017-01-08       NaN       NaN       NaN       NaN
2017-01-09       NaN       NaN       NaN       NaN
2017-01-10       NaN       NaN       NaN       NaN
2017-01-11 -0.407898  1.072648  1.079074 -2.922704In [35]: frame.resample('W-THU', fill_method='ffill')
C:\Users\yangfl\Anaconda3\Scripts\ipython-script.py:1: FutureWarning: fill_metho
d is deprecated to .resample()
the new syntax is .resample(...).ffill()if __name__ == '__main__':
Out[35]:Colorda     Texas   NewYork      Ohio
2017-01-05  1.666793 -0.478740 -0.544072  1.934226
2017-01-12 -0.407898  1.072648  1.079074 -2.922704In [38]: frame.resample('W-THU').ffill()
Out[38]:Colorda     Texas   NewYork      Ohio
2017-01-05  1.666793 -0.478740 -0.544072  1.934226
2017-01-12 -0.407898  1.072648  1.079074 -2.922704

6.3 通過時(shí)期(period)進(jìn)行重采樣

# 創(chuàng)建一個(gè)每月隨機(jī)數(shù)據(jù),兩年
In [41]: frame = DataFrame(np.random.randn(24,4), index=pd.date_range('1-2017',...: '1-2019', freq='M'), columns=['Colorda','Texas','NewYork','Ohio'])# 每年平均值進(jìn)行重采樣
In [42]: a_frame = frame.resample('A-DEC').mean()In [43]: a_frame
Out[43]:Colorda     Texas   NewYork      Ohio
2017-12-31 -0.441948 -0.040711  0.036633 -0.328769
2018-12-31 -0.121778  0.181043 -0.004376  0.085500# 按季度進(jìn)行采用
In [45]: a_frame.resample('Q-DEC').ffill()
Out[45]:Colorda     Texas   NewYork      Ohio
2017-12-31 -0.441948 -0.040711  0.036633 -0.328769
2018-03-31 -0.441948 -0.040711  0.036633 -0.328769
2018-06-30 -0.441948 -0.040711  0.036633 -0.328769
2018-09-30 -0.441948 -0.040711  0.036633 -0.328769
2018-12-31 -0.121778  0.181043 -0.004376  0.085500In [49]: frame.resample('Q-DEC').mean()
Out[49]:Colorda     Texas   NewYork      Ohio
2017-03-31 -0.445315  0.488191 -0.543567 -0.459284
2017-06-30 -0.157438 -0.680145  0.295301 -0.118013
2017-09-30 -0.151736  0.092512  0.684201 -0.035097
2017-12-31 -1.013302 -0.063404 -0.289404 -0.702681
2018-03-31  0.157538 -0.175134 -0.548305  0.609768
2018-06-30 -0.231697 -0.094108  0.224245 -0.151958
2018-09-30 -0.614219  0.308801 -0.205952  0.154302
2018-12-31  0.201266  0.684613  0.512506 -0.270111

7. 時(shí)間序列繪圖

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pandas import Series,DataFrameframe = DataFrame(np.random.randn(20,3),index = pd.date_range('1/1/2017', periods=20, freq='M'),columns=['randn1','randn2','randn3'])
frame.plot()

8. 移動(dòng)窗口函數(shù)

待續(xù)。。。

9. 性能和內(nèi)存使用方面的注意事項(xiàng)

In [50]: rng = pd.date_range('1/1/2017', periods=10000000, freq='1s')In [51]: ts = Series(np.random.randn(len(rng)), index=rng)In [52]: %timeit ts.resample('15s').ohlc()
1 loop, best of 3: 222 ms per loopIn [53]: %timeit ts.resample('15min').ohlc()
10 loops, best of 3: 152 ms per loop

貌似現(xiàn)在還有所下降。

轉(zhuǎn)載于:https://www.cnblogs.com/felo/p/6426429.html

總結(jié)

以上是生活随笔為你收集整理的时间序列学习笔记4的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。