日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪(fǎng)問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

【算法竞赛学习】气象海洋预测-Task1 气象数据分析常用工具

發(fā)布時(shí)間:2023/12/15 编程问答 31 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【算法竞赛学习】气象海洋预测-Task1 气象数据分析常用工具 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

氣象海洋預(yù)測(cè)-Task1 氣象數(shù)據(jù)分析常用工具

氣象科學(xué)中的數(shù)據(jù)通常包含多個(gè)維度,例如本賽題中給出的數(shù)據(jù)就包含年、月、經(jīng)度、緯度四個(gè)維度,為了便于數(shù)據(jù)的讀取和操作,氣象數(shù)據(jù)通常采用netCDF文件來(lái)存儲(chǔ),文件后綴為.nc。

對(duì)于以netCDF文件存儲(chǔ)的氣象數(shù)據(jù),有兩個(gè)常用的數(shù)據(jù)分析庫(kù),即NetCDF4和Xarray。在此次任務(wù)中,我們將學(xué)習(xí)這兩個(gè)庫(kù)的基本對(duì)象和基本操作,掌握用這兩個(gè)庫(kù)讀取和處理氣象數(shù)據(jù)的基本方法。

學(xué)習(xí)目標(biāo)

1.了解和學(xué)習(xí)NetCDF4和Xarray的基本對(duì)象和基本操作,掌握用這兩個(gè)庫(kù)讀取和處理氣象數(shù)據(jù)的基本方法。

內(nèi)容介紹

  • NetCDF4
    • 創(chuàng)建、打開(kāi)和關(guān)閉netCDF文件
    • 組(Groups)
    • 維度(Dimensions)
    • 變量(Variables)
    • 屬性(Attributes)
    • 寫(xiě)入或讀取變量數(shù)據(jù)
    • 應(yīng)用
  • Xarray
    • 創(chuàng)建DataArray
    • 索引
    • 屬性
    • 計(jì)算
    • GroupBy
    • 繪圖
    • 與Pandas對(duì)象相互轉(zhuǎn)換
    • Dataset
    • 讀/寫(xiě)netCDF文件
    • 應(yīng)用
  • NetCDF4

    官方文檔

    NetCDF4是NetCDF C庫(kù)的Python模塊,支持Groups、Dimensions、Variables和Attributes等對(duì)象類(lèi)型及其相關(guān)操作。

    安裝NetCDF4

    !pip install netCDF4 import netCDF4 as nc

    創(chuàng)建、打開(kāi)和關(guān)閉netCDF文件

    NetCDF4可以通過(guò)調(diào)用Dataset創(chuàng)建netCDF文件或打開(kāi)已存在的文件,并通過(guò)查看data_model屬性確定文件的格式。需要注意創(chuàng)建或打開(kāi)文件后要先關(guān)閉文件才能再次調(diào)用Dataset打開(kāi)文件。

    • 創(chuàng)建netCDF文件
    from netCDF4 import Dataset# Dataset包含三個(gè)輸入?yún)?shù):文件名,模式(其中'w', 'r+', 'a'為可寫(xiě)入模式),文件格式 test = Dataset('test.nc', 'w', 'NETCDF4') test <class 'netCDF4._netCDF4.Dataset'> root group (NETCDF4 data model, file format HDF5):dimensions(sizes): variables(dimensions): groups:
    • 打開(kāi)已存在的netCDF文件
    # 打開(kāi)訓(xùn)練樣本中的SODA數(shù)據(jù) soda = Dataset('test.nc') soda <class 'netCDF4._netCDF4.Dataset'> root group (NETCDF4 data model, file format HDF5):dimensions(sizes): variables(dimensions): groups:
    • 查看文件格式
    print(soda.data_model) NETCDF4
    • 關(guān)閉netCDF文件
    soda.close()

    Groups

    NetCDF4支持按層級(jí)的組(Groups)來(lái)組織數(shù)據(jù),類(lèi)似于文件系統(tǒng)中的目錄,Groups中可以包含Variables、Dimenions、Attributes對(duì)象以及其他Groups對(duì)象,Dataset會(huì)創(chuàng)建一個(gè)特殊的Groups,稱(chēng)為根組(Root Group),類(lèi)似于根目錄,使用Dataset.createGroup方法創(chuàng)建的組都包含在根組中。

    • 創(chuàng)建Groups
    # 接受一個(gè)字符串參數(shù)作為Group名稱(chēng) group1 = test.createGroup('group1') group2 = test.createGroup('group2') group1 <class 'netCDF4._netCDF4.Group'> group /group1:dimensions(sizes): variables(dimensions): groups:
    • 查看文件中的所有Groups
    # 返回一個(gè)Group字典 test.groups OrderedDict([('group1',<class 'netCDF4._netCDF4.Group'>group /group1:dimensions(sizes): variables(dimensions): groups: ),('group2',<class 'netCDF4._netCDF4.Group'>group /group2:dimensions(sizes): variables(dimensions): groups: )])
    • Groups嵌套
    # 在group1和group2下分別再創(chuàng)建一個(gè)Group group1_1 = test.createGroup('group1/group11') group2_1 = test.createGroup('group2/group21') test.groups OrderedDict([('group1',<class 'netCDF4._netCDF4.Group'>group /group1:dimensions(sizes): variables(dimensions): groups: group11),('group2',<class 'netCDF4._netCDF4.Group'>group /group2:dimensions(sizes): variables(dimensions): groups: group21)]) test.groups.values() odict_values([<class 'netCDF4._netCDF4.Group'> group /group1:dimensions(sizes): variables(dimensions): groups: group11, <class 'netCDF4._netCDF4.Group'> group /group2:dimensions(sizes): variables(dimensions): groups: group21])
    • 遍歷查看所有Groups
    # 定義一個(gè)生成器函數(shù)用來(lái)遍歷所有目錄樹(shù) def walktree(top):values = top.groups.values()yield valuesfor value in top.groups.values():for children in walktree(value):yield children for groups in walktree(test):for group in groups:print(group) <class 'netCDF4._netCDF4.Group'> group /group1:dimensions(sizes): variables(dimensions): groups: group11 <class 'netCDF4._netCDF4.Group'> group /group2:dimensions(sizes): variables(dimensions): groups: group21 <class 'netCDF4._netCDF4.Group'> group /group1/group11:dimensions(sizes): variables(dimensions): groups: <class 'netCDF4._netCDF4.Group'> group /group2/group21:dimensions(sizes): variables(dimensions): groups:

    Dimensions

    NetCDF4用維度來(lái)定義各個(gè)變量的大小,例如本賽題中訓(xùn)練樣本的第二維度month就是一個(gè)維度對(duì)象,每個(gè)樣本包含36個(gè)月的數(shù)據(jù),因此month維度內(nèi)的變量的大小就是36。變量是包含在維度中的,因此在創(chuàng)建每個(gè)變量時(shí)要先創(chuàng)建其所在的維度。

    • 創(chuàng)建Dimensions

    Dataset.createDimension方法接受兩個(gè)參數(shù):維度名稱(chēng),維度大小。維度大小設(shè)置為None或0時(shí)表示無(wú)窮維度。

    # 創(chuàng)建無(wú)窮維度 level = test.createDimension('level', None) time = test.createDimension('time', None) # 創(chuàng)建有限維度 lat = test.createDimension('lat', 180) lon = test.createDimension('lon', 360)
    • 查看Dimensions
    test.dimensions OrderedDict([('level',<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0),('time',<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0),('lat',<class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 180),('lon',<class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 360)])
    • 查看維度大小
    # 查看維度大小 print(len(lon)) 360 # Dimension對(duì)象存儲(chǔ)在字典中 print(level) <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0 # 判斷維度是否是無(wú)窮 print(time.isunlimited()) print(lat.isunlimited()) True False

    Variables

    NetCDF4的Variables對(duì)象類(lèi)似于Numpy中的多維數(shù)組,不同的是,NetCDF4的Variables變量可以存儲(chǔ)在無(wú)窮維度中。

    • 創(chuàng)建Variables

    Dataset.createVariable方法接受的參數(shù)為:變量名,變量的數(shù)據(jù)類(lèi)型,變量所在的維度。

    變量的有效數(shù)據(jù)類(lèi)型包括:‘f4’(32位浮點(diǎn)數(shù))、‘f8’(64位浮點(diǎn)數(shù))、‘i1’(8位有符號(hào)整型)、‘i2’(16位有符號(hào)整型)、‘i4’(32位有符號(hào)整型)、‘i8’(64位有符號(hào)整型)、‘u1’(8位無(wú)符號(hào)整型)、‘u2’(16位無(wú)符號(hào)整型)、‘u4’(32位無(wú)符號(hào)整型)、‘u8’(64位無(wú)符號(hào)整型)、‘s1’(單個(gè)字符)。

    # 創(chuàng)建單個(gè)維度上的變量 times = test.createVariable('time', 'f8', ('time',)) levels = test.createVariable('level', 'i4', ('level',)) lats = test.createVariable('lat', 'f4', ('lat',)) lons = test.createVariable('lon', 'f4', ('lon',))# 創(chuàng)建多個(gè)維度上的變量 temp = test.createVariable('temp', 'f4', ('time', 'level', 'lat', 'lon')) times <class 'netCDF4._netCDF4.Variable'> float64 time(time) unlimited dimensions: time current shape = (0,) filling on, default _FillValue of 9.969209968386869e+36 used levels <class 'netCDF4._netCDF4.Variable'> int32 level(level) unlimited dimensions: level current shape = (0,) filling on, default _FillValue of -2147483647 used
    • 查看Variables
    print(temp) <class 'netCDF4._netCDF4.Variable'> float32 temp(time, level, lat, lon) unlimited dimensions: time, level current shape = (0, 0, 180, 360) filling on, default _FillValue of 9.969209968386869e+36 used # 通過(guò)路徑的方式在Group中創(chuàng)建變量 ftemp = test.createVariable('/group1/group11/ftemp', 'f8', ('time', 'level', 'lat', 'lon')) ftemp <class 'netCDF4._netCDF4.Variable'> float64 ftemp(time, level, lat, lon) path = /group1/group11 unlimited dimensions: time, level current shape = (0, 0, 180, 360) filling on, default _FillValue of 9.969209968386869e+36 used # 可以通過(guò)路徑查看變量 print(test['/group1/group11/ftemp']) <class 'netCDF4._netCDF4.Variable'> float64 ftemp(time, level, lat, lon) path = /group1/group11 unlimited dimensions: time, level current shape = (0, 0, 180, 360) filling on, default _FillValue of 9.969209968386869e+36 used print(test['/group1/group11']) <class 'netCDF4._netCDF4.Group'> group /group1/group11:dimensions(sizes): variables(dimensions): float64 ftemp(time, level, lat, lon)groups: # 查看文件中的所有變量 print(test.variables) OrderedDict([('time', <class 'netCDF4._netCDF4.Variable'> float64 time(time) unlimited dimensions: time current shape = (0,) filling on, default _FillValue of 9.969209968386869e+36 used), ('level', <class 'netCDF4._netCDF4.Variable'> int32 level(level) unlimited dimensions: level current shape = (0,) filling on, default _FillValue of -2147483647 used), ('lat', <class 'netCDF4._netCDF4.Variable'> float32 lat(lat) unlimited dimensions: current shape = (180,) filling on, default _FillValue of 9.969209968386869e+36 used), ('lon', <class 'netCDF4._netCDF4.Variable'> float32 lon(lon) unlimited dimensions: current shape = (360,) filling on, default _FillValue of 9.969209968386869e+36 used), ('temp', <class 'netCDF4._netCDF4.Variable'> float32 temp(time, level, lat, lon) unlimited dimensions: time, level current shape = (0, 0, 180, 360) filling on, default _FillValue of 9.969209968386869e+36 used)])

    Attributes

    Attributes對(duì)象用于存儲(chǔ)對(duì)文件或維變量的描述信息,netcdf文件中包含兩種屬性:全局屬性和變量屬性。全局屬性提供Groups或整個(gè)文件對(duì)象的信息,變量屬性提供Variables對(duì)象的信息,屬性的名稱(chēng)可以自己設(shè)置,下面例子中的description和history等都是自定義的屬性名稱(chēng)。

    import time# 設(shè)置對(duì)文件的描述 test.description = 'bogus example script' # 設(shè)置文件的歷史信息 test.history = 'Created' + time.ctime(time.time()) # 設(shè)置文件的來(lái)源信息 test.source = 'netCDF4 python module tutorial' # 設(shè)置變量屬性 lats.units = 'degrees north' lons.units = 'degrees east' levels.units = 'hPa' temp.units = 'K' times.units = 'hours since 0001-01-01 00:00:00.0' times.calendar = 'gregorian' # 查看文件屬性名稱(chēng) print(test.ncattrs()) # 查看變量屬性名稱(chēng) print(test['lat'].ncattrs()) print(test['time'].ncattrs()) ['description', 'history', 'source'] ['units'] ['units', 'calendar'] # 查看文件屬性 for name in test.ncattrs():print('Global attr {} = {}'.format(name, getattr(test, name))) Global attr description = bogus example script Global attr history = CreatedTue Jan 11 19:47:20 2022 Global attr source = netCDF4 python module tutorial

    寫(xiě)入或讀取變量數(shù)據(jù)

    類(lèi)似于數(shù)組,可以通過(guò)切片的方式向變量中寫(xiě)入或讀取數(shù)據(jù)。

    • 向變量中寫(xiě)入數(shù)據(jù)
    from numpy.random import uniformnlats = len(test.dimensions['lat']) nlons = len(test.dimensions['lon']) print('temp shape before adding data = {}'.format(temp.shape)) temp shape before adding data = (0, 0, 180, 360) # 無(wú)窮維度的大小會(huì)隨著寫(xiě)入的數(shù)據(jù)的大小自動(dòng)擴(kuò)展 temp[0:5, 0:10, :, :] = uniform(size=(5, 10, nlats, nlons)) print('temp shape after adding data = {}'.format(temp.shape)) temp shape after adding data = (5, 10, 180, 360) print('levels shape after adding pressure data = {}'.format(levels.shape)) levels shape after adding pressure data = (10,)
    • 讀取變量中的數(shù)據(jù)
    print(temp[1, 5, 100, 200]) 0.13303915 print(temp[1, 5, 10:20, 100:110].shape) (10, 10) # 可以用start:stop:step的形式進(jìn)行切片 print(temp[1, 5, 10, 100:110:2]) [0.08290233 0.44888723 0.11997929 0.7889917 0.17327116]

    應(yīng)用

    我們嘗試用NetCDF4來(lái)操作一下訓(xùn)練樣本中的SODA數(shù)據(jù)。

    # 打開(kāi)SODA文件 soda = Dataset('test.nc') # 查看文件格式 print('SODA文件格式:', soda.data_model) # 查看文件中包含的對(duì)象 print(soda) SODA文件格式: NETCDF4 <class 'netCDF4._netCDF4.Dataset'> root group (NETCDF4 data model, file format HDF5):description: bogus example scripthistory: CreatedTue Jan 11 19:47:20 2022source: netCDF4 python module tutorialdimensions(sizes): level(10), time(5), lat(180), lon(360)variables(dimensions): float64 time(time), int32 level(level), float32 lat(lat), float32 lon(lon), float32 temp(time, level, lat, lon)groups: group1, group2 # 查看維度和變量 print(soda.dimensions) print(soda.variables) OrderedDict([('level', <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 10), ('time', <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 5), ('lat', <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 180), ('lon', <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 360)]) OrderedDict([('time', <class 'netCDF4._netCDF4.Variable'> float64 time(time)units: hours since 0001-01-01 00:00:00.0calendar: gregorian unlimited dimensions: time current shape = (5,) filling on, default _FillValue of 9.969209968386869e+36 used), ('level', <class 'netCDF4._netCDF4.Variable'> int32 level(level)units: hPa unlimited dimensions: level current shape = (10,) filling on, default _FillValue of -2147483647 used), ('lat', <class 'netCDF4._netCDF4.Variable'> float32 lat(lat)units: degrees north unlimited dimensions: current shape = (180,) filling on, default _FillValue of 9.969209968386869e+36 used), ('lon', <class 'netCDF4._netCDF4.Variable'> float32 lon(lon)units: degrees east unlimited dimensions: current shape = (360,) filling on, default _FillValue of 9.969209968386869e+36 used), ('temp', <class 'netCDF4._netCDF4.Variable'> float32 temp(time, level, lat, lon)units: K unlimited dimensions: time, level current shape = (5, 10, 180, 360) filling on, default _FillValue of 9.969209968386869e+36 used)])

    可以看到,SODA文件中包含year、month、lat、lon四個(gè)維度,維度大小分別是100、36、24和72,包含sst、t300、ua、va四個(gè)變量,每個(gè)變量都定義在(year, month, lat, lon)維度上。

    # 讀取每個(gè)變量中的數(shù)據(jù) soda_sst = soda['level'][:] print(soda_sst[1])soda_t300 = soda['temp'][:] print(soda_t300[1, 2, 12:24, 36])# soda_ua = soda['ua'][:] # print(soda_ua[1, 2, 12:24:2, 36:38])# soda_va = soda['va'][:] # print(soda_va[5:10, 0:12, 12, 36])# 關(guān)閉文件 soda.close() -- [0.31591734 0.59989274 0.44380635 0.7643542 0.5319885 0.788631860.97203755 0.4588462 0.19999161 0.9740341 0.65341175 0.9087504 ]

    Xarray

    官方文檔

    Xarray是一個(gè)開(kāi)源的Python庫(kù),支持在類(lèi)似Numpy的多維數(shù)組上引入維度、坐標(biāo)和屬性標(biāo)記并可以直接使用標(biāo)記的名稱(chēng)進(jìn)行相關(guān)操作,能夠讀寫(xiě)netcdf文件并進(jìn)行進(jìn)一步的數(shù)據(jù)分析和可視化。

    Xarray有兩個(gè)基本的數(shù)據(jù)結(jié)構(gòu):DataArray和Dataset,這兩個(gè)數(shù)據(jù)結(jié)構(gòu)都是在多維數(shù)組上建立的,其中DataArray用于標(biāo)記的實(shí)現(xiàn),Dataset則是一個(gè)類(lèi)似于字典的DataArray容器。

    安裝Xarray要求滿(mǎn)足以下依賴(lài)包:

    • Python(3.7+)
    • setuptools(40.4+)
    • Numpy(1.17+)
    • Pandas(1.0+)
    !pip install xarray Defaulting to user installation because normal site-packages is not writeable Looking in indexes: https://mirrors.aliyun.com/pypi/simple Collecting xarrayDownloading https://mirrors.aliyun.com/pypi/packages/10/6f/9aa15b1f9001593d51a0e417a8ad2127ef384d08129a0720b3599133c1ed/xarray-0.16.2-py3-none-any.whl (736 kB) [K |████████████████████████████████| 736 kB 198 kB/s eta 0:00:01 [?25hRequirement already satisfied: setuptools>=38.4 in /opt/conda/lib/python3.6/site-packages (from xarray) (51.1.1) Requirement already satisfied: numpy>=1.15 in /opt/conda/lib/python3.6/site-packages (from xarray) (1.19.4) Requirement already satisfied: pandas>=0.25 in /opt/conda/lib/python3.6/site-packages (from xarray) (1.1.5) Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.6/site-packages (from pandas>=0.25->xarray) (2020.5) Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.6/site-packages (from pandas>=0.25->xarray) (2.8.1) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.6/site-packages (from python-dateutil>=2.7.3->pandas>=0.25->xarray) (1.15.0) Installing collected packages: xarray Successfully installed xarray-0.16.2 [33mWARNING: You are using pip version 21.0.1; however, version 21.1.2 is available. You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.[0m import numpy as np import pandas as pd import xarray as xr

    創(chuàng)建DataArray

    xr.DataArray接受三個(gè)輸入?yún)?shù):數(shù)組,維度,坐標(biāo)。其中維度為數(shù)組的維度名稱(chēng),坐標(biāo)以字典的形式給維度賦予坐標(biāo)標(biāo)簽。

    # 創(chuàng)建一個(gè)2x3的數(shù)組,將維度命名為'x'和'y',并賦予'x'維度10和20兩個(gè)坐標(biāo)標(biāo)簽 data = xr.DataArray(np.random.randn(2, 3), dims=('x', 'y'), coords={'x': [10, 20]}) data # 查看數(shù)據(jù) print(data.values)# 查看維度 print(data.dims)# 查看坐標(biāo) print(data.coords)# 可以用data.attrs字典來(lái)存儲(chǔ)任意元數(shù)據(jù) print(data.attrs) [[-1.89477837 -0.58997363 -1.77758946][-0.21793173 0.77616912 0.45868184]] ('x', 'y') Coordinates:* x (x) int64 10 20 {}

    索引

    Xarray支持四種索引方式。

    # 通過(guò)位置索引,類(lèi)似于numpy print(data[0, :], '\n')# 通過(guò)坐標(biāo)標(biāo)簽索引 print(data.loc[10], '\n')# 通過(guò)維度名稱(chēng)和位置索引,isel表示"integer select" print(data.isel(x=0), '\n')# 通過(guò)維度名稱(chēng)和坐標(biāo)標(biāo)簽索引,sel表示"select" print(data.sel(x=10), '\n') <xarray.DataArray (y: 3)> array([-1.89477837, -0.58997363, -1.77758946]) Coordinates:x int64 10 Dimensions without coordinates: y <xarray.DataArray (y: 3)> array([-1.89477837, -0.58997363, -1.77758946]) Coordinates:x int64 10 Dimensions without coordinates: y <xarray.DataArray (y: 3)> array([-1.89477837, -0.58997363, -1.77758946]) Coordinates:x int64 10 Dimensions without coordinates: y <xarray.DataArray (y: 3)> array([-1.89477837, -0.58997363, -1.77758946]) Coordinates:x int64 10 Dimensions without coordinates: y

    屬性

    和NetCDF4一樣,Xarray也支持自定義DataArray或標(biāo)記的屬性描述。

    # 設(shè)置DataArray的屬性 data.attrs['long_name'] = 'random velocity' data.attrs['units'] = 'metres/sec' data.attrs['description'] ='A random variable created as an example' data.attrs['ramdom_attribute'] = 123 # 查看屬性 print(data.attrs) {'long_name': 'random velocity', 'units': 'metres/sec', 'description': 'A random variable created as an example', 'ramdom_attribute': 123} # 設(shè)置維度標(biāo)記的屬性描述 data.x.attrs['units'] ='x units' print('Attributes of x dimension:', data.x.attrs, '\n') Attributes of x dimension: {'units': 'x units'}

    計(jì)算

    DataArray的計(jì)算方式類(lèi)似于numpy ndarray。

    data + 10 data.T data.sum()

    可以直接使用維度名稱(chēng)進(jìn)行聚合操作。

    data.mean(dim='x')

    DataArray之間的計(jì)算操作可以根據(jù)維度名稱(chēng)進(jìn)行廣播。

    a = xr.DataArray(np.random.randn(3), [data.coords['y']]) b = xr.DataArray(np.random.randn(4), dims='z') print(a, '\n') print(b, '\n') print(a+b, '\n') <xarray.DataArray (y: 3)> array([0.04405523, 0.36823828, 0.38351121]) Coordinates:* y (y) int64 0 1 2 <xarray.DataArray (z: 4)> array([ 0.62771044, -0.41870179, -1.38038185, -0.19742089]) Dimensions without coordinates: z <xarray.DataArray (y: 3, z: 4)> array([[ 0.67176567, -0.37464656, -1.33632661, -0.15336566],[ 0.99594872, -0.05046351, -1.01214356, 0.17081739],[ 1.01122165, -0.03519058, -0.99687063, 0.18609032]]) Coordinates:* y (y) int64 0 1 2 Dimensions without coordinates: z data - data.T data[:-1] - data[:1]

    GroupBy

    Xarray支持使用類(lèi)似于Pandas的API進(jìn)行分組操作。

    labels = xr.DataArray(['E', 'F', 'E'], [data.coords['y']], name='labels') labels # 將data的y坐標(biāo)對(duì)齊labels后按labels的值分組求均值 data.groupby(labels).mean('y') # 將data的y坐標(biāo)按labels分組后減去組內(nèi)的最小值 data.groupby(labels).map(lambda x: x - x.min())

    繪圖

    Xarray支持簡(jiǎn)單方便的可視化操作,這里只做簡(jiǎn)單的介紹,更多的繪圖方法感興趣的同學(xué)們可以自行去探索。

    %matplotlib inline data.plot() <matplotlib.collections.QuadMesh at 0x7f7ea4327828>

    [外鏈圖片轉(zhuǎn)存失敗,源站可能有防盜鏈機(jī)制,建議將圖片保存下來(lái)直接上傳(img-mEPsd2AX-1645665921672)(Task1_files/Task1_116_1.png)]

    與Pandas對(duì)象互相轉(zhuǎn)換

    Xarray可以方便地轉(zhuǎn)換成Pandas的Series或DataFrame,也可以由Pandas對(duì)象轉(zhuǎn)換回Xarray。

    # 轉(zhuǎn)換成Pandas的Series series = data.to_series() series x y 10 0 -1.8947781 -0.5899742 -1.777589 20 0 -0.2179321 0.7761692 0.458682 dtype: float64 # Series轉(zhuǎn)換成Xarray series.to_xarray() # 轉(zhuǎn)換成Pandas的DataFrame df = data.to_dataframe(name='colname') df colnamexy1001220012
    -1.894778
    -0.589974
    -1.777589
    -0.217932
    0.776169
    0.458682
    # DataFrame轉(zhuǎn)換成Xarray xr.Dataset.from_dataframe(df)

    Dataset

    Dataset是一個(gè)類(lèi)似于字典的DataArray的容器,可以看作是一個(gè)具有多為結(jié)構(gòu)的DataFrame。對(duì)比NetCDF4庫(kù)中的Dataset,我們可以發(fā)現(xiàn)兩者的作用是相似的,都是作為容器用來(lái)存儲(chǔ)其他的對(duì)象。

    # 創(chuàng)建一個(gè)Dataset,其中包含三個(gè)DataArray ds = xr.Dataset({'foo': data, 'bar': ('x', [1, 2]), 'baz': np.pi}) ds

    可以通過(guò)字典的方式或者點(diǎn)索引的方式來(lái)查看DataArray,但是只有采用字典方式時(shí)才可以進(jìn)行賦值。

    # 通過(guò)字典方式查看DataArray print(ds['foo'], '\n')# 通過(guò)點(diǎn)索引的方式查看DataArray print(ds.foo)

    讀/寫(xiě)netCDF文件

    # 寫(xiě)入到netcdf文件 ds.to_netcdf('xarray_test.nc')# 讀取已存在的netcdf文件 xr.open_dataset('xarray_test.nc')

    應(yīng)用

    嘗試用Xarray來(lái)操作一下訓(xùn)練樣本中的SODA數(shù)據(jù)。

    # 打開(kāi)SODA文件 soda = xr.open_dataset('SODA_train.nc') # 查看文件屬性 print(soda.attrs) # 查看文件中包含的對(duì)象 print(soda) {} <xarray.Dataset> Dimensions: (lat: 24, lon: 72, month: 36, year: 100) Coordinates:* year (year) int32 1 2 3 4 5 6 7 8 9 10 ... 92 93 94 95 96 97 98 99 100* month (month) int32 1 2 3 4 5 6 7 8 9 10 ... 28 29 30 31 32 33 34 35 36* lat (lat) float64 -55.0 -50.0 -45.0 -40.0 -35.0 ... 45.0 50.0 55.0 60.0* lon (lon) float64 0.0 5.0 10.0 15.0 20.0 ... 340.0 345.0 350.0 355.0 Data variables:sst (year, month, lat, lon) float32 ...t300 (year, month, lat, lon) float32 ...ua (year, month, lat, lon) float64 ...va (year, month, lat, lon) float64 ... # 查看維度和坐標(biāo) print(soda.dims) print(soda.coords) Frozen(SortedKeysDict({'year': 100, 'month': 36, 'lat': 24, 'lon': 72})) Coordinates:* year (year) int32 1 2 3 4 5 6 7 8 9 10 ... 92 93 94 95 96 97 98 99 100* month (month) int32 1 2 3 4 5 6 7 8 9 10 ... 28 29 30 31 32 33 34 35 36* lat (lat) float64 -55.0 -50.0 -45.0 -40.0 -35.0 ... 45.0 50.0 55.0 60.0* lon (lon) float64 0.0 5.0 10.0 15.0 20.0 ... 340.0 345.0 350.0 355.0 # 讀取數(shù)據(jù) soda_sst = soda['sst'] print(soda_sst[1, 1, 1, 1], '\n')soda_t300 = soda['t300'] print(soda_t300[1, 2, 12:24, 36], '\n')soda_ua = soda['ua'] print(soda_ua[1, 2, 12:24:2, 36:38], '\n')soda_va = soda['va'] print(soda_va[5:10, 0:12, 12, 36]) <xarray.DataArray 'sst' ()> array(0.549156, dtype=float32) Coordinates:year int32 2month int32 2lat float64 -50.0lon float64 5.0 <xarray.DataArray 't300' (lat: 12)> array([ 0.350308, -0.271906, -0.394029, 0.534374, 0.378115, 0.371367,0.082296, 0.754251, 0.682577, 0.147856, 0.220678, 0.574088],dtype=float32) Coordinates:year int32 2month int32 3* lat (lat) float64 5.0 10.0 15.0 20.0 25.0 ... 40.0 45.0 50.0 55.0 60.0lon float64 180.0 <xarray.DataArray 'ua' (lat: 6, lon: 2)> array([[ 1.222841, 1.084187],[-0.106073, -0.286916],[-0.983318, -0.892802],[-1.157512, -1.04381 ],[ 1.443658, 1.275039],[ 2.179182, 1.776857]]) Coordinates:year int32 2month int32 3* lat (lat) float64 5.0 15.0 25.0 35.0 45.0 55.0* lon (lon) float64 180.0 185.0 <xarray.DataArray 'va' (year: 5, month: 12)> array([[ 0.875687, 0.640397, 1.346922, 0.532989, 0.985298, 1.02812 ,0.853269, 0.746913, 0.289339, -0.401898, -0.832116, -0.432147],[ 0.040508, 0.157661, -0.734164, -0.706849, -0.567588, 0.104219,0.588996, 0.224966, -0.252701, -0.519716, -1.152297, -1.315635],[-1.742571, -2.09365 , -3.080663, -2.863212, -1.135314, 0.053631,0.513007, 1.139938, 1.030276, 1.018402, 0.882338, 2.161939],[ 1.876133, 1.298197, 0.912559, 0.072299, -0.547984, 0.95893 ,1.205327, 0.956807, 0.993742, 0.75878 , 0.690233, 0.910672],[ 0.564618, -0.047889, 0.537964, 0.341526, -0.142936, -0.160385,0.36168 , 0.315495, 0.51516 , 0.513514, 0.066542, 0.423261]]) Coordinates:* year (year) int32 6 7 8 9 10* month (month) int32 1 2 3 4 5 6 7 8 9 10 11 12lat float64 5.0lon float64 180.0

    作業(yè)

    基礎(chǔ)作業(yè):

    1.嘗試用NetCDF4和Xarray來(lái)操作賽題數(shù)據(jù),對(duì)數(shù)據(jù)有基本的了解。

    進(jìn)階作業(yè):

    2.嘗試用Xarray對(duì)訓(xùn)練數(shù)據(jù)進(jìn)行數(shù)據(jù)探索和數(shù)據(jù)可視化。

    總結(jié)

    以上是生活随笔為你收集整理的【算法竞赛学习】气象海洋预测-Task1 气象数据分析常用工具的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

    如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。