日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

python通过hive transform处理数据

發布時間:2024/1/17 python 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python通过hive transform处理数据 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
[python]?view plaincopy
  • 自己寫的一個簡單例子,用來做話題描述去重,表中的desc字段?“a-b-a-b-b-c”需要去重????
  • python代碼如下:????
  • #!/usr/bin/python????
  • import?sys????
  • reload(sys)????
  • sys.setdefaultencoding('utf8')????
  • def?quchong(desc):????
  • ????a=desc.split('-')????
  • ????return?'-'.join(set(a))????
  • while?True:????
  • ????????line?=?sys.stdin.readline()????
  • ????????if?line?==?"":????
  • ????????????????break????
  • ????????line?=?line.rstrip('\n')????
  • ????????#?your?process?code?here????
  • ????????parts?=?line.split('\t')????
  • ????????parts[2]=quchong(parts[2])????
  • ????????print?"\t".join(parts)????
  • ????
  • 下面是轉載過來的,比較詳細????
  • 二、hive?map中字段自增的寫法(轉)????
  • ????
  • 1、建立表結構????
  • ????
  • hive>?CREATE?TABLE?t3?(foo?STRING,?bar?MAP<STRING,INT>)????
  • ????>?ROW?FORMAT?DELIMITED????
  • ????>?FIELDS?TERMINATED?BY?'/t'????
  • ????>?COLLECTION?ITEMS?TERMINATED?BY?','????
  • ????>?MAP?KEYS?TERMINATED?BY?':'????
  • ????>?STORED?AS?TEXTFILE;????
  • OK????
  • ????
  • ?????
  • ????
  • 2、建成的效果????
  • ????
  • hive>?describe?t3;????
  • OK????
  • foo?????string????
  • bar?????map<string,int>????
  • ????
  • ?????
  • ????
  • 3、生成test.txt????
  • ????
  • jeffgeng????????click:13,uid:15????
  • ????
  • ?????
  • ????
  • 4、把test.txt?load進來????
  • ????
  • hive>?LOAD?DATA?LOCAL?INPATH?'test.txt'?OVERWRITE?INTO?TABLE?t3;????
  • Copying?data?from?file:/root/src/hadoop/hadoop-0.20.2/contrib/hive-0.5.0-bin/bin/test.txt????
  • Loading?data?to?table?t3????
  • OK????
  • ????
  • ?????
  • ????
  • load完效果如下????
  • ????
  • hive>?select?*?from?t3;????
  • OK????
  • jeffgeng????????{"click":13,"uid":15}????
  • ????
  • ?????
  • ????
  • 5、可以這樣查map的值????
  • ????
  • hive>?select?bar['click']?from?t3;????
  • ????
  • ...一系列的mapreduce...????
  • ????
  • OK????
  • 13????
  • ????
  • ?????
  • ????
  • 6、編寫add_mapper????
  • ????
  • #!/usr/bin/python????
  • import?sys????
  • import?datetime????
  • ????
  • for?line?in?sys.stdin:????
  • ????line?=?line.strip()????
  • ????foo,?bar?=?line.split('/t')????
  • ????d?=?eval(bar)????
  • ????d['click']?+=?1????
  • ????print?'/t'.join([foo,?str(d)])????
  • ????
  • ?????
  • ????
  • 7、在hive中執行????
  • ????
  • hive>?CREATE?TABLE?t4?(foo?STRING,?bar?MAP<STRING,INT>)????
  • ????>?ROW?FORMAT?DELIMITED????
  • ????>?FIELDS?TERMINATED?BY?'/t'????
  • ????>?COLLECTION?ITEMS?TERMINATED?BY?','????
  • ????>?MAP?KEYS?TERMINATED?BY?':'????
  • ????>?STORED?AS?TEXTFILE;????
  • ????
  • ?????
  • ????
  • hive>?add?FILE?add_mapper.py????
  • ????
  • ?????
  • ????
  • INSERT?OVERWRITE?TABLE?t4????
  • ????>?SELECT????
  • ????>???TRANSFORM?(foo,?bar)????
  • ????>???USING?'python?add_mapper.py'????
  • ????>???AS?(foo,?bar)????
  • ????>?FROM?t3;????
  • FAILED:?Error?in?semantic?analysis:?line?1:23?Cannot?insert?into?target?table?because?column?number/types?are?different?t4:?Cannot?convert?column?1?from?string?to?map<string,int>.????
  • ????
  • ?????
  • ????
  • 8、為什么會報出以上錯誤?貌似add_mapper.py的輸出是string格式的,hive無法此這種格式的map認出。后查明,AS后邊可以為字段強制指定類型????
  • ????
  • INSERT?OVERWRITE?TABLE?t4????
  • SELECT????
  • ??TRANSFORM?(foo,?bar)????
  • ??USING?'python?add_mapper.py'????
  • ??AS?(foo?string,?bar?map<string,int>)????
  • FROM?t3;????
  • ????
  • ?????
  • ????
  • 9、同時python腳本要去除字典轉換后遺留下來的空格,引號,左右花排號等????
  • ????
  • #!/usr/bin/python????
  • import?sys????
  • import?datetime????
  • ????
  • for?line?in?sys.stdin:????
  • ????line?=?line.strip('/t')????
  • ????foo,?bar?=?line.split('/t')????
  • ????d?=?eval(bar)????
  • ????d['click']?+=?1????
  • ????d['uid']?+=?1????
  • ????strmap?=?''????
  • ????for?x?in?str(d):????
  • ????????if?x?in?('?',?"'"):????
  • ????????????continue????
  • ????????strmap?+=?x????
  • ????print?'/t'.join([foo,?strmap])????
  • ????
  • ?????
  • ????
  • 10、執行后的結果????
  • ????
  • hive>?select?*?from?t4;????
  • OK????
  • jeffgeng????????{"click":14,"uid":null}????
  • Time?taken:?0.146?seconds? ?
  • 總結

    以上是生活随笔為你收集整理的python通过hive transform处理数据的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。