日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

《SAS编程与数据挖掘商业案例》学习笔记之十九

發(fā)布時間:2023/12/4 编程问答 22 豆豆
生活随笔 收集整理的這篇文章主要介紹了 《SAS编程与数据挖掘商业案例》学习笔记之十九 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

繼續(xù)《SAS編程與數(shù)據(jù)挖掘商業(yè)案例》學習筆記,本文側重數(shù)據(jù)處理實踐,包括:HASH對象、自定義format、以及功能強大的正則表達式

一:HASH對象

Hash對象又稱散列表,是根據(jù)關鍵碼值而直接進行訪問的數(shù)據(jù)結構,是根據(jù)關鍵碼值而直接進行訪問的數(shù)據(jù)結構,

sas提供了兩個類來處理哈希表,用于存儲數(shù)據(jù)的hash和用于遍歷的hiter,hash類提供了查找、添加、修改、刪除等方法,hiter提供了用于定位和遍歷的firstnext等方法。

優(yōu)點:鍵值的查找是在內(nèi)存中進行的,有利于提高性能;

??????????????hash表可以在數(shù)據(jù)步運行時,動態(tài)的添加更新或刪除觀測;

??????????????hash表中可以很快的定位數(shù)據(jù),減少查找次數(shù);

常用方法:

definekey:定義鍵

Definedata:定義值

definedone:定義完成,可以載入數(shù)據(jù)

add:添加鍵值,如在hash表中已存在,則忽略;

replace:如果健在hash表中存在,則替換,如果不存在則添加鍵值

remove:清除鍵值對

find:查找健值,如果存在則將值寫入對應變量

check:查找鍵值,如果存在則返回rc=0,不修改當前變量的值;

output:將hash表輸出到數(shù)據(jù)集

clear:清空hash表,但并不刪除對象

equal:判斷兩個hash類是否相等

?

find方法的示例:

libname chapt12 'f:\data_model\book_data\chapt12';

data results;

?if _n_=0 then set chapt12.participants;??????????????????

???if _n_ = 1 then do;

????declare hash h(dataset:'chapt12.participants');????

????h.definekey('name');

????h.definedata('gender', 'treatment');

????h.definedone();

??end;

???set chapt12.weight;

??if h.find() = 0 then

????output;

run;

?

hiter對象的引例:

data patients;

??length patient_id $ 16 discharge 8;

??input patient_id discharge:date9.;

datalines;

smith-4123 15mar2004

hagen-2834 23apr2004

smith-2437 15jan2004

flinn-2940 12feb2004

;

data _null_;

??if _n_=0 then set patients;

??declare hash ht(dataset:"patients",ordered:"ascending");

??ht.definekey("patient_id");

??ht.definedata("patient_id", "discharge");

??ht.definedone();

??declare hiter iter("ht");

??rc = iter.first();

??do while (rc=0);

????put patient_id discharge:date9.;

????rc = iter.next();

??end;

run;

declare hiter iter("ht");hashht定義了一個遍歷器iter,之后調(diào)用first方法將遍歷器定位到hash表的第一條觀測,然后使用next方法遍歷hash表中的所有記錄并輸出。

?

商業(yè)實戰(zhàn)-兩個數(shù)據(jù)集的合并:

????data both1(drop=rc);????

??????declare hash plan ();???

???rc = plan.definekey ('plan_id');?

???rc = plan.definedata ('plan_desc');?

???rc = plan.definedone ();??

???do until (eof1) ;?????

?????set chapt12.plans end = eof1;

?????rc = plan.add ();????

??end;

??do until (eof2) ;?

?????set chapt12.members end = eof2;

?????call missing(plan_desc);

?????rc = plan.find ();?

?????output;???

??end;

??stop;

run;

上述程序可以簡化為:

data both2;

???length plan_id $3 plan_desc $20;

???if _n_ = 1 then do;

?????????declare hash h(dataset:'chapt12.plans');

?????????h.definekey('plan_id');

?????????h.definedata('plan_desc');

?????????h.definedone();

?????????call missing(plan_desc);

??????end;

???set chapt12.members;

???rc=h.find();

run;

二:format

自定義format

Proc Format;

????Value $ Sex_Fmt

????'F'=''

????'M'=''

????Other = '未知';

????Value Age_Dur

????Low-10="10歲以下"????????????

????11-13="11-13"

????14-<15="14-15"

????15-High="15歲以上";

Run;

應用:

Data??test;

Set??sashelp.class(keep=sex age);

x=put(sex,$sex_fmt);y=put(age,age_dur.);

Run;

三:正則表達式:

/.../??一個正則表達式的起止;

|??數(shù)項之間的選擇,“或”運算;

()???匹配組,標記一個子表達式的開始和結束位置;

.????除換行符以外的任意字符;

\w??任一單詞字符,數(shù)字大小寫字母以及下劃線

\W??任一非單詞字符

\s???任一空白字符,包括空格、制表符、換行符、回車符、中文全角空格等;

\S???任一非空白字符,

\d???0-9任一數(shù)字

\D??任一非數(shù)字字符

[...]

[^...]

[a-z]??az

[^a-z]??不在從az范圍內(nèi)的任意字符

^??匹配輸入字符串的開始位置

$??匹配輸入字符串的結尾位置

\b??描述單詞的前或后邊界

\B??表示非單詞邊界

*??匹配0次或多次

+?匹配一次或多次

???匹配零次或?一次

{n}??匹配n

{n,}??匹配n次以上

{n,m}??匹配nm

?

常用函數(shù):

Prxparse?????定義一個正則表達式

Prxmatch??返回匹配模式的首次匹配位置

Call prxsubstr???返回匹配模式在目標字符串的開始位置和長度

Prxposn????返回正則表達式子表達式對應的匹配模式值

Call??prxposn????返回正則表達式子表達式對應的匹配模式和長度

Cal l??prxnext??返回匹配模式在目標字符串中的多個匹配位置和長度

Prxchange????替代匹配模式的值

Call prxchange???替代匹配模式的值

?

eg1

data _null_;

???if _n_ = 1 then pattern_num = rxparse("/cat/");

??

???retain pattern_num;

???input string $30.;

???position = rxmatch(pattern_num,string);

???file print;

???put pattern_num= string= position=;

datalines;

there is a cat in this line.

does not match cat

cat in the beginning

at the end, a cat

cat

;

run;

eg2:數(shù)據(jù)驗證

data match_phone;

???set chapt12.phone_numbers;

???if _n_ = 1 then pattern = prxparse("/\(\d\d\d\) ?\d\d\d-\d{4}/");

???retain pattern;

???if prxmatch(pattern,phone) gt 0 then output;

run;

找出不匹配的手機號碼

data unmatch_phone;

???set chapt12.phone_numbers;

???where not prxmatch("/\(\d\d\d\) ?\d\d\d-\d{4}/",phone);

run;

Eg3:提取匹配某種模式的字符串

data extract;

???if _n_ = 1 then do;

??????pattern = prxparse("/\(\d\d\d\) ?\d\d\d-\d{4}/");

??????if missing(pattern) then do;

?????????put "error in compiling regular expression";

?????????stop;

??????end;

???end;

???retain pattern;

???length number $ 15;

???input string $char80.;

???call prxsubstr(pattern,string,start,length);

??????if start gt 0 then do;

??????number = substr (string,start,length);?

??????number = compress(number," ");

??????output;

???end;

???keep number;

datalines;

this line does not have any phone numbers on it

this line does: (123)345-4567 la di la di la

also valid (123) 999-9999

two numbers here (333)444-5555 and (800)123-4567

;

run;

eg4:提取名字

data ReversedNames;

???input name & $32.;

???datalines;

Jones, Fred

Kavich, Kate

Turley, Ron

Dulix, Yolanda

;

data FirstLastNames;

???length first last $ 16;

???keep first last;

???retain re;

???if _N_ = 1 then

??????re = prxparse('/(\w+), (\w+)/');

???set ReversedNames;

???if prxmatch(re, name) then

??????do;

?????????last = prxposn(re, 1, name);

?????????first = prxposn(re, 2, name);

??????end;

run;

注:1,2分別代表正則表達式中的兩個組

eg5:提取符合規(guī)定的名字

data old;

???input name $60.;

???datalines;

Judith S Reaveley

Ralph F. Morgan

Jess Ennis

Carol Echols

Kelly Hansen Huff

Judith

Nick

Jones

;

data new;

???length first middle last $ 40;

???re1 = prxparse('/(\S+)\s+([^\s]+\s+)?(\S+)/o');

???re2 = prxparse('/(\S+)(\s+)([^\s]+\s+)(?)(\S+)/o');

???set old;

???id1=prxmatch(re1, name);

???id2=prxmatch(re2, name);

???if id1 then

??????do;

?????????first = prxposn(re1, 1, name);

?????????middle = prxposn(re1, 2, name);

?????????last = prxposn(re1, 3, name);

??????end;

???if id2 then test=prxposn(re1, 4, name);

???put test=;

run;

Eg6:返回匹配模式的多個位置

data _null_;

???expressionid = prxparse('/[crb]at/');

???text = 'the woods have a bat, cat, and a rat!';

???start = 1;

???stop = length(text);

???call prxnext(expressionid, start, stop, text, position, length);

??????do while (position > 0);

?????????found = substr(text, position, length);

?????????put found= position= length=;

?????????call prxnext(expressionid, start, stop, text, position, length);

??????end;

run;

注:首次執(zhí)行call prxnext返回一個position,然后進入循環(huán),在抽取滿足條件的子串中,再次執(zhí)行all prxnext,此時會返回下一個匹配的position

Eg7:替換文本

data cat_and_mouse;

???input text $char40.;

???length new_text $ 80;

???if _n_ = 1 then match = prxparse("s/[Cc]at/mouse/");

???retain match;

???call prxchange(match,-1,text,new_text,len,trunc,num);???

???if trunc then put "note: new_text was truncated";

datalines;

the Cat in the hat

there are two cat cats in this line

here is no replacement

;

run;

?

?

總結

以上是生活随笔為你收集整理的《SAS编程与数据挖掘商业案例》学习笔记之十九的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。