使用 Amazon Personalize 快速搭建推荐服务
Amazon Personalize 是亞馬遜云科技完全托管的服務(wù)。Amazon Personalize 將 Amazon.com 二十多年機器學習的應(yīng)用經(jīng)驗集成到服務(wù)當中,并且可以根據(jù)用戶數(shù)據(jù)進一步定制化的調(diào)整模型。不需要任何 ML 經(jīng)驗,您就可以開始使用簡單的 API,通過幾次點擊就可以構(gòu)建復雜的個性化推薦功能。
在本文中,將向您展示如何使用 Amazon Personalize 構(gòu)建自動訓練和推理的推薦服務(wù)。文中采用 MovieLens 電影評分數(shù)據(jù)作為樣本數(shù)據(jù)并將數(shù)據(jù)存儲在 Amazon S3 中,文中將利用 Amazon Lambda 函數(shù)觸發(fā)數(shù)據(jù)更新,模型訓練,模型更新和模型批量推理。
推薦服務(wù)架構(gòu)
應(yīng)用推送用戶數(shù)據(jù),電影數(shù)據(jù),用戶評分數(shù)據(jù),推理用戶列表數(shù)據(jù),推理結(jié)果數(shù)據(jù)到相應(yīng)的 Amazon S3 桶
將全量數(shù)據(jù)按照定義的格式從 Amazon S3 導入 Amazon Personalize 中
Amazon Lambda 定時觸發(fā)模型訓練任務(wù)
應(yīng)用推送增量數(shù)據(jù)到 Amazon S3 桶中,Amazon Lambda 函數(shù)觸發(fā)數(shù)據(jù)更新任務(wù)和模型更新任務(wù)
應(yīng)用推送推理用戶列表數(shù)據(jù)到 Amazon S3 桶,Amazon Lambda 函數(shù)觸發(fā)模型推理任務(wù),推理結(jié)果文件寫入 Amazon S3 桶中
權(quán)限設(shè)置
在 Amazon IAM 中創(chuàng)建 Role 用來 Amazon Personalize 數(shù)據(jù)導入,數(shù)據(jù)更新,模型訓練,模型更新,模型推理
進入亞馬遜云科技控制臺中,創(chuàng)建 Amazon Personalize 的 service role。將 AmazonPersonalizeFullAccess 權(quán)限賦予該 role,取名 PersonalizeRole。我們還需要 PersonalizeRole 能夠訪問相應(yīng)的 Amazon S3 桶,所以我們要賦予相應(yīng)的桶訪問權(quán)限。
為 Amazon PersonalizeRole 添加 Amazon S3 訪問策略:
回到 Amazon IAM 首頁,點擊左側(cè) Policy。
點擊Create policy
選擇 JSON,把下面的 json 粘貼到輸入框中,點擊 Review policy。
如您在項目中有特定的 Amazon S3 桶,需要在 Resource 中修改或者添加 Amazon S3 桶名。該 blog 以 global 資源為例,如果是用中國區(qū)資源需要將相關(guān) policy 中 arn 中 aws 改為 aws-cn
{"Version":?"2012-10-17","Id":?"PersonalizeS3BucketAccessPolicy","Statement":?[{"Sid":?"PersonalizeS3BucketAccessPolicy","Effect":?"Allow","Action":?["s3:GetObject","s3:ListBucket","s3:PutObject"],"Resource":?["arn:aws:s3:::user-personalization-demo-batch-input","arn:aws:s3:::user-personalization-demo-batch-input/*","arn:aws:s3:::user-personalization-demo-batch-output","arn:aws:s3:::user-personalization-demo-batch-output/*","arn:aws:s3:::user-personalization-demo-fulldata","arn:aws:s3:::user-personalization-demo-fulldata/*","arn:aws:s3:::user-personalization-demo-datasetupdate","arn:aws:s3:::user-personalization-demo-?datasetupdate/*","arn:aws:s3:::user-personalization-demo","arn:aws:s3:::user-personalization-demo/*"]}] }*左滑查看更多
添加訪問策略名稱和描述,點擊創(chuàng)建策略
回到角色 PersonalizeRole 頁,添加新創(chuàng)建的PersonalizeS3BucketAccessPolicy
訪問策略.
點擊Attach policies
在搜索框中搜索 PersonalizeS3BucketAccessPolicy,選中該策略,點擊 Attach policy
創(chuàng)建 Amazon S3 桶。
下面以創(chuàng)建 user-personalization-demo-fulldata 為例。其余桶按照同樣方法創(chuàng)建
u ser-personalization-demo-fulldata:存儲全量數(shù)據(jù)(csv格式)
user-personalization-demo-datasetupdate:存儲增量數(shù)據(jù)(csv格式)
user-personalization-demo-batch-input:存儲推薦用戶列表數(shù)據(jù)(json格式)
user-personalization-demo-batch-output:存儲批量推薦結(jié)果(json格式)
亞馬遜云科技進入? Amazon S3 服務(wù)。點擊右上角 create bucket 創(chuàng)建桶
輸入 Amazon S3 桶名稱,例如 user-personalization-demo-fulldata
加密部分選擇 Enable,Amazon S3 key。點擊創(chuàng)建桶
進入 Amazon S3 桶修改桶訪問策略,進入 Permissions 項
在桶策略部分點擊編輯
將下面的 json 拷貝到輸入框
如果 Amazon S3 桶名有變化或有添加,需要在 Resource 中修改或者添加 Amazon S3 桶名。該 blog 以 global 資源為例,如果是用中國區(qū)資源需要將相關(guān) policy 中 arn中 aws 改為 aws-cn
{"Version":?"2012-10-17","Id":?"PersonalizeS3BucketAccessPolicy","Statement":?[{"Sid":?"PersonalizeS3BucketAccessPolicy","Effect":?"Allow","Principal":?{"Service":?"personalize.amazonaws.com"},"Action":?["s3:GetObject","s3:ListBucket"],"Resource":?["arn:aws:s3:::user-personalization-demo-batch-input","arn:aws:s3:::user-personalization-demo-batch-input/*","arn:aws:s3:::user-personalization-demo-batch-output","arn:aws:s3:::user-personalization-demo-batch-output/*","arn:aws:s3:::user-personalization-demo-fulldata","arn:aws:s3:::user-personalization-demo-fulldata/*","arn:aws:s3:::user-personalization-demo-datasetupdate","arn:aws:s3:::user-personalization-demo-?datasetupdate/*","arn:aws:s3:::user-personalization-demo","arn:aws:s3:::user-personalization-demo/*"]}] }*左滑查看更多
點擊保存策略
創(chuàng)建 Lambda Service Role,賦予 Amazon Lambda訪問Amazon S3,Amazon Personalize 的權(quán)限
進入 Amazon IAM,點擊 Roles,點擊 Create role
點擊 Amazon Lambda
添加 AmazonS3FullAccess, CloudWatchFullAccess , AWSLambdaFullAccess , AmazonPersonalizeFullAccess
輸入 Role name:lambda-s3-personalize。點擊 Create role 完成 role 創(chuàng)建。
數(shù)據(jù)處理
MovieLens 數(shù)據(jù)需要進行處理來滿足 Amazon Personalize 的數(shù)據(jù)要求。下面的代碼會對評分數(shù)據(jù)修改列名,生成用戶數(shù)據(jù),對電影數(shù)據(jù)修改列名。并結(jié)果保存成 csv 格式
‘users.csv’ 用戶數(shù)據(jù)
‘items.csv’ 電影數(shù)據(jù)
‘interacts.csv’ 用戶評分數(shù)據(jù)
*左滑查看更多
數(shù)據(jù)導入
本文中,需要用戶數(shù)據(jù)集,電影數(shù)據(jù)集和交互數(shù)據(jù)集創(chuàng)建一個數(shù)據(jù)集組。
{"type":?"record","name":?"Items","namespace":?"com.amazonaws.personalize.schema","fields":?[{"name":?"ITEM_ID","type":?"string"},{"name":?"GENRES","type":?["null","string"],"categorical":?true}],"version":?"1.0" }{"type":?"record","name":?"Users","namespace":?"com.amazonaws.personalize.schema","fields":?[{"name":?"USER_ID","type":?"string"},{"name":?"RATE_F","type":?["float","null"]}],"version":?"1.0" }{"type":?"record","name":?"Interactions","namespace":?"com.amazonaws.personalize.schema","fields":?[{"name":?"USER_ID","type":?"string"},{"name":?"ITEM_ID","type":?"string"},{"name":?"RATING","type":?["null","float"]},{"name":?"TIMESTAMP","type":?"long"}],"version":?"1.0"*左滑查看更多
創(chuàng)建模型訓練 Amazon Lambda 函數(shù)
使用 lambda-s3-personalize role 和下面的代碼創(chuàng)建模型訓練 Amazon Lambda 函數(shù),訓練需指定訓練 recipe,數(shù)據(jù)組
import?json import?urllib.parse import?boto3 import?logging import?os import?re import?datetime from?botocore.exceptions?import?ClientError import?timelogger?=?logging.getLogger() logger.setLevel(logging.INFO)def?lambda_handler(event,?context):#?獲取用戶ID,aws區(qū)域record?=?event['Records'][0]client?=?boto3.client("sts")account_id?=?client.get_caller_identity().get('Account')awsRegion?=?record['awsRegion']create_solution_response?=?Nonesolution_name?=?'user-personalization-demo'recipe_arn?=?"arn:aws:personalize:::recipe/aws-user-personalization"?#?訓練的recipedataset_group_arn?=?'arn:aws:personalize:%s:%s:dataset-group/user-personalization-demo'?%?(awsRegion,account_id)personalize?=?boto3.Session().client('personalize')#?創(chuàng)建一個新的solutiontry:create_solution_response?=?personalize.create_solution(name=solution_name,?recipeArn=?recipe_arn,?datasetGroupArn?=?dataset_group_arn,performHPO?=?True,solutionConfig={'hpoConfig':?{'hpoResourceConfig':?{'maxNumberOfTrainingJobs':?'30','maxParallelTrainingJobs':?'10'}}})solution_arn?=?create_solution_response['solutionArn']print('solution_arn:?',?solution_arn)except?personalize.exceptions.ClientError?as?e:if?'EVENT_INTERACTIONS'?not?in?str(e):print(json.dumps(create_solution_response,?indent=2))print(e)time.sleep(120)#?首先創(chuàng)建一個新的solution version。此過程為模型訓練,時間較長,所以不需要等待其訓練結(jié)果。執(zhí)行完成后直接結(jié)束lambda函數(shù)即可。 try:solution_arn='arn:aws:personalize:%s:%s:solution/user-personalization-demo'?%?(awsRegion,account_id)create_solution_version_response?=?personalize.create_solution_version(solutionArn?=?solution_arn)solution_version_arn?=?create_solution_version_response['solutionVersionArn']print('solution_version_arn:',?solution_version_arn)except?Exception?as?e:print(e)raise?e*左滑查看更多
在創(chuàng)建完 Amazon Lambda 函數(shù)之后,可以為訓練函數(shù)添加定時訓練觸發(fā)。例如我們可以用 EventBridge 定義每月訓練一次cron(0 2 1 * ? *)。
? ? ? ? ? ? ? ? ?創(chuàng)建數(shù)據(jù)更新,
模型更新 Amazon Lambda 函數(shù)
使用 lambda-s3-personalize role 和下面的代碼創(chuàng)建 Amazon Lambda 函數(shù),代碼會對新增的數(shù)據(jù) csv 文件進行解析,并更新 Amazon Personalize 中相應(yīng)的數(shù)據(jù),最后對模型進行更新。模型更新是為了在未來的推薦中有新用戶或者是新電影
如果您的數(shù)據(jù)中有必須字段之外的字段,需在代碼中添加相應(yīng)字段以完成數(shù)據(jù)導入。
import?json import?urllib.parse import?boto3 import?logging import?os import?re import?datetime import?csv from?botocore.exceptions?import?ClientError import?timelogger?=?logging.getLogger() logger.setLevel(logging.INFO)print('Loading?function')s3_client?=?boto3.client('s3')def?lambda_handler(event,?context):#?獲取s3文件觸發(fā)相關(guān)信息(s3路徑)record?=?event['Records'][0]downloadBucket?=?record['s3']['bucket']['name']key?=?urllib.parse.unquote(record['s3']['object']['key'])#?獲取用戶ID,aws區(qū)域client?=?boto3.client("sts")account_id?=?client.get_caller_identity().get('Account')awsRegion?=?record['awsRegion']print(key)print(event)print(account_id)print(awsRegion)logger.info(key)prefix?=?'user-personalization-'personalize?=?boto3.Session().client('personalize')personalize_runtime?=?boto3.Session().client('personalize-runtime')personalize_events?=?boto3.Session().client('personalize-events')role_arn?=?'arn:aws:iam::%s:role/PersonalizeRole'?%?account_id#?下載文件到lambda本地目錄進行處理??download_path?=?'/tmp/{}'.format(key)?s3_client.download_file(downloadBucket,?key,?download_path)try:#?用戶數(shù)據(jù)增量更新if?'users'?in?key:datasetType?=?'USERS'with?open(download_path,?'r')?as?this_csv_file:#?讀取csv文件data?=?csv.reader(this_csv_file,?delimiter=",")colList?=?[]userlist?=?[]for?line?in?data:if?len(colList)?==?0:colList?=?lineelse:newTmp?=?{'userId':?line[colList.index(?'USER_ID'?)],'properties':"{\"RATE_F\":%s}"?%(line[colList.index(?'RATE_F'?)])}userlist?=?userlist?+?[newTmp]personalize_events.put_users(datasetArn='arn:aws:personalize:%s:%s:dataset/user-personalization-demo/%s'%?(awsRegion,account_id,datasetType),??????????????????????????users=userlist)print('updated?users')#?商品數(shù)據(jù)增量更新if?'items'?in?key:datasetType?=?'ITEMS'with?open(download_path,?'r')?as?this_csv_file:#?讀取csv文件data?=?csv.reader(this_csv_file,?delimiter=",")colList?=?[]itemlist?=?[]for?line?in?data:if?len(colList)?==?0:colList?=?lineelse:newTmp?=?{'itemId':?str(line[colList.index(?'ITEM_ID'?)]),'properties':'''{\"creationTimestamp\":%s,\"GENRES\":\"%s\"}'''?%(int(time.time()),line[colList.index(?'GENRES'?)])}itemlist?=?itemlist?+?[newTmp]personalize_events.put_items(datasetArn='arn:aws:personalize:%s:%s:dataset/user-personalization-demo/%s'%?(awsRegion,account_id,datasetType),??????????????????????????items=itemlist)print('updated?items')#?交互數(shù)據(jù)增量更新if?'interacts'?in?key:datasetType?=?'INTERACTIONS'event_tracker_name?=?'user-personalization-demo'dataset_group_arn?=?'arn:aws:personalize:%s:%s:dataset-group/user-personalization-demo'%?(awsRegion,account_id)#?創(chuàng)建?eventTrackereven_tracker_response?=?personalize.create_event_tracker(name=event_tracker_name,datasetGroupArn=dataset_group_arn)event_tracker_arn??=?even_tracker_response['eventTrackerArn']event_tracking_id?=?even_tracker_response['trackingId']print(even_tracker_response)print(event_tracking_id)time.sleep(180)#?逐行導入交易數(shù)據(jù)with?open(download_path,?'r')?as?this_csv_file:#?讀取csv文件data?=?csv.reader(this_csv_file,?delimiter=",")colList?=?[]for?line?in?data:if?len(colList)?==?0:colList?=?lineelse:personalize_events.put_events(trackingId?=?event_tracking_id,userId=?line[colList.index(?'USER_ID'?)],sessionId?=?'1',eventList?=?[{'sentAt':?int(time.time()),'eventType'?:?str(line[colList.index(?'EVENT_TYPE'?)]),'itemId'?:?line[colList.index(?'ITEM_ID'?)],'properties':?'''{\"EVENT_VALUE\":%s}?'''?%?(line[colList.index(?'EVENT_VALUE'?)])}])print('updated?interacts')#?刪除?eventTrackerresponse?=?personalize.delete_event_tracker(eventTrackerArn=event_tracker_arn)#?在更新完交易數(shù)據(jù)后進行user-personalization模型更新solution_arn?=?'arn:aws:personalize:%s:%s:solution/user-personalization-demo'%?(awsRegion,account_id)?create_solution_version_response?=?personalize.create_solution_version(solutionArn?=?solution_arn,?trainingMode?=?"UPDATE")solution_version_after_update?=?create_solution_version_response['solutionVersionArn']print('updated?solution')except?Exception?as?e:print(e)print('Error?getting?object?{}?from?bucket?{}.?Make?sure?they?exist?and?your?bucket?is?in?the?same?region?as?this?function.'.format(key,?downloadBucket))raise?e*左滑查看更多
由于是數(shù)據(jù)更新,所以我們在 Amazon Lambda 函數(shù)的觸發(fā)設(shè)定為 Amazon S3 桶事件觸發(fā)。在 Amazon S3 新增數(shù)據(jù)桶中有任何的數(shù)據(jù)導入,都會觸發(fā)該函數(shù)。當 user-personalization-demo-datasetupdate 桶中有數(shù)據(jù)更新時,會觸發(fā)數(shù)據(jù)更新和模型更新。
創(chuàng)建模型批量推理 Amazon Lambda 函數(shù)
用下面的代碼創(chuàng)建 Amazon Lambda 函數(shù),代碼會根據(jù) user-personalization-demo-batch-input中的用戶數(shù)據(jù)列表,Amazon Personalize 會為這些用戶做出系統(tǒng)推薦,并且將模型推薦結(jié)果寫入 user-personalization-demo-batch-output
import?json import?urllib.parse import?boto3 import?logging import?os import?re import?datetime from?botocore.exceptions?import?ClientError import?timelogger?=?logging.getLogger() logger.setLevel(logging.INFO)def?lambda_handler(event,?context):#?獲取用戶ID,aws區(qū)域record?=?event['Records'][0]client?=?boto3.client("sts")account_id?=?client.get_caller_identity().get('Account')awsRegion?=?record['awsRegion']#?batch的job名字和用戶權(quán)限名稱current_time?=?int(time.time())batchJobName?=?'user-personalization-demo-batchPredict-%s'%current_timerole_arn?=?'arn:aws:iam::%s:role/PersonalizeRole'?%?account_idpersonalize?=?boto3.Session().client('personalize')#?獲取最新的模型地址solution_versions_response?=?personalize.list_solution_versions(solutionArn='arn:aws:personalize:%s:%s:solution/user-personalization-demo'?%?(awsRegion,account_id),maxResults=100)solution_version_arn?=?solution_versions_response['solutionVersions'][0]['solutionVersionArn']?#?選取最新模型try:#?批量推薦personalize.create_batch_inference_job?(solutionVersionArn?=?solution_version_arn,jobName?=?batchJobName,roleArn?=?role_arn,batchInferenceJobConfig?=?{#?optional?USER_PERSONALIZATION?recipe?hyperparameters,模型探索比例,新電影時間定義(樣例中為20天)"itemExplorationConfig":?{??????"explorationWeight":?"0.3,"explorationItemAgeCutOff":?"20"}},#?輸入數(shù)據(jù)的s3桶地址jobInput?=?{"s3DataSource":?{"path":?"s3://user-personalization-demo-batch-input/"}},#?輸出結(jié)果的s3桶地址jobOutput?=?{"s3DataDestination":?{"path":?"s3://user-personalization-demo-batch-output/"}})except?Exception?as?e:print(e)raise?e*左滑查看更多
Amazon Lambda 函數(shù)的觸發(fā)設(shè)定為 Amazon S3 桶事件觸發(fā)。在 Amazon S3 模型預測用戶列表數(shù)據(jù)桶中有任何的數(shù)據(jù)導入,都會觸發(fā)該函數(shù)
推理用戶列表數(shù)據(jù)要求為 json 格式
{"userId"?:?"XXXX1"} {"userId"?:?"XXXX2"} {"userId"?:?"XXXX3"}*左滑查看更多
推薦結(jié)果存在 user-personalization-demo-batch-output s3 桶中,格式如下:
*左滑查看更多
總結(jié)
到此我們已經(jīng)利用 Amazon Personalize 構(gòu)建了一個推薦服務(wù)?,F(xiàn)在 Amazon Personalize 將每個月定期重新訓練模型。當我們的應(yīng)用往新增數(shù)據(jù)桶中導入數(shù)據(jù)時,Amazon Personalize 也將為數(shù)據(jù)和模型進行更新。當應(yīng)用往 ‘user-personalization-demo-batch-input’S3 桶中導入新的用戶數(shù)據(jù)列表時,Amazon Personalize 將為這些用戶進行批量推薦,并將推薦結(jié)果寫到 ‘user-personalization-demo-batch-input’S3 桶中。
本篇作者
陳恒智
亞馬遜云科技專業(yè)服務(wù)團隊數(shù)據(jù)科學家
掃描上方二維碼即刻注冊
聽說,點完下面4個按鈕
就不會碰到bug了!
總結(jié)
以上是生活随笔為你收集整理的使用 Amazon Personalize 快速搭建推荐服务的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 关于日历点击跳转以及短信点击跳转到指定联
- 下一篇: 怎样快速查询单号物流信息,筛选出未签收的