ai与虚拟现实_将AI推向现实世界
ai與虛擬現(xiàn)實
If you fit one of these profiles, this article is for you:
如果您適合這些配置文件之一,那么本文適合您:
● You are a data science manager. You’d like to improve your team’s productivity with some best practices.
● 您是數(shù)據(jù)科學經(jīng)理。 您想通過一些最佳實踐來提高團隊的生產(chǎn)力。
● You are a data scientist. You’d like to learn what happens downstream: How your model turns into a product.
● 您是一名數(shù)據(jù)科學家。 您想了解下游發(fā)生了什么:您的模型如何變成產(chǎn)品。
● You are a software architect. You are designing or expanding a platform to support data science use cases.
● 您是一名軟件架構(gòu)師。 您正在設(shè)計或擴展一個平臺來支持數(shù)據(jù)科學用例。
I recently completed an online course that I think you should check out. It’s called Full Stack Deep Learning. It covers the full lifecycle of an AI application, from ideation through deployment but it does not cover theory or model fitting. If you are an intermediate data scientist and want to “zoom out” from your niche, this course will show you how the sausage is made, tracking it from one station to the next.
我最近完成了在線課程,我認為您應(yīng)該退出該課程。 這就是所謂的全棧深度學習 。 它涵蓋了從構(gòu)思到部署的AI應(yīng)用程序的整個生命周期,但不涵蓋理論或模型擬合。 如果您是中級數(shù)據(jù)科學家,并且想從自己的細分市場中“脫穎而出”,那么本課程將向您展示如何制作香腸 ,并從一個工作站到另一個工作站進行跟蹤。
The course started out as a pricy SF-based bootcamp in 2018 but is now available for free. It features some industry heavyweights, including Tesla’s Andrej Karpahy and fast.ai’s Jeremy Howard. I took it because I wanted to compare my own practices against what the celebrities do. By way of background, I am a partner at Genpact, a consulting company. I help clients transform their processes and sometimes their business using AI. In practice, this means I create a proof of concept (POC) to demonstrate potential value and then run the team that implements the solution to capture that value.
該課程于2018年作為基于SF的價格昂貴的訓練營開始,但現(xiàn)在免費提供。 它具有一些行業(yè)重量級人物,包括特斯拉的安德烈·卡帕希(Andrej Karpahy)和fast.ai的杰里米·霍華德(Jeremy Howard)。 我之所以這么做,是因為我想將自己的做法與名人的行為進行比較。 作為背景,我是咨詢公司Genpact的合伙人。 我?guī)椭蛻羰褂肁I來改變他們的流程,有時甚至是他們的業(yè)務(wù)。 在實踐中,這意味著我將創(chuàng)建概念證明(POC)來展示潛在價值,然后運行實施該解決方案的團隊來獲取該價值。
“Full Stack Deep Learning” exceeded my expectations. It is organized into six content areas as well as hands-on labs and guest lectures from AI luminaries. Here is what I found most new and useful:
“全棧深度學習”超出了我的期望。 它分為六個內(nèi)容區(qū)域,以及來自AI專家的動手實驗室和客座演講。 這是我發(fā)現(xiàn)的最新穎和有用的內(nèi)容:
1.設(shè)置ML項目 (1. Setting up ML Projects)
Photo by Sven Mieke on Unsplash Sven Mieke在Unsplash上拍攝的照片This is an “executive” module that discusses planning, prioritizing, staffing and scheduling AI projects.
這是一個“執(zhí)行”模塊,討論了AI項目的計劃,優(yōu)先級劃分,人員配備和日程安排。
I did not find much new here, but it is a good, concise executive overview. Since the course focuses on deep learning as opposed to traditional ML, it brings up three important points:
我在這里沒有發(fā)現(xiàn)太多新內(nèi)容,但這是一個很好,簡潔的執(zhí)行概述。 由于該課程側(cè)重于深度學習而不是傳統(tǒng)ML,因此它提出了三個重點:
● Deep learning (DL) , unlike more traditional machine learning, is “still research.” You should not plan for a 100% success rate
●與更傳統(tǒng)的機器學習不同,深度學習(DL)是“仍在研究中”。 您不應(yīng)該計劃100%的成功率
● If you are “graduating” from “classical” ML to DL, plan on spending a lot more time and money on labeling than you are used to…
●如果您要從“經(jīng)典” ML逐漸“升級”到DL,則計劃花費更多的時間和金錢來貼標簽,這比您過去習慣的要多得多。
● …but don’t throw out your playbook. In both cases, you are looking for settings where cheap prediction will have a large business impact
●…但不要丟掉您的劇本。 在這兩種情況下,您都在尋找便宜的預(yù)測會對業(yè)務(wù)產(chǎn)生重大影響的設(shè)置
2.基礎(chǔ)設(shè)施和工具 (2. Infrastructure and Tooling)
Kyle Head on Kyle Head的UnsplashUnsplash圖片This is the module I found most helpful. It sets up a comprehensive framework for developing an AI/ML application, from the lab through production. At each layer or category, it covers key functionality, how it fits with other layers and the major tool choices.
這是我發(fā)現(xiàn)最有用的模塊。 它為從實驗室到生產(chǎn)的AI / ML應(yīng)用程序開發(fā)建立了一個全面的框架。 在每個層或類別中,它涵蓋了關(guān)鍵功能,如何與其他層配合以及主要的工具選擇。
Let me emphasize: what makes this course different is how comprehensive the framework is. Most public AI/ML content is focused on model development. Some sources cover just data management or just deployment. Commercial vendors often understate the complexity of the process and skip steps. This is the most “panoramic” picture I’ve seen if you are trying to understand the AI/ML pipeline from alpha to omega.
讓我強調(diào)一下:使本課程與眾不同的是框架的全面程度。 大多數(shù)公共AI / ML內(nèi)容都集中在模型開發(fā)上。 一些資源僅涉及數(shù)據(jù)管理或部署。 商業(yè)供應(yīng)商常常低估了流程的復雜性,并跳過了步驟。 如果您試圖了解從alpha到omega的AI / ML管道,這是我所見過的最“全景”圖片。
The course is “opinionated” — it sometimes calls “category winners” which is helpful if you’re placing bets. For example, it calls Kubernetes as a winner in the “resource management” category. I agree with most of these calls, but not with all. For example, among cloud providers it picks AWS and pans Azure as having a “bad user experience.” While AWS is excellent, several of our clients (rightly) chose Azure, particularly those that already have a Microsoft stack (Excel, MS SQL, etc.)
該課程是“有針對性的”-有時稱為“類別優(yōu)勝者”,這對您下注很有幫助。 例如,它稱Kubernetes為“資源管理”類別的贏家。 我同意這些電話中的大多數(shù),但不是全部。 例如,在云提供商中,它選擇AWS并將Pan Azure視為具有“糟糕的用戶體驗”。 雖然AWS非常出色,但我們的幾個客戶(正確地)選擇了Azure,特別是那些已經(jīng)具有Microsoft堆棧的客戶(Excel,MS SQL等)
After setting up the overall framework, this module digs into Development and Training/Evaluation. I found three areas particularly interesting:
在建立了總體框架之后,本模塊將深入研究開發(fā)與培訓/評估。 我發(fā)現(xiàn)三個方面特別有趣:
● Prototyping: I’m always looking for quick and easy ways to create a proofs of concept (POCs) for clients. I need to produce a visually attractive, interactive POC that is easily accessible over a public or semi-public URL. My ideal solution would give me code-level control over the model while not making me code a lot of HTML or Javascript. One-click deployment is a plus. I’ve been using Shiny but would like to do something similar with Python. The course introduced me to streamlit, which I will be investigating further. Also interesting is dash, which is curiously not covered.
● 原型制作 :我一直在尋找快速簡便的方法來為客戶創(chuàng)建概念證明(POC)。 我需要制作一個視覺上吸引人的交互式POC,可以輕松地通過公共或半公共URL進行訪問。 我理想的解決方案將使我能夠?qū)δP瓦M行代碼級控制,而又不會使我編寫大量HTML或Javascript。 一鍵式部署是一個加號。 我一直在使用Shiny,但想使用Python做類似的事情。 本課程將我介紹給streamlit ,我將對其進行進一步研究。 有趣的是dash ,奇怪的是沒有涵蓋。
● Experiment Management is an interesting category: It keeps track of how well your model performs under a variety of configuration options (experiments). I coded my own version of this for competing on Kaggle. I didn’t know this was a category with a name. I will be checking out a few of the tools recommended by this course, including Weights and Biases.
● 實驗管理是一個有趣的類別:它跟蹤模型在各種配置選項(實驗)下的性能。 我編寫了自己的版本,以便在Kaggle上競爭。 我不知道這是一個帶有名稱的類別。 我將檢查本課程推薦的一些工具,包括Weights和Biases 。
● All-in-one: There was a nice, informative comparison between all the all-in-one platforms available. AWS SageMaker and GCP AI look like the best choices at the moment. If pressed, I would bet others will be acquired or copied by the cloud providers.
● 多合一: 在所有可用的多合一平臺之間進行了很好的,信息豐富的比較。 AWS SageMaker和GCP AI看起來是目前的最佳選擇。 如果按下,我敢打賭其他人將被云提供商收購或復制。
3.數(shù)據(jù)管理 (3. Data Management)
Photo by Vince Veras on Unsplash 文斯·維拉斯 ( Vince Veras)攝于UnsplashThis module discusses how to store and manage datasets related to your pipeline. I did not find much new here. The material on data augmentation was interesting, but mostly applies to computer vision, which I have not done much of.
本模塊討論如何存儲和管理與管道相關(guān)的數(shù)據(jù)集。 我在這里沒有發(fā)現(xiàn)太多新東西。 關(guān)于數(shù)據(jù)增強的材料很有趣,但主要適用于計算機視覺,而我并未做太多工作。
4.機器學習團隊 (4. Machine Learning Teams)
Photo by Matteo Vistocco on Unsplash Matteo Vistocco在Unsplash上的照片This module discusses the HR portion of the project: roles, team structure, managing projects, etc. In my view, this content belongs in module 1 above — Setting up ML Projects.
本模塊討論項目的人力資源部分:角色,團隊結(jié)構(gòu),管理項目等。在我看來,此內(nèi)容屬于上面的模塊1 —設(shè)置ML項目。
There were some interesting points about how to get a job in the field — for hiring managers and candidates. There is also a good summary of the typical roles in an ML project:
關(guān)于如何在該領(lǐng)域找到一份工作,有一些有趣的觀點,即招聘經(jīng)理和候選人。 ML項目中的典型角色也有很好的總結(jié):
5.培訓和調(diào)試 (5. Training and Debugging)
Photo by Steven Lelham on Unsplash 史蒂文·萊勒姆 ( Steven Lelham)在Unsplash上拍攝的照片This module discusses the process of getting a model to work in the lab. It should really be required reading for every data scientist, and is similar to the workflow I used to win several Kaggle contests. You can also get this content in many other places, but this is a well-organized and succinct presentation:
本模塊討論使模型在實驗室中可用的過程。 每位數(shù)據(jù)科學家都必須閱讀該書,并且該書與我贏得過幾次Kaggle競賽的工作流程相似。 您還可以在許多其他地方獲得此內(nèi)容,但這是一個組織良好且簡潔的演示文稿:
The discussion around debugging DL models was particularly good: Get your model to run, overfit a single batch and compare to a known result. Just to illustrate the depth, here is the subsection on overfitting a single batch:
關(guān)于調(diào)試DL模型的討論特別好:讓您的模型運行,過度擬合單個批處理并與已知結(jié)果進行比較。 只是為了說明深度,這是關(guān)于過度擬合單個批次的小節(jié):
More tips on overfitting are at the end of the article.
本文的末尾提供了更多關(guān)于過度擬合的技巧。
6.測試與部署 (6. Testing and Deployment)
Photo by Louis Reed on Unsplash Louis Reed在Unsplash上拍攝的照片This module discusses how to get your model from the lab to the real world. It’s the module I was originally looking for when I took the class. I found several useful nuggets here:
本模塊討論如何將模型從實驗室轉(zhuǎn)移到現(xiàn)實世界。 這是我上課時最初尋找的模塊。 我在這里找到了幾個有用的塊:
Testing an ML system is very different from testing traditional software because its behavior is driven by the data as well as the algorithm:
測試ML系統(tǒng)與測試傳統(tǒng)軟件有很大不同,因為它的行為是由數(shù)據(jù)和算法驅(qū)動的:
You need to adjust your test suite accordingly. The course provides an excellent checklist for doing just that, taken from the now-famous paper Hidden Technical Debt in Machine Learning Systems.
您需要相應(yīng)地調(diào)整測試套件。 本課程提供了一個出色的清單,該清單摘自如今著名的論文《機器學習系統(tǒng)中的隱藏技術(shù)債務(wù)》 。
The course recommends you check that training-time and production-time variables have approximately consistent distributions (Monitoring Test 3, above). This is a critical test. It can help you detect a runtime error, such as blanks in the data feed. It can also tell you it may be time to re-train the model because the input is different than what you expected. A simple way to accomplish this is to plot training data vs production-time data, variable by variable. The Domino Data Lab tool does this.
本課程建議您檢查培訓時間和生產(chǎn)時間變量是否具有大致一致的分布(上面的監(jiān)控測試3)。 這是一項關(guān)鍵測試。 它可以幫助您檢測運行時錯誤,例如數(shù)據(jù)饋送中的空白。 它還可以告訴您可能是時候重新訓練模型了,因為輸入的內(nèi)容與您的預(yù)期不同。 一種簡單的方法是繪制訓練數(shù)據(jù)與生產(chǎn)時間數(shù)據(jù),并逐變量繪制。 Domino Data Lab工具可以執(zhí)行此操作。
A better way, which is not covered in the course, is to use adversarial validation: Train an auxiliary model (in production) which tries to classify an observation as belonging to train or prod data. If this model is successful at distinguishing the two, you have a significant distribution shift. You can then inspect the model to find the most important variables that drive that shift.
更好的方法(本課程中未涉及)是使用對抗性驗證 :訓練(生產(chǎn)中的)輔助模型,該模型試圖將觀察結(jié)果分類為屬于訓練或生產(chǎn)數(shù)據(jù)。 如果此模型可以成功地區(qū)分兩者,則您的分配將發(fā)生重大變化。 然后,您可以檢查模型,以找出驅(qū)動這一轉(zhuǎn)變的最重要變量。
Deployment is covered with a good introduction to Kubernetes and Docker, as well as GPU-based model serving.
Kubernetes和Docker以及基于GPU的模型服務(wù)都很好地介紹了部署。
客座講座 (Guest Lectures)
The course includes guest lectures from industry heavyweights. The quality is highly variable. Some speakers are polished and prepared, others… not so much. I was most impressed with two guests:
該課程包括來自行業(yè)重量級人物的客座演講。 質(zhì)量變化很大。 有些揚聲器是經(jīng)過拋光和準備的,而另一些則不是。 兩個客人給我留下了最深刻的印象:
● Jeremy Howard of fast.ai: This talk provided lots of “news you can use” in terms of improving model performance.
● fast.ai的 杰里米·霍華德 ( Jeremy Howard ) :在提高模型性能方面,此演講提供了許多“您可以使用的新聞”。
o The Fast.ai library is designed to use fewer resources (human and machine) to get good results. For example, training ImageNet in 3 hours for $25. This focus on efficiency is very much aligned with what our clients are looking for.
o Fast.ai庫旨在使用更少的資源(人力和機器資源)來獲得良好的結(jié)果。 例如, 以3美元的價格在3個小時內(nèi)培訓ImageNet 。 對效率的關(guān)注與我們的客戶所尋找的非常一致。
o Howard asks “Why are people trying to automate machine learning?” The idea is we can get much better results working together. He calls this “AugmentML” vs. “AutoML.” Platform.ai is a case in point. It is a labeling product that allows the labeler to have an interactive “conversation” with a neural network. Each iteration improves both the labels and the model. I’ve never seen anything like it, and it seems to work, at least on the video he shared.
o霍華德問:“為什么人們試圖自動化機器學習?” 我們的想法是,我們可以一起獲得更好的結(jié)果。 他將其稱為“ AugmentML”與“ AutoML”。 Platform.ai就是一個很好的例子。 它是一種貼標產(chǎn)品,允許貼標者與神經(jīng)網(wǎng)絡(luò)進行交互式“對話”。 每次迭代都會改善標簽和模型。 我從未見過類似的東西,而且至少在他分享的視頻上,它似乎奏效了。
o Howard shares a box of tricks for improving model performance, particularly for computer vision tasks. I found Test Time Augmentation (TTA) particularly eye-opening. Will have to try it in my next project.
霍華德(Howard)分享了一些技巧,以提高模型性能,特別是對于計算機視覺任務(wù)。 我發(fā)現(xiàn)測試時間增強 (TTA)尤其令人大開眼界。 將不得不在我的下一個項目中嘗試。
● Andrej Karpathy of Tesla: This talk was interesting as well, although the audio wasn’t great. Karpathy discussed his Software 2.0 concept, the idea that we will increasingly use optimization methods like gradient descent to solve problems probabilistically rather than devising fixed software rules or heuristics to solve them. Like many others, I found this mental model compelling.
● 特斯拉(Tesla)的安德烈(Andrej Karpathy) :這個演講也很有趣,盡管音頻效果不佳。 Karpathy討論了他的Software 2.0概念,即我們將越來越多地使用梯度下降等優(yōu)化方法來概率地解決問題,而不是設(shè)計固定的軟件規(guī)則或試探法來解決問題。 像許多其他人一樣,我發(fā)現(xiàn)這種心理模型令人信服。
離別的想法 (Parting Thoughts)
The course is not perfect. A lot of this material was created in 2018 and is starting to show its age. Three examples:
課程并不完美。 許多此類材料創(chuàng)建于2018年,并開始顯示其年代。 三個例子:
● Richard Socher, chief scientist at Salesforce.com, is arguing for a unified NLP model with something called decaNLP. BERT has since taken over this niche, and GPT3 is an exciting recent development.
●Salesforce.com的首席科學家Richard Socher主張使用稱為decaNLP的統(tǒng)一NLP模型。 從那以后, BERT接管了這個利基市場,而GPT3是令人振奮的最新發(fā)展。
● Model Explainability has developed rapidly over the past few years, but is not well represented
●在過去的幾年中, 模型解釋能力得到了快速發(fā)展,但代表性不足
● As mentioned above, Microsoft Azure has been making strides since and does not get a fair shake in my view
●如上所述,自此以來,Microsoft Azure一直在取得長足進步,在我看來并沒有引起太大的動搖
Despite these nits, I think the course packs a lot of value into a compact and well-organized frame. The price is right, and I recommend it to anyone interested in understanding how AI/ML applications are built.
盡管有這些技巧,但我認為該課程將很多價值打包到一個緊湊且組織良好的框架中。 價格合適,我向有興趣了解如何構(gòu)建AI / ML應(yīng)用程序的任何人推薦。
Lastly, and for no particular reason, I hope you will enjoy this thrilling conclusion:
最后,出于特殊原因,我希望您會喜歡這個令人振奮的結(jié)論:
翻譯自: https://towardsdatascience.com/moving-ai-to-the-real-world-e5f9d4d0f8e8
ai與虛擬現(xiàn)實
總結(jié)
以上是生活随笔為你收集整理的ai与虚拟现实_将AI推向现实世界的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 嵌入式和非嵌入式_我如何向非技术同事解释
- 下一篇: 亚马逊训练alexa的方法_Alexa对