Principles of Machine Learning -- Before You Start 翻译
全世界都在學習AI,當然我也不能例外。自動駕駛、人臉識別、遍地的機器人。。。So,今天起,我將開始著手翻譯Principles of Machine Learning全書,全書共7個章節加一個導讀,如果中間摻雜有實驗,我也會和大家一起來完成。那么現在,讓我們開始機器學習的旅程吧!
Introduction
Welcome to the principles of Machine Learning!?My name is Cynthia Rudin.?>> And I’m Steve Elston.?>> Now machine learning is everywhere.?This is the time for machine learning;
it’s becoming mainstream, it’s in the search engines we use every day, it’s in the bank teller machines reading our checks, it’s in our smart phone assistance like Cortana, it’s – you know,
jobs in machine learning are in every industry and we are thrilled to be able to give you an instruction to machine learning in this course.?So let’s Steven and I introduce ourselves first.
So I am an associate professor of computer science and electrical and computer engineering at Duke, and an associate professor of statistics at MIT, and my main expertise is in machine learning and data mining.
My lab is called “the prediction analysis” lab. And I have a PhD from Princeton University, and a lot of my work that I do is applied in machine learning and it’s applied to problems in the electric power history, in healthcare,
and in computational criminology.?>>
Hi and I’m Steve Elston.?I’m a co-founder and principle consultant at a data science consultancy at Seattle called Quantia Analytics.?I’ve been working in predictive analytics and machine learning for several decades now.
I’ve been a long-term R S/SPLUS Python user and developer, started using S when it was a Bell labs project and of course more – you know, in recent decade moved to R like everybody else.
I’m currently an advisor on Azure machine learning and some other analytics products to Microsoft, and I’ve worked in a variety of industries:
payment fraud prevention, telecommunication, capital markets including things like market credit risk models, clearing, and collateral management,
and also worked in several industrial areas such as forecasting for logistics management.
And I have a PhD also from Princeton University and mine is in geophysics.?>> Now when I first learned about machine learning, I thought it was magic.
A way for computers to predict the future, just by seeing the past.?And you know, it’s a way for computers to learn on their own how to solve problems that I can’t solve, and that’s exactly what’s going on.
Computers are learning, just from observing what’s happened in the past.?But it’s nothing like magic.?Now machine learning, in addition to being a really useful toolbox for industrial applications,
it also gives you a perspective about the way your mind works.?So let’s say that I asked you why you could learn and why a computer can’t, right, what would you say?
Would you say that it’s because you’ve seen more of the world than a computer has?
I mean, I think that’s not particularly true anymore, because we have lots of pictures and video and sound now that we could feed to any computer.
Is it because there are more connections in your brain than in a computer??Well that might be part of it, but lots of creatures with much smaller brains than my computer can still learn,
so that’s not it.?Maybe you could argue that a brain is more flexible in some ways than a computer;?maybe you could think your brain is somehow more open to identifying new types of patterns than your computer,
and that’s why you can learn perhaps.
The interesting thing is that actually that’s not quite the way it is;?in fact, it’s sort of the opposite.
Your brain is really good at identifying only certain kinds of patterns;?in fact, these are the types of patterns that it’s expecting.
The fact that humans can learn is not so much a consequence of so much of the human brain being flexible, as it is of the human brain being inflexible,
being wired to identify exactly the types of patterns that it comes across, right.?Natural images, real sounds, patterns of behavior… these are – you know, these are things that we’re really good at identifying. Humans are absolutely awful at identifying patterns in large databases,right, we can’t – we just can’t learn in some settings, and what enables us to learn in the settings we can learn in is the way that our brains are wired. It’s the structure in our mind;?it’s not the flexibility, it’s the limited flexibility.
It’s just that structure.?Okay so what is the field of machine learning exactly??It completely revolves around setting up structures in the computer that limit its flexibility and allow it to learn.
Okay, setting up these structures is really a form of statistical modelling, and that’s what we’re going to do in this course.?And once you can teach a computer to learn, there are a huge number of applications that you can use it on.
>> So, let’s talk about a few of the applications that we’ll use both for our demos and for the labs that you’re going to do hands-on in this course.?So first off, we’re going to do a classification example,
and we’ll be coming back to this in several points in the course – actually each of these and these examples,
and so we’re going to work on classifying diabetes patients who have been in a hospital for treatment and we want to classify the ones who are at high risk that they’re going to be readmitted to the hospital;
that is, that somehow their treatment or the follow up to their treatment or something isn’t likely to be sufficient and they’re going to wind up being re-hospitalized, which is, as you can imagine a serious problem.
It’s expensive, it’s dangerous for the patients, etc. so there’s a lot of reasons why this is an important area.?We’re going to look at forecasting;?forecasting for demand is used all over the place from warehouse management to power generation.
In particular, we’re going to look at forecasting demand for rented bicycles, and so that will be an – again, an application we’ll come back to at several points in this course.
A lot of these things are done in clustering and segmentation, and we’re going to look at segmenting people by their income level, and that’s an –
again, an analog for lots of different things that are done and everything from political science to marketing.?And finally, we’re going to look at how a recommender works;
we’re going to use a restaurant database of Mexican restaurants and compute some recommendations for some of the customers who have written reviews for these Mexican restaurants.?>>
Okay now as I mentioned, humans are lousy at finding patterns in large databases, and so here are some of the applications that we’re working on in my lab that use large databases and machine learning,
and in all of these applications, the answer is really in the data.?It really is, and by providing the computer with the proper machine learning structure to find important patterns, we can really make headway into societal problems.
For instance, we’ve been looking at power grid failures and personalized advertising, and healthcare applications.
>> So, why would you want to continue with this course??What should you expect to get out of this course??Well first off, it’s going to be a hands-on introduction to machine learning.
We have some great labs laid out here, there’s going to be demos – so you’re going to gain some practical experience at working with data and applying machine learning algorithms of various types to those data.
We’re going to look at actually all the major focus areas in machine learning, so we’ll cover a wide variety of algorithms,
methods and techniques.?We’re going to use Azure machine learning quite a bit for demos and for your labs;?and why actually we’re doing this, it’s not only a great environment,
but it’s also a great learning environment because a lot of the tedious stuff is kind of taking care for you, so there’s a lot of things you won’t have to spend time when you do your weekly labs.
Nonetheless, we’ll do a significant amount of data cleaning and visualization using R and/or Python, you can pick which path you’re on.?So we’ll be working – you can be building some skills with that. And we hope that as you go along here as you work on these examples as you listen to the theory lectures, you start to build some intuition around analytics and machine learning and how it all fits together and mostly given intuition of what’s a useful result, what’s adding value, and what’s going in the direction you or say your boss wants you to go.?And we’re going to minimize the math; there’s not going to be any heavy theories, so if you remember a little bit of calculus and some minimal linear algebra, you should be good to go here.
So what are we going to cover specifically in this course??So the first module, we’re going to discuss an introduction to classification, and classification is – in the history of machine learning is kind of where machine learning grew out of largely.?
Then we’re going to talk about regression, and regression is also – many regression methods that are important in machine learning and they have even a much longer history in statistics going back to the late 19th century.?
We’re going to then talk about how do you – once you have improved machine learning models, how do you evaluate the performance??How do you know what to do to improve that performance??
We’re going to then look at some more modern powerful methods like tree and ensemble learning methods and if you don’t know what that means, stay tuned you’ll find out a lot about it.?
And we’re going to look at optimization-based learning methods such as sport vector machines and neural networks.?And we’ll finish up with clustering and recommenders.?>>?
So, as you’re taking this course, we hope you will take some steps to get the most out of it to maximize your learning experience.
So overall, think about the fact that this course is going to be over 6 weeks, we have one module per week over those 6weeks so you can kind of plan your time and your work that way.?
For each module, we have lectures, demos, and labs;?and the labs derive from the lectures and the demos and they are for you to do on your own to reinforce key learning concepts.
And you’ll perform the labs using – as I already mentioned, Azure machine learning, but also either R or Python, and I suggest you decide if you’re going to use R or Python.
Every lab has the same materials or the same steps in either language;?it doesn’t matter in terms of the learning experience.?If you’re very ambitious of course you can try both, but for most people just doing one or the other is going to be just great.?So some of you want to get the certificate from this course, so what do you need to know??
First off, you need a 70% score to pass and get the certificate, and that score is divided between assessments at the end of each of the 6 modules, and the final exam.?
So each module assessment that – or all those module assessments together are half your grade, 50% of your grade, and on each question for the assessment, you actually get two tries so if you mess it up the first time don’t panic,?
you get another chance.?The other half of your grade is a final exam at the end of the class.?This one you only get one try per question, but by then you’ve been through the lectures, you’ve seen all the demos,
and you’ve done all the labs, and so you should – you know, be in a great position to ace that.?
So we hope you get a lot out of this course, and we’re looking forward to presenting it and I think it’s going to be really great informative class to get yourself bootstrapped into the wonderful world of machine learning!
歡迎來到機器學習的原理!我叫辛西婭·魯丁。>>,我是Steve Elston。>>現在機器學習到處都是。這是機器學習的時代;
機器學習已成為主流,它應用與我們每天使用的搜索引擎中,它在銀行柜員機里讀取我們的支票,它在我們的智能手機幫助下,像Cortana,它-你知道的,
機器學習應用于我們工作中的每一個行業,我們很高興能在這門課上給你一個機器學習的指導。讓我們先介紹一下我們自己。
我是杜克大學計算機科學與電子計算機工程的副教授,麻省理工學院的統計學副教授,我的主要專長是機器學習和數據挖掘。
我的實驗室被稱為“預測分析”實驗室,我在普林斯頓大學有一個博士學位,我的很多工作都應用了機器學習,比如說電力歷史,醫療保健,
在計算犯罪學。> >
大家好,我是史蒂夫·埃爾斯頓。我是西雅圖一家名為Quantia Analytics的數據科學咨詢公司的聯合創始人和首席顧問。我從事預測分析和機器學習已經有幾十年了。
我是一個長期的R S/SPLUS Python用戶和開發人員,開始使用S時是在貝爾實驗室的一個項目中,當然后來-你知道,在最近十年,像其他人一樣轉移到R。
我現在是Azure機器學習的顧問和微軟的其他分析產品,我在很多行業工作過:?例如支付欺詐預防,電信,資本市場包括市場信用風險模型,清算和抵押品管理,并在多個工業領域工作,如物流管理預測。
我有一個普林斯頓大學的博士學位,我主攻地球物理學。>>現在當我第一次學習機器學習的時候,我覺得它很神奇。
一種通過觀察過去來預測未來的方法。你知道,這是一種讓電腦自己學習如何解決我無法解決的問題的方法,而這正是正在發生的事情。
計算機正在學習,僅僅是通過觀察過去發生的事情。但這并不是魔法。現在機器學習,除了作為工業應用的一個非常有用的工具箱,
它也給你一個關于你的思維運作方式的視角。假設我問你為什么你可以去主動學習,電腦卻不會主動學習呢,你會說什么?
你會說這是因為你看到的世界比電腦還多嗎?
我的意思是,我認為這不再是事實了,因為我們有很多圖片,視頻和聲音,現在我們可以輸入進任何電腦。
是因為大腦中的神經網絡比電腦的多嗎?這可能是其中的一部分,但是很多大腦比我的電腦小的生物仍然可以學習,
所以這不是它。也許你可以認為大腦在某些方面比電腦更靈活;也許你會認為你的大腦比你的電腦更容易識別出新的模式,
這就是為什么你可以學習。
有趣的是,事實并非如此;事實上,這恰恰相反。
你的大腦非常善于識別特定的模式;實際上,這些是它所期望的模式類型。
人類能夠學習的事實與其說是由于人類大腦的靈活,不如說是由于人類大腦的靈活性,
通過連線來確定它所遇到的模式的類型。自然圖像,真實聲音,行為模式這些都是我們非常擅長識別的東西。
人類對于在大型數據庫中識別模式是非常可怕的,對吧,我們不能——我們不能在某些環境中學習,在我們可以學習的環境中,讓我們學習的是我們的大腦是如何連接的。這是我們頭腦中的結構;這不是靈活性,而是有限的靈活性。
這就是結構!那么機器學習的領域到底是什么呢?它完全圍繞在計算機中設置結構,限制其靈活性并允許它學習。
好的,建立這些結構實際上是一種統計模型,這是我們在這門課上要做的。一旦你可以教電腦學習,你可以使用大量的應用程序。
>>所以,讓我們來討論一下我們將會用到的一些應用程序,用于我們的演示和實驗室,你們將在這門課上親自動手。首先,我們要做一個分類的例子,
我們將會在課程的幾個方面回到這一點——實際上,每一個例子,
所以我們將致力于對那些在醫院接受治療的糖尿病患者進行分類我們想要對那些高危人群進行分類他們將被重新接納到醫院;
也就是說,他們的治療或后續治療可能是不夠的他們會被重新送進醫院,這是一個很嚴重的問題。
它很貴,對病人來說很危險,所以這是一個重要的領域有很多原因。我們來看看預測;對需求的預測從倉庫管理到發電都使用了。
特別地,我們將會看到租賃自行車的需求預測,這將是一個應用,我們將在這門課的幾個點上討論這個應用。
很多事情都是在聚類和分割中完成的,我們會考慮按收入水平細分人們,這是。
再一次,這是一種類似于許多不同事物的模擬,從政治科學到市場營銷。最后,我們來看看推薦者是如何工作的;
我們將使用一家墨西哥餐館的餐館數據庫,并為一些為這些墨西哥餐館撰寫評論的顧客提供一些建議。> >
好了,正如我剛才提到的,人類在大型數據庫中發現模式很糟糕,所以這里有一些應用程序我們在我的實驗室中使用大型數據庫和機器學習,
在所有這些應用中,答案都是在數據中。它確實是,并且通過提供計算機與適當的機器學習結構來尋找重要的模式,我們真的可以在社會問題上取得進展。
例如,我們一直在研究電網故障和個性化廣告,以及醫療應用。
>>,你為什么要繼續這門課?你希望從這門課中學到什么?首先,這將是機器學習的入門介紹。
我們這里有一些很棒的實驗室,會有一些演示,所以你會獲得一些實際的經驗,在處理數據和將各種類型的機器學習算法應用到這些數據中。
我們將會看到機器學習中所有主要的重點領域,我們將涉及到各種各樣的算法,
方法和技術。我們將使用Azure機器來學習一些演示和實驗室的知識;為什么我們要這么做,這不僅是一個偉大的環境,但這也是一個很棒的學習環境因為很多繁瑣的事情都是為了照顧你,所以你不用花時間在每周的實驗室里。盡管如此,我們將使用R和/或Python做大量的數據清理和可視化,您可以選擇您所使用的路徑。所以我們會工作——你可以用它來建立一些技能。
和我們希望你在這里工作在這些例子中你聽理論講座,你開始建立一些直覺分析和機器學習和如何相互配合,主要是直覺的一個有用的結果,增加價值是什么,什么方向你或你的老板要你去說。我們要最小化數學;
不會有什么大的理論,所以如果你還記得一些微積分和最小的線性代數,應該最好。
那么我們在這門課中具體要講什么呢?第一個模塊,我們將討論分類的介紹,分類是——在機器學習的歷史中,機器學習是在很大程度上產生的。
然后我們將討論回歸,回歸也是很多回歸方法在機器學習中很重要他們甚至有更長的歷史可以追溯到19世紀晚期。
我們接下來要講的是如何——一旦你有了更好的機器學習模型,你如何評價它的性能?你怎么知道該怎么做才能提高你模型的性能?
我們將會看到一些更現代的強大的方法比如樹和集成學習方法。如果你不知道這意味著什么,請繼續關注,你會發現很多關于它的東西。
我們將研究基于優化的學習方法,比如運動向量機和神經網絡。我們將以聚類和推薦結束。> >
所以,當你學習這門課程的時候,我們希望你能采取一些步驟來最大化你的學習經驗。
所以總的來說,考慮到這門課要超過6周的時間,我們每周有一個模塊在這6周內所以你可以安排你的時間和你的工作。
對于每個模塊,我們都有講座、演示和實驗室;實驗室來自于講課和演示,它們是你自己做的,以強化關鍵的學習概念。
你將使用-我已經提到過的,Azure機器學習,也可以使用R或Python,我建議你決定是否使用R或Python。
每個實驗室在兩種語言中都有相同的材料或相同的步驟;就學習經驗而言,這無關緊要。如果你雄心勃勃,你當然可以同時嘗試這兩種方法,但對大多數人來說,只做一件事或另一件事就太好了。你們有些人想從這門課拿到證書,你們需要知道什么?
首先,你需要一個70%的分數才能通過并獲得證書,而這個分數在6個模塊的末尾和期末考試中分成兩部分。
所以每個模塊評估——或者所有這些模塊的評估都是你的一半分數,50%的分數,在每一個評估的問題上,你實際上得到了兩個嘗試如果你第一次把它搞砸了不要驚慌,
你得到另一個機會。你們成績的另一半是期末考試。這個你只需要一個問題,但是到那時你已經通過了講座,你已經看到了所有的演示,
你已經做了所有的實驗,所以你應該——你知道,處于一個非常有利的位置。
所以我們希望你能從這門課中學到很多東西,我們期待著展示它,我認為這將是一個非常好的信息課程,讓你自己進入機器學習的奇妙世界!
總結
以上是生活随笔為你收集整理的Principles of Machine Learning -- Before You Start 翻译的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 关于端口号
- 下一篇: 我的编程之路点滴记录(五)