日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

敏捷数据科学pdf_敏捷数据科学数据科学可以并且应该是敏捷的

發(fā)布時(shí)間:2023/11/29 编程问答 39 豆豆
生活随笔 收集整理的這篇文章主要介紹了 敏捷数据科学pdf_敏捷数据科学数据科学可以并且应该是敏捷的 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

敏捷數(shù)據(jù)科學(xué)pdf

TL;DR;

TL; DR;

  • I have encountered a lot of resistance in the data science community against agile methodology and specifically scrum framework;

    在數(shù)據(jù)科學(xué)界,我遇到了許多反對(duì)敏捷方法論(特別是Scrum框架)的抵制。
  • I don’t see it this way and claim that most disciplines would improve by adopting agile mindset;

    我不這樣認(rèn)為,并認(rèn)為通過(guò)采用敏捷的思維方式,大多數(shù)學(xué)科都將得到改善。
  • We will go through a typical scrum sprint to highlight the compatibility of the data science process and the agile development process.

    我們將經(jīng)歷一個(gè)典型的Scrum沖刺,以突出數(shù)據(jù)科學(xué)過(guò)程與敏捷開(kāi)發(fā)過(guò)程的兼容性。
  • Finally, we discuss when a scrum is not an appropriate process to follow. If you are a consultant working on many projects at a time or your work requires deep concentration on a single and narrow issue (narrow, so that you alone can solve it).

    最后,我們討論了Scrum何時(shí)不適合遵循的過(guò)程。 如果您是同時(shí)從事多個(gè)項(xiàng)目的顧問(wèn),或者您的工作需要專注于一個(gè)狹窄的問(wèn)題(狹窄,那么您一個(gè)人就能解決)。

I have found a medium post recently, which claims that Scrum is awful for data science. I’m afraid I have to disagree and would like to make a case for Agile Data Science.

我最近發(fā)現(xiàn)了一篇中篇文章,其中聲稱Scrum 對(duì)于數(shù)據(jù)科學(xué)非常糟糕 。 恐怕我不得不不同意,并希望為敏捷數(shù)據(jù)科學(xué)辯護(hù)。

Ideas for this post are significantly influenced by the Agile Data Science 2.0 book (which I highly recommend) and personal experience. I am eager to know other experiences, so please share them in the comments.

這篇文章的想法在很大程度上受到敏捷數(shù)據(jù)科學(xué)2.0本書(shū)(我強(qiáng)烈推薦)和個(gè)人經(jīng)驗(yàn)的影響。 我很想知道其他經(jīng)歷,所以請(qǐng)?jiān)谠u(píng)論中分享。

First, we need to agree on what data science is and how it solves business problems so we can investigate the process of data science and how agile (and specifically Scrum) can improve it.

首先,我們需要就什么是數(shù)據(jù)科學(xué)及其如何解決業(yè)務(wù)問(wèn)題達(dá)成共識(shí),以便我們可以調(diào)查數(shù)據(jù)科學(xué)的過(guò)程以及敏捷性(特別是Scrum)如何改進(jìn)它。

什么是數(shù)據(jù)科學(xué)? (What is Data Science?)

There are countless definitions online. For example, Wikipedia gives such a description:

在線上有無(wú)數(shù)的定義。 例如, 維基百科給出了這樣的描述:

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data.

數(shù)據(jù)科學(xué)是一個(gè)跨學(xué)科領(lǐng)域,它使用科學(xué)的方法,過(guò)程,算法和系統(tǒng)從許多結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)中提取知識(shí)和見(jiàn)解。

In my opinion, it is quite an accurate definition of what data science tries to accomplish. But I would simplify this definition further.

我認(rèn)為,這是對(duì)數(shù)據(jù)科學(xué)要完成的工作的準(zhǔn)確定義。 但是,我將進(jìn)一步簡(jiǎn)化該定義。

Data Science solves business problems by combining business understanding, data and algorithms.

數(shù)據(jù)科學(xué)通過(guò)結(jié)合業(yè)務(wù)理解,數(shù)據(jù)和算法來(lái)解決業(yè)務(wù)問(wèn)題。

Compared to the definition in Wikipedia, I would like to stress that data scientists should aim to solve business problems rather than “extract knowledge and insights.”

與Wikipedia中的定義相比,我想強(qiáng)調(diào)的是,數(shù)據(jù)科學(xué)家應(yīng)該致力于解決業(yè)務(wù)問(wèn)題,而不是“ 提取知識(shí)和見(jiàn)解”。

數(shù)據(jù)科學(xué)如何解決業(yè)務(wù)問(wèn)題? (How Data Science Solves business problems?)

So data science is here to solve business problems. We need to accomplish a few things along the way:

因此,數(shù)據(jù)科學(xué)在這里可以解決業(yè)務(wù)問(wèn)題。 我們需要在此過(guò)程中完成幾件事:

  • Understand the business problem;

    了解業(yè)務(wù)問(wèn)題;
  • Identify and acquire available data;

    識(shí)別并獲取可用數(shù)據(jù);
  • Clean / transform / prepare data;

    清理/轉(zhuǎn)換/準(zhǔn)備數(shù)據(jù);
  • Select and fit an appropriate “model” for a given data;

    為給定的數(shù)據(jù)選擇合適的“模型”;
  • Deploy model to “production” — this is our attempt to solving a given problem;

    將模型部署到“生產(chǎn)”中–這是我們解決給定問(wèn)題的嘗試;
  • Monitoring performance;

    監(jiān)測(cè)績(jī)效;
  • As with everything, there are countless ways to go about implementing those steps, but I will try to persuade you that the agile (incremental and iterative) approach brings the most value to the company and the most joy to data scientists.

    與所有內(nèi)容一樣,執(zhí)行這些步驟的方法有無(wú)數(shù)種,但是我將嘗試說(shuō)服您敏捷(增量和迭代)方法為公司帶來(lái)最大的價(jià)值,并為數(shù)據(jù)科學(xué)家?guī)?lái)最大的樂(lè)趣。

    敏捷數(shù)據(jù)科學(xué)宣言 (Agile Data Science Manifesto)

    I took this from page 6 in the Agile Data Science 2.0 book, so you are encouraged to read the original, but here it is:

    我是從敏捷數(shù)據(jù)科學(xué)2.0本書(shū)的第6頁(yè)上摘下來(lái)的,因此鼓勵(lì)您閱讀原始文檔,但此處是:

    • Iterate, iterate, iterate — tables, charts, reports, predictions.

      迭代,迭代,迭代-表格,圖表,報(bào)告,預(yù)測(cè)。
    • Ship intermediate output. Even failed experiments have output.

      運(yùn)送中間輸出。 即使失敗的實(shí)驗(yàn)也可以輸出。
    • Prototype experiments over implementing tasks.

      在執(zhí)行任務(wù)方面進(jìn)行原型實(shí)驗(yàn)。
    • Integrate the tyrannical opinion of data in product management.

      將數(shù)據(jù)的專橫觀點(diǎn)整合到產(chǎn)品管理中。
    • Climb up and down the data-value pyramid as you work.

      在工作時(shí)上下爬數(shù)據(jù)值金字塔。
    • Discover and pursue the critical path to a killer product.

      發(fā)現(xiàn)并尋求關(guān)鍵產(chǎn)品的關(guān)鍵途徑。
    • Get meta. Describe the process, not just the end state.

      獲取元數(shù)據(jù)。 描述過(guò)程,而不僅僅是結(jié)束狀態(tài)。

    Not all the steps are self-explanatory, and I encourage you to go and read what Russel Jurney had to say, but I hope that the main idea is clear — we share and intermediate output, and we iterate to achieve value.

    并非所有步驟都是不言自明的,我鼓勵(lì)您去閱讀Russel Jurney所說(shuō)的內(nèi)容,但是我希望主要思想是明確的-我們共享和中間產(chǎn)出,并不斷迭代以實(shí)現(xiàn)價(jià)值。

    Given the above preliminaries, let us go over a standard week for a scrum team. And we will assume a one week sprint.

    鑒于以上初步介紹,讓我們?yōu)橐粋€(gè)Scrum團(tuán)隊(duì)度過(guò)一個(gè)標(biāo)準(zhǔn)的星期。 我們將假設(shè)一個(gè)星期的沖刺。

    Scrum團(tuán)隊(duì)沖刺 (Scrum Team Sprint)

    第一天 (Day 1)

    There are many sprint structure variations, but I will assume that planning is done on Monday morning. The team will decide which user stories from the product backlog will be transferred to the Sprint backlog. The most pressing issue for our business, as evident from the backlog ranking, is customer fraud — fraudulent transactions are causing our valuable customers out of our platform. During the previous backlog refinement session, the team already discussed this task, and the product owner got additional information from the Fraud Investigation team. So during the meeting, the team decides to start with a simple experiment (and already is thinking of interesting iterations further down the road) — an initial model based on simple features of the transaction and participating users. Work is split so that the data scientist can go and have a look at the data team identified for this problem. The data engineer will set up the pipeline for model output integration to DWH systems, and the full-stack engineer starts to set up a page for transaction review and alert system for the Fraud Investigation team.

    sprint結(jié)構(gòu)有很多變化,但我將假定計(jì)劃在星期一早上完成。 團(tuán)隊(duì)將決定將產(chǎn)品積壓中的哪些用戶故事轉(zhuǎn)移到Sprint積壓中。 從積壓的排名中可以明顯看出,我們業(yè)務(wù)最緊迫的問(wèn)題是客戶欺詐-欺詐性交易正使我們寶貴的客戶退出平臺(tái)。 在上一個(gè)待辦事項(xiàng)優(yōu)化會(huì)話中,團(tuán)隊(duì)已經(jīng)討論了此任務(wù),產(chǎn)品所有者從欺詐調(diào)查團(tuán)隊(duì)獲得了更多信息。 因此,在會(huì)議期間,團(tuán)隊(duì)決定從一個(gè)簡(jiǎn)單的實(shí)驗(yàn)開(kāi)始(并且已經(jīng)在考慮下一步的有趣迭代),這是一個(gè)基于交易和參與用戶的簡(jiǎn)單特征的初始模型。 工作是分開(kāi)的,以便數(shù)據(jù)科學(xué)家可以去看看針對(duì)此問(wèn)題確定的數(shù)據(jù)團(tuán)隊(duì)。 數(shù)據(jù)工程師將建立將模型輸出集成到DWH系統(tǒng)的管道,而全棧工程師將開(kāi)始為欺詐調(diào)查團(tuán)隊(duì)設(shè)置一個(gè)頁(yè)面,用于事務(wù)審查和警報(bào)系統(tǒng)。

    第二天 (Day 2)

    At the start of Tuesday, all team gathers and shares progress. Data scientist shows a few graphs which indicate that even with limited features, we will have a decent model. At the same time, the data engineer is already halfway through setting up the system to score incoming transactions with the new model. The full-stack engineer is also progressing nicely, and just after a few minutes, everyone is back at their desk working on the agreed tasks.

    在星期二初,所有團(tuán)隊(duì)聚集并分享進(jìn)步。 數(shù)據(jù)科學(xué)家顯示了一些圖表,這些圖表表明即使功能有限,我們也將擁有一個(gè)不錯(cuò)的模型。 同時(shí),數(shù)據(jù)工程師已經(jīng)完成設(shè)置系統(tǒng)的一半,以使用新模型對(duì)傳入的交易進(jìn)行評(píng)分。 全職工程師的進(jìn)度也不錯(cuò),幾分鐘后,每個(gè)人都回到了辦公桌前,完成約定的任務(wù)。

    第三天 (Day 3)

    As with Tuesday, the team starts Wednesday with a standup meeting to share their progress. There is already a simple model build and some accuracy and error rate numbers. The data engineer shows the infrastructure for the transaction scoring, and the team discusses how the features arrive at the system and what needs to be done for them to be ready for the algorithm. The full-stack engineer shows the admin panel with metadata on transactions is displayed and the triggering mechanism. Another discussion follows on the threshold value for the model output to trigger a message for a fraud analyst. The team agrees that we need to be able to adjust this value since different models might have different distributions, and also, depending on other variables, we might want to increase and decrease the number of approved transactions.

    與星期二一樣,團(tuán)隊(duì)從星期三開(kāi)始進(jìn)行站立會(huì)議,以分享他們的進(jìn)度。 已經(jīng)有一個(gè)簡(jiǎn)單的模型構(gòu)建以及一些準(zhǔn)確性和錯(cuò)誤率數(shù)字。 數(shù)據(jù)工程師展示了交易評(píng)分的基礎(chǔ)架構(gòu),團(tuán)隊(duì)討論了功能如何到達(dá)系統(tǒng)以及需要做什么才能使其準(zhǔn)備好算法。 全棧工程師將顯示管理面板,其中顯示有關(guān)事務(wù)的元數(shù)據(jù)以及觸發(fā)機(jī)制。 接下來(lái)是關(guān)于模型輸出的閾值以觸發(fā)欺詐分析者消息的討論。 團(tuán)隊(duì)同意我們必須能夠調(diào)整此值,因?yàn)椴煌哪P涂赡芫哂胁煌姆植?#xff0c;并且根據(jù)其他變量,我們可能希望增加和減少批準(zhǔn)的交易數(shù)量。

    第四天 (Day 4)

    On Thursday, the team already has all the pieces, and during the standup, discuss how to integrate those pieces. Team also outlines how to best monitor models in production, so that model performance could be evaluated and also degradation could be detected before it causes any real damage. They agree that a simple dashboard for monitoring accuracy and error rates will suffice for now.

    星期四,團(tuán)隊(duì)已經(jīng)掌握了所有內(nèi)容,在站立比賽中,討論了如何整合這些內(nèi)容。 團(tuán)隊(duì)還概述了如何在生產(chǎn)中最好地監(jiān)視模型,以便可以評(píng)估模型性能并在導(dǎo)致任何實(shí)際損害之前檢測(cè)出退化。 他們一致認(rèn)為,目前僅需要一個(gè)用于監(jiān)視準(zhǔn)確性和錯(cuò)誤率的簡(jiǎn)單儀表板即可。

    第五天 (Day 5)

    Friday is a demo day. During standup, the team discusses the last issues remaining with the first iteration of the transaction fraud detection. Team members prepare for the meeting with the fraud analysts that will be using this solution.

    星期五是演示日。 在站立期間,團(tuán)隊(duì)討論事務(wù)欺詐檢測(cè)的第一次迭代中剩下的最后一個(gè)問(wèn)題。 團(tuán)隊(duì)成員準(zhǔn)備與將使用此解決方案的欺詐分析師進(jìn)行會(huì)議。

    During the demo, the team shows what they have built for the fraud analysts. The team presents performance metrics and their implications for the fraud analysts. All feedback is converted to tasks for future sprints.

    在演示期間,團(tuán)隊(duì)將展示他們?yōu)槠墼p分析人員構(gòu)建的內(nèi)容。 該團(tuán)隊(duì)介紹了績(jī)效指標(biāo)及其對(duì)欺詐分析師的影響。 所有反饋都轉(zhuǎn)換為任務(wù),以供將來(lái)沖刺。

    Another vital part of the Sprint is a retrospective — meeting where the team discusses three things:1. What went well in the Sprint;

    Sprint的另一個(gè)重要組成部分是回顧會(huì)議-團(tuán)隊(duì)討論三件事的會(huì)議:1。 在Sprint中進(jìn)展順利;

    2. What could be improved;

    2.有待改進(jìn)的地方;

    3. What will we commit to improving in the next Sprint;

    3.在下一個(gè)Sprint中我們將致力于改進(jìn)什么;

    再往前走 (Further down the road)

    During the next Sprint, the team is working on another most important item from the product backlog. It might be feedback from the fraud analysts, or it might be something else that the product owner thinks will improve the overall business the most. However, the team closely monitors the performance of the initial version of the solution. It will continue to do so because ML solutions are sensitive to changes in underlying assumptions that the model made about data distribution.

    在下一個(gè)Sprint期間,團(tuán)隊(duì)正在處理產(chǎn)品積壓中的另一個(gè)最重要的項(xiàng)目。 這可能是欺詐分析師的反饋,也可能是產(chǎn)品所有者認(rèn)為可以最大程度改善整體業(yè)務(wù)的其他方面。 但是,團(tuán)隊(duì)將密切監(jiān)視解決方案初始版本的性能。 它將繼續(xù)這樣做,因?yàn)镸L解決方案對(duì)模型對(duì)數(shù)據(jù)分布所做的基本假設(shè)的更改敏感。

    討論區(qū) (Discussion)

    Above is a relatively “clean” exposition of the scrum process for data science solutions. Real-world rarely is that way, but I wanted to convey a few points:

    上面是數(shù)據(jù)科學(xué)解決方案的Scrum過(guò)程的相對(duì)“干凈”的闡述。 現(xiàn)實(shí)世界很少采用這種方式,但我想表達(dá)幾點(diǎn):

  • Data Science cannot stand on its own. If we’re going to impact the real world we have to collaborate in a cross-functional team, it should be a part of a wider team;

    數(shù)據(jù)科學(xué)不能自立。 如果要影響現(xiàn)實(shí)世界,我們必須在跨職能團(tuán)隊(duì)中進(jìn)行協(xié)作,這應(yīng)該成為更廣泛團(tuán)隊(duì)的一部分。
  • Iteration is critical in data science, and we should expose artifacts of those iterations to our stakeholders to receive feedback as fast as possible;

    迭代在數(shù)據(jù)科學(xué)中至關(guān)重要,我們應(yīng)該將這些迭代的工件暴露給我們的涉眾,以便盡快獲得反饋。
  • Scrum is a framework that is designed for iterative progress. Therefore it is a perfect fit for data science work;

    Scrum是一個(gè)專為迭代進(jìn)度而設(shè)計(jì)的框架。 因此,它非常適合數(shù)據(jù)科學(xué)工作;
  • However, it is not a framework for any endeavor. If your job requires you to think deeply for days, then Scrum and agile would probably be very disruptive and counterproductive. Also, if your work requires you to handle a lot of different and small data science-related tasks, following Scrum would be inappropriate, and maybe Kanban should be considered. However, typical product data science work is not like that. Iteration is king, and getting feedback fast is key to providing the right solutions to business problems.

    但是,這不是任何努力的框架。 如果您的工作需要您深入思考數(shù)日,那么Scrum和敏捷可能會(huì)非常破壞性且適得其反。 另外,如果您的工作要求您處理許多與小數(shù)據(jù)科學(xué)相關(guān)的不同任務(wù),那么遵循Scrum是不合適的,也許應(yīng)該考慮看板。 但是,典型的產(chǎn)品數(shù)據(jù)科學(xué)工作并非如此。 迭代為王,快速??獲得反饋對(duì)于提供正確的業(yè)務(wù)問(wèn)題解決方案至關(guān)重要。

    綜上所述 (In summary)

    Data Science is a perfect fit for the Scrum with a single modification — we do not expect to ship finished models. Instead, we ship artifacts of our work and solicit feedback from our stakeholders so we can make progress faster. Project managers might not like data science for the unpredictability of the progress, but iteration is not at fault, it is the only way forward.

    只需修改一下,Data Science就非常適合Scrum —我們不希望交付完成的模型。 取而代之的是,我們運(yùn)送工作的工件并征求利益相關(guān)者的反饋,以便我們更快地取得進(jìn)展。 項(xiàng)目經(jīng)理可能不喜歡數(shù)據(jù)科學(xué),因?yàn)樗哂胁豢深A(yù)測(cè)的進(jìn)度,但是迭代并不是錯(cuò)誤,這是前進(jìn)的唯一途徑。

    I would like to know what you think about agile data science? What has worked for you and your team? What didn’t work? I hope you will leave a comment!

    我想知道您如何看待敏捷數(shù)據(jù)科學(xué)? 什么對(duì)您和您的團(tuán)隊(duì)有用? 什么沒(méi)用? 希望您發(fā)表評(píng)論!

    翻譯自: https://towardsdatascience.com/agile-data-science-data-science-can-and-should-be-agile-c719a511b868

    敏捷數(shù)據(jù)科學(xué)pdf

    總結(jié)

    以上是生活随笔為你收集整理的敏捷数据科学pdf_敏捷数据科学数据科学可以并且应该是敏捷的的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

    如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。