文本分析工具 数据科学_数据科学工具
文本分析工具 數據科學
The Data Scientist is the "Sexiest job of 21 Century", by Harvard Business Review, however, what specifically will a data Scientist do, what tools do they use?
數據科學家是《哈佛商業評論》(Harvard Business Review)所說的“ 21世紀最勤奮的工作”,但是,數據科學家將具體做什么工作,他們將使用哪些工具?
Data Science as a profession will be outlined as people operating and experimenting with information to answer relevant data-related inquiries and building and deploying scalable models that support the data. They use heaps of technical tools to investigate data, build models and cite their observations.
數據科學作為一種專業將被概述為人們在操作和嘗試信息以回答與數據相關的相關查詢,以及建立和部署支持數據的可伸縮模型。 他們使用大量技術工具來調查數據,建立模型并引用他們的觀察結果。
Here could be a list of them:
這可能是它們的列表:
1. Git / GitHub (1. Git/GitHub)
A versioning system, lowlife and GitHub are well widespread all told domains involving open supply comes, collaborations and maintaining code. It's a vastly widespread tool employed by Data Scientists to preserve their findings and code blocks. GitHub has additionally been termed as your "Digital Resume" for the very fact recruiters are analyzing a person’s skills.on GitHub.
一個版本控制系統,lowlife和GitHub廣為人知,涉及開放供應,協作和代碼維護領域。 它是數據科學家用來保護其發現和代碼塊的一種廣泛使用的工具。 由于招聘人員正在分析某人的技能,因此GitHub還被稱為您的“ 數字簡歷 ”。在GitHub上。
2.編程語言界面 (2. Programming Languages Interface)
Python, Spider, Subline, Jupyter Notebooks for Julia, R, RStudio, PyCharm, Notepad++, Colab by Google and several other IDE and Code-writing platforms are an awfully widespread tool employed by Data Scientists.
適用于Julia,R,RStudio,PyCharm,Notepad ++,Google的Colab以及其他幾種IDE和代碼編寫平臺的Python,Spider,Subline,Jupyter Notebooks是數據科學家廣泛使用的工具。
3. Orange和IBM Watson (3. Orange and IBM Watson)
Orange, IBM Watson and lots of different automatic Machine Learning design building frameworks are a handy tool for Data Scientists and Machine Learning Engineers to experiment with different models and to create extremely scalable renewable Machine Learning architecture.
Orange,IBM Watson和許多不同的自動機器學習設計構建框架對于數據科學家和機器學習工程師來說都是一種方便的工具,可以嘗試使用不同的模型并創建可擴展的可擴展機器學習架構。
4. D3.js和Tableau (4. D3.js and Tableau)
Analytics is an integral part of the data Science advancement and understanding the data via visualization makes a data scientist capable of responsive most data-driven queries from pure observations. For this, D3.js and Tableau have established to be an excellent catalyst particularly within the field of Business Analytics. Honorable mention additionally goes to Excel and PowerBI.
分析是數據科學進步不可或缺的一部分,通過可視化了解數據使數據科學家能夠響應來自純觀測值的大多數數據驅動的查詢。 為此,D3.js和Tableau已經確立了成為極好的催化劑,特別是在業務分析領域。 值得一提的是Excel和PowerBI。
5. Hadoop,Mahout,Apache,Hive和Pig (5. Hadoop, Mahout, Apache, Hive, and Pig)
After the appearance of BigData, many frameworks are developed to handle vast streams of information, to investigate it and build models on that. Whereas Hadoop is extremely widespread for its distributed filing system referred to as HDFS(Hadoop Distributed File System), Apache and Driver for machine learning incorporation and Hive and Pig for quicker huge information integration; these are extremely powerful and favored tools employed by Data Scientists.
BigData出現之后,開發了許多框架來處理大量信息流,進行研究并在此基礎上建立模型。 Hadoop因其稱為HDFS(Hadoop分布式文件系統)的分布式文件系統而極為廣泛,而Apache和用于機器學習合并的驅動程序,以及Hive和Pig用于更快地進行大規模信息集成; 這些是數據科學家使用的極其強大且受人歡迎的工具。
6. NoSql,MongoDB,Cassandra,MySQL (6. NoSql, MongoDB, Cassandra, MySQL)
SQL increasing to Structured source language is an integrated part of the direction that falls underneath the primary quarter of the advancement of information Science. Whereas MySQL has been the selection of veterans, MongoDB has picked up some serious pace and has established to be extremely used tools by Data Scientists.
增加到結構化源語言SQL是該方向的一個組成部分,該方向屬于信息科學發展的主要部分。 MySQL是退伍軍人的首選,而MongoDB已經取得了一些認真的進展,并已被數據科學家確定為極為有用的工具。
7.編程語言的軟件包/模塊 (7. Packages/Modules of Programming Languages)
Packages in several programming languages are a crucial side in writing easy, reusable and economical code. In Python, packages like pandas, NumPy, Scipy, matplolib, bokeh, seaborn, stats model, collections, sci-kit-learn, urllib, beautifulsoup and lots of additional are terribly ordinarily employed by Data Scientists. Similarly, in R, tidy, ggplot2, etc., are notable mentions.
幾種編程語言的程序包在編寫簡單,可重用和經濟的代碼方面至關重要。 在Python中,數據科學家通常會嚴格地使用pandas,NumPy,Scipy,matplolib,bokeh,seaborn,stats模型,集合,sci-kit-learn,urllib,beautifulsoup等軟件包。 同樣,在R中,值得注意的是tidy,ggplot2等。
翻譯自: https://www.includehelp.com/data-science/tools-for-data-science.aspx
文本分析工具 數據科學
總結
以上是生活随笔為你收集整理的文本分析工具 数据科学_数据科学工具的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: ruby宝石区块链最新消息_Ruby宝石
- 下一篇: java中创建窗口用的什么_JAVA 窗