日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

OnLineML一:关于Jubatus 的简介...

發布時間:2023/12/31 编程问答 32 豆豆
生活随笔 收集整理的這篇文章主要介紹了 OnLineML一:关于Jubatus 的简介... 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一:簡介:原文鏈接:jubat.us/en/ ?xuwenq.iteye.com/blog/1702746

Jubatus?http://jubat.us/en/overview.html?是一個面向大數據數據流的分布式在線機器學習的開源框架,和storm有些類似,但是從介紹上來看,它提供了更多的功能。?
?Jubatus認為未來的數據分析平臺應該同時向三個方向展開:處理更大的數據,深層次的分析和實時處理;而當前還沒有一種能夠處理不斷生成的流式大數據的水平可擴展的分布式架構。Hadoop的mapreduce能夠處理大數據,但不能做復雜的機器學習算法;Apache Mahout是基于Hadoop的機器學習平臺,但不適用于在線處理數據流。?
Jubatus將在線機器學習,分布式計算和隨機算法等的優勢結合在一起用于機器學習,并支持分類,回歸,推薦等基本元素。根據其設計目的,Jubatus有如下的特點:?

  • 可擴展:支持可擴展的機器學習處理。在普通硬件集群上處理數據速度高達100000條/秒
  • 實時計算:實時分析數據和更新模型
  • 深層次的數據分析:支持各種分析計算:分類,回歸,統計,推薦等
? Jubatus還是一個很年輕的項目,當前最新的發布版本是0.3.2(c++), 但暫時還沒有看到有商業使用的例子;如果有基于流數據的機器學習方面的需求,還是關注一下的。


二:又一鏈接:blog.csdn.net/jixuan1989/article/details/7880978

Abstract: In the coming era of extremely large databases, computer science will face new challenges in real Big Data applications such as nation-wide M2M sensor network analysis, online advertising optimization for millions of consumers, and real-time security monitoring on the raw Internet traffic. In such applications, it is impractical or useless to apply ordinary approaches for data analysis on small datasets by storing all data into databases, analyzing the data on the databases as a batch-processing, and only visualizing the summarized output. In fact, the future of data analytics platform should expand to three directions at the same time, handling even bigger data, applying deep analytics, and processing in real-time. However, there has been no such analytics platform for massive data streams of continuously generated Big Data with a distributed scale-out architecture. For example, Hadoop is not equipped with sophisticated machine learning algorithms since most of the algorithms do not fit its MapReduce paradigm. Though Apache Mahout is also a Hadoop-based machine learning platform, online processing of data streams is still out of the scope.
在即將到來的超大規模數據庫的時代,計算機科學將在實時大數據應用上面臨新的挑戰,比如全國M2M傳感器網絡分析,面向百萬級別用戶的在線廣告優化,和互聯網流量的實時安全監控。在這些應用中,使用傳統的用來處理小數據集的方式進行數據分析是不切合實際的,這種傳統方式往往把所有數據存在數據庫中、使用一個批處理在數據庫中分析數據、并且僅僅可視化輸出概要數據。事實上,未來的數據分析平臺應該同時向三個方向展開:處理更大的數據、深層的分析、實時處理。然而,在分布式水平擴展架構上還沒有這樣的分析平臺來處理不斷生成大數據的數據流。比如說,由于大多數算法無法適應Hadoop 的Map/Reduce框架,因此 Hadoop 不能做復雜的機器學習算法。盡管Apache Mahout 也是一個基于Hadoop的機器學習平臺,但在線處理數據流仍然超出了他的能力范圍。
Jubatus is the first open source platform for online distributed machine learning on the data streams of Big Data. We use a loose model sharing architecture for efficient training and sharing of machine learning models, by defining three fundamental operations; Update, Mix, and Analyze, in a similar way with the Map and Reduce operations in Hadoop. The point is how to reduce the size of model and the number of the Mix operations while keeping high accuracy, since Mix-ing large models for many times causes high networking cost and high latency in the distributed environment. Then our development team includes competent researchers who combine the latest advances in online machine learning, distributed computing, and randomized algorithms to provide efficient machine learning features for Jubatus. Currently, Jubatus supports basic tasks including classification, regression, and recommendation. A demo system for tweet categorization on fast Twitter data streams is available.

Jubatus是第一個面向大數據數據流的分布式在線機器學習的開源平臺。我們使用一個松散的模型通過定義了三種基本操作來共享有效訓練的架構 并且共享機器學習模型,這三種方式做事:更新、混合、分析,這是一種和Hadoop中的Map 、Reduce操作類似的方式。關鍵點是如何在保持高精準度的同時來減小模型的大小和混合操作的數量,因為多次混合大模型將導致在分布式環境下的高網絡消耗和高潛伏期。我們的開發團隊中有這樣的研究者:他們結合了在在線機器學習、分布式計算和隨機算法中的最新的優點以提供Jubatus高效的機器學習特點。目前,Jubatus支持基本的任務,包括分類、回歸和推薦。一個在Twitter的數據上的信息分類演示系統已經可用了。

三:項目主頁:jubat.us/en/

Jubatus is a distributed processing framework and streaming machine learning library. Jubatus includes these functionalities:

  • Online Machine Learning Library: Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering
  • Feature Vector Converter (fv_converter): Data Preprocess and Feature Extraction
  • Framework for Distributed Online Machine Learning with Fault Tolerance

Table of Contents

  • Quick Start
    • Install Jubatus
      • Red Hat Enterprise Linux 6.2 or later (64-bit)
      • Ubuntu Server 12.04 LTS (64-bit)
      • Other Linux Distributions (including 32-bit)
      • Mac OS X
    • Install Jubatus Client Libraries
      • C++
      • Python
      • Ruby
      • Java
    • Try Tutorial
    • Write Your Application
  • Overview
    • Scalable
    • Real-Time
    • Deep-Analysis
    • Difference from Hadoop and Mahout
  • Tutorial
    • Scenario
    • Run Tutorial
    • Tutorial in Detail
      • Dataset
      • Server Configuration
      • Use of Classifier API: Train & Classify
    • Other Tutorials
      • Classifier
      • Regression
      • Graph
      • Stat
  • Setup in Distributed Mode
    • Distributed Mode
      • Setup ZooKeeper
      • Register configuration file to ZooKeeper
      • Jubatus Proxy
      • Join Jubatus Servers to Cluster
      • Run Tutorial
    • Cluster Management in Jubatus
      • ZooKeepers & Jubatus Proxies
      • Jubavisor: Process Management Agent
  • Documentation
    • Architecture
    • Data Conversion
      • Datum
      • Flow of Data Conversion
      • Filter
      • Feature Extraction from Strings
      • Feature Extraction from Numbers
      • Feature Extraction from Binary Data
      • Hashing Key of Feature Vector
      • Plugins
    • Plugin Development
      • Plugin for Data Conversion
    • Cluster Administration Guide
      • Recommended Process Configuration
      • Managing Clusters
      • Monitoring
      • Logging
      • Save and Load
    • Building Jubatus from Source
      • Requirements
    • Using Framework
      • Using Code Generators
      • How to Get Clients
    • RPC Error Handling
      • Common Issues
      • Recommendation for each client languages
    • Backup and Recovery
      • Save and Load
    • Frequently Asked Questions (FAQs)
      • Installation
      • RPC Errors
      • Distributed Environment
      • Learning Model
      • Anomaly detection
      • Miscellaneous
  • References
    • Commands
      • Jubatus Servers
      • Distributed Environment
      • Utilities
    • Client API
      • Common Data Structures and Methods
      • Classifier
      • Regression
      • Recommender
      • Nearest Neighbor
      • Anomaly
      • Clustering
      • Stat
      • Graph
  • How to Contribute
    • We Welcome Your Contribution
    • Join the Community
    • Issue Openning Policy
    • Pull-Request Policy
    • Tips for Contributors
  • Miscellaneous
    • Publications
      • 2013
      • 2012
      • 2011
    • Contributions (Thanks a lot!)
  • About Us
    • Jubatus Team Members
待翻譯........................

總結

以上是生活随笔為你收集整理的OnLineML一:关于Jubatus 的简介...的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。