日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 人文社科 > 生活经验 >内容正文

生活经验

!! 机器学习常用工具

發布時間:2023/11/27 生活经验 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 !! 机器学习常用工具 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

http://fuliang.iteye.com/blog/955023

機器學習

Support Vector Machine

  • SVMlight

An implementation of Vapnik's Support Vector Machine

  • LIBSVM

A Library for Support Vector Machines

Decision Tree

  • C4.5

The "classic" decision-tree tool, developed by J. R. Quinlan?Tutorial

Maximum Entropy

  • YASMET

Yet Another Small MaxEnt Toolkit

Conditional Random Field

  • CRF++

A simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data

自然語言處理

綜合

  • OpenNLP

An organizational center for open source projects related to natural language processing

  • CMU Statistical Language Modeling Toolkit

A suite of UNIX software tools to facilitate the construction and testing of statistical language models

  • The Dragon ToolKit

A Java-based development package for academic use in information retrieval (IR) and text mining. Include many NLP tools

  • LingPipe

A suite of Java libraries for the linguistic analysis of human language, including

  • track mentions of entities (e.g. people or proteins);
  • link entity mentions to database entries;
  • uncover relations between entities and actions;
  • classify text passages by language, character encoding, genre, topic, or sentiment;
  • correct spelling with respect to a text collection;
  • cluster documents by implicit topic and discover significant trends over time; and
  • provide part-of-speech tagging and phrase chunking.
  • Natural Language Toolkit

Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics, with distributions for Windows, Mac OSX and Linux.

  • Antelope
  • Advanced Natural Lange Object-oriented Processing Environment.包括一系列工具(特別c#的stanford parser)

分詞

  • ICTCLAS

中科院的中文分詞系統

  • Stanford Chinese Word Segmenter

A Java implementation of a CRF-based Chinese Word Segmenter

詞性標注

  • Brill tagger

A error-driven transformation-based tagger implemented by?Eric Brill

  • Stanford POS Tagger

A Java implementation of the log-linear part-of-speech taggers descriped by Kristina Toutanova, et.al.

  • MBT:Memory-based Tagger
  • TreeTagger

A decision tree based tagger from the University of Stuttgart.

  • SVMTool?, a POS Tagger based on SVMs
  • QTAG Part of speech tagger

An HMM-based Java POS tagger from Birmingham U.

命名實體識別

  • Stanford Named Entity Recognizer

A Java implementation of a Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition

  • LingPipe

Tools include statistical named-entity recognition, a heuristic sentence boundary detector, and a heuristic within-document coreference resolution engine. Java. GPL. By Bob Carpenter, Breck Baldwin and co.

  • YamCha

SVM-based NP-chunker, also usable for POS tagging, NER, etc. C/C++ open source. Won CoNLL 2000 shared task. (Less automatic than a specialized POS tagger for an end user.)

Stemming

  • Porter Stemming

A process for removing the commoner morphological and inflexional endings from words in English byMartin Porter

  • Snowball

A small string processing language designed for creating stemming algorithms for use in Information Retrieval.

句法分析

  • Stanford Parser

Java implementations of probabilistic natural language parsers, both highly optimized PCFG and dependency parsers, and a lexicalized PCFG parser.

  • Berkeley Parser

文本挖掘

摘要

  • Rouge?Rouge在Windows下的配置

其他

加密

  • OpenSSL

包括眾多加密算法,RSA、DES、MD5、SHA等?Win32安裝版

壓縮

  • zlib

A Massively Spiffy Yet Delicately Unobtrusive Compression Library

日志

  • Apache Logging Services

Creates and maintains open-source software related to the logging of application behavior and released at no charge to the public, including

  • log4j?for Java,
  • log4cxx?for C++, and
  • log4net?for MS .Net framework.

注: log4cxx官方版本有內存泄漏問題

Unicode

  • ICU

A mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications

XML

  • Xerces

A validating XML parser, including C and Java edition

多字符串匹配

  • AC in C#?: Aho-Corasick string matching in C#

HTML Parser

  • Html Agility Pack?, an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
  • Majestic-12?, an open source high-performance .NET C# module that was created to parse HTML for links, indexing and other purposes. 速度快,但不生成dom樹

外部聯接

  • An annotated list of resources?by Stanford NLP Group
  • KDnuggets?有一些與KDD相關的軟件等

轉載于:https://www.cnblogs.com/carl2380/archive/2012/08/24/2654681.html

總結

以上是生活随笔為你收集整理的!! 机器学习常用工具的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。