當前位置：首頁 > 编程语言 > python >内容正文

python

Python之 sklearn：sklearn中的train_test_split函数的简介及使用方法之详细攻略

發布時間：2025/3/21 python 48 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python之 sklearn：sklearn中的train_test_split函数的简介及使用方法之详细攻略小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Python之 sklearn：sklearn中的train_test_split函數的簡介及使用方法之詳細攻略

sklearn中的train_test_split函數的簡介

train_test_split使用方法

1、基礎用法

sklearn中的train_test_split函數的簡介

官方文檔：https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html?highlight=train_test_split#sklearn.model_selection.train_test_split

sklearn.model_selection.train_test_split(*arrays, **options)[source]
Split arrays or matrices into random train and test subsets
Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner.

sklearn.model_selection.train_test_split(*數組,* *選項)[源]
將數組或矩陣分割成隨機的序列和測試子集
包裝輸入驗證和next的快速實用程序(ShuffleSplit())。拆分(X, y))和應用程序將數據輸入到單個調用中，以便在oneliner中拆分(和可選的子采樣)數據。

Parameters
*arrays：sequence of indexables with same length / shape[0]
Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.

test_size：float or int, default=None
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

train_size：float or int, default=None
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

random_state：int or RandomState instance, default=None
Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls. See Glossary.

shuffle：bool, default=True
Whether or not to shuffle the data before splitting. If shuffle=False then stratify must be None.

stratify：array-like, default=None
If not None, data is split in a stratified fashion, using this as the class labels.

參數
*arrays：相同長度/形狀的索引表的序列
允許的輸入是列表、numpy數組、scipy稀疏矩陣或panda數據矩陣。

test_size：float或int，默認=無
如果是浮動的，則應該在0.0和1.0之間，并表示要包含在測試分割中的數據集的比例。如果int，表示測試樣本的絕對數量。如果沒有，則將該值設置為列車大小的補充。如果train_size也是None，那么它將被設置為0.25。

train_size：float或int，默認為無
如果是浮點數，則應該在0.0和1.0之間，并表示要包含在分割序列中的數據集的比例。如果int，表示列車樣本的絕對數量。如果沒有，該值將自動設置為測試大小的補充。

random_state：int或RandomState實例，默認為None
控制在應用分割之前應用于數據的變換。在多個函數調用之間傳遞可重復輸出的int。看到術語表。

shuffle：bool,默認= True
是否在拆分前打亂數據。如果shuffle=False，則層必須為None。

stratify：array-like默認=沒有
如果沒有，則以分層的方式分割數據，并將其用作類標簽。

Returns
splitting：list, length=2 * len(arrays)
List containing train-test split of inputs.

New in version 0.16: If the input is sparse, the output will be a scipy.sparse.csr_matrix. Else, output type is the same as the input type.

返回
splitting：list, length=2 *
列表包含訓練測試的輸入分割。

版本0.16中的新內容:如果輸入是稀疏的，則輸出將是scipy.sparse.csr_matrix.。否則，輸出類型與輸入類型相同。

train_test_split使用方法

1、基礎用法

>>> import numpy as np >>> from sklearn.model_selection import train_test_split >>> X, y = np.arange(10).reshape((5, 2)), range(5) >>> X array([[0, 1],[2, 3],[4, 5],[6, 7],[8, 9]]) >>> list(y) [0, 1, 2, 3, 4] >>> >>> X_train, X_test, y_train, y_test = train_test_split( ... X, y, test_size=0.33, random_state=42) ... >>> X_train array([[4, 5],[0, 1],[6, 7]]) >>> y_train [2, 0, 3] >>> X_test array([[2, 3],[8, 9]]) >>> y_test [1, 4] >>> >>> train_test_split(y, shuffle=False) [[0, 1, 2], [3, 4]]

總結

以上是生活随笔為你收集整理的Python之 sklearn：sklearn中的train_test_split函数的简介及使用方法之详细攻略的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：成功解决Eclipse打开py等文件出现
下一篇：成功解决ValueError: With