tensorflow综合示例3:对结构化数据进行分类:csv keras feature_column
文章目錄
- 1、數(shù)據(jù)集
- 1.1 使用 Pandas 從csv創(chuàng)建一個(gè) dataframe
- 1.2 將 dataframe 拆分為訓(xùn)練、驗(yàn)證和測(cè)試集
- 1.3 用 tf.data 創(chuàng)建輸入流水線Dataset
- 1.4 理解輸入流水線
- 2、特征列 feature_column
- 2.1 數(shù)值列
- 2.2 分桶列
- 2.3 分類列
- 2.4 嵌入列
- 2.5 經(jīng)過(guò)哈希處理的特征列
- 2.6 組合的特征列
- 2.7 選擇要使用的列
- 3、構(gòu)建&運(yùn)行模型
- 3.1 建立一個(gè)新的特征層
- 3.2 創(chuàng)建,編譯和訓(xùn)練模型
- 4、完整代碼
- 5、另一個(gè)簡(jiǎn)單例子
本文主要內(nèi)容來(lái)自:https://www.tensorflow.org/tutorials/structured_data/feature_columns?hl=zh-cn
本教程演示了如何對(duì)結(jié)構(gòu)化數(shù)據(jù)進(jìn)行分類(例如,CSV 中的表格數(shù)據(jù))。我們將使用 Keras 來(lái)定義模型,將特征列(feature columns) 作為從 CSV 中的列(columns)映射到用于訓(xùn)練模型的特征(features)的橋梁。本教程包括了以下內(nèi)容的完整代碼:
用 Pandas 導(dǎo)入 CSV 文件。
用 tf.data 建立了一個(gè)輸入流水線(pipeline),用于對(duì)行進(jìn)行分批(batch)和隨機(jī)排序(shuffle)。
用特征列將 CSV 中的列映射到用于訓(xùn)練模型的特征。
用 Keras 構(gòu)建,訓(xùn)練并評(píng)估模型。
1、數(shù)據(jù)集
我們將使用一個(gè)小型 數(shù)據(jù)集,該數(shù)據(jù)集由克利夫蘭心臟病診所基金會(huì)(Cleveland Clinic Foundation for Heart Disease)提供。CSV 中有幾百行數(shù)據(jù)。每行描述了一個(gè)病人(patient),每列描述了一個(gè)屬性(attribute)。我們將使用這些信息來(lái)預(yù)測(cè)一位病人是否患有心臟病,這是在該數(shù)據(jù)集上的二分類任務(wù)。
import numpy as np import pandas as pdimport tensorflow as tffrom tensorflow import feature_column from tensorflow.keras import layers from sklearn.model_selection import train_test_split1.1 使用 Pandas 從csv創(chuàng)建一個(gè) dataframe
Pandas 是一個(gè) Python 庫(kù),它有許多有用的實(shí)用程序,用于加載和處理結(jié)構(gòu)化數(shù)據(jù)。我們將使用 Pandas 從 URL下載數(shù)據(jù)集,并將其加載到 dataframe 中。
URL = 'https://storage.googleapis.com/applied-dl/heart.csv' dataframe = pd.read_csv(URL) dataframe.head()| 63 | 1 | 1 | 145 | 233 | 1 | 2 | 150 | 0 | 2.3 | 3 | 0 | fixed | 0 |
| 67 | 1 | 4 | 160 | 286 | 0 | 2 | 108 | 1 | 1.5 | 2 | 3 | normal | 1 |
| 67 | 1 | 4 | 120 | 229 | 0 | 2 | 129 | 1 | 2.6 | 2 | 2 | reversible | 0 |
| 37 | 1 | 3 | 130 | 250 | 0 | 0 | 187 | 0 | 3.5 | 3 | 0 | normal | 0 |
| 41 | 0 | 2 | 130 | 204 | 0 | 2 | 172 | 0 | 1.4 | 1 | 0 | normal | 0 |
1.2 將 dataframe 拆分為訓(xùn)練、驗(yàn)證和測(cè)試集
我們下載的數(shù)據(jù)集是一個(gè) CSV 文件。 我們將其拆分為訓(xùn)練、驗(yàn)證和測(cè)試集。
train, test = train_test_split(dataframe, test_size=0.2) train, val = train_test_split(train, test_size=0.2) print(len(train), 'train examples') print(len(val), 'validation examples') print(len(test), 'test examples') 193 train examples 49 validation examples 61 test examples1.3 用 tf.data 創(chuàng)建輸入流水線Dataset
接下來(lái),我們將使用 tf.data 包裝 dataframe。這讓我們能將特征列作為一座橋梁,該橋梁將 Pandas dataframe 中的列映射到用于訓(xùn)練模型的特征。如果我們使用一個(gè)非常大的 CSV 文件(非常大以至于它不能放入內(nèi)存),我們將使用 tf.data 直接從磁盤讀取它。本教程不涉及這一點(diǎn)。
# 一種從 Pandas Dataframe 創(chuàng)建 tf.data 數(shù)據(jù)集的實(shí)用程序方法(utility method) def df_to_dataset(dataframe, shuffle=True, batch_size=32):dataframe = dataframe.copy()labels = dataframe.pop('target')ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))if shuffle:ds = ds.shuffle(buffer_size=len(dataframe))ds = ds.batch(batch_size)return ds batch_size = 5 # 小批量大小用于演示 train_ds = df_to_dataset(train, batch_size=batch_size) val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size) test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)1.4 理解輸入流水線
現(xiàn)在我們已經(jīng)創(chuàng)建了輸入流水線,讓我們調(diào)用它來(lái)查看它返回的數(shù)據(jù)的格式。 我們使用了一小批量大小來(lái)保持輸出的可讀性。
for feature_batch, label_batch in train_ds.take(1):print('Every feature:', list(feature_batch.keys()))print('A batch of ages:', feature_batch['age'])print('A batch of targets:', label_batch ) Every feature: ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal'] A batch of ages: tf.Tensor([51 56 42 54 46], shape=(5,), dtype=int64) A batch of targets: tf.Tensor([0 0 0 1 0], shape=(5,), dtype=int64)我們可以看到數(shù)據(jù)集返回了一個(gè)字典,該字典從列名稱(來(lái)自 dataframe)映射到 dataframe 中行的列值。
2、特征列 feature_column
TensorFlow 提供了多種特征列。本節(jié)中,我們將創(chuàng)建幾類特征列,并演示特征列如何轉(zhuǎn)換 dataframe 中的列。
# 我們將使用該批數(shù)據(jù)演示幾種特征列 example_batch = next(iter(train_ds))[0] # 用于創(chuàng)建一個(gè)特征列 # 并轉(zhuǎn)換一批次數(shù)據(jù)的一個(gè)實(shí)用程序方法 def demo(feature_column):feature_layer = layers.DenseFeatures(feature_column)print(feature_layer(example_batch).numpy())2.1 數(shù)值列
一個(gè)特征列的輸出將成為模型的輸入(使用上面定義的 demo 函數(shù),我們將能準(zhǔn)確地看到 dataframe 中的每列的轉(zhuǎn)換方式)。 數(shù)值列(numeric column) 是最簡(jiǎn)單的列類型。它用于表示實(shí)數(shù)特征。使用此列時(shí),模型將從 dataframe 中接收未更改的列值。
age = feature_column.numeric_column("age") demo(age) WARNING:tensorflow:Layer dense_features is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because its dtype defaults to floatx.If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.[[62.][52.][40.][59.][56.]]在這個(gè)心臟病數(shù)據(jù)集中,dataframe 中的大多數(shù)列都是數(shù)值列。
2.2 分桶列
通常,您不希望將數(shù)字直接輸入模型,而是根據(jù)數(shù)值范圍將其值分成不同的類別。考慮代表一個(gè)人年齡的原始數(shù)據(jù)。我們可以用 分桶列(bucketized column)將年齡分成幾個(gè)分桶(buckets),而不是將年齡表示成數(shù)值列。請(qǐng)注意下面的 one-hot 數(shù)值表示每行匹配的年齡范圍。
age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65]) demo(age_buckets) WARNING:tensorflow:Layer dense_features_1 is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because its dtype defaults to floatx.If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.[[0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.][0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.][0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.][0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.][0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]]2.3 分類列
在此數(shù)據(jù)集中,thal 用字符串表示(如 ‘fixed’,‘normal’,或 ‘reversible’)。我們無(wú)法直接將字符串提供給模型。相反,我們必須首先將它們映射到數(shù)值。分類詞匯列(categorical vocabulary columns)提供了一種用 one-hot 向量表示字符串的方法(就像您在上面看到的年齡分桶一樣)。詞匯表可以用 categorical_column_with_vocabulary_list 作為 list 傳遞,或者用 categorical_column_with_vocabulary_file 從文件中加載。
thal = feature_column.categorical_column_with_vocabulary_list('thal', ['fixed', 'normal', 'reversible'])thal_one_hot = feature_column.indicator_column(thal) demo(thal_one_hot) WARNING:tensorflow:Layer dense_features_2 is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because its dtype defaults to floatx.If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.[[0. 1. 0.][0. 0. 1.][0. 0. 1.][0. 0. 1.][0. 1. 0.]]在更復(fù)雜的數(shù)據(jù)集中,許多列都是分類列(如 strings)。在處理分類數(shù)據(jù)時(shí),特征列最有價(jià)值。盡管在該數(shù)據(jù)集中只有一列分類列,但我們將使用它來(lái)演示在處理其他數(shù)據(jù)集時(shí),可以使用的幾種重要的特征列。
2.4 嵌入列
假設(shè)我們不是只有幾個(gè)可能的字符串,而是每個(gè)類別有數(shù)千(或更多)值。 由于多種原因,隨著類別數(shù)量的增加,使用 one-hot 編碼訓(xùn)練神經(jīng)網(wǎng)絡(luò)變得不可行。我們可以使用嵌入列來(lái)克服此限制。嵌入列(embedding column)將數(shù)據(jù)表示為一個(gè)低維度密集向量,而非多維的 one-hot 向量,該低維度密集向量可以包含任何數(shù),而不僅僅是 0 或 1。嵌入的大小(在下面的示例中為 8)是必須調(diào)整的參數(shù)。
關(guān)鍵點(diǎn):當(dāng)分類列具有許多可能的值時(shí),最好使用嵌入列。我們?cè)谶@里使用嵌入列用于演示目的,為此您有一個(gè)完整的示例,以在將來(lái)可以修改用于其他數(shù)據(jù)集。
# 注意到嵌入列的輸入是我們之前創(chuàng)建的類別列 thal_embedding = feature_column.embedding_column(thal, dimension=8) demo(thal_embedding) WARNING:tensorflow:Layer dense_features_3 is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because its dtype defaults to floatx.If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.[[-0.16302079 -0.19813393 -0.11037839 -0.2307198 0.30720705 -0.57019540.0502194 -0.34920064][ 0.4270712 -0.278063 0.23978122 -0.07503474 0.10773634 -0.06057737-0.6062939 0.19062711][ 0.4270712 -0.278063 0.23978122 -0.07503474 0.10773634 -0.06057737-0.6062939 0.19062711][ 0.4270712 -0.278063 0.23978122 -0.07503474 0.10773634 -0.06057737-0.6062939 0.19062711][-0.16302079 -0.19813393 -0.11037839 -0.2307198 0.30720705 -0.57019540.0502194 -0.34920064]]2.5 經(jīng)過(guò)哈希處理的特征列
表示具有大量數(shù)值的分類列的另一種方法是使用 categorical_column_with_hash_bucket。該特征列計(jì)算輸入的一個(gè)哈希值,然后選擇一個(gè) hash_bucket_size 分桶來(lái)編碼字符串。使用此列時(shí),您不需要提供詞匯表,并且可以選擇使 hash_buckets 的數(shù)量遠(yuǎn)遠(yuǎn)小于實(shí)際類別的數(shù)量以節(jié)省空間。
關(guān)鍵點(diǎn):該技術(shù)的一個(gè)重要缺點(diǎn)是可能存在沖突,不同的字符串被映射到同一個(gè)范圍。實(shí)際上,無(wú)論如何,經(jīng)過(guò)哈希處理的特征列對(duì)某些數(shù)據(jù)集都有效。
thal_hashed = feature_column.categorical_column_with_hash_bucket('thal', hash_bucket_size=1000) demo(feature_column.indicator_column(thal_hashed)) WARNING:tensorflow:Layer dense_features_4 is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because its dtype defaults to floatx.If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.[[0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.]]2.6 組合的特征列
將多種特征組合到一個(gè)特征中,稱為特征組合(feature crosses),它讓模型能夠?yàn)槊糠N特征組合學(xué)習(xí)單獨(dú)的權(quán)重。此處,我們將創(chuàng)建一個(gè) age 和 thal 組合的新特征。請(qǐng)注意,crossed_column 不會(huì)構(gòu)建所有可能組合的完整列表(可能非常大)。相反,它由 hashed_column 支持,因此您可以選擇表的大小。
crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000) demo(feature_column.indicator_column(crossed_feature)) WARNING:tensorflow:Layer dense_features_5 is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because its dtype defaults to floatx.If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.[[0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.][0. 0. 0. ... 0. 0. 0.]]2.7 選擇要使用的列
我們已經(jīng)了解了如何使用幾種類型的特征列。 現(xiàn)在我們將使用它們來(lái)訓(xùn)練模型。本教程的目標(biāo)是向您展示使用特征列所需的完整代碼(例如,機(jī)制)。我們?nèi)我獾剡x擇了幾列來(lái)訓(xùn)練我們的模型。
關(guān)鍵點(diǎn):如果您的目標(biāo)是建立一個(gè)準(zhǔn)確的模型,請(qǐng)嘗試使用您自己的更大的數(shù)據(jù)集,并仔細(xì)考慮哪些特征最有意義,以及如何表示它們。
feature_columns = []# 數(shù)值列 for header in ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope', 'ca']:feature_columns.append(feature_column.numeric_column(header))# 分桶列 age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65]) feature_columns.append(age_buckets)# 分類列 thal = feature_column.categorical_column_with_vocabulary_list('thal', ['fixed', 'normal', 'reversible']) thal_one_hot = feature_column.indicator_column(thal) feature_columns.append(thal_one_hot)# 嵌入列 thal_embedding = feature_column.embedding_column(thal, dimension=8) feature_columns.append(thal_embedding)# 組合列 crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000) crossed_feature = feature_column.indicator_column(crossed_feature) feature_columns.append(crossed_feature)3、構(gòu)建&運(yùn)行模型
3.1 建立一個(gè)新的特征層
現(xiàn)在我們已經(jīng)定義了我們的特征列,我們將使用密集特征(DenseFeatures)層將特征列輸入到我們的 Keras 模型中。
feature_layer = tf.keras.layers.DenseFeatures(feature_columns)之前,我們使用一個(gè)小批量大小來(lái)演示特征列如何運(yùn)轉(zhuǎn)。我們將創(chuàng)建一個(gè)新的更大批量的輸入流水線。
batch_size = 32 train_ds = df_to_dataset(train, batch_size=batch_size) val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size) test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)3.2 創(chuàng)建,編譯和訓(xùn)練模型
model = tf.keras.Sequential([feature_layer,layers.Dense(128, activation='relu'),layers.Dense(128, activation='relu'),layers.Dense(1, activation='sigmoid') ])model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'],run_eagerly=True)model.fit(train_ds,validation_data=val_ds,epochs=5) 7/7 [==============================] - 0s 42ms/step - loss: 0.5361 - accuracy: 0.7254 - val_loss: 0.7132 - val_accuracy: 0.5102<tensorflow.python.keras.callbacks.History at 0x7ffbf4973410>關(guān)鍵點(diǎn):通常使用更大更復(fù)雜的數(shù)據(jù)集進(jìn)行深度學(xué)習(xí),您將看到最佳結(jié)果。使用像這樣的小數(shù)據(jù)集時(shí),我們建議使用決策樹或隨機(jī)森林作為強(qiáng)有力的基準(zhǔn)。本教程的目的不是訓(xùn)練一個(gè)準(zhǔn)確的模型,而是演示處理結(jié)構(gòu)化數(shù)據(jù)的機(jī)制,這樣,在將來(lái)使用自己的數(shù)據(jù)集時(shí),您有可以使用的代碼作為起點(diǎn)。
下一步
了解有關(guān)分類結(jié)構(gòu)化數(shù)據(jù)的更多信息的最佳方法是親自嘗試。我們建議尋找另一個(gè)可以使用的數(shù)據(jù)集,并使用和上面相似的代碼,訓(xùn)練一個(gè)模型,對(duì)其分類。要提高準(zhǔn)確率,請(qǐng)仔細(xì)考慮模型中包含哪些特征,以及如何表示這些特征。
4、完整代碼
# -*- coding: utf-8 -*-"""AUTHOR: lujinhongCREATED ON: 2020年08月28日 11:53PROJECT: lujinhong-commons-python3 DESCRIPTION: TODO """ import ssl ssl._create_default_https_context = ssl._create_unverified_context import numpy as np import pandas as pd import tensorflow as tf from tensorflow import feature_column from tensorflow.keras import layers from sklearn.model_selection import train_test_split## 1、數(shù)據(jù)集 URL = 'https://storage.googleapis.com/applied-dl/heart.csv' dataframe = pd.read_csv(URL) dataframe.head()train, test = train_test_split(dataframe, test_size=0.2) train, val = train_test_split(train, test_size=0.2)# 一種從 Pandas Dataframe 創(chuàng)建 tf.data 數(shù)據(jù)集的實(shí)用程序方法(utility method) def df_to_dataset(dataframe, shuffle=True, batch_size=32):dataframe = dataframe.copy()labels = dataframe.pop('target')ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))if shuffle:ds = ds.shuffle(buffer_size=len(dataframe))ds = ds.batch(batch_size)return dsbatch_size = 32 train_ds = df_to_dataset(train, batch_size=batch_size) val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size) test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)## 2、構(gòu)建feature_columns feature_columns = []# 數(shù)值列 for header in ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope', 'ca']:feature_columns.append(feature_column.numeric_column(header))# 分桶列 age = feature_column.numeric_column("age") age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65]) feature_columns.append(age_buckets)# 分類列 thal = feature_column.categorical_column_with_vocabulary_list('thal', ['fixed', 'normal', 'reversible']) thal_one_hot = feature_column.indicator_column(thal) feature_columns.append(thal_one_hot)# 嵌入列 thal_embedding = feature_column.embedding_column(thal, dimension=8) feature_columns.append(thal_embedding)# 組合列 crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000) crossed_feature = feature_column.indicator_column(crossed_feature) feature_columns.append(crossed_feature)## 3、構(gòu)建并運(yùn)行模型 feature_layer = tf.keras.layers.DenseFeatures(feature_columns)model = tf.keras.Sequential([feature_layer,layers.Dense(128, activation='relu'),layers.Dense(128, activation='relu'),layers.Dense(1, activation='sigmoid') ])model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'],run_eagerly=True)model.fit(train_ds,validation_data=val_ds,epochs=5) Epoch 1/5Consider rewriting this model with the Functional API. 7/7 [==============================] - 0s 42ms/step - loss: 0.5361 - accuracy: 0.7254 - val_loss: 0.7132 - val_accuracy: 0.5102<tensorflow.python.keras.callbacks.History at 0x7ffbf4973410>5、另一個(gè)簡(jiǎn)單例子
# -*- coding: utf-8 -*-"""AUTHOR: lujinhongCREATED ON: 2020年08月28日 10:26PROJECT: lujinhong-commons-python3 DESCRIPTION: TODO """import tensorflow as tf import pandas as pd print(tf.__version__) import ssl ssl._create_default_https_context = ssl._create_unverified_context## 1、準(zhǔn)備數(shù)據(jù)集 df_train = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv') df_eval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv') y_train = df_train.pop('survived') y_eval = df_eval.pop('survived')ds_train = tf.data.Dataset.from_tensor_slices((dict(df_train),y_train)).batch(2) ds_eval = tf.data.Dataset.from_tensor_slices((dict(df_eval),y_eval))## 2、構(gòu)建feature_column CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck','embark_town', 'alone'] NUMERIC_COLUMNS = ['age', 'fare']feature_columns = [] #對(duì)類別特征做one-hot,還可以用embeding_column做embedding。 for feature_name in CATEGORICAL_COLUMNS:vocabulary = df_train[feature_name].unique()feature_columns.append(tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_vocabulary_list(feature_name,vocabulary)))for feature_name in NUMERIC_COLUMNS:feature_columns.append(tf.feature_column.numeric_column(feature_name,dtype=tf.float32))#除上述特征外,還可以做組合特征。## 3、構(gòu)建并運(yùn)行模型 feature_layer = tf.keras.layers.DenseFeatures(feature_columns)model = tf.keras.Sequential([feature_layer,# tf.keras.layers.Dense(128, activation='relu'),# tf.keras.layers.Dense(128, activation='relu'),tf.keras.layers.Dense(1, activation='sigmoid') ]) model.compile(loss = 'binary_crossentropy', optimizer='sgd',metrics=['accuracy'])model.fit(ds_train) 2.3.0 WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'sex': <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=string>, 'age': <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float64>, 'n_siblings_spouses': <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=int64>, 'parch': <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int64>, 'fare': <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float64>, 'class': <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>, 'deck': <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=string>, 'embark_town': <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=string>, 'alone': <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=string>} Consider rewriting this model with the Functional API. WARNING:tensorflow:Layer dense_features_8 is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because its dtype defaults to floatx.If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'sex': <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=string>, 'age': <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float64>, 'n_siblings_spouses': <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=int64>, 'parch': <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int64>, 'fare': <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float64>, 'class': <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>, 'deck': <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=string>, 'embark_town': <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=string>, 'alone': <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=string>} Consider rewriting this model with the Functional API. 314/314 [==============================] - 0s 1ms/step - loss: 6.1425 - accuracy: 0.5965<tensorflow.python.keras.callbacks.History at 0x7ffbf4d03390>總結(jié)
以上是生活随笔為你收集整理的tensorflow综合示例3:对结构化数据进行分类:csv keras feature_column的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: tensorflow系列之1:加载数据
- 下一篇: tensorflow综合示例4:逻辑回归