當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

tensorflow综合示例4：逻辑回归：使用Estimator

發布時間：2024/1/23 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 tensorflow综合示例4：逻辑回归：使用Estimator 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

- 1、加載csv格式的數據集并生成Dataset
- - 1.1 pandas讀取csv數據生成Dataframe
  - 1.2 將Dataframe生成Dataset
- 2、將數據封裝成Feature columnn
- 3、構建并訓練模型
- 4、構建組合特征
- 5、預測數據

本部分使用Estimator的方式實現邏輯回歸。

（1）使用csv格式的泰坦尼克號數據作為數據集。

import tensorflow as tf import pandas as pd from IPython.display import clear_output print(tf.__version__) print(pd.__version__) 2.3.0 1.0.1

1、加載csv格式的數據集并生成Dataset

1.1 pandas讀取csv數據生成Dataframe

dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv') dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv') y_train = dftrain.pop('survived') y_eval = dfeval.pop('survived')

我們看一下數據：

dftrain.head() sexagen_siblings_spousesparchfareclassdeckembark_townalone01234

male	22.0	1	7.2500	Third	unknown	Southampton	n
female	38.0	1	71.2833	First	C	Cherbourg	n
female	26.0	0	7.9250	Third	unknown	Southampton	y
female	35.0	1	53.1000	First	C	Southampton	n
male	28.0	0	8.4583	Third	unknown	Queenstown	y

dftrain.describe() agen_siblings_spousesparchfarecountmeanstdmin25%50%75%max

627.000000	627.000000	627.000000	627.000000
29.631308	0.545455	0.379585	34.385399
12.511818	1.151090	0.792999	54.597730
0.750000	0.000000	0.000000	0.000000
23.000000	0.000000	0.000000	7.895800
28.000000	0.000000	0.000000	15.045800
35.000000	1.000000	0.000000	31.387500
80.000000	8.000000	5.000000	512.329200

訓練數據和驗證數據的數量分別為：

dftrain.shape,dfeval.shape ((627, 9), (264, 9))

看看年齡的分布：

#將數字劃分成bins份，統計每份的數量，并作直方圖 dftrain.age.hist(bins=20) #dftain.age等同于dftrain['age']，pandas會為每列數據生成一個同名的變量 <matplotlib.axes._subplots.AxesSubplot at 0x7f83dd224c50>

再看看性別：

dftrain.sex.value_counts() male 410 female 217 Name: sex, dtype: int64 #以圖標方式顯示 dftrain.sex.value_counts().plot(kind='barh') <matplotlib.axes._subplots.AxesSubplot at 0x7f83e18777d0>

#再看看倉位 dftrain['class'].value_counts().plot(kind='barh') <matplotlib.axes._subplots.AxesSubplot at 0x7f83e072a410>

看看男女的survivid比例：

pd.concat([dftrain,y_train],axis=1).groupby('sex').survived.mean() sex female 0.778802 male 0.180488 Name: survived, dtype: float64 pd.concat([dftrain,y_train],axis=1).groupby('sex').survived.mean().plot(kind='barh').set_xlabel('% survived') Text(0.5, 0, '% survived')

1.2 將Dataframe生成Dataset

使用上面的訓練數據生成tf.data.Dataset:

#本行代碼只是示范了如何將Dataframe轉換成Dataset，下面并未用到這個變量。 dataset = tf.data.Dataset.from_tensor_slices((dict(dftrain),y_train)) for ds in dataset:print(ds) def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):def input_function():#從Dataframe中構建Datasetds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))if shuffle:ds = ds.shuffle(1000)ds = ds.batch(batch_size).repeat(num_epochs)return dsreturn input_functiontrain_input_fn = make_input_fn(dftrain, y_train) eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False) ds = make_input_fn(dftrain, y_train, batch_size=10)() for feature_batch, label_batch in ds.take(1):print('Some feature keys:', list(feature_batch.keys()))print()print('A batch of class:', feature_batch['class'].numpy())print()print('A batch of Labels:', label_batch.numpy()) Some feature keys: ['sex', 'age', 'n_siblings_spouses', 'parch', 'fare', 'class', 'deck', 'embark_town', 'alone']A batch of class: [b'Second' b'Second' b'Third' b'Second' b'Third' b'Third' b'Third'b'Third' b'Third' b'Third']A batch of Labels: [0 0 0 1 0 0 0 1 0 0] age_column = feature_columns[7] tf.keras.layers.DenseFeatures([age_column])(feature_batch).numpy() array([[23.],[28.],[32.],[31.],[28.],[ 4.],[28.],[25.],[35.],[28.]], dtype=float32) gender_column = feature_columns[0] tf.keras.layers.DenseFeatures([tf.feature_column.indicator_column(gender_column)])(feature_batch).numpy() array([[1., 0.],[1., 0.],[1., 0.],[1., 0.],[1., 0.],[1., 0.],[1., 0.],[1., 0.],[1., 0.],[1., 0.]], dtype=float32)

2、將數據封裝成Feature columnn

CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck','embark_town', 'alone'] NUMERIC_COLUMNS = ['age', 'fare']feature_columns = [] for feature_name in CATEGORICAL_COLUMNS:vocabulary = dftrain[feature_name].unique()feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name,vocabulary))for feature_name in NUMERIC_COLUMNS:feature_columns.append(tf.feature_column.numeric_column(feature_name,dtype=tf.float32)) feature_columns[5] VocabularyListCategoricalColumn(key='embark_town', vocabulary_list=('Southampton', 'Cherbourg', 'Queenstown', 'unknown'), dtype=tf.string, default_value=-1, num_oov_buckets=0)

3、構建并訓練模型

使用tensorflow預構建的LinearClassfier可以很方便的構建和訓練模型：

linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns) linear_est.train(train_input_fn) result = linear_est.evaluate(eval_input_fn)clear_output() print(result) {'accuracy': 0.75, 'accuracy_baseline': 0.625, 'auc': 0.825375, 'auc_precision_recall': 0.7897542, 'average_loss': 0.5112618, 'label/mean': 0.375, 'loss': 0.50658554, 'precision': 0.6386555, 'prediction/mean': 0.47108686, 'recall': 0.7676768, 'global_step': 200}

4、構建組合特征

上述特征使用的都是單一特征，但有時候特征的組合和結果的相關性更高。比如單從性別年齡都很難看出結果，但某個年齡+性別的組合，可能就和結果的相關性很高了。

age_x_gender = tf.feature_column.crossed_column(['age', 'sex'], hash_bucket_size=100) derived_feature_columns = [age_x_gender] linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns+derived_feature_columns) linear_est.train(train_input_fn) result = linear_est.evaluate(eval_input_fn)clear_output() print(result) {'accuracy': 0.7765151, 'accuracy_baseline': 0.625, 'auc': 0.85026014, 'auc_precision_recall': 0.7782748, 'average_loss': 0.48670053, 'label/mean': 0.375, 'loss': 0.4771559, 'precision': 0.7631579, 'prediction/mean': 0.29976445, 'recall': 0.5858586, 'global_step': 200}

5、預測數據

pred_dicts = list(linear_est.predict(eval_input_fn)) probs = pd.Series([pred['probabilities'][1] for pred in pred_dicts])probs.plot(kind='hist', bins=20, title='predicted probabilities') INFO:tensorflow:Calling model_fn. WARNING:tensorflow:Layer linear/linear_model is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because its dtype defaults to floatx.If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Restoring parameters from /var/folders/vx/w_50bfjj6xn9j5_lqhfrbcv00000gn/T/tmpo5ivmpz_/model.ckpt-200 INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op.<matplotlib.axes._subplots.AxesSubplot at 0x7f83dccac390>

計算ROC：

from sklearn.metrics import roc_curve from matplotlib import pyplot as pltfpr, tpr, _ = roc_curve(y_eval, probs) plt.plot(fpr, tpr) plt.title('ROC curve') plt.xlabel('false positive rate') plt.ylabel('true positive rate') plt.xlim(0,) plt.ylim(0,) (0, 1.05)

總結

以上是生活随笔為你收集整理的tensorflow综合示例4：逻辑回归：使用Estimator的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： tensorflow综合示例3：对结构化
下一篇： tensorflow综合示例5：图象分割