當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

TFRecord tf.train.Feature

發布時間：2023/11/28 生活经验 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 TFRecord tf.train.Feature 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、定義

事先將數據編碼為二進制的TFRecord文件，配合TF自帶的多線程API，讀取效率最高，且跨平臺，適合規范化存儲復雜的數據。上圖為TFRecord的pb格式定義，可發現每個TFRecord由許多Example組成。

Example官方定義：An Example is a mostly-normalized data format for storing data for training and inference.
一個Example代表一個封裝的數據輸入，比如包含一張圖片、圖片的寬高、圖片的label等信息。而每個信息用鍵值對的方式存儲。因此一個Example包含了一個Features(Features 包含多個 feature）。

這種約定好的TFRecord格式，可以應用于所有數據集的制作。

二、Feature
官方定義：

// A Feature contains Lists which may hold zero or more values. These
// lists are the base values BytesList, FloatList, Int64List.
//
// Features are organized into categories by name. The Features message
// contains the mapping from name to Feature.、

eatures是Feature的字典合集，key為String，而value為tf.train.Feature()，value必須符合特定的三種格式之一：字符串（BytesList）、實數列表（FloatList）或者整數列表（Int64List）。

tf.train.Feature(**options) 
options可以選擇如下三種數據格式：
bytes_list = tf.train.BytesList(value = 輸入)#輸入的元素的數據類型為string
int64_list = tf.train.Int64List(value = 輸入)#輸入的元素的數據類型為int(int32,int64)
float_list = tf.trian.FloatList(value = 輸入)#輸入的元素的數據類型為float(float32,float64)
注：value必須是list(向量)

原始數據為矩陣或張量（比如圖片格式）不管哪種方式存儲都會使數據丟失形狀信息，所以在向該樣本中寫入feature時應該額外加入shape信息作為額外feature。shape信息是int類型，建議采用原feature名字+’_shape’來指定shape信息的feature名。這樣讀取操作可獲取到shape信息進行還原。

以下是兩種存儲矩陣的方式，都需要額外存儲shape信息以便還原：（第二種更方便）

將矩陣或張量fatten成list(向量)，再根據元素的數據類型選擇使用哪個數據格式存儲。
將矩陣或張量用.tostring()轉換成string類型，再用tf.train.Feature(bytes_list=tf.train.BytesList(value=[input.tostring()]))來存儲。

# 定義函數轉化變量類型。
def _int64_feature(value):return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))def _bytes_feature(value):return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))# 將每一個數據轉化為tf.train.Example格式。
def _make_example(pixels, label, image):image_raw = image.tostring()  # np.array ---> String byteexample = tf.train.Example(features=tf.train.Features(feature={'pixels': _int64_feature(pixels),'label': _int64_feature(np.argmax(label)),'image_raw': _bytes_feature(image_raw)}))return example

三、完整的持久化mnist數據為TFRecord

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np# 定義函數轉化變量類型。
def _int64_feature(value):return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))def _bytes_feature(value):return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))# 將數據轉化為tf.train.Example格式。
def _make_example(pixels, label, image):image_raw = image.tostring()example = tf.train.Example(features=tf.train.Features(feature={'pixels': _int64_feature(pixels),'label': _int64_feature(np.argmax(label)),'image_raw': _bytes_feature(image_raw)}))return exampledef save_tfrecords():# 讀取mnist訓練數據。mnist = input_data.read_data_sets("../../datasets/MNIST_data",dtype=tf.uint8, one_hot=True)images = mnist.train.images  # (55000, 784) <class 'numpy.ndarray'>labels = mnist.train.labels  # (55000, 10) <class 'numpy.ndarray'>pixels = images.shape[1]  # 784 = 28 * 28num_examples = mnist.train.num_examples# 輸出包含訓練數據的TFRecord文件。with tf.python_io.TFRecordWriter("output.tfrecords") as writer:for index in range(num_examples):# 生成一個Example并序列化后寫入pbexample = _make_example(pixels, labels[index], images[index])writer.write(example.SerializeToString())print("TFRecord訓練文件已保存。")

四、讀取解析TFRecord
讀取解析的步驟中，需要根據編碼時候的定義，來指定解碼時候的規則和還原的dtype，如image需要指定tf.string格式，之后再去解析成uint8。注意，這里的parse等op操作都是在graph中定義一些運算op，并沒有運行。sess.run()的時候才會真正多線程開始讀取解析。這種讀取二進制了流文件的速度，多線程加持下遠遠超過讀取硬盤中的原生圖片。

def test_tfrecords():# 讀取文件。print(len(tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS)))  # 0reader = tf.TFRecordReader()filename_queue = tf.train.string_input_producer(["output.tfrecords"])  # 隊列默認自動添加進collectionprint(len(tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS)))   # 1_, serialized_example = reader.read(filename_queue)# 解析讀取的樣例。features = tf.parse_single_example(serialized_example,features={'image_raw': tf.FixedLenFeature([], tf.string),'pixels': tf.FixedLenFeature([], tf.int64),'label': tf.FixedLenFeature([], tf.int64)})images = tf.decode_raw(features['image_raw'], tf.uint8)labels = tf.cast(features['label'], tf.int32)pixels = tf.cast(features['pixels'], tf.int32)sess = tf.Session()# 啟動多線程處理輸入數據。coord = tf.train.Coordinator()threads = tf.train.start_queue_runners(sess=sess, coord=coord)for i in range(5):image, label, pixel = sess.run([images, labels, pixels])print(label)

總結

以上是生活随笔為你收集整理的TFRecord tf.train.Feature的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。