當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

使用 Inception-v3，实现图像识别（Python、C++）

發布時間：2023/11/27 生活经验 38 豆豆

生活随笔收集整理的這篇文章主要介紹了使用 Inception-v3，实现图像识别（Python、C++）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

簡介

使用 Python API

使用 C++ API

簡介

對于我們的大腦來說，視覺識別似乎是一件特別簡單的事。人類不費吹灰之力就可以分辨獅子和美洲虎、看懂路標或識別人臉。但對計算機而言，這些實際上是很難處理的問題：這些問題只是看起來簡單，因為大腦非常擅長理解圖像。

在過去幾年內，機器學習領域在解決此類難題方面取得了巨大進展。尤其是，我們發現一種稱為深度卷積神經網絡的模型可以很好地處理較難的視覺識別任務 - 在某些領域的表現與人類大腦不相上下，甚至更勝一籌。

研究人員通過用?ImageNet（計算機視覺的一種學術基準）驗證其工作成果，證明他們在計算機視覺方面取得了穩步發展。他們陸續推出了以下幾個模型，每一個都比上一個有所改進，且每一次都取得了新的領先成果：QuocNet、AlexNet、Inception (GoogLeNet)、BN-Inception-v2。Google 內部和外部的研究人員均發表過關于所有這些模型的論文，但這些成果仍是難以復制的。現在我們將采取后續步驟，發布用于在我們的最新模型?Inception-v3?上進行圖像識別的代碼。

Inception-v3 使用 2012 年的數據針對?ImageNet?大型視覺識別挑戰賽訓練而成。它的層次結構如下圖所示：

Inception-v3處理的是標準的計算機視覺任務，在此類任務中，模型會嘗試將所有圖像分成?1000 個類別，如 “斑馬”、“斑點狗” 和 “洗碗機”。例如，以下是?AlexNet?對某些圖像進行分類的結果：

為了比較各個模型，我會檢查正確答案不在模型預測的最有可能的 5 個選項中的頻率，稱為 “top-5 錯誤率”。?AlexNet?在 2012 年的驗證數據集上實現了 15.3% 的 top-5 錯誤率；Inception (GoogLeNet)、BN-Inception-v2?和?Inception-v3?的 top-5 錯誤率分別達到 6.67%、4.9% 和 3.46%。

人類在 ImageNet 挑戰賽上的表現如何？Andrej Karpathy 曾嘗試衡量自己的表現，他發表了一篇博文，提到自己的 top-5 錯誤率為 5.1%。

本次將介紹如何使用?Inception-v3。小伙伴們將了解如何使用 Python 或 C++ 將圖像分成?1000 個類別。此外，我們還將討論如何從該模型提取更高級別的特征，以重復用于其他視覺任務。

使用 Python API

首次運行程序時，classify_image.py?會從?tensorflow.org?下載經過訓練的模型。你的硬盤上需要有約 200M 的可用空間。

首先，從 GitHub 克隆?TensorFlow 模型代碼庫。

cd models/tutorials/image/imagenet

classify_image.py 程序內容如下：

from __future__ import absolute_import
from __future__ import division
from __future__ import print_functionimport argparse
import os.path
import re
import sys
import tarfileimport numpy as np
from six.moves import urllib
import tensorflow as tfFLAGS = None# pylint: disable=line-too-long
DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'
# pylint: enable=line-too-longclass NodeLookup(object):"""Converts integer node ID's to human readable labels."""def __init__(self,label_lookup_path=None,uid_lookup_path=None):if not label_lookup_path:label_lookup_path = os.path.join(FLAGS.model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt')if not uid_lookup_path:uid_lookup_path = os.path.join(FLAGS.model_dir, 'imagenet_synset_to_human_label_map.txt')self.node_lookup = self.load(label_lookup_path, uid_lookup_path)def load(self, label_lookup_path, uid_lookup_path):"""Loads a human readable English name for each softmax node.Args:label_lookup_path: string UID to integer node ID.uid_lookup_path: string UID to human-readable string.Returns:dict from integer node ID to human-readable string."""if not tf.gfile.Exists(uid_lookup_path):tf.logging.fatal('File does not exist %s', uid_lookup_path)if not tf.gfile.Exists(label_lookup_path):tf.logging.fatal('File does not exist %s', label_lookup_path)# Loads mapping from string UID to human-readable stringproto_as_ascii_lines = tf.gfile.GFile(uid_lookup_path).readlines()uid_to_human = {}p = re.compile(r'[n\d]*[ \S,]*')for line in proto_as_ascii_lines:parsed_items = p.findall(line)uid = parsed_items[0]human_string = parsed_items[2]uid_to_human[uid] = human_string# Loads mapping from string UID to integer node ID.node_id_to_uid = {}proto_as_ascii = tf.gfile.GFile(label_lookup_path).readlines()for line in proto_as_ascii:if line.startswith('  target_class:'):target_class = int(line.split(': ')[1])if line.startswith('  target_class_string:'):target_class_string = line.split(': ')[1]node_id_to_uid[target_class] = target_class_string[1:-2]# Loads the final mapping of integer node ID to human-readable stringnode_id_to_name = {}for key, val in node_id_to_uid.items():if val not in uid_to_human:tf.logging.fatal('Failed to locate: %s', val)name = uid_to_human[val]node_id_to_name[key] = namereturn node_id_to_namedef id_to_string(self, node_id):if node_id not in self.node_lookup:return ''return self.node_lookup[node_id]def create_graph():"""Creates a graph from saved GraphDef file and returns a saver."""# Creates graph from saved graph_def.pb.with tf.gfile.FastGFile(os.path.join(FLAGS.model_dir, 'classify_image_graph_def.pb'), 'rb') as f:graph_def = tf.GraphDef()graph_def.ParseFromString(f.read())_ = tf.import_graph_def(graph_def, name='')def run_inference_on_image(image):"""Runs inference on an image.Args:image: Image file name.Returns:Nothing"""if not tf.gfile.Exists(image):tf.logging.fatal('File does not exist %s', image)image_data = tf.gfile.FastGFile(image, 'rb').read()# Creates graph from saved GraphDef.create_graph()with tf.Session() as sess:# Some useful tensors:# 'softmax:0': A tensor containing the normalized prediction across#   1000 labels.# 'pool_3:0': A tensor containing the next-to-last layer containing 2048#   float description of the image.# 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG#   encoding of the image.# Runs the softmax tensor by feeding the image_data as input to the graph.softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')predictions = sess.run(softmax_tensor,{'DecodeJpeg/contents:0': image_data})predictions = np.squeeze(predictions)# Creates node ID --> English string lookup.node_lookup = NodeLookup()top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]for node_id in top_k:human_string = node_lookup.id_to_string(node_id)score = predictions[node_id]print('%s (score = %.5f)' % (human_string, score))def maybe_download_and_extract():"""Download and extract model tar file."""dest_directory = FLAGS.model_dirif not os.path.exists(dest_directory):os.makedirs(dest_directory)filename = DATA_URL.split('/')[-1]filepath = os.path.join(dest_directory, filename)if not os.path.exists(filepath):def _progress(count, block_size, total_size):sys.stdout.write('\r>> Downloading %s %.1f%%' % (filename, float(count * block_size) / float(total_size) * 100.0))sys.stdout.flush()filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress)print()statinfo = os.stat(filepath)print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')tarfile.open(filepath, 'r:gz').extractall(dest_directory)def main(_):maybe_download_and_extract()image = (FLAGS.image_file if FLAGS.image_file elseos.path.join(FLAGS.model_dir, 'cropped_panda.jpg'))run_inference_on_image(image)if __name__ == '__main__':parser = argparse.ArgumentParser()# classify_image_graph_def.pb:#   Binary representation of the GraphDef protocol buffer.# imagenet_synset_to_human_label_map.txt:#   Map from synset ID to a human readable string.# imagenet_2012_challenge_label_map_proto.pbtxt:#   Text representation of a protocol buffer mapping a label to synset ID.parser.add_argument('--model_dir',type=str,default=r'C:\Users\Administrator\Desktop\imagenet',help="""\Path to classify_image_graph_def.pb,imagenet_synset_to_human_label_map.txt, andimagenet_2012_challenge_label_map_proto.pbtxt.\""")parser.add_argument('--image_file',type=str,default=r'C:\Users\Administrator\Desktop\imagenet\cropped_panda.jpg',help='Absolute path to image file.')parser.add_argument('--num_top_predictions',type=int,default=5,help='Display this many predictions.')FLAGS, unparsed = parser.parse_known_args()tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

運行以下命令：

python classify_image.py

以上命令會對提供的大熊貓圖像進行分類。

如果模型運行正確，腳本將生成以下輸出：

giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.88493)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00878)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00317)
custard apple (score = 0.00149)
earthstar (score = 0.00127)

如果想提供其他 JPEG 圖像，只需修改?--image_file?參數即可。

如果將模型數據下載到其他目錄，則需要使?--model_dir?指向所使用的目錄。

在Windows環境下小伙伴們可以在直接到GitHub下載該程序案例：https://github.com/tensorflow/models

但是有時下載識別模型時經常會失敗，這里我給大家分享下我調試好的Demo：https://download.csdn.net/download/m0_38106923/10892062

使用 C++ API

可以使用 C++ 運行同一?Inception-v3?模型，以在生產環境中使用模型。為此，可以下載包含 GraphDef 的歸檔文件，GraphDef 會以如下方式定義模型（從 TensorFlow 代碼庫的根目錄運行）：

curl -L "https://storage.googleapis.com/download.tensorflow.org/models/inception_v3_2016_08_28_frozen.pb.tar.gz" |tar -C tensorflow/examples/label_image/data -xz

接下來，我們需要編譯包含加載和運行圖的代碼的 C++ 二進制文件。如果按照針對您平臺的說明下載 TensorFlow 源安裝文件，則應該能夠通過從 shell 終端運行以下命令來構建該示例：

bazel build tensorflow/examples/label_image/...

上述命令應該會創建一個可執行的二進制文件，然后可以運行該文件，如下所示：

bazel-bin/tensorflow/examples/label_image/label_image

這里使用的是框架附帶的默認示例圖像，輸出結果應與以下內容類似：

I tensorflow/examples/label_image/main.cc:206] military uniform (653): 0.834306
I tensorflow/examples/label_image/main.cc:206] mortarboard (668): 0.0218692
I tensorflow/examples/label_image/main.cc:206] academic gown (401): 0.0103579
I tensorflow/examples/label_image/main.cc:206] pickelhaube (716): 0.00800814
I tensorflow/examples/label_image/main.cc:206] bulletproof vest (466): 0.00535088

在本例中，我們使用的是默認的海軍上將格蕾絲·赫柏的圖像，您可以看到，網絡可正確識別她穿的是軍裝，分數高達 0.8。

有關其工作原理，請參閱?tensorflow/examples/label_image/main.cc?文件（https://www.tensorflowers.cn/t/7558）。希望此代碼可幫助小伙伴們將 TensorFlow 集成到自己的應用中，因此將逐步介紹主要函數：

命令行標記可控制文件加載路徑以及輸入圖像的屬性。由于應向模型輸入 299x299 RGB 的正方形圖像，因此標記?input_width?和?input_height?應設成這些值。此外，我們還需要將像素值從介于 0 至 255 之間的整數縮放成浮點值，因為圖執行運算時采用的是浮點數。我們使用?input_mean?和?input_std?標記控制縮放；先用每個像素值減去?input_mean，然后除以?input_std。

這些值看起來可能有點不可思議，但它們只是原模型作者根據他 / 她想要用做輸入圖像以用于訓練的內容定義的。如果小伙伴們有自行訓練的圖，只需對值做出調整，使其與您在訓練過程中使用的任何值一致即可。

你可以參閱?ReadTensorFromImageFile()?函數，了解這些標記是如何應用到圖像的。

// Given an image file name, read in the data, try to decode it as an image,
// resize it to the requested size, and then scale the values as desired.
Status ReadTensorFromImageFile(string file_name, const int input_height,const int input_width, const float input_mean,const float input_std,std::vector<Tensor>* out_tensors) {tensorflow::GraphDefBuilder b;

首先，創建一個?GraphDefBuilder?對象，它可用于指定要運行或加載的模型。

  string input_name = "file_reader";string output_name = "normalized";tensorflow::Node* file_reader =tensorflow::ops::ReadFile(tensorflow::ops::Const(file_name, b.opts()),b.opts().WithName(input_name));

然后，為要運行的小型模型創建節點，以加載、調整和縮放像素值，從而獲得主模型期望作為其輸入的結果。我創建的第一個節點只是一個?Const?操作，它會存儲一個張量，其中包含要加載的圖像的文件名。然后，該張量會作為第一個輸入傳遞到?ReadFile?操作。小伙伴們可能會注意到，我將?b.opts()?作為最后一個參數傳遞到所有操作創建函數。該參數可確保該節點會添加到?GraphDefBuilder?中存儲的模型定義中。此外，我還通過向?b.opts()?發起?WithName()?調用來命名?ReadFile?運算符，從而命名該節點，雖然這不是絕對必要的操作（因為如果您不執行此操作，系統會自動為該節點分配名稱），但確實可簡化調試過程。

// Now try to figure out what kind of file it is and decode it.const int wanted_channels = 3;tensorflow::Node* image_reader;if (tensorflow::StringPiece(file_name).ends_with(".png")) {image_reader = tensorflow::ops::DecodePng(file_reader,b.opts().WithAttr("channels", wanted_channels).WithName("png_reader"));} else {// Assume if it's not a PNG then it must be a JPEG.image_reader = tensorflow::ops::DecodeJpeg(file_reader,b.opts().WithAttr("channels", wanted_channels).WithName("jpeg_reader"));}// Now cast the image data to float so we can do normal math on it.tensorflow::Node* float_caster = tensorflow::ops::Cast(image_reader, tensorflow::DT_FLOAT, b.opts().WithName("float_caster"));// The convention for image ops in TensorFlow is that all images are expected// to be in batches, so that they're four-dimensional arrays with indices of// [batch, height, width, channel]. Because we only have a single image, we// have to add a batch dimension of 1 to the start with ExpandDims().tensorflow::Node* dims_expander = tensorflow::ops::ExpandDims(float_caster, tensorflow::ops::Const(0, b.opts()), b.opts());// Bilinearly resize the image to fit the required dimensions.tensorflow::Node* resized = tensorflow::ops::ResizeBilinear(dims_expander, tensorflow::ops::Const({input_height, input_width},b.opts().WithName("size")),b.opts());// Subtract the mean and divide by the scale.tensorflow::ops::Div(tensorflow::ops::Sub(resized, tensorflow::ops::Const({input_mean}, b.opts()), b.opts()),tensorflow::ops::Const({input_std}, b.opts()),b.opts().WithName(output_name));

接下來，我繼續添加更多節點，以便將文件數據解碼為圖像、將整數轉換為浮點值、調整大小，最終對像素值運行減法和除法運算。

  // This runs the GraphDef network definition that we've just constructed, and// returns the results in the output tensor.tensorflow::GraphDef graph;TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));

最后，我獲得一個存儲在變量 b 中的模型定義，并可以使用?ToGraphDef()?函數將其轉換成一個完整的圖定義。

  std::unique_ptr<tensorflow::Session> session(tensorflow::NewSession(tensorflow::SessionOptions()));TF_RETURN_IF_ERROR(session->Create(graph));TF_RETURN_IF_ERROR(session->Run({}, {output_name}, {}, out_tensors));return Status::OK();

接下來，創建一個?tf.Session?對象（它是實際運行圖的接口）并運行它，從而指定要從哪個節點獲得輸出，以及將輸出數據存放在什么位置。

這為我們提供了一個由?Tensor?對象構成的向量，在此例中，我們知道它將僅是單個對象的長度。在這種情況下，可以將?Tensor?視為多維數組，它將 299 像素高、299 像素寬、3 通道的圖像存儲為浮點值。如果產品中已有自己的圖像處理框架，則應該能夠使用該框架，只要在將圖像饋送到主圖之前對其應用相同的轉換即可。

下面是使用 C++ 動態創建小型 TensorFlow 圖的簡單示例，但對于預訓練的 Inception 模型，我們需要從文件中加載更大的定義?？梢圆榭?LoadGraph()?函數，了解如何做到這一點。

// Reads a model graph definition from disk, and creates a session object you
// can use to run it.
Status LoadGraph(string graph_file_name,std::unique_ptr<tensorflow::Session>* session) {tensorflow::GraphDef graph_def;Status load_graph_status =ReadBinaryProto(tensorflow::Env::Default(), graph_file_name, &graph_def);if (!load_graph_status.ok()) {return tensorflow::errors::NotFound("Failed to load compute graph at '",graph_file_name, "'");}

如果已經瀏覽圖像加載代碼，則應該對許多術語都比較熟悉了。我會加載直接包含?GraphDef?的 protobuf 文件，而不是使用?GraphDefBuilder?生成?GraphDef?對象。

  session->reset(tensorflow::NewSession(tensorflow::SessionOptions()));Status session_create_status = (*session)->Create(graph_def);if (!session_create_status.ok()) {return session_create_status;}return Status::OK();
}

然后，我從該?GraphDef?創建一個 Session 對象，并將其傳遞回調用程序，以便調用程序稍后可以運行它。

GetTopLabels()?函數很像圖像加載，只是在本例中，我想要獲取運行主圖得到的結果，并將其轉換成得分最高的標簽的排序列表。與圖像加載器類似，該函數可創建一個?GraphDefBuilder，向其添加幾個節點，然后運行較短的圖，從而獲取一對輸出張量。在本例中，它們分別表示最高結果的經過排序的得分和索引位置。

// Analyzes the output of the Inception graph to retrieve the highest scores and
// their positions in the tensor, which correspond to categories.
Status GetTopLabels(const std::vector<Tensor>& outputs, int how_many_labels,Tensor* indices, Tensor* scores) {tensorflow::GraphDefBuilder b;string output_name = "top_k";tensorflow::ops::TopK(tensorflow::ops::Const(outputs[0], b.opts()),how_many_labels, b.opts().WithName(output_name));// This runs the GraphDef network definition that we've just constructed, and// returns the results in the output tensors.tensorflow::GraphDef graph;TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));std::unique_ptr<tensorflow::Session> session(tensorflow::NewSession(tensorflow::SessionOptions()));TF_RETURN_IF_ERROR(session->Create(graph));// The TopK node returns two outputs, the scores and their original indices,// so we have to append :0 and :1 to specify them both.std::vector<Tensor> out_tensors;TF_RETURN_IF_ERROR(session->Run({}, {output_name + ":0", output_name + ":1"},{}, &out_tensors));*scores = out_tensors[0];*indices = out_tensors[1];return Status::OK();

PrintTopLabels()?函數會采用這些經過排序的結果，并以友好的方式輸出這些結果。CheckTopLabel()?函數與其極為相似，但出于調試目的，需確保最有可能的標簽是我們預期的值。

最后，main()?將所有這些調用綁定在一起。

int main(int argc, char* argv[]) {// We need to call this to set up global state for TensorFlow.tensorflow::port::InitMain(argv[0], &argc, &argv);Status s = tensorflow::ParseCommandLineFlags(&argc, argv);if (!s.ok()) {LOG(ERROR) << "Error parsing command line flags: " << s.ToString();return -1;}// First we load and initialize the model.std::unique_ptr<tensorflow::Session> session;string graph_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_graph);Status load_graph_status = LoadGraph(graph_path, &session);if (!load_graph_status.ok()) {LOG(ERROR) << load_graph_status;return -1;}

加載主圖

  // Get the image from disk as a float array of numbers, resized and normalized// to the specifications the main graph expects.std::vector<Tensor> resized_tensors;string image_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_image);Status read_tensor_status = ReadTensorFromImageFile(image_path, FLAGS_input_height, FLAGS_input_width, FLAGS_input_mean,FLAGS_input_std, &resized_tensors);if (!read_tensor_status.ok()) {LOG(ERROR) << read_tensor_status;return -1;}const Tensor& resized_tensor = resized_tensors[0];

加載、處理輸入圖像并調整其大小

  // Actually run the image through the model.std::vector<Tensor> outputs;Status run_status = session->Run({ {FLAGS_input_layer, resized_tensor}},{FLAGS_output_layer}, {}, &outputs);if (!run_status.ok()) {LOG(ERROR) << "Running model failed: " << run_status;return -1;}

在本示例中，我們將圖像作為輸入，運行已加載的圖

  // This is for automated testing to make sure we get the expected result with// the default settings. We know that label 866 (military uniform) should be// the top label for the Admiral Hopper image.if (FLAGS_self_test) {bool expected_matches;Status check_status = CheckTopLabel(outputs, 866, &expected_matches);if (!check_status.ok()) {LOG(ERROR) << "Running check failed: " << check_status;return -1;}if (!expected_matches) {LOG(ERROR) << "Self-test failed!";return -1;}}

出于測試目的，我們可以在下方檢查以確保獲得了預期的輸出

  // Do something interesting with the results we've generated.Status print_status = PrintTopLabels(outputs, FLAGS_labels);

最后，輸出我們找到的標簽

  if (!print_status.ok()) {LOG(ERROR) << "Running print failed: " << print_status;return -1;}

在本示例中，我使用 TensorFlow 的?Status?對象處理錯誤，它非常方便，因為通過它，小伙伴們可以使用?ok()?檢查工具了解是否發生了任何錯誤，如果有錯誤，則可以輸出可以讀懂的錯誤消息。

在本示例中，我演示的是對象識別，但小伙伴們應該能夠對自己在各種領域找到的或自行訓練的其他模型使用非常相似的代碼。我希望這一小示例可就如何在自己的產品中使用 TensorFlow 為大家帶來一些啟發。

總結

以上是生活随笔為你收集整理的使用 Inception-v3，实现图像识别（Python、C++）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：自然语言处理常用数据集
下一篇： Grasp2Vec：通过自我监督式抓取学