當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

使用tensorflow训练数据时遇到的问题总结

發布時間：2025/3/20 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了使用tensorflow训练数据时遇到的问题总结小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1、OP_REQUIRES failed at assign_op.h models

這個問題的根源在于GPU不夠用，解決方法也是各不相同，這里寫一下幾個共性方法。

在eval文件中加入os.environ['CUDA_VISIBLE_DEVICES']='2'

強制使用CPU驗證

降低batch_size
修改tensorflow-gpu的版本，可能有效果
更換網絡

2、Argument must be a dense tensor: range(0, 3) - got shape [3], but wanted []

需要將models/research/object_detection/utils/learning_schedules.py里的：

rate_index = tf.reduce_max(tf.where(tf.greater_equal(global_step, boundaries),range(num_boundaries),[0] * num_boundaries))

修改成：

rate_index = tf.reduce_max(tf.where(tf.greater_equal(global_step, boundaries),list(range(num_boundaries)),[0] * num_boundaries)

3、valueerror not enough values to unpack (expected 7 got 0)

The batch_size in config file should be set the same number as your num_clones, which could prevent this.
The batch_size in detection and classification tasks has different definition.
– 來自github

意思：在你的配置文件中的batch_size需要和你的訓練文件中的num_clones保持一致。

4、tensorboard無法顯示問題

這個問題就是沒有讀取到正確的路徑，使用下面方法可以解決。

將cmd的默認路徑cd到log文件的上一層，即cd home/tensorBoard，之后等號后面直接鍵入log文件名即可，不需寫全路徑，即 tensorboard --logdir=logs

5、No scalar data was found…

最開始的時候不顯示scalar數據，這個時候有可能是eval還沒有解析，所以數據暫時不顯示，只要tensorboard正常顯示，這個數據可能等等就有了。

6、Value Error: First Step Cannot Be Zero

找到類似下面的代碼

schedule {step: 0learning_rate: .0001 }

將step修改為非0，或者刪除這一段。

7、查看gpu、cpu信息

https://blog.csdn.net/weiyumeizi/article/details/83035711
https://blog.csdn.net/wujizhishui/article/details/89333957

8、fail to start snmpd

package snmpd 5.7.3+dfsg-1ubuntu4 failed to install/upgrade: subprocess installed post-installation script returned error exit status 1

https://answers.launchpad.net/ubuntu/+source/net-snmp/+question/656995

9、Tensorflow 2.1 報錯整合

RuntimeError: loss passed to Optimizer.compute_gradients should be a function when eager execution is enabled.
RuntimeError: Attempting to capture an EagerTensor without building a function.
RuntimeError: When eager execution is enabled, var_list must specify a list or dict of variables to save

當eager execution開啟的時候，loss應該是一個Python函數。
在Tensorflow 2.0 中，eager execution 是默認開啟的。
所以，需要先關閉eager execution
tf.compat.v1.disable_eager_execution()

10、github clone很慢解決方法

https://www.jianshu.com/p/fb9848d5418c

11、How to fix the bug “Expected “required”, “optional”, or “repeated”.”？

問題出在當前版本的protobuf有bug，所以需要安裝其他版本的進行操作，步驟如下：

tensorflow$ mkdir protoc_3.3 tensorflow$ cd protoc_3.3 tensorflow/protoc_3.3$ wget wget https://github.com/google/protobuf/releases/download/v3.3.0/protoc-3.3.0-linux-x86_64.zip tensorflow/protoc_3.3$ chmod 775 protoc-3.3.0-linux-x86_64.zip tensorflow/protoc_3.3$ unzip protoc-3.3.0-linux-x86_64.zip tensorflow/protoc_3.3$ cd ../models/ tensorflow/protoc_3.3$ /home/humayun/tensorflow/protoc_3.3/bin/protoc object_detection/protos/*.proto --python_out=.

https://github.com/tensorflow/models/issues/1834

12、安裝google object detection api的有效的教程

https://zhuanlan.zhihu.com/p/215456184

13、解決no module named’pycocotools_mask’的問題

我以為是cocoAPI沒裝好，在tensorflow/models/research下有一個pycocotools,程序會優先導入這個包，但是這個包里的_mask并不是python程序，把這個包刪了。在 models/research下重新安裝，命令為

git clone https://github.com/cocodataset/cocoapi.git cd cocoapi/PythonAPI python setup.py install make make install前要先激活環境 make install

14、findfont: Font family [‘serif’] not found. Falling back to DejaVu Sans.

https://blog.csdn.net/mr_muli/article/details/89485619

15、LaTeX Error: File `type1ec.sty’ not found.

apt install cm-super

16、FileNotFoundError: [Errno 2] No such file or directory: ‘latex’: ‘latex’ (Python 3.6 issue)

sudo aptitude install texlive-fonts-recommended texlive-fonts-extra
sudo apt-get install dvipng

16、WARNING:root:image 4000 does not have groundtruth difficult flag specified

這個問題在于eval過多的圖片，導致eval時間過長，所以需要減少，這個設置就在config文件中，在"eval_config"中"num_examples"，設置成你想要的數字，比如100，10即可

17、自己的數據集的圖片的格式不同，導致識別出現問題

識別的圖片的格式為RGBA，而程序是RGB，所以識別的時候一直報錯，出現下面的問題。

ValueError: cannot reshape array of size 60654 into shape (264,256,1,3)

其實這個問題的原因就在于，RGB的圖片通道是3通道，而RGBA的通道不是，所以，導致shape是對不上的。

所以，在加載圖片時，需要做一下轉換，將RGBA格式的圖片轉為RGB的格式。

def load_image_into_numpy_array(image):# The function supports only grayscale imagesimage_np = np.asarray(image)image_np = cv2.cvtColor(image_np, cv2.COLOR_RGBA2RGB)return image_np

使用的方法是CV2的方法：image_np = cv2.cvtColor(image_np, cv2.COLOR_RGBA2RGB)。
如上，問題應該就解決了。

18、cannot import name AsyncGenerator

解決辦法就是降低版本

pip install --upgrade prompt-toolkit==2.0.1

安裝成功后，執行

python -m ipykernel --version

如果有版本號，那問題就解決了，jupyter可以正常使用

或者，卸載后重新安裝

pip uninstall Ipython pip install Ipython

19、Cannot uninstall ‘ipython’. It is a distutils installed project and thus we cannot accurately det…

解決辦法：使用下面命令進行強制更新即可。親測可用

sudo pip3 install --ignore-installed ipython --upgrade

20、ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.9‘ not found

https://blog.csdn.net/bitcarmanlee/article/details/90242598

總結

以上是生活随笔為你收集整理的使用tensorflow训练数据时遇到的问题总结的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。