當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【CV】语义分割：最简单的代码实现！

發布時間：2025/3/12 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了【CV】语义分割：最简单的代码实现！小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

分割對于圖像解釋任務至關重要，那就不要落后于流行趨勢，讓我們來實施它，我們很快就會成為專業人士！

什么是語義分割？

它描述了將圖像的每個像素與類別標簽（例如花、人、道路、天空、海洋或汽車）相關聯的過程，即我們要輸入圖像，然后為該圖像中的每個像素輸出一個類別決策。例如下面這個輸入圖像，這是一只坐在床上的狗：

因此，在輸出中，我們希望為每個像素定義一組類別，即狗、床、后面的桌子和櫥柜。在語義分割之后，圖像看起來像這樣：

關于語義分割的一件有趣的事情是它不區分實例，即如果此圖像中有兩只狗，它們將僅被描述為一個標簽，即 dog ，而不是 dog1 和 dog2。

語義分割一般用于：

自動駕駛
工業檢驗
衛星圖像中值得注意的區域分類
醫學影像監查

語義分割實現：

第一種方法是滑動窗口，我們將輸入圖像分解成許多小的局部圖像，但是這種方法在計算上會很昂貴。所以，我們在實踐中并沒有真正使用這個方法。
另一種方法是完全卷積網絡，其中網絡有一整堆卷積層，沒有完全連接的層，從而保留了輸入的空間大小，這在計算上也是極其昂貴的。
第三個也是最好的一個方法，那就是對圖像進行上采樣和下采樣。因此，我們不需要對圖像的完整空間分辨率進行所有卷積，我們可能會在原始分辨率下遍歷少量卷積層，然后對該特征圖進行下采樣，然后對其進行上采樣。
在這里，我們只想在網絡的后半部分提高我們預測的空間分辨率，以便我們的輸出圖像現在可以與我們的輸入圖像具有相同的維度。它的計算效率要高得多，因為我們可以使網絡非常深，并以更便宜的空間分辨率運行。

讓我們在代碼中實現這一點：

導入處理所需的必要庫，即
Pytorch 的重要功能，例如數據加載器、變量、轉換和優化器相關函數。

導入 VOC12 和 cityscapes 的數據集類，從 transform.py 文件導入 Relabel、ToLabel 和 Colorize 類，從 iouEval.py 文件中導入 iouEval 類。

#SSCV IIITH 2K19 import random import time import numpy as np import torch print(torch.__version__) import math from PIL import Image, ImageOps from torch.optim import SGD, Adam, lr_scheduler from torch.autograd import Variable from torch.utils.data import DataLoader from torchvision.transforms import Resize from torchvision.transforms import ToTensor, ToPILImage from dataset import cityscapes from dataset import idd_lite import sys print(sys.executable) from transform import Relabel, ToLabel, Colorize import matplotlib from matplotlib import pyplot as plt %matplotlib inline import importlib from iouEval import iouEval, getColorEntry #importing iouEval class from the iouEval.py file from shutil import copyfile

定義幾個全局參數：

NUM_CHANNELS = 3 #RGB Images NUM_CLASSES = 8 #IDD Lite has 8 labels or Level1 hierarchy of labels USE_CUDA = torch.cuda.is_available() IMAGE_HEIGHT = 160 DATA_ROOT = ‘/tmp/school/6-segmentation/user/1/6-segmentation/idd1_lite’ BATCH_SIZE = 2 NUM_WORKERS = 4 NUM_EPOCHS = 100 ENCODER_ONLY = True device = torch.device(“cuda” ) #device = ‘cuda’ color_transform = Colorize(NUM_CLASSES) image_transform = ToPILImage() IOUTRAIN = False IOUVAL = True

增強，即對圖像和目標執行隨機增強的不同功能：

class MyCoTransform(object):def __init__(self, enc, augment=True, height=160):self.enc=encself.augment = augmentself.height = heightpassdef __call__(self, input, target):# Resizing data to required sizeinput = Resize((self.height,320), Image.BILINEAR)(input)target = Resize((self.height,320), Image.NEAREST)(target) if(self.augment):# Random horizontal fliphflip = random.random()if (hflip < 0.5):input = input.transpose(Image.FLIP_LEFT_RIGHT)target = target.transpose(Image.FLIP_LEFT_RIGHT)#Random translation 0–2 pixels (fill rest with padding)transX = random.randint(0, 2) transY = random.randint(0, 2) input = ImageOps.expand(input, border=(transX,transY,0,0), fill=0)target = ImageOps.expand(target, border=(transX,transY,0,0), fill=7) #pad label filling with 7input = input.crop((0, 0, input.size[0]-transX, input.size[1]-transY))target = target.crop((0, 0, target.size[0]-transX, target.size[1]-transY)) input = ToTensor()(input)target = ToLabel()(target)target = Relabel(255,7)(target)return input, target

加載數據：我們將遵循 pytorch 推薦的語義，并使用數據加載器加載數據。

best_acc = 0 co_transform = MyCoTransform(ENCODER_ONLY, augment=True, height=IMAGE_HEIGHT) co_transform_val = MyCoTransform(ENCODER_ONLY, augment=False, height=IMAGE_HEIGHT) #train data dataset_train = idd_lite(DATA_ROOT, co_transform, ‘train’) print(len(dataset_train)) #test data dataset_val = idd_lite(DATA_ROOT, co_transform_val, ‘val’) print(len(dataset_val)) loader_train = DataLoader(dataset_train, num_workers=NUM_WORKERS, batch_size=BATCH_SIZE, shuffle=True) loader_val = DataLoader(dataset_val, num_workers=NUM_WORKERS, batch_size=BATCH_SIZE, shuffle=False)

既然是分類問題，我們就使用交叉熵損失，但為什么呢？

答案是負對數，在較小值的時候效果不好，并且在較大值的時候效果也不好。因為我們將損失函數加到所有正確的類別上，實際發生的情況是，每當網絡為正確的類別，分配高置信度時，損失就低，但是當網絡為正確的類別時分配低置信度，損失就高。

criterion = torch.nn.CrossEntropyLoss()

現在讓我們加載模型并優化它！

model_file = importlib.import_module(‘erfnet’) model = model_file.Net(NUM_CLASSES).to(device) optimizer = Adam(model.parameters(), 5e-4, (0.9, 0.999), eps=1e-08, weight_decay=1e-4) start_epoch = 1

所以，編碼的最終本質就是訓練！

import os steps_loss = 50 my_start_time = time.time() for epoch in range(start_epoch, NUM_EPOCHS+1):print(“ — — — TRAINING — EPOCH”, epoch, “ — — -”) epoch_loss = []time_train = [] doIouTrain = IOUTRAIN doIouVal = IOUVAL if (doIouTrain):iouEvalTrain = iouEval(NUM_CLASSES) model.train()for step, (images, labels) in enumerate(loader_train): start_time = time.time()inputs = images.to(device)targets = labels.to(device)outputs = model(inputs, only_encode=ENCODER_ONLY) # zero the parameter gradientsoptimizer.zero_grad()# forward + backward + optimizeloss = criterion(outputs, targets[:, 0])loss.backward()optimizer.step() epoch_loss.append(loss.item())time_train.append(time.time() — start_time) if (doIouTrain):#start_time_iou = time.time()iouEvalTrain.addBatch(outputs.max(1)[1].unsqueeze(1).data, targets.data)#print (“Time to add confusion matrix: “, time.time() — start_time_iou) # print statisticsif steps_loss > 0 and step % steps_loss == 0:average = sum(epoch_loss) / len(epoch_loss)print(‘loss: {average:0.4} (epoch: {epoch}, step: {step})’, “// Avg time/img: %.4f s” % (sum(time_train) / len(time_train) / BATCH_SIZE)) average_epoch_loss_train = sum(epoch_loss) / len(epoch_loss) iouTrain = 0if (doIouTrain):iouTrain, iou_classes = iouEvalTrain.getIoU()iouStr = getColorEntry(iouTrain)+’{:0.2f}’.format(iouTrain*100) + ‘\033[0m’print (“EPOCH IoU on TRAIN set: “, iouStr, “%”) my_end_time = time.time() print(my_end_time — my_start_time)

在訓練了 100 個 epoch 之后，我們會看到：

驗證：

#Validate on val images after each epoch of training print(“ — — — VALIDATING — EPOCH”, epoch, “ — — -”) model.eval() epoch_loss_val = [] time_val = [] if (doIouVal):iouEvalVal = iouEval(NUM_CLASSES) for step, (images, labels) in enumerate(loader_val):start_time = time.time() inputs = images.to(device) targets = labels.to(device)with torch.no_grad():outputs = model(inputs, only_encode=ENCODER_ONLY) #outputs = model(inputs)loss = criterion(outputs, targets[:, 0])epoch_loss_val.append(loss.item())time_val.append(time.time() — start_time) #Add batch to calculate TP, FP and FN for iou estimationif (doIouVal):#start_time_iou = time.time()iouEvalVal.addBatch(outputs.max(1)[1].unsqueeze(1).data, targets.data)#print (“Time to add confusion matrix: “, time.time() — start_time_iou)if steps_loss > 0 and step % steps_loss == 0:average = sum(epoch_loss_val) / len(epoch_loss_val)print(‘VAL loss: {average:0.4} (epoch: {epoch}, step: {step})’, “// Avg time/img: %.4f s” % (sum(time_val) / len(time_val) / BATCH_SIZE)) average_epoch_loss_val = sum(epoch_loss_val) / len(epoch_loss_val) iouVal = 0 if (doIouVal): iouVal, iou_classes = iouEvalVal.getIoU()print(iou_classes)iouStr = getColorEntry(iouVal)+’{:0.2f}’.format(iouVal*100) + ‘\033[0m’print (“EPOCH IoU on VAL set: “, iouStr, “%”)

可視化輸出：

# Qualitative Analysis dataiter = iter(loader_val) images, labels = dataiter.next() if USE_CUDA:images = images.to(device) inputs = images.to(device) with torch.no_grad():outputs = model(inputs, only_encode=ENCODER_ONLY) label = outputs[0].max(0)[1].byte().cpu().data label_color = Colorize()(label.unsqueeze(0)) label_save = ToPILImage()(label_color) plt.figure() plt.imshow(ToPILImage()(images[0].cpu())) plt.figure() plt.imshow(label_save)

輸出圖像

很快我們就可以準備好我們的模型了！

隨意使用我們新設計的模型，嘗試增加更多的 epoch 并觀察我們的模型表現得更好！

因此，簡而言之，現在我們將能夠輕松地將圖像的每個像素與類標簽相關聯，并可以調整超參數以查看顯示的更改。本文展示了語義分割的基礎知識，要對實例進行分類，我們需要進行實例分割，這是語義分割的高級版本。

往期精彩回顧適合初學者入門人工智能的路線及資料下載中國大學慕課《機器學習》（黃海廣主講）機器學習及深度學習筆記等資料打印機器學習在線手冊深度學習筆記專輯《統計學習方法》的代碼復現專輯 AI基礎下載本站qq群955171419，加入微信群請掃碼：

總結

以上是生活随笔為你收集整理的【CV】语义分割：最简单的代码实现！的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【Python】参考ggplot2，Se
下一篇： ajax提交加载loading图标遮罩层