當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

学习Knowledge Graph Embedding Based Question Answering代码笔记

發布時間：2024/8/26 编程问答 45 豆豆

生活随笔收集整理的這篇文章主要介紹了学习Knowledge Graph Embedding Based Question Answering代码笔记小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

前言

最近被導師安排學習一下【Knowledge Graph Embedding Based Question Answering】這篇paper，這篇paper的重點在于運用了Knowledge Graph為dataset，在不用知道數據結構的情況下，去完成Question Answering這個自然語言處理方向的問題。這篇筆記只用來記錄一下閱讀這篇paper的github的代碼時，作為一名很菜的本科學生所發現覺得可能有用的代碼片段，具體對paper的筆記會再開一份筆記另行記錄

希望自己能和大家一起學習進步！加油！

paper 鏈接：

delivery.acm.org/10.1145/330…acm=1564312374_9607150c0f9e4d7029cba11e69cb8903 (請復制全部)

github 鏈接：

github.com/xhuang31/KE…

下面會逐步緩慢更新

正文開始！

if the question contains specific words, delete it

比如我們想去掉what is your name里的what is，獲得結果your name，便可使用如下代碼：

whhowset = [{'what', 'how', 'where', 'who', 'which', 'whom'}, {'in which', 'what is', "what 's", 'what are', 'what was', 'what were', 'where is', 'where are','where was', 'where were', 'who is', 'who was', 'who are', 'how is', 'what did'}, {'what kind of', 'what kinds of', 'what type of', 'what types of', 'what sort of'}] question = ["what","is","your","name"] for j in range(3, 0, -1):if ' '.join(question[0:j]) in whhowset[j - 1]:del question[0:j]continue print(question) 復制代碼

output: ["your","name"]

create n-gram list for sentence word list

以下引用自wiki里對n-gram的解釋：n-gram is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus.

n可自定義，例如unigram, bigram. 對于n-gram的具體例子就是：

單詞：word: apple, n-gram list: ['a','p','l','e','ap','pp','pl','pl','app','ppl','ple','appl','pple','apple']
句子：sentence: 'how are you', n-gram list: ['how', 'are', 'u', 'how are', 'are u', 'how are u']

question = ["how","are","u"] grams = [] maxlen = len(question) for token in question:grams.append(token)for j in range(2, maxlen + 1):for token in [question[idx:idx + j] for idx in range(maxlen - j + 1)]:grams.append(' '.join(token))print(grams) 復制代碼

output: ['how', 'are', 'u', 'how are', 'are u', 'how are u']

write the output into a file

import os mids = ["I","I","am","a","human"] with open(os.path.join('output.txt'), 'w')as outfile:for i, entity in enumerate(set(mids)):outfile.write("{}\t{}\n".format(entity, i)) 復制代碼

output: 為一個文件：output.txt: 內容為：

Human 0a 1am 2I 3 復制代碼

argParser in PyTorch:makes it easy to write user-friendly command-line interface. Define how a single command-line argument should be parsed.

function:parser.add_argument(name or flags...[, action][, nargs][, const][, default][, type][, choices][, required][, help][, metavar][, dest])

parameters (cite from the Pytorch documentation):

const - A constant value required by some action and nargs selections.
dest - The name of the attribute to be added to the object returned by parse_args().
action - The basic type of action to be taken when this argument is encountered at the command line.

import argparseparser = argparse.ArgumentParser(description='Process some integers.') parser.add_argument('integers', metavar='N', type=int, nargs='+',help='an integer for the accumulator') parser.add_argument('--sum', dest='accumulate', action='store_const',const=sum, default=max,help='sum the integers (default: find the max)')args = parser.parse_args() print args.accumulate(args.integers) 復制代碼

output: python prog.py 1 2 3 4 --> 4(get the maximum), python prog.py 1 2 3 4 --sum -->10(get the sum)

Counter Object A counter tool is provided to support convenient and rapid tallies

from collections import Countercnt = Counter() for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:cnt[word] += 1 print(cnt) 復制代碼

output: Counter({'blue': 3, 'red': 2, 'green': 1})

PyTorch Manualseed

import torchtorch.manual_seed(3) print(torch.rand(3)) 復制代碼

output: tensor([0.0043, 0.1056, 0.2858]),this array will always be the same, if you don't have the manual_seed function, the output will be different every time

CUDNN deterministic n some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance.If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting torch.backends.cudnn.deterministic = True

Example:

torch.backends.cudnn.deterministic = True 復制代碼

torchtext 注：以下部分來自于知乎大佬Lee的知乎文章 torchtext入門教程，輕松玩轉文本數據處理僅作為學習筆記整理，侵刪。

torchtext組件：

Field :主要包含以下數據預處理的配置信息，比如指定分詞方法，是否轉成小寫，起始字符，結束字符，補全字符以及詞典等等
Dataset :繼承自pytorch的Dataset，用于加載數據，提供了TabularDataset可以指點路徑，格式，Field信息就可以方便的完成數據加載。同時torchtext還提供預先構建的常用數據集的Dataset對象，可以直接加載使用，splits方法可以同時加載訓練集，驗證集和測試集。
Iterator : 主要是數據輸出的模型的迭代器，可以支持batch定制

field：

TEXT = data.Field(lower=True) 復制代碼

此處為數據預處理設置為全部轉為小寫

Dataset

torchtext的Dataset是繼承自pytorch的Dataset，提供了一個可以下載壓縮數據并解壓的方法（支持.zip, .gz, .tgz）

splits方法可以同時讀取訓練集，驗證集，測試集

TabularDataset可以很方便的讀取CSV, TSV, or JSON格式的文件

train = data.TabularDataset(path=os.path.join(args.output, 'dete_train.txt'), format='tsv', fields=[('text', TEXT), ('ed', ED)]) dev, test = data.TabularDataset.splits(path=args.output, validation='valid.txt', test='test.txt', format='tsv', fields=field) 復制代碼

加載數據后可以建立詞典，建立詞典的時候可以使用與訓練的word vector

TEXT.build_vocab(train，vectors="text.6B.100d") 復制代碼

Iterator

Iterator是torchtext到模型的輸出，它提供了我們對數據的一般處理方式，比如打亂，排序，等等，可以動態修改batch大小，這里也有splits方法可以同時輸出訓練集，驗證集，測試集

train_iter = data.Iterator(train, batch_size=args.batch_size, device=torch.device('cuda', args.gpu), train=True,repeat=False, sort=False, shuffle=True, sort_within_batch=False)dev_iter = data.Iterator(dev, batch_size=args.batch_size, device=torch.device('cuda', args.gpu), train=False,repeat=False, sort=False, shuffle=False, sort_within_batch=False) 復制代碼

Floor division: Python Arithmetic Operators -- // The division of operands where the result is the quotient in which the digits after the decimal point are removed. But if one of the operands is negative, the result is floored, i.e., rounded away from

print(9//4) print(-11//3) 復制代碼

output: 2 -4

轉載于:https://juejin.im/post/5d3d8157f265da1ba84ada19

總結

以上是生活随笔為你收集整理的学习Knowledge Graph Embedding Based Question Answering代码笔记的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： [bzoj4034]树上操作
下一篇： 54. Spiral Matrix