日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【TensorFlow】Keras机器学习基础知识-使用TF.Hub进行文本分类

發(fā)布時(shí)間:2023/12/14 编程问答 29 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【TensorFlow】Keras机器学习基础知识-使用TF.Hub进行文本分类 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

此筆記本(notebook)使用評論文本將影評分為積極(positive)或消極(nagetive)兩類。這是一個(gè)二元(binary)或者二分類問題,一種重要且應(yīng)用廣泛的機(jī)器學(xué)習(xí)問題。

本教程演示了使用 Tensorflow Hub 和 Keras 進(jìn)行遷移學(xué)習(xí)的基本應(yīng)用。

我們將使用來源于網(wǎng)絡(luò)電影數(shù)據(jù)庫(Internet Movie Database)的 IMDB 數(shù)據(jù)集(IMDB dataset),其包含 50,000 條影評文本。從該數(shù)據(jù)集切割出的 25,000 條評論用作訓(xùn)練,另外 25,000 條用作測試。訓(xùn)練集與測試集是平衡的(balanced),意味著它們包含相等數(shù)量的積極和消極評論。

此筆記本(notebook)使用了 tf.keras,它是一個(gè) Tensorflow 中用于構(gòu)建和訓(xùn)練模型的高級API,此外還使用了 TensorFlow Hub,一個(gè)用于遷移學(xué)習(xí)的庫和平臺。有關(guān)使用 tf.keras 進(jìn)行文本分類的更高級教程,請參閱 MLCC文本分類指南(MLCC Text Classification Guide)。

from __future__ import absolute_import, division, print_function, unicode_literalsimport numpy as np import tensorflow as tf import tensorflow_hub as hub import tensorflow_datasets as tfdsprint("Version: ", tf.__version__) print("Eager mode: ", tf.executing_eagerly()) print("Hub version: ", hub.__version__) print("GPU is", "available" if tf.config.experimental.list_physical_devices("GPU") else "NOT AVAILABLE") Version: 2.0.0 Eager mode: True Hub version: 0.6.0 GPU is available

1 下載 IMDB 數(shù)據(jù)集

IMDB數(shù)據(jù)集可以在 Tensorflow 數(shù)據(jù)集處獲取。以下代碼將 IMDB 數(shù)據(jù)集下載至您的機(jī)器中:

# 將訓(xùn)練集按照 6:4 的比例進(jìn)行切割,從而最終我們將得到 15,000 # 個(gè)訓(xùn)練樣本, 10,000 個(gè)驗(yàn)證樣本以及 25,000 個(gè)測試樣本 train_validation_split = tfds.Split.TRAIN.subsplit([6, 4]) (train_data, validation_data), test_data = tfds.load(name="imdb_reviews", split=(train_validation_split, tfds.Split.TEST),as_supervised=True) Downloading and preparing dataset imdb_reviews (80.23 MiB) to /home/kbuilder/tensorflow_datasets/imdb_reviews/plain_text/0.1.0...HBox(children=(IntProgress(value=1, bar_style='info', description='Dl Completed...', max=1, style=ProgressStyl… HBox(children=(IntProgress(value=1, bar_style='info', description='Dl Size...', max=1, style=ProgressStyle(des…HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))HBox(children=(IntProgress(value=0, description='Shuffling...', max=10, style=ProgressStyle(description_width=… WARNING:tensorflow:From /home/kbuilder/.local/lib/python3.6/site-packages/tensorflow_datasets/core/file_format_adapter.py:209: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version. Instructions for updating: Use eager execution and: `tf.data.TFRecordDataset(path)`WARNING:tensorflow:From /home/kbuilder/.local/lib/python3.6/site-packages/tensorflow_datasets/core/file_format_adapter.py:209: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version. Instructions for updating: Use eager execution and: `tf.data.TFRecordDataset(path)`HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))HBox(children=(IntProgress(value=0, description='Shuffling...', max=10, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))HBox(children=(IntProgress(value=0, description='Shuffling...', max=20, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des… HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=… Dataset imdb_reviews downloaded and prepared to /home/kbuilder/tensorflow_datasets/imdb_reviews/plain_text/0.1.0. Subsequent calls will reuse this data.

2 探索數(shù)據(jù)

讓我們花一點(diǎn)時(shí)間來了解數(shù)據(jù)的格式。每一個(gè)樣本都是一個(gè)表示電影評論和相應(yīng)標(biāo)簽的句子。該句子不以任何方式進(jìn)行預(yù)處理。標(biāo)簽是一個(gè)值為 0 或 1 的整數(shù),其中 0 代表消極評論,1 代表積極評論。

我們來打印下前十個(gè)樣本。

train_examples_batch, train_labels_batch = next(iter(train_data.batch(10))) train_examples_batch <tf.Tensor: id=219, shape=(10,), dtype=string, numpy= array([b"As a lifelong fan of Dickens, I have invariably been disappointed by adaptations of his novels.<br /><br />Although his works presented an extremely accurate re-telling of human life at every level in Victorian Britain, throughout them all was a pervasive thread of humour that could be both playful or sarcastic as the narrative dictated. In a way, he was a literary caricaturist and cartoonist. He could be serious and hilarious in the same sentence. He pricked pride, lampooned arrogance, celebrated modesty, and empathised with loneliness and poverty. It may be a clich\xc3\xa9, but he was a people's writer.<br /><br />And it is the comedy that is so often missing from his interpretations. At the time of writing, Oliver Twist is being dramatised in serial form on BBC television. All of the misery and cruelty is their, but non of the humour, irony, and savage lampoonery. The result is just a dark, dismal experience: the story penned by a journalist rather than a novelist. It's not really Dickens at all.<br /><br />'Oliver!', on the other hand, is much closer to the mark. The mockery of officialdom is perfectly interpreted, from the blustering beadle to the drunken magistrate. The classic stand-off between the beadle and Mr Brownlow, in which the law is described as 'a ass, a idiot' couldn't have been better done. Harry Secombe is an ideal choice.<br /><br />But the blinding cruelty is also there, the callous indifference of the state, the cold, hunger, poverty and loneliness are all presented just as surely as The Master would have wished.<br /><br />And then there is crime. Ron Moody is a treasure as the sleazy Jewish fence, whilst Oliver Reid has Bill Sykes to perfection.<br /><br />Perhaps not surprisingly, Lionel Bart - himself a Jew from London's east-end - takes a liberty with Fagin by re-interpreting him as a much more benign fellow than was Dicken's original. In the novel, he was utterly ruthless, sending some of his own boys to the gallows in order to protect himself (though he was also caught and hanged). Whereas in the movie, he is presented as something of a wayward father-figure, a sort of charitable thief rather than a corrupter of children, the latter being a long-standing anti-semitic sentiment. Otherwise, very few liberties are taken with Dickens's original. All of the most memorable elements are included. Just enough menace and violence is retained to ensure narrative fidelity whilst at the same time allowing for children' sensibilities. Nancy is still beaten to death, Bullseye narrowly escapes drowning, and Bill Sykes gets a faithfully graphic come-uppance.<br /><br />Every song is excellent, though they do incline towards schmaltz. Mark Lester mimes his wonderfully. Both his and my favourite scene is the one in which the world comes alive to 'who will buy'. It's schmaltzy, but it's Dickens through and through.<br /><br />I could go on. I could commend the wonderful set-pieces, the contrast of the rich and poor. There is top-quality acting from more British regulars than you could shake a stick at.<br /><br />I ought to give it 10 points, but I'm feeling more like Scrooge today. Soak it up with your Christmas dinner. No original has been better realised.",b"Oh yeah! Jenna Jameson did it again! Yeah Baby! This movie rocks. It was one of the 1st movies i saw of her. And i have to say i feel in love with her, she was great in this move.<br /><br />Her performance was outstanding and what i liked the most was the scenery and the wardrobe it was amazing you can tell that they put a lot into the movie the girls cloth were amazing.<br /><br />I hope this comment helps and u can buy the movie, the storyline is awesome is very unique and i'm sure u are going to like it. Jenna amazed us once more and no wonder the movie won so many awards. Her make-up and wardrobe is very very sexy and the girls on girls scene is amazing. specially the one where she looks like an angel. It's a must see and i hope u share my interests",b"I saw this film on True Movies (which automatically made me sceptical) but actually - it was good. Why? Not because of the amazing plot twists or breathtaking dialogue (of which there is little) but because actually, despite what people say I thought the film was accurate in it's depiction of teenagers dealing with pregnancy.<br /><br />It's NOT Dawson's Creek, they're not graceful, cool witty characters who breeze through sexuality with effortless knowledge. They're kids and they act like kids would. <br /><br />They're blunt, awkward and annoyingly confused about everything. Yes, this could be by accident and they could just be bad actors but I don't think so. Dermot Mulroney gives (when not trying to be cool) a very believable performance and I loved him for it. Patricia Arquette IS whiny and annoying, but she was pregnant and a teenagers? The combination of the two isn't exactly lavender on your pillow. The plot was VERY predictable and but so what? I believed them, his stress and inability to cope - her brave, yet slightly misguided attempts to bring them closer together. I think the characters, acted by anyone else, WOULD indeed have been annoying and unbelievable but they weren't. It reflects the surreality of the situation they're in, that he's sitting in class and she walks on campus with the baby. I felt angry at her for that, I felt angry at him for being such a child and for blaming her. I felt it all.<br /><br />In the end, I loved it and would recommend it.<br /><br />Watch out for the scene where Dermot Mulroney runs from the disastrous counselling session - career performance.",b'This was a wonderfully clever and entertaining movie that I shall never tire of watching many, many times. The casting was magnificent in matching up the young with the older characters. There are those of us out here who really do appreciate good actors and an intelligent story format. As for Judi Dench, she is beautiful and a gift to any kind of production in which she stars. I always make a point to see Judi Dench in all her performances. She is a superb actress and a pleasure to watch as each transformation of her character comes to life. I can only be grateful when I see such an outstanding picture for most of the motion pictures made more recently lack good characters, good scripts and good acting. The movie public needs heroes, not deviant manikins, who lack ingenuity and talent. How wonderful to see old favorites like Leslie Caron, Olympia Dukakis and Cleo Laine. I would like to see this movie win the awards it deserves. Thank you again for a tremendous night of entertainment. I congratulate the writer, director, producer, and all those who did such a fine job.',b'I have no idea what the other reviewer is talking about- this was a wonderful movie, and created a sense of the era that feels like time travel. The characters are truly young, Mary is a strong match for Byron, Claire is juvenile and a tad annoying, Polidori is a convincing beaten-down sycophant... all are beautiful, curious, and decadent... not the frightening wrecks they are in Gothic.<br /><br />Gothic works as an independent piece of shock film, and I loved it for different reasons, but this works like a Merchant and Ivory film, and was from my readings the best capture of what the summer must have felt like. Romantic, yes, but completely rekindles my interest in the lives of Shelley and Byron every time I think about the film. One of my all-time favorites.',b"This was soul-provoking! I am an Iranian, and living in th 21st century, I didn't know that such big tribes have been living in such conditions at the time of my grandfather!<br /><br />You see that today, or even in 1925, on one side of the world a lady or a baby could have everything served for him or her clean and on-demand, but here 80 years ago, people ventured their life to go to somewhere with more grass. It's really interesting that these Persians bear those difficulties to find pasture for their sheep, but they lose many the sheep on their way.<br /><br />I praise the Americans who accompanied this tribe, they were as tough as Bakhtiari people.",b'Just because someone is under the age of 10 does not mean they are stupid. If your child likes this film you\'d better have him/her tested. I am continually amazed at how so many people can be involved in something that turns out so bad. This "film" is a showcase for digital wizardry AND NOTHING ELSE. The writing is horrid. I can\'t remember when I\'ve heard such bad dialogue. The songs are beyond wretched. The acting is sub-par but then the actors were not given much. Who decided to employ Joey Fatone? He cannot sing and he is ugly as sin.<br /><br />The worst thing is the obviousness of it all. It is as if the writers went out of their way to make it all as stupid as possible. Great children\'s movies are wicked, smart and full of wit - films like Shrek and Toy Story in recent years, Willie Wonka and The Witches to mention two of the past. But in the continual dumbing-down of American more are flocking to dreck like Finding Nemo (yes, that\'s right), the recent Charlie & The Chocolate Factory and eye-crossing trash like Red Riding Hood.',b"I absolutely LOVED this movie when I was a kid. I cried every time I watched it. It wasn't weird to me. I totally identified with the characters. I would love to see it again (and hope I wont be disappointed!). Pufnstuf rocks!!!! I was really drawn in to the fantasy world. And to me the movie was loooong. I wonder if I ever saw the series and have confused them? The acting I thought was strong. I loved Jack Wilde. He was so dreamy to an 10 year old (when I first saw the movie, not in 1970. I can still remember the characters vividly. The flute was totally believable and I can still 'feel' the evil woods. Witchy poo was scary - I wouldn't want to cross her path.",b'A very close and sharp discription of the bubbling and dynamic emotional world of specialy one 18year old guy, that makes his first experiences in his gay love to an other boy, during an vacation with a part of his family.<br /><br />I liked this film because of his extremly clear and surrogated storytelling , with all this "Sound-close-ups" and quiet moments wich had been full of intensive moods.<br /><br />',b"This is the most depressing film I have ever seen. I first saw it as a child and even thinking about it now really upsets me. I know it was set in a time when life was hard and I know these people were poor and the crops were vital. Yes, I get all that. What I find hard to take is I can't remember one single light moment in the entire film. Maybe it was true to life, I don't know. I'm quite sure the acting was top notch and the direction and quality of filming etc etc was wonderful and I know that every film can't have a happy ending but as a family film it is dire in my opinion.<br /><br />I wouldn't recommend it to anyone who wants to be entertained by a film. I can't stress enough how this film affected me as a child. I was talking about it recently and all the sad memories came flooding back. I think it would have all but the heartless reaching for the Prozac."],dtype=object)>

我們再打印下前十個(gè)標(biāo)簽。

train_labels_batch <tf.Tensor: id=220, shape=(10,), dtype=int64, numpy=array([1, 1, 1, 1, 1, 1, 0, 1, 1, 0])>

3 構(gòu)建模型

神經(jīng)網(wǎng)絡(luò)由堆疊的層來構(gòu)建,這需要從三個(gè)主要方面來進(jìn)行體系結(jié)構(gòu)決策:

  • 如何表示文本?
  • 模型里有多少層?
  • 每個(gè)層里有多少隱層單元(hidden units)?

本示例中,輸入數(shù)據(jù)由句子組成。預(yù)測的標(biāo)簽為 0 或 1。

表示文本的一種方式是將句子轉(zhuǎn)換為嵌入向量(embeddings vectors)。我們可以使用一個(gè)預(yù)先訓(xùn)練好的文本嵌入(text embedding)作為首層,這將具有三個(gè)優(yōu)點(diǎn):

  • 我們不必?fù)?dān)心文本預(yù)處理
  • 我們可以從遷移學(xué)習(xí)中受益
  • 嵌入具有固定長度,更易于處理

針對此示例我們將使用 TensorFlow Hub 中名為 google/tf2-preview/gnews-swivel-20dim/1 的一種預(yù)訓(xùn)練文本嵌入(text embedding)模型 。

為了達(dá)到本教程的目的還有其他三種預(yù)訓(xùn)練模型可供測試:

  • google/tf2-preview/gnews-swivel-20dim-with-oov/1 ——類似google/tf2-preview/gnews-swivel-20dim/1,但 2.5%的詞匯轉(zhuǎn)換為未登錄詞桶(OOV buckets)。如果任務(wù)的詞匯與模型的詞匯沒有完全重疊,這將會(huì)有所幫助。
  • google/tf2-preview/nnlm-en-dim50/1 ——一個(gè)擁有約 1M 詞匯量且維度為 50 的更大的模型。
  • google/tf2-preview/nnlm-en-dim128/1 ——擁有約 1M 詞匯量且維度為128的更大的模型。

讓我們首先創(chuàng)建一個(gè)使用 Tensorflow Hub 模型嵌入(embed)語句的Keras層,并在幾個(gè)輸入樣本中進(jìn)行嘗試。請注意無論輸入文本的長度如何,嵌入(embeddings)輸出的形狀都是:(num_examples, embedding_dimension)。

embedding = "https://hub.tensorflow.google.cn/google/tf2-preview/gnews-swivel-20dim/1" hub_layer = hub.KerasLayer(embedding, input_shape=[], dtype=tf.string, trainable=True) hub_layer(train_examples_batch[:3]) <tf.Tensor: id=402, shape=(3, 20), dtype=float32, numpy= array([[ 3.9819887 , -4.4838037 , 5.177359 , -2.3643482 , -3.2938678 ,-3.5364532 , -2.4786978 , 2.5525482 , 6.688532 , -2.3076782 ,-1.9807833 , 1.1315885 , -3.0339816 , -0.7604128 , -5.743445 ,3.4242578 , 4.790099 , -4.03061 , -5.992149 , -1.7297493 ],[ 3.4232912 , -4.230874 , 4.1488533 , -0.29553518, -6.802391 ,-2.5163853 , -4.4002395 , 1.905792 , 4.7512794 , -0.40538004,-4.3401685 , 1.0361497 , 0.9744097 , 0.71507156, -6.2657013 ,0.16533905, 4.560262 , -1.3106939 , -3.1121316 , -2.1338716 ],[ 3.8508697 , -5.003031 , 4.8700504 , -0.04324996, -5.893603 ,-5.2983093 , -4.004676 , 4.1236343 , 6.267754 , 0.11632943,-3.5934832 , 0.8023905 , 0.56146765, 0.9192484 , -7.3066816 ,2.8202746 , 6.2000837 , -3.5709393 , -4.564525 , -2.305622 ]],dtype=float32)>

現(xiàn)在讓我們構(gòu)建完整模型:

model = tf.keras.Sequential() model.add(hub_layer) model.add(tf.keras.layers.Dense(16, activation='relu')) model.add(tf.keras.layers.Dense(1, activation='sigmoid'))model.summary() Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= keras_layer (KerasLayer) (None, 20) 400020 _________________________________________________________________ dense (Dense) (None, 16) 336 _________________________________________________________________ dense_1 (Dense) (None, 1) 17 ================================================================= Total params: 400,373 Trainable params: 400,373 Non-trainable params: 0 _________________________________________________________________

層按順序堆疊以構(gòu)建分類器:

  • 第一層是 Tensorflow Hub 層。這一層使用一個(gè)預(yù)訓(xùn)練的保存好的模型來將句子映射為嵌入向量(embedding vector)。我們所使用的預(yù)訓(xùn)練文本嵌入(embedding)模型(google/tf2-preview/gnews-swivel-20dim/1)將句子切割為符號,嵌入(embed)每個(gè)符號然后進(jìn)行合并。最終得到的維度是:(num_examples, embedding_dimension)。
  • 該定長輸出向量通過一個(gè)有 16 個(gè)隱層單元的全連接層(Dense)進(jìn)行管道傳輸。
  • 最后一層與單個(gè)輸出結(jié)點(diǎn)緊密相連。使用 Sigmoid 激活函數(shù),其函數(shù)值為介于 0 與 1 之間的浮點(diǎn)數(shù),表示概率或置信水平。
  • 讓我們編譯模型。

    3.1 損失函數(shù)與優(yōu)化器

    一個(gè)模型需要損失函數(shù)和優(yōu)化器來進(jìn)行訓(xùn)練。由于這是一個(gè)二分類問題且模型輸出概率值(一個(gè)使用 sigmoid 激活函數(shù)的單一單元層),我們將使用 binary_crossentropy 損失函數(shù)。

    這不是損失函數(shù)的唯一選擇,例如,您可以選擇 mean_squared_error 。但是,一般來說 binary_crossentropy 更適合處理概率——它能夠度量概率分布之間的“距離”,或者在我們的示例中,指的是度量 ground-truth 分布與預(yù)測值之間的“距離”。

    稍后,當(dāng)我們研究回歸問題(例如,預(yù)測房價(jià))時(shí),我們將介紹如何使用另一種叫做均方誤差的損失函數(shù)。

    現(xiàn)在,配置模型來使用優(yōu)化器和損失函數(shù):

    model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

    4 訓(xùn)練模型

    以 512 個(gè)樣本的 mini-batch 大小迭代 20 個(gè) epoch 來訓(xùn)練模型。 這是指對 x_train 和 y_train 張量中所有樣本的的 20 次迭代。在訓(xùn)練過程中,監(jiān)測來自驗(yàn)證集的 10,000 個(gè)樣本上的損失值(loss)和準(zhǔn)確率(accuracy):

    history = model.fit(train_data.shuffle(10000).batch(512),epochs=20,validation_data=validation_data.batch(512),verbose=1) Epoch 1/20 30/30 [==============================] - 5s 153ms/step - loss: 0.9062 - accuracy: 0.4985 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00 Epoch 2/20 30/30 [==============================] - 4s 117ms/step - loss: 0.7007 - accuracy: 0.5625 - val_loss: 0.6692 - val_accuracy: 0.6029 Epoch 3/20 30/30 [==============================] - 4s 117ms/step - loss: 0.6486 - accuracy: 0.6379 - val_loss: 0.6304 - val_accuracy: 0.6543 Epoch 4/20 30/30 [==============================] - 4s 117ms/step - loss: 0.6113 - accuracy: 0.6866 - val_loss: 0.5943 - val_accuracy: 0.6966 Epoch 5/20 30/30 [==============================] - 3s 114ms/step - loss: 0.5764 - accuracy: 0.7176 - val_loss: 0.5650 - val_accuracy: 0.7201 Epoch 6/20 30/30 [==============================] - 3s 109ms/step - loss: 0.5435 - accuracy: 0.7447 - val_loss: 0.5373 - val_accuracy: 0.7424 Epoch 7/20 30/30 [==============================] - 3s 110ms/step - loss: 0.5132 - accuracy: 0.7723 - val_loss: 0.5080 - val_accuracy: 0.7667 Epoch 8/20 30/30 [==============================] - 3s 110ms/step - loss: 0.4784 - accuracy: 0.7943 - val_loss: 0.4790 - val_accuracy: 0.7833 Epoch 9/20 30/30 [==============================] - 3s 110ms/step - loss: 0.4440 - accuracy: 0.8172 - val_loss: 0.4481 - val_accuracy: 0.8054 Epoch 10/20 30/30 [==============================] - 3s 112ms/step - loss: 0.4122 - accuracy: 0.8362 - val_loss: 0.4204 - val_accuracy: 0.8196 Epoch 11/20 30/30 [==============================] - 3s 110ms/step - loss: 0.3757 - accuracy: 0.8534 - val_loss: 0.3978 - val_accuracy: 0.8290 Epoch 12/20 30/30 [==============================] - 3s 111ms/step - loss: 0.3449 - accuracy: 0.8685 - val_loss: 0.3736 - val_accuracy: 0.8413 Epoch 13/20 30/30 [==============================] - 3s 109ms/step - loss: 0.3188 - accuracy: 0.8798 - val_loss: 0.3570 - val_accuracy: 0.8465 Epoch 14/20 30/30 [==============================] - 3s 110ms/step - loss: 0.2934 - accuracy: 0.8893 - val_loss: 0.3405 - val_accuracy: 0.8549 Epoch 15/20 30/30 [==============================] - 3s 109ms/step - loss: 0.2726 - accuracy: 0.9003 - val_loss: 0.3283 - val_accuracy: 0.8611 Epoch 16/20 30/30 [==============================] - 3s 111ms/step - loss: 0.2530 - accuracy: 0.9079 - val_loss: 0.3173 - val_accuracy: 0.8648 Epoch 17/20 30/30 [==============================] - 3s 113ms/step - loss: 0.2354 - accuracy: 0.9143 - val_loss: 0.3096 - val_accuracy: 0.8679 Epoch 18/20 30/30 [==============================] - 3s 112ms/step - loss: 0.2209 - accuracy: 0.9229 - val_loss: 0.3038 - val_accuracy: 0.8700 Epoch 19/20 30/30 [==============================] - 3s 112ms/step - loss: 0.2037 - accuracy: 0.9287 - val_loss: 0.2990 - val_accuracy: 0.8736 Epoch 20/20 30/30 [==============================] - 3s 109ms/step - loss: 0.1899 - accuracy: 0.9349 - val_loss: 0.2960 - val_accuracy: 0.8751

    5 評估模型

    我們來看下模型的表現(xiàn)如何。將返回兩個(gè)值。損失值(loss)(一個(gè)表示誤差的數(shù)字,值越低越好)與準(zhǔn)確率(accuracy)。

    results = model.evaluate(test_data.batch(512), verbose=2) for name, value in zip(model.metrics_names, results):print("%s: %.3f" % (name, value)) 49/49 - 2s - loss: 0.3163 - accuracy: 0.8651 loss: 0.316 accuracy: 0.865

    這種十分樸素的方法得到了約 87% 的準(zhǔn)確率(accuracy)。若采用更好的方法,模型的準(zhǔn)確率應(yīng)當(dāng)接近 95%。

    總結(jié)

    以上是生活随笔為你收集整理的【TensorFlow】Keras机器学习基础知识-使用TF.Hub进行文本分类的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。