日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

css餐厅_餐厅的评分预测

發布時間:2023/12/15 编程问答 28 豆豆
生活随笔 收集整理的這篇文章主要介紹了 css餐厅_餐厅的评分预测 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

css餐廳

描述 (Description)

Restaurants from all over the world can be found here in Bengaluru. From United States to Japan, Russia to Antarctica, you get all type of cuisines here. Delivery, Dine-out, Pubs, Bars, Drinks,Buffet, Desserts you name it and Bengaluru has it. The number of restaurants are increasing day by day. Currently which stands at approximately 12,000 restaurants. With such a high number of restaurants. This industry hasn’t been saturated yet. And new restaurants are opening every day. However it has become difficult for them to compete with already established restaurants. The key issues that continue to pose a challenge to them include high real estate costs, rising food costs, shortage of quality manpower, fragmented supply chain and over-licensing. This Zomato data aims at analyzing demography of the location. Most importantly it will help new restaurants in deciding their theme, menus, cuisine, cost etc for a particular location. It also aims at finding similarity between neighborhoods of Bengaluru on the basis of food. The dataset also contains reviews for each of the restaurant which will help in finding overall rating for the place.

班加羅爾(Bengaluru)遍布世界各地的餐廳。 從美國到日本,從俄羅斯到南極洲,您可以在這里找到所有類型的美食。 送貨,外出就餐,酒吧,酒吧,飲料,自助餐,甜品,您自己命名,班加羅爾擁有。 飯店的數量每天都在增加。 目前擁有約12,000家餐廳。 擁有如此眾多的餐廳。 這個行業還沒有飽和。 新餐廳每天都在營業。 然而,與已建立的餐館競爭已經變得困難。 繼續對他們構成挑戰的關鍵問題包括高昂的房地產成本,不斷上漲的食品成本,缺乏優質的人力,供應鏈分散和許可過度。 該Zomato數據旨在分析該位置的人口統計學。 最重要的是,它將幫助新餐廳確定特定位置的主題,菜單,美食,成本等。 它還旨在根據食物發現班加羅爾居民區之間的相似性。 數據集還包含每個餐廳的評論,這將有助于查找該地點的總體評分。

The basic idea of analyzing the Zomato dataset is to get a fair idea about the factors affecting the establishment of different types of restaurant at different places in Bengaluru, aggregate rating of each restaurant, Bengaluru being one such city has more than 12,000 restaurants with restaurants serving dishes from all over the world. With each day new restaurants opening the industry hasn’t been saturated yet and the demand is increasing day by day. Inspite of increasing demand it however has become difficult for new restaurants to compete with established restaurants. Most of them serving the same food. Bengaluru being an IT capital of India. Most of the people here are dependent mainly on the restaurant food. as they don’t have time to cook for themselves. With such an overwhelming demand of restaurants it has therefore become important to study the demography of a location. What kind of a food is more popular in a locality. Does the entire locality loves vegetarian food. If yes then is that locality populated by a particular sect of people for eg. Jain, Marwaris, Gujaratis who are mostly vegetarian.

分析Zomato數據集的基本思想是,對影響班加羅爾不同地區不同類型餐廳的建立因素,每個餐廳的綜合評分產生一個公平的想法,班加羅爾是一個這樣的城市,擁有超過12,000家餐廳,并提供餐廳服務來自世界各地的菜肴。 每天都有新餐廳開業,該行業尚未飽和,需求日益增加。 盡管需求增加,但是新餐廳很難與老店競爭。 他們大多數人都提供相同的食物。 班加羅爾是印度的IT之都。 這里的大多數人主要依靠餐廳的食物。 因為他們沒有時間自己做飯。 由于餐館的需求如此巨大,因此研究場所的人口統計學變得很重要。 哪種食物在當地更受歡迎。 整個地方都喜歡素食嗎? 如果是,那么該地點是否由某個特定人群組成,例如。 in那教,馬瓦里斯和古吉拉特人大多是素食主義者。

Objective: Design a machine learning model to predict the rating of the restaurants which accepts the order from zomato.

目標:設計一個機器學習模型來預測接受zomato訂單的餐館的評級。

Prerequisites: This post assumes familiarity with machine learning basic concepts like Linear Regression, Decision Trees, Random Forest, Gradient Boosted Decision Trees, One vs Rest classifiers, Multicollinearity, Model based imputations, CNN, CNN-LSTM, hyperparamter tuning, mean squared error.

先決條件:這篇文章假設您熟悉機器學習的基本概念,例如線性回歸,決策樹,隨機森林,梯度提升決策樹,One vs Rest分類器,多重共線性,基于模型的估算,CNN,CNN-LSTM,超參數調整,均方誤差。

指數: (INDEX:)

  • Reading Data- Reading the csv file and storing into a dataframe

    讀取數據-讀取csv文件并將其存儲到數據幀中
  • Missing Value imputation-Using model based, mean based and frequency based imputations replace NULL values.

    缺失值插補-使用基于模型,基于均值和基于頻率的插補替換NULL值。
  • Exploratory Data Analysis- Graph plots like pieplot, counterplot and barplot

    探索性數據分析-圖形圖,例如餅圖,反圖和條形圖
  • Data Preprocessing- Removing stopwords and unnecessary characters from the the text data

    數據預處理-從文本數據中刪除停用詞和不必要的字符
  • Vectorization- Used countervectorizer, tfidfvectorizer and normlizer to vectorize the data

    向量化-使用反向量化器,tfidfvectorizer和normlizer來向量化數據
  • Building models- Building different machine learning and deep learning models.

    構建模型-構建不同的機器學習和深度學習模型。
  • Dataset Overview: Each row contains a click record, with the following features.

    數據集概述:每行包含一個具有以下功能的點擊記錄。

    -> url: contains the url of the restaurant in the zomato website

    ->網址:包含zomato網站中餐廳的網址

    -> address: contains the address of the restaurant in Bengaluru

    ->地址:包含班加羅爾餐廳的地址

    -> name: contains the name of the restaurant

    ->名稱:包含餐廳名稱

    -> online_order: whether online ordering is available in the restaurant or not

    -> online_order:餐廳是否提供在線訂購

    -> book_table: table book option available or not

    -> book_table:桌面書選項是否可用

    -> votes: contains total number of rating for the restaurant as of the above mentioned date

    ->票數:包含截至上述日期的餐廳的評分總數

    -> phone: contains the phone number of the restaurant

    ->電話:包含餐廳的電話號碼

    -> location: contains the neighborhood in which the restaurant is located

    ->位置:包含餐廳所在的社區

    -> rest_type: restaurant type like Quick Bytes, Casual Bytes.

    -> rest_type:餐廳類型,例如“快速字節”,“休閑字節”。

    -> dish_liked: dishes people liked in the restaurant

    -> dish_liked:人們在餐廳喜歡的菜肴

    -> cuisines: food styles, separated by comma

    ->美食:用逗號分隔的美食風格

    -> approx_cost(for two people): contains the approximate cost for meal for two people

    ->大約費用(兩個人):包含兩個人的大概用餐費用

    -> reviews_list: list of tuples containing reviews for the restaurant, each tuple consists of two values, rating and review by the customer

    -> reviews_list:包含餐廳評論的元組列表,每個元組包含兩個值,即顧客的評價和評論

    -> menu_item: contains list of menus available in the restaurant

    -> menu_item:包含餐廳中可用菜單的列表

    -> listed_in(type): type of meal

    ->列名(類型):用餐類型

    -> listed_in(city): contains the neighborhood in which the restaurant is listed

    ->列表中的(城市):包含列出餐廳的社區

    Real-world/Business objectives and constraints:

    現實世界/業務目標和約束條件:

    ->No strict latency requirement.

    ->沒有嚴格的延遲要求。

    -> Interpretability is not important.

    ->可解釋性并不重要。

    Performance Metrics: Hence this is a regression problem so our performance metrics is Mean Squared Error. We will try to reduce the MSE Value as much as possible.

    績效指標:因此這是一個回歸問題,因此我們的績效指標為均方誤差。 我們將嘗試盡可能降低MSE值。

    讀取資料 (Reading Data)

    (51717 , 17)

    This dataset has 51717 rows and 17 columns.

    該數據集具有51717行和17列。

    檢查每個功能的NULL值百分比 (Checking for percentage of NULL values for each features)

    填寫缺失值 (Filling the Missing values)

    We are using 3 different approaches to fill the missing values ie. model based imputation, mean based and frequency based imputation

    我們正在使用3種不同的方法來填充缺失值,即。 基于模型的歸因,基于均值和基于頻率的歸因

    i. Model Based Imputation method : In order to fill the missing values of the columns “rate” and “dish_liked”, we are using model based imputation

    一世。 基于模型的插補方法:為了填充“ rate”和“ dish_liked”列的缺失值,我們使用基于模型的插補

    Initially we’ve divided the original dataframe into 2 different dataframes. First dataframe containing no null values and second dataframe containing only null values. we’ve build the model using the first dataframe and find the missing values of second dataframe.

    最初,我們將原始數據幀分為2個不同的數據幀。 第一個數據幀不包含空值,第二個數據幀僅包含空值。 我們使用第一個數據框構建模型,然后查找第二個數據框的缺失值。

    Here is the model to predict the missing values of “dish_liked” column

    這是預測“ dish_liked”列缺失值的模型

    array([‘Murgh Ghee Roast, Egg Fried Rice, Thali, Mutton Biryani, Naan, Andhra Meal’, ‘Pizza, Mocktails, Coffee, Nachos, Salad, Pasta, Sandwiches’, ‘Pizza, Potato Wedges, Country Feast, Pasta, Garlic Bread, Lemonade’, …, ‘Ferrero Rocher Cake, Chocolate Cake’, ‘Ferrero Rocher Cake, Chocolate Cake’, ‘Ferrero Rocher Cake, Chocolate Cake’], dtype=’<U134')

    數組(['Murgh酥油烤,雞蛋炒飯,塔利,羊肉Biryani,Naan,安得拉餐”,“比薩餅,雞尾酒,咖啡,玉米片,沙拉,面食,三明治”,“比薩餅,土豆楔子,鄉村盛宴,面食,蒜味面包,檸檬水”,…,“費雷羅羅切蛋糕,巧克力蛋糕”,“費雷羅羅切蛋糕,巧克力蛋糕”,“費雷羅羅切蛋糕,巧克力蛋糕”],dtype ='<U134')

    Here is the model to predict the missing values of “rate” column

    這是預測“比率”列缺失值的模型

    array([3.47331694, 3.48851577, 3.44792981, …, 3.58956974, 3.58956974, 3.58956974])

    數組([3.47331694、3.48851577、3.44792981,...,3.58956974、3.58956974、3.58956974])

    ii. Mean based imputation to find missing Values

    ii。 基于均值的估算以找到缺失值

    iii. Frequency based approach to find missing values

    iii。 基于頻率的方法來查找缺失值

    We have checked the most frequently occouring values for these columns and replaced the missing values of columns with the most frequent occouring value.

    我們檢查了這些列的最頻繁出現值,并用最頻繁出現值替換了列的缺失值。

    探索性數據分析 (Exploratory Data Analysis)

    一世。 餐廳位置分析 (i. Analysis on Location of restaurant)

    Fig-1圖。1

    Conclusion- There is a variation in restaurants as per the locations. BTM has the highest number of the restaurants in Bangalore that 3108 restaurants. New BEL Road contains the least number of restaurants followed by banashankari. Btm has 17.24% of the total restaurants in bangalore

    結論 -餐館因地點而異。 BTM在班加羅爾的餐廳數量最多,為3108家。 新的BEL路的餐廳數量最少,其次是banashankari。 Btm在班加羅爾的餐廳總數中占17.24%

    ii。 網上訂單分析 (ii. Analysis on online_order)

    Conclusion- Number of restaurants that allows online order are more than those restaurants who don’t allows online order. There are 29342 restaurants in bangalore which are accepting the online orders and 20098 restaurants which don’t accepts the online order. There are 59.65% of restaurants that allows online ordering

    結論 -允許在線下單的餐廳數量要多于不允許在線下單的餐廳。 班加羅爾有29342家餐廳接受在線訂單,而20098家餐廳不接受在線訂單。 有59.65%的餐廳允許在線訂購

    iii。 評級分析 (iii. Analysis on ratings)

    Conclusion- Majority of restaurants has ratings between 3.6 to 3.9. 15% of the restaurants have an approx rating of 3.7 . Minimum rating for the restaurants is 1.8 . There is not even a single restaurant in bangalore where rating is equal to 5.

    結論-大多數餐廳的評分在3.6到3.9之間。 15%的餐廳的評分大約為3.7。 餐廳的最低評分是1.8。 班加羅爾甚至沒有一家餐廳的評分等于5。

    iv。 分析各餐廳的店鋪數 (iv. Analysis on number of stores for each restaurants)

    Conclusion- There is a variation in the number of stores in bangalore. CCD has maximum number of stores in bangalore followed by onesta and just bake. There are various restaurants that are having only 1 stores such as SV Juice Corner Tiffin, Brown box etc. The total no. of stores of CCD composed of 9.26 % of the entire stores present in bangalore

    結論-班加羅爾的商店數量有所不同。 CCD在班加羅爾擁有最多的商店,其次是onesta,然后烘烤。 SV Juice Corner Tiffin,Brown box等各種各樣的餐館只有1家門店。 CCD商店的數量占班加羅爾全部商店的9.26%

    v。餐廳分析允許預訂餐桌 (v. Analysis on Restaurants allows booking of tables)

    Conclusion- There are 43120 restaurants that are accepting the booking of table and 6320 restaurants that are not accepting the booking of table. Majority of restaurants may be street food type restaurant as it is not allowing booking of table. 87.22% of the restaurants are not allowing the booking of tables

    結論 -有43120家餐廳接受餐桌預訂,而6320家餐廳不接受餐桌預訂。 大多數餐廳可能是街頭食品類型的餐廳,因為它不允許預訂餐桌。 87.22%的餐廳不允許預訂餐桌

    vi。 大多數餐廳出售的美食類型 (vi. Types of cuisines sold by most of the restaurants)

    Conclusion — North indian and chinese are the two most sold cuisines in bangalore. Number of restaurants where north indian cuisine is available is close to 20,000 and number of restaurants where chinese food is available is close to 14,000.

    結論 —印度北部和中國是班加羅爾最暢銷的兩種美食。 提供北印度美食的餐廳數量接近20,000,提供中餐的餐廳數量接近14,000。

    七。 班加羅爾人民喜歡的物品 (vii. Items liked by peoples in Bangalore)

    Conclusion- Biryani is the most liked dish by the peoples of bangalore. There are around 12000 restaurants where biryani is one of the most famous recipe. Chicken is the second most famous dish liked in bangalore

    結論 -Biryani是班加羅爾人民最喜歡的菜。 大約有12000家餐廳,其中Biryani是最著名的食譜之一。 雞肉是班加羅爾第二受歡迎的美食

    八。 餐飲成本分析 (viii. Analysis on cost of dining)

    Conclusion- Majority of restaurants in bangalore has average cost for 2 person is 561. The minimum cost for the dining is 40 and maximum cost is 6000. It concludes that there are all sorts of food at different prices are available in bangalore

    結論 -班加羅爾的大多數餐館的2人平均消費為561。最低用餐成本為40,最高消費為6000。得出的結論是,班加羅爾提供了各種價格不同的食物

    ix。 投票分析 (ix. Analysis on votes)

    Conclusion The restaurants in Bangalore has an average vote of 296.76 . Minimum vote for the restaurant is 0 and the maximum votes are 16832. Very few restaurants in bangalore has no. of votes greater than 1700

    結論班加羅爾的餐館平均投票率為296.76。 該餐廳的最低投票為0,最高投票為16832。班加羅爾極少數餐廳沒有。 大于1700的選票

    X。 餐館評分與online_order評分 (x. Rating of restaurants vs online_order)

    Conclusion — Only for those restaurants whose rating is 3.7, the number of restaurants accepting online order is more than the restaurants who don’t accepts the online order. For all the restaurants (whose rating is other than 3.7), there are more no. of restaurants that accepts online order rather than the restaurants who don’t accepts the online order.

    結論 —僅對于那些評級為3.7的餐廳,接受在線訂單的餐廳數量要多于不接受在線訂單的餐廳。 對于所有餐廳(評級不為3.7),沒有更多。 接受在線訂單的餐廳,而不是不接受在線訂單的餐廳。

    。 餐廳類型 (xi. Type of restaurant)

    Conclusion — Around 50% of the restaurants in bangalore belongs to the delivery type of restaurants. The least type of restaurants in bangalore belongs to pubs and bars, buffet, drinks and nightlife. Also there are lot of restaurants (34%) which allows dine-out service. In total there are 24728 restaurants that belongs to delivery type. The number of Pubs and bar is 669 which the minimum among all the types of restaurants

    結論 —班加羅爾大約50%的餐館屬于餐館的交付類型。 班加羅爾最少的餐館類型是酒吧,自助餐,飲料和夜生活。 也有很多餐廳(34%)提供外出就餐服務。 總共有24728家屬于交付類型的餐廳。 酒吧和酒吧的數量是669,在所有類型的餐廳中最少

    十二。 配對圖 (xii. Pairplots)

    Conclusion from this pairplot

    該對圖的結論

  • In the plot of votes vs rate, most of restaurants having higher no. of votes has better ratings also

    在票數與比率的關系圖中,大多數餐館的門檻都較高。 的投票也有更好的評分
  • In the plot of approx_cost vs rate, the restaurant whose rating is high has more price.

    在roximate_cost vs rate圖中,評分較高的餐廳的價格更高。
  • In the graph of rate vs cost, rate vs votes, the data points are linearly separable

    在費率與成本,費率與投票的關系圖中,數據點是線性可分離的
  • EDA摘要 (EDA Summary)

    • BTM alone has 3108 restaurants which is the highest number of Restaurants in Bangalore as compared to any other location. BEL has the least Number of restaurants ie. 725. Number of restaurants in BTM comprise of 17% of total restaurants.

      僅BTM就有3108家餐廳,這是班加羅爾餐廳數量最多的餐廳。 BEL的飯店數量最少 725. BTM的餐廳數量占餐廳總數的17%。
    • The number of restaurants that takes online order is more than those which don’t accepts online order. There are more 29342 restaurants that are accepting online orders and there are 20098 restaurants that are not accepting online order

      接受在線訂購的餐廳數量要多于不接受在線訂購的餐廳數量。 有更多29342家餐廳接受在線訂單,還有20098家餐廳不接受在線訂單
    • There is a variation in ratings of restaurants between 1.8 to 4.9. The average rating of restaurants is 3.7.

      飯店的評分在1.8到4.9之間變化。 餐館的平均評分是3.7。
    • CCD has 93 stores in bangalore which the highest number of stores for any restaurant in bangalore followed by onesta having 85 restaurants.

      CCD在班加羅爾擁有93家商店,這是班加羅爾所有餐廳中商店數量最多的,其次是onesta擁有85家餐廳。
    • There are 43120 restaurants that are accepting the booking of table and 6320 restaurants that are not accepting the booking of table. Majority of restaurants may be street food type restaurant as it is not allowing booking of table

      有43120家餐廳接受餐桌預訂,而6320家餐廳不接受餐桌預訂。 大多數餐廳可能是街頭食品類型的餐廳,因為它不允許預訂餐桌
    • North Indian, Chinese and South indian are the top 3 cuisines available in the most of restaurants.

      大多數餐廳都提供北印度,中國和南印度三大美食。
    • Chicken is the most liked dish by the peoples of bangalore followed by Biryani and rice.

      雞肉是班加羅爾人民最喜歡的菜,其次是比里亞尼和米飯。
    • The average cost of restaurants for the dining is 561. Minimum cost is 40 and max cost is 4000. Overall, 87.22% of the restaurants are not allowing the booking of tables

      用餐的餐廳的平均費用為561。最低費用為40,最高費用為4000。總體而言,有87.22%的餐廳不允許預訂餐桌
    • Only for those restaurants whose rating is 3.7, the number of restaurants accepting online order is more than the restaurants who don’t accepts the online order. For all the other restaurants (whose rating is other than 3.7), there are more no. of restaurants that accepts online order rather than the restaurants who don’t accepts the online order.

      僅對于那些評級為3.7的餐廳,接受在線訂單的餐廳數量要多于不接受在線訂單的餐廳。 對于其他所有餐廳(評級不為3.7),沒有更多。 接受在線訂單的餐廳,而不是不接受在線訂單的餐廳。
    • Around 50% of the restaurants in bangalore belongs to the delivery type of restaurants. The least type of restaurants in bangalore belongs to pubs and bars, buffet, drinks and nightlife. Also there are lot of restaurants (34%) which allows dine-out service. In total there are 24728 restaurants that belongs to delivery type. The number of Pubs and bar is 669 which the minimum among all the types of restaurants

      班加羅爾大約50%的餐廳屬于餐廳的外賣類型。 班加羅爾最少的餐館類型是酒吧,自助餐,飲料和夜生活。 也有很多餐廳(34%)提供外出就餐服務。 總共有24728家屬于交付類型的餐廳。 酒吧和酒吧的數量是669,在所有類型的餐廳中最少
    • The maximum no. restaurants that allows table booking has an average rating of 4.2 . The maximum number of restaurants, which don’t allows table booking has an average rating of 3.7 . Irrespective of ratings, the number of restaurants that allows booking of tables are less than the restaurants which don;t allows that.

      最大編號 允許訂餐的餐館的平均評分為4.2分。 不允許進行餐桌預訂的餐廳數量最多,平均評分為3.7。 與等級無關,允許預訂餐桌的餐廳數量少于不允許的餐廳數量。

    檢查多重共線性 (Checking for multicollinearity)

    Defining a function to check multicollinearity using vif method

    定義使用vif方法檢查多重共線性的函數

    Using label encoding as shown below

    使用標簽編碼,如下所示

    Conclusion — Hence by analyzing the vif values, we can conclude that there is no multicollinearity between any independent variables because the vif values are very small for each of the independent variables.

    結論 —因此,通過分析vif值,我們可以得出結論,因為每個自變量的vif值都非常小,因此任何自變量之間都沒有多重共線性。

    特征工程 (Feature Engineering)

  • Total No. of cuisines available in each of the restaurant

    每間餐廳提供的美食總數
  • 2. Total number of dishes liked by the customers. It may be directly proportional to the rating

    2.顧客喜歡的菜肴總數。 它可能與等級成正比

    3. Facilities offered by restaurants : there are 2 major facilities that a restaurant can provide is online order and booking tables. so, here we are summing both of them to find the overall quality of service by the restaurant.

    3.餐館提供的設施:餐館可以提供的兩種主要設施是在線訂購和預訂表。 因此,在這里我們對兩者進行匯總,以找到餐廳的整體服務質量。

    4. This function is used to convert categorical features into response coded features. It simply perform MEAN VALUE REPLACEMENT.

    4.此功能用于將分類特征轉換為響應編碼的特征。 它只是執行MEAN VALUE REPLACEMENT。

    功能工程摘要 (Feature Engineering Summary)

  • Mean value replacement for dish_liked — Here, first we have done response coding followed by mean value replacement for dish_liked column. We found its value is almost similar to the rate column

    “ disish_liked”的均值替換—在這里,首先,我們完成了響應編碼,然后“ discount_liked”列的均值替換。 我們發現它的值幾乎與“匯率”列相似
  • Mean value replacement for cuisines — Here also, first we have done response coding followed by mean value replacement for cuisines column.

    菜式的均值替換—在這里,首先,我們完成了響應編碼,然后是菜式的均值替換列。
  • Number of cuisines available- This column contains the total number of cuisines available in each restaurants

    提供的美食數量-此列包含各餐廳提供的美食總數
  • Number of dish_liked — This column contains the total number of dishes liked by the customers in each restaurants.

    dish_liked的數量—此列包含每個餐廳的顧客喜歡的菜肴總數。
  • Facilities offered — If the restaurant is allowing both online_order and booking_table, then we have given the facilities offered values as 2. If restaurant is allowing either of the them, then we’ve given the values as 1. If the restaurant is not allowing any of the facilities, then we’ve given the value as 0.

    提供的設施-如果餐廳同時允許online_order和booking_table,則我們給設施提供的值是2。如果餐廳允許這兩個設施中的任何一個,那么我們給的值就是1。如果餐廳不允許任何值的設施,那么我們將值設為0。
  • 特征的預處理 (Preprocessing of Features)

    We are removing the stopwords and other special characters that are not essential from the column of preprocessed_reviews. Finally we are replacing the original column of review with the preprocessed_review column.

    我們將從preprocessed_reviews列中刪除不需要的停用詞和其他特殊字符。 最后,我們用preprocessed_review列替換了原始的評論列。

    向量化 (Vectorization)

    Here we are using countvectorizer for categorical features, tfidf for text features and normalizer for numerical features.

    在這里,我們將countvectorizer用于分類特征,將tfidf用于文本特征,將normalizer用于數字特征。

    Countvectorizer for categorical feature :

    Countvectorizer用于分類功能:

    [‘no’, ‘yes’] Shape of training dataset one hot encoding & corresponding class label (23215, 2) (23215,) Shape of cv dataset one hot encoding & corresponding class label (11435, 2) (11435,) Shape of test dataset one hot encoding & corresponding class label (17067, 2) (17067,)

    ['no','yes']訓練數據集的形狀一個熱編碼和相應的類別標簽(23215,2)(23215,)cv數據集的形狀一個熱編碼和相應的類別標簽(11435,2)(11435,)形狀測試數據集的一種熱編碼和相應的類標簽(17067,2)(17067,)

    Normalizer for numerical feature :

    數值特征的歸一化器:

    Tfidf for text features :

    Tfidf的文字功能:

    隨機森林算法的超參數調整 (Hyperparamter tuning for Random forest algorithm)

    Here we are trying to find the best value of n_estimators and max_depth which provides the minimum mse value for the regression model

    在這里,我們試圖找到n_estimators和max_depth的最佳值,從而為回歸模型提供最小的mse值

    RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion=’mse’, max_depth=None, max_features=’auto’, max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False, random_state=None, verbose=0, warm_start=False)

    RandomForestRegressor(bootstrap = True,ccp_alpha = 0.0,條件='mse',max_depth = None,max_features ='auto',max_leaf_nodes = None,max_samples = None,min_impurity_decrease = 0.0,min_impurity_split = None,min_samples_leaf = 1, min_weight_fraction_leaf = 0.0,n_estimators = 100,n_jobs = None,oob_score = False,random_state = None,verbose = 0,warm_start = False)

    應用具有最佳超參數的隨機森林模型 (Applying Random forest model with best hyperparameters)

    0.027927709527412244

    0.027927709527412244

    深度學習模型: (Deep learning models:)

    Now, we’ve used few deep learning models to predict the accuracy of the model. we’ve used lstm, lstm-cnn and cnn with conv1d. Although in this problem, the machine learning model are performing better as compared to deep learning models.

    現在,我們使用了很少的深度學習模型來預測模型的準確性。 我們在conv1d中使用了lstm,lstm-cnn和cnn。 盡管存在此問題,但與深度學習模型相比,機器學習模型的性能更好。

    Finally, we are comparing the mse values of all the models that we’ve build for predicting the ratings

    最后,我們將比較用于預測收視率的所有模型的mse值

    翻譯自: https://medium.com/@kishanraj_16649/rating-prediction-of-restaurants-b51afd857e30

    css餐廳

    總結

    以上是生活随笔為你收集整理的css餐厅_餐厅的评分预测的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。