Sign in

Rating Prediction of Restaurants

Description

Restaurants from all over the world can be found here in Bengaluru. From United States to Japan, Russia to Antarctica, you get all type of cuisines here. Delivery, Dine-out, Pubs, Bars, Drinks,Buffet, Desserts you name it and Bengaluru has it. The number of restaurants are increasing day by day. Currently which stands at approximately 12,000 restaurants. With such a high number of restaurants. This industry hasn’t been saturated yet. And new restaurants are opening every day. However it has become difficult for them to compete with already established restaurants. The key issues that continue to pose a challenge to them include high real estate costs, rising food costs, shortage of quality manpower, fragmented supply chain and over-licensing. This Zomato data aims at analyzing demography of the location. Most importantly it will help new restaurants in deciding their theme, menus, cuisine, cost etc for a particular location. It also aims at finding similarity between neighborhoods of Bengaluru on the basis of food. The dataset also contains reviews for each of the restaurant which will help in finding overall rating for the place.

INDEX:

  1. Reading Data- Reading the csv file and storing into a dataframe
  2. Missing Value imputation-Using model based, mean based and frequency based imputations replace NULL values.
  3. Exploratory Data Analysis- Graph plots like pieplot, counterplot and barplot
  4. Data Preprocessing- Removing stopwords and unnecessary characters from the the text data
  5. Vectorization- Used countervectorizer, tfidfvectorizer and normlizer to vectorize the data
  6. Building models- Building different machine learning and deep learning models.

Reading Data

(51717 , 17)

Checking for percentage of NULL values for each features

Filling the Missing values

We are using 3 different approaches to fill the missing values ie. model based imputation, mean based and frequency based imputation

Exploratory Data Analysis

i. Analysis on Location of restaurant

Fig-1

ii. Analysis on online_order

iii. Analysis on ratings

iv. Analysis on number of stores for each restaurants

v. Analysis on Restaurants allows booking of tables

vi. Types of cuisines sold by most of the restaurants

vii. Items liked by peoples in Bangalore

viii. Analysis on cost of dining

ix. Analysis on votes

x. Rating of restaurants vs online_order

xi. Type of restaurant

xii. Pairplots

  1. In the plot of approx_cost vs rate, the restaurant whose rating is high has more price.
  2. In the graph of rate vs cost, rate vs votes, the data points are linearly separable

EDA Summary

  • BTM alone has 3108 restaurants which is the highest number of Restaurants in Bangalore as compared to any other location. BEL has the least Number of restaurants ie. 725. Number of restaurants in BTM comprise of 17% of total restaurants.
  • The number of restaurants that takes online order is more than those which don’t accepts online order. There are more 29342 restaurants that are accepting online orders and there are 20098 restaurants that are not accepting online order
  • There is a variation in ratings of restaurants between 1.8 to 4.9. The average rating of restaurants is 3.7.
  • CCD has 93 stores in bangalore which the highest number of stores for any restaurant in bangalore followed by onesta having 85 restaurants.
  • There are 43120 restaurants that are accepting the booking of table and 6320 restaurants that are not accepting the booking of table. Majority of restaurants may be street food type restaurant as it is not allowing booking of table
  • North Indian, Chinese and South indian are the top 3 cuisines available in the most of restaurants.
  • Chicken is the most liked dish by the peoples of bangalore followed by Biryani and rice.
  • The average cost of restaurants for the dining is 561. Minimum cost is 40 and max cost is 4000. Overall, 87.22% of the restaurants are not allowing the booking of tables
  • Only for those restaurants whose rating is 3.7, the number of restaurants accepting online order is more than the restaurants who don’t accepts the online order. For all the other restaurants (whose rating is other than 3.7), there are more no. of restaurants that accepts online order rather than the restaurants who don’t accepts the online order.
  • Around 50% of the restaurants in bangalore belongs to the delivery type of restaurants. The least type of restaurants in bangalore belongs to pubs and bars, buffet, drinks and nightlife. Also there are lot of restaurants (34%) which allows dine-out service. In total there are 24728 restaurants that belongs to delivery type. The number of Pubs and bar is 669 which the minimum among all the types of restaurants
  • The maximum no. restaurants that allows table booking has an average rating of 4.2 . The maximum number of restaurants, which don’t allows table booking has an average rating of 3.7 . Irrespective of ratings, the number of restaurants that allows booking of tables are less than the restaurants which don;t allows that.

Checking for multicollinearity

Defining a function to check multicollinearity using vif method

Feature Engineering

  1. Total No. of cuisines available in each of the restaurant

Feature Engineering Summary

  1. Mean value replacement for dish_liked — Here, first we have done response coding followed by mean value replacement for dish_liked column. We found its value is almost similar to the rate column
  2. Mean value replacement for cuisines — Here also, first we have done response coding followed by mean value replacement for cuisines column.
  3. Number of cuisines available- This column contains the total number of cuisines available in each restaurants
  4. Number of dish_liked — This column contains the total number of dishes liked by the customers in each restaurants.
  5. Facilities offered — If the restaurant is allowing both online_order and booking_table, then we have given the facilities offered values as 2. If restaurant is allowing either of the them, then we’ve given the values as 1. If the restaurant is not allowing any of the facilities, then we’ve given the value as 0.

Preprocessing of Features

We are removing the stopwords and other special characters that are not essential from the column of preprocessed_reviews. Finally we are replacing the original column of review with the preprocessed_review column.

Vectorization

Here we are using countvectorizer for categorical features, tfidf for text features and normalizer for numerical features.

Hyperparamter tuning for Random forest algorithm

Here we are trying to find the best value of n_estimators and max_depth which provides the minimum mse value for the regression model

Applying Random forest model with best hyperparameters

Deep learning models:

Now, we’ve used few deep learning models to predict the accuracy of the model. we’ve used lstm, lstm-cnn and cnn with conv1d. Although in this problem, the machine learning model are performing better as compared to deep learning models.

References:

www.appliedaicourse.com

Data Science aspirant