What is a recommendation system?
The goal of a recommendation algorithm is to recommend or predict items a user might like based on their data or based on the entire user database. Here’s a conceptual pipeline to show the process of recommending a song.
Let’s analyze step by step the system
pd.read_csv(path)
and stores it in a DataFrame named spotify_df
.spotify_df.isnull().sum()
and computing the correlation matrix using spotify_df.corr()
.spotify_df.select_dtypes(include=datatypes)
and then uses MinMaxScaler
to scale each column independently to a range of [0, 1].KMeans(n_clusters=10)
) and fits it to the normalized data. The cluster assignments are stored in the features
column of the spotify_df
.Spotify_Recommendation
class is defined to build the recommender system. Th e class takes the Spotify dataset as input. The recommend_songs
method in this class is used to recommend songs similar to a given song. The method calculates the distance between the input song and all other songs in the dataset using a similarity metric (in this case, it computes the absolute difference between attribute values). The songs with the lowest distance (i.e., most similar) are recommended.
seaborn.heatmap
to visualize the correlation matrix as a heatmap.recommend_songs
method, it returns a Data Frame containing the recommended songs with their artists and names.pip show spotipy
!pip install python-dotenv spotipy
import pandas as pd
import numpy as np
import json
import re
import sys
import itertools
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy.oauth2 import SpotifyOAuth
import spotipy.util as util
import warnings
warnings.filterwarnings("ignore")
#matplotlib inline
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))
spotify_df = pd.read_csv(path)
spotify_df.info()
spotify_df.isnull().sum()
corr_matrix = spotify_df.drop(columns=['id','name','release_date','year','artists'])
corr_matrix.corr()
import seaborn as sns
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
#Using minmaxscalar method to normarize the data with columns seelted only of types = int and float variables
from sklearn.preprocessing import MinMaxScaler
# MinMaxScaler -> where the minimum of feature is made equal to zero and the maximum of feature equal to one.
datatypes = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
normarization = spotify_df.select_dtypes(include=datatypes)
for col in normarization.columns:
MinMaxScaler(col)
#Use K-means clustering to allot different features' values to a cluster
from sklearn.cluster import KMeans
#buulding 10 clusters
kmeans = KMeans(n_clusters=10)
#fitting clusters in normarized data
features = kmeans.fit_predict(normarization)
spotify_df['features'] = features
#There is another way of data scaling, MinMaxScaler -> where the minimum of feature is made equal to zero and the maximum of feature equal to one.
MinMaxScaler(spotify_df['features'])
#Using the data to build recommender system for given song
class Spotify_Recommendation():
def __init__(self,dataset):
self.dataset = dataset
def recommend_songs(self,song,amount=1):
#initialiseda as empty array
distance =[]
song = self.dataset[(self.dataset.name.str.lower() == songs.lower())].head(1).values[0]
rec = self.dataset[self.dataset.name.str.lower() != songs.lower()]
for songs in tqdm(rec.values):
d = 0
for col in np.arange(len(rec.columns)):
if not col in [1, 6, 12, 14, 18]:
d = d + np.absolute(float(song[col]) - float(songs[col]))
distance.append(d)
rec['distance'] = distance
rec = rec.sort_values('distance')
columns = ['artists', 'name']
return rec[columns][:amount]
recommendations = Spotify_Recommendation(data)
recommendations.recommend("Mixe", 10)