What is a recommendation system?

The goal of a recommendation algorithm is to recommend or predict items a user might like based on their data or based on the entire user database. Here’s a conceptual pipeline to show the process of recommending a song.

Untitled

Let’s analyze step by step the system

  1. Data Loading: The script starts by loading the Spotify dataset from a CSV file using pd.read_csv(path) and stores it in a DataFrame named spotify_df.
  2. Data Preprocessing: The script performs some data preprocessing tasks such as checking for missing values using spotify_df.isnull().sum() and computing the correlation matrix using spotify_df.corr().
  3. Data Normalization: The script applies Min-Max scaling to normalize the numeric columns in the dataset. It selects only the numeric columns from the Data Frame using spotify_df.select_dtypes(include=datatypes) and then uses MinMaxScaler to scale each column independently to a range of [0, 1].
  4. K-means Clustering: The normalized numeric columns are then used as features for K-means clustering. The script initializes a K-means clustering model with 10 clusters (KMeans(n_clusters=10)) and fits it to the normalized data. The cluster assignments are stored in the features column of the spotify_df.
  5. Recommender System: The Spotify_Recommendation class is defined to build the recommender system. Th e class takes the Spotify dataset as input. The recommend_songs method in this class is used to recommend songs similar to a given song. The method calculates the distance between the input song and all other songs in the dataset using a similarity metric (in this case, it computes the absolute difference between attribute values). The songs with the lowest distance (i.e., most similar) are recommended.
    1. Visualization: The script uses seaborn.heatmap to visualize the correlation matrix as a heatmap.
  6. Recommendation Output: When calling the recommend_songs method, it returns a Data Frame containing the recommended songs with their artists and names.
pip show spotipy
!pip install python-dotenv spotipy

import pandas as pd
import numpy as np
import json
import re 
import sys
import itertools

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy.oauth2 import SpotifyOAuth
import spotipy.util as util

import warnings
warnings.filterwarnings("ignore")

#matplotlib inline
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

spotify_df = pd.read_csv(path)
spotify_df.info()
spotify_df.isnull().sum()

corr_matrix = spotify_df.drop(columns=['id','name','release_date','year','artists'])
corr_matrix.corr()

import seaborn as sns

sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')

#Using minmaxscalar method to normarize the data with columns seelted only of types = int and float variables
from sklearn.preprocessing import MinMaxScaler
# MinMaxScaler -> where the minimum of feature is made equal to zero and the maximum of feature equal to one.

datatypes = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
normarization = spotify_df.select_dtypes(include=datatypes)
for col in normarization.columns:
MinMaxScaler(col)

#Use K-means clustering to allot different features' values to a cluster
from sklearn.cluster import KMeans
#buulding 10 clusters 
kmeans = KMeans(n_clusters=10)
#fitting clusters in normarized data
features = kmeans.fit_predict(normarization)
spotify_df['features'] = features
#There is another way of data scaling, MinMaxScaler -> where the minimum of feature is made equal to zero and the maximum of feature equal to one. 
MinMaxScaler(spotify_df['features'])

#Using the data to build recommender system for given song
class Spotify_Recommendation():
    def __init__(self,dataset):
        self.dataset = dataset
    def recommend_songs(self,song,amount=1):
        #initialiseda as empty array
        distance =[]
        song = self.dataset[(self.dataset.name.str.lower() == songs.lower())].head(1).values[0]
        rec = self.dataset[self.dataset.name.str.lower() != songs.lower()]
        for songs in tqdm(rec.values):
            d = 0
            for col in np.arange(len(rec.columns)):
                if not col in [1, 6, 12, 14, 18]:
                    d = d + np.absolute(float(song[col]) - float(songs[col]))
            distance.append(d)
        rec['distance'] = distance
        rec = rec.sort_values('distance')
        columns = ['artists', 'name']
        return rec[columns][:amount]

recommendations = Spotify_Recommendation(data)
recommendations.recommend("Mixe", 10)