Spring 2025, Midterm 1
: Game On!
¶Version 1.0.1
All of the header information is important. Please read it..
Topics number of exercises: This problem builds on your knowledge of built-in python data structures such as lists and sets, nested data structures, math as code, and basic algorithm concepts
. It has 9 exercises numbered 0 to 8. There are 18 available points. However to earn 100% the threshold is 13 points. (Therefore once you hit 13 points you can stop. There is no extra credit for exceeding this threshold.)
Exercise ordering: Each exercise builds logically on previous exercises but you may solve them in any order. That is if you can't solve an exercise you can still move on and try the next one. Use this to your advantage as the exercises are not necessarily ordered in terms of difficulty. Higher point values generally indicate more difficult exercises.
Demo cells: Code cells starting with the comment ### Run Me!!!
load results from prior exercises applied to the entire data set and use those to build demo inputs. These must be run for subsequent demos to work properly but they do not affect the test cells. The data loaded in these cells may be rather large (at least in terms of human readability). You are free to print or otherwise use Python to explore them but we may not print them in the starter code.
Debugging your code: Right before each exercise test cell there is a block of text explaining the variables available to you for debugging. You may use these to test your code and can print/display them as needed (careful when printing large objects you may want to print the head or chunks of rows at a time).
Exercise point breakdown:
Exercise 0 - : 2 point(s)
Exercise 1 - : 3 point(s)
Exercise 2 - : 2 point(s)
Exercise 3 - : 3 point(s)
Exercise 4 - : 1 point(s)
Exercise 5 - : 2 point(s)
Exercise 6 - : 2 point(s)
Exercise 7 - : 1 point(s) - FREE
Exercise 8 - : 2 point(s)
Final reminders:
### Global imports
import dill
from cse6040_devkit import plugins, utils
from collections import defaultdict, Counter
from math import log
from pprint import pprint
utils.add_from_file('defaultdict_check', plugins)
with open('resource/asnlib/publicdata/user_items.dill', 'rb') as f:
users = dill.load(f)
with open('resource/asnlib/publicdata/games.dill', 'rb') as f:
games = dill.load(f)
Background. As of 2024, Steam is the largest digital distribution platform for selling and distributing video games. It hosts over 30,000 unique titles which are available for consumer purchase. Consumers who purchase games on Steam are provided with an account. These user accounts are tied to a user's game purchases, which makes it possible to see who owns which games. The storefront also tracks information about the games it distributes, such as relevant tags, the number of reviews, the text of individual reviews written by users, and more.
Your overall task. Your goal is to create an individually-tailored recommendation system for Steam. You will create two recommendation systems by taking two different approaches:
At the end, we will combine these results to create an ordered list of recommended games which a user could purchase.
The datasets. You will work with two datasets to solve this problem. Both were obtained from the research produced at The University of California, San Diego and Julian McAuley's research team, such as by Wang-Chen Kang. The datasets describe:
Both datasets are provided as Python lists. If you have not already done so, run the cells above this paragraph to load the data into memory.
Before we begin creating a recommendation system, we need to deal with the fact that our data are a bit messy. Let's start by cleaning up our inputs and organizing them so it's easier to work with our information later.
dictionary_key_frequency
Your task: define dictionary_key_frequency
as follows:
To begin, it will be helpful to get a sense for what sorts of attributes we have access to in our input data and how frequently we have access to that information. You will do this by completing the following task:
Calculate the frequencies of the keys found in a list of dictionaries.
Inputs:
list_of_dictionaries
: A list of dictionaries.Return:
key_frequencies
: A dictionary.Hints
Counter()
data structure, provided by the collections library helpful. It is not required to solve the problem.### Solution - Exercise 0
def dictionary_key_frequency(list_of_dictionaries: list) -> dict:
### BEGIN SOLUTION
keys = [key for element in list_of_dictionaries for key in element]
key_counts = dict(Counter(keys))
num_elements = len(list_of_dictionaries)
for key in key_counts:
proportion = round(key_counts[key] / num_elements, 6)
key_counts[key] = proportion
return key_counts
### END SOLUTION
### Demo function call
demo_list_of_dict = [
{"a": 1, "b": 2},
{"a": 3, "b": 11, "c": 4},
{"a": 5, "b": 6, "d": 7},
{"b": 8, "c": 9}
]
print('Here is the desired output for `demo_list_of_dicts`:')
pprint(dictionary_key_frequency(demo_list_of_dict))
print('-------------------------------------------------')
print('Here are the keys in the dictionary and their frequencies for the demo output:')
pprint(dictionary_key_frequency(games))
Whether your solution is working or not, run the following code cell. It will load the proper results into memory and show the expected output for the demo cell above.
with open('resource/asnlib/publicdata/dictionary_key_frequencies_demo.dill', 'rb') as fp:
dictionary_key_frequencies_demo = dill.load(fp)
The demo should display this printed output.
Here is the desired output for `demo_list_of_dicts`:
{'a': 0.75, 'b': 1.0, 'c': 0.5, 'd': 0.25}
-------------------------------------------------
Here are the keys in the dictionary and their frequencies for the demo output:
{'app_name': 0.999938,
'developer': 0.897339,
'discount_price': 0.007002,
'early_access': 1.0,
'genres': 0.897837,
'id': 0.999938,
'metascore': 0.083305,
'price': 0.95715,
'publisher': 0.749432,
'release_date': 0.935678,
'reviews_url': 0.999938,
'sentiment': 0.776505,
'specs': 0.97915,
'tags': 0.994928,
'title': 0.936207,
'url': 1.0}
The cell below will test your solution for dictionary_key_frequency (exercise 0). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 0
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
execute_tests = dill.load(f)
# Execute test
passed, test_case_vars, e = execute_tests(func=dictionary_key_frequency,
ex_name='dictionary_key_frequency',
key=b'bXITTMSvN2mdmM8cprP7s1wI32RY8znER4wcdi8MGTY=',
n_iter=50)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to dictionary_key_frequency did not pass the test.'
### BEGIN HIDDEN TESTS
passed, test_case_vars, e = execute_tests(func=dictionary_key_frequency,
ex_name='dictionary_key_frequency',
key=b'6eyvyfuHYk8oOnG006W1whQrRHKNi9LV7vkKWcQmRwc=',
n_iter=10,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to dictionary_key_frequency did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
organize_game_info
Your task: define organize_game_info
as follows:
Organize the information in games_list
, represented as dictionaries with various attributes, into a more useful format for processing with the recommender system.
Inputs:
games_list
: A list of dictionaries, where each dictionary represents a game and may or may not contain attributes 'id', 'name', and 'tags'.The term "attribute" is not the same as the term "key" in this context. The key for an attribute may or may not be the same as the attribute's name. (e.g. the key for the 'id' attribute may be 'game_id', 'id_number', 'id', or anything else.)
attribute_map
: A dictionary that maps attribute names to their corresponding keys in the game dictionaries. For example:
attribute_map['id']
is the key we would use to look up the 'id' attribute in one of the game dictionaries rather than the string 'id'.
game_info_lookup
: An optional dictionary containing pre-existing game data that should be merged with the newly extracted information. Defaults to None.Return:
game_info_lookup
: A dictionary with the structure described below.Requirements/steps:
games_list
using the attribute_map
to determine the correct keys.game_info_lookup
is provided, merge its contents with the newly created result dictionary.game_info_lookup
, the value from game_info_lookup
should be kept as part of the post-merge result.Hints
game_info_lookup
is provided, make sure you don't modify the input!### Solution - Exercise 1
def organize_game_info(games_list: list, attribute_map: dict, game_info_lookup=None) -> dict:
### BEGIN SOLUTION
if not game_info_lookup:
game_info_lookup = dict()
else:
game_info_lookup = game_info_lookup.copy()
id_attr = attribute_map['id']
name_attr = attribute_map['name']
tags_attr = attribute_map['tags']
for game in games_list:
if id_attr not in game or name_attr not in game:
continue
id = game[id_attr]
name = game[name_attr]
tags = game.get(tags_attr, [])
if tags:
tags = [tag.lower().strip() for tag in tags]
if id not in game_info_lookup:
game_info_lookup[id] = {'name': name, 'tags': tags}
return game_info_lookup
### END SOLUTION
### Demo function call
def example_organize_game_info(output, ex_num, game_arg, attr_map_arg, tf, ex_IDs):
print(f'Example {ex_num} ------------------------------------------------------------')
print('`organize_game_info` will be called with the following attribute map:')
pprint(attr_map_arg)
print('\nThe full results are stored in `game_info_lookup_demo`. Here are the results for a subset of the full dictionary.')
pprint({ID: output[ID] for ID in ex_IDs})
print(f'\nThe following should show `{tf}`: {"100" in output}\n')
# Example 0 (A small-scale demo)
print(f'Example 0 ------------------------------------------------------------')
print('This example is just to help you understand the logic of the problem.')
games_list = [
{'name': 'Game A', 'game_tags': ['Action', 'Adventure']},
{'game_id': 2, 'game_name': 'Game B', 'game_tags': ['RPG', 'Strategy']},
{'id': 3, 'game_name': 'Game C', 'game_tags': ['RPG', 'Strategy']},
{'game_id': 4, 'game_name': 'Game D', 'tags': ['FPS', 'Action']},
{'game_id': 4, 'game_name': 'Game D', 'game_tags': ['FPS', 'Action']},
{'game_id': 22, 'game_name': 'Game E', 'game_tags': []}
]
attribute_map = {
'id': 'game_id',
'name': 'game_name',
'tags': 'game_tags'
}
pprint(organize_game_info(games_list, attribute_map))
# Example 1 (without existing dictionary)
attr_map_1 = {
'id': 'id',
'name': 'app_name',
'tags': 'tags'
}
game_info_lookup_demo = organize_game_info(games, attr_map_1)
example_organize_game_info(game_info_lookup_demo, 1, games, attr_map_1, False, ('10', '1002', '100400', '10090'))
# Example 2 (with existing dictionary)
attr_map_2 = {
'id': 'item_id',
'name': 'item_name',
'tags': 'tags'
}
all_user_games = [game for user in users for game in user.get('items', [])]
game_info_lookup_demo = organize_game_info(all_user_games, attr_map_2, game_info_lookup_demo)
example_organize_game_info(game_info_lookup_demo, 2, all_user_games, attr_map_2, True, ('10', '100', '10000'))
Whether your solution is working or not, run the following code cell. It will load the proper results into memory and show the expected output for the demo cell above.
with open('resource/asnlib/publicdata/game_info_lookup_partial_demo.dill', 'rb') as fp:
game_info_lookup_partial_demo = dill.load(fp)
with open('resource/asnlib/publicdata/game_info_lookup_demo.dill', 'rb') as fp:
game_info_lookup_demo = dill.load(fp)
The demo should display this printed output.
Example 0 ------------------------------------------------------------
This example is just to help you understand the logic of the problem.
{2: {'name': 'Game B', 'tags': ['rpg', 'strategy']},
4: {'name': 'Game D', 'tags': []},
22: {'name': 'Game E', 'tags': []}}
Example 1 ------------------------------------------------------------
`organize_game_info` will be called with the following attribute map:
{'id': 'id', 'name': 'app_name', 'tags': 'tags'}
The full results are stored in `game_info_lookup_demo`. Here are the results for a subset of the full dictionary.
{'10': {'name': 'Counter-Strike',
'tags': ['action',
'fps',
'multiplayer',
'shooter',
'classic',
'team-based',
'competitive',
'first-person',
'tactical',
"1990's",
'e-sports',
'pvp',
'military',
'strategy',
'score attack',
'survival',
'assassin',
'1980s',
'ninja',
'tower defense']},
'1002': {'name': 'Rag Doll Kung Fu',
'tags': ['indie', 'fighting', 'multiplayer']},
'100400': {'name': 'Silo 2', 'tags': ['animation & modeling', 'software']},
'10090': {'name': 'Call of Duty: World at War',
'tags': ['zombies',
'world war ii',
'fps',
'action',
'multiplayer',
'shooter',
'moddable',
'co-op',
'first-person',
'singleplayer',
'war',
'online co-op',
'gore',
'historical',
'survival',
'classic',
'tanks',
'great soundtrack',
'adventure',
'horror']}}
The following should show `False`: False
Example 2 ------------------------------------------------------------
`organize_game_info` will be called with the following attribute map:
{'id': 'item_id', 'name': 'item_name', 'tags': 'tags'}
The full results are stored in `game_info_lookup_demo`. Here are the results for a subset of the full dictionary.
{'10': {'name': 'Counter-Strike',
'tags': ['action',
'fps',
'multiplayer',
'shooter',
'classic',
'team-based',
'competitive',
'first-person',
'tactical',
"1990's",
'e-sports',
'pvp',
'military',
'strategy',
'score attack',
'survival',
'assassin',
'1980s',
'ninja',
'tower defense']},
'100': {'name': 'Counter-Strike: Condition Zero Deleted Scenes', 'tags': []},
'10000': {'name': 'Enemy Territory: Quake Wars', 'tags': []}}
The following should show `True`: True
The cell below will test your solution for organize_game_info (exercise 1). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 1
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
execute_tests = dill.load(f)
# Execute test
passed, test_case_vars, e = execute_tests(func=organize_game_info,
ex_name='organize_game_info',
key=b'bXITTMSvN2mdmM8cprP7s1wI32RY8znER4wcdi8MGTY=',
n_iter=100)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to organize_game_info did not pass the test.'
### BEGIN HIDDEN TESTS
passed, test_case_vars, e = execute_tests(func=organize_game_info,
ex_name='organize_game_info',
key=b'6eyvyfuHYk8oOnG006W1whQrRHKNi9LV7vkKWcQmRwc=',
n_iter=10,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to organize_game_info did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
Background. There are a few different ways to create a recommendation system. One approach is called content filtering. In this approach, we will try to build a model which tries to understand what types of things a person likes. Then, we will try to recommend things which are most similar to the things they tend to like.
We are going to try to do this by building a variant of a common classification model: the Naive Bayes Classifier.
split_games_by_ownership
Your task: define split_games_by_ownership
as follows:
Our model is going to need to know whether a user already owns a game or not.
Return a tuple which contains two sets: the IDs for the games a user owns, and the IDs for the games a user does not own.
Inputs:
The keys of
game_lookup
are the ids of all the games which could potentially be owned by a user.
Return:
owned
: a set containing the IDs of every game owned by user
.not_owned
: a set containing the IDs of every game not owned by user
.Requirements/steps:
owned
.Hints
items
key in their user
dictionary.item_id
key for each game owned by the user. ### Solution - Exercise 2
def split_games_by_ownership(user: dict, game_lookup: dict) -> tuple:
### BEGIN SOLUTION
user_games = user.get('items', [])
owned = set(map(lambda g: g.get('item_id'), user_games))
all_games = set(game_lookup.keys())
not_owned = all_games - owned
return owned, not_owned
### END SOLUTION
### Demo function call
# Example 0: Simple demo
demo_user = {
'user_id': 123,
'items': [
{'item_id': 'game_a'},
{'item_id': 'game_c'}
]
}
demo_game_lookup = {
'game_a': {'name': 'Game A'},
'game_b': {'name': 'Game B'},
'game_c': {'name': 'Game C'},
'game_d': {'name': 'Game D'}
}
print('The following output is designed to help you understand the logic of the question:')
print(split_games_by_ownership(demo_user, demo_game_lookup))
print('-------------------------------------------------------------')
# Example 1: Expected output
owned_demo, not_owned_demo = split_games_by_ownership(users[0], game_info_lookup_demo)
print('The full output is contained in `owned_demo` and `not_owned_demo`.')
print('Here are the first 15 games owned by the first user, ordered by key:')
pprint(set(sorted([game for game in owned_demo])[:15]))
print('Here are the first 15 games not owned by the first user, ordered by key:')
pprint(set(sorted([game for game in not_owned_demo])[:15]))
Whether your solution is working or not, run the following code cell. It will load the proper results into memory and show the expected output for the demo cell above.
with open('resource/asnlib/publicdata/split_games_by_ownership_demo.dill', 'rb') as fp:
owned_demo, not_owned_demo = dill.load(fp)
The demo should display this printed output.
The following output is designed to help you understand the logic of the question:
({'game_a', 'game_c'}, {'game_b', 'game_d'})
-------------------------------------------------------------
The full output is contained in `owned_demo` and `not_owned_demo`.
Here are the first 15 games owned by the first user, ordered by key:
{'10180',
'10190',
'102600',
'104700',
'104900',
'105600',
'10680',
'107100',
'107300',
'107310',
'108710',
'11200',
'113020',
'113200',
'116100'}
Here are the first 15 games not owned by the first user, ordered by key:
{'10',
'100',
'10000',
'1002',
'100400',
'100410',
'10080',
'10090',
'100970',
'100980',
'10100',
'10110',
'10120',
'10130',
'10140'}
The cell below will test your solution for split_games_by_ownership (exercise 2). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 2
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
execute_tests = dill.load(f)
# Execute test
passed, test_case_vars, e = execute_tests(func=split_games_by_ownership,
ex_name='split_games_by_ownership',
key=b'bXITTMSvN2mdmM8cprP7s1wI32RY8znER4wcdi8MGTY=',
n_iter=25)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to split_games_by_ownership did not pass the test.'
### BEGIN HIDDEN TESTS
passed, test_case_vars, e = execute_tests(func=split_games_by_ownership,
ex_name='split_games_by_ownership',
key=b'6eyvyfuHYk8oOnG006W1whQrRHKNi9LV7vkKWcQmRwc=',
n_iter=5,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to split_games_by_ownership did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
get_tag_probabilities
Your task: define get_tag_probabilities
as follows:
A Naive Bayes Classifier needs to know the probability that an item chosen at random from a class (classification group) has a given attribute.
Create a dictionary which maps tags to their smoothed probabilities that a game in a population has that tag.
Inputs:
game_IDs
: A set of game IDs which define a class of games. These IDs should all be present as keys in game_lookup
.game_lookup
: A dictionary mapping game IDs to game metadata.tags
key mapped to a list of that game's tags.alpha
: A provided smoothing constant which is an integer.Return:
smoothed_outputs
: A dictionary containing the following elements:tag_probabilities
: A dictionary with every tag in the games provided by game_IDs
as keys and their smoothed probabilities as the values.smoothed_default
: A floating point value of the formula when $N_i = 0$ (see formula below).bign_i
: The number of games, in the class specified by game_IDs
, which have tag $i$, represented by $N_i$ (see formula below).bign
: The number of games in the class defined by game_IDs
, represented by $N$ (see formula below).d
: The total number of unique tags (or the cardinality of the tags) which appear in ALL the games found in game_lookup
, represented by $d$ (see formula below).Requirements/steps:
game_IDs
.game_IDs
.game_lookup
.Hints
game_IDs
from game_lookup
.len()
function and the set()
data structure useful.Counter()
data structure provided by the collections library helpful. It is not required to solve the problem.### Solution - Exercise 3
def get_tag_probabilities(game_IDs: set, game_lookup: dict, alpha: int) -> dict:
### BEGIN SOLUTION
tags_by_game = [game_lookup[game_ID]['tags'] for game_ID in game_IDs]
total_games = len(tags_by_game)
all_tags = set([
tag
for game
in game_lookup
for tag
in game_lookup[game]['tags']
])
n_tags = len(all_tags)
tag_counts = Counter()
for game_tags in tags_by_game:
tag_counts.update(game_tags)
bign_i=tag_counts.copy()
for tag in tag_counts:
tag_counts[tag] += alpha
tag_counts[tag] /= total_games + (alpha * n_tags)
smoothed_default = alpha / (total_games + (alpha * n_tags))
output = {
'tag_probabilities': dict(tag_counts),
'smoothed_default': smoothed_default,
'bign_i': dict(bign_i),
'bign': total_games,
'd':n_tags
}
return output
### END SOLUTION
### Demo function call
owned_probs_demo = get_tag_probabilities( owned_demo, game_info_lookup_demo, 1)
not_owned_probs_demo = get_tag_probabilities(not_owned_demo, game_info_lookup_demo, 1)
print('The full results are in `owned_probs_demo`.')
# Results for owned games
print('Here are the probabilities of a subset tags within the games owned by the user:')
pprint({tag: owned_probs_demo['tag_probabilities'][tag] for tag in ('comedy', 'fps', 'strategy', 'puzzle')})
print(f'Your `smoothed_default` value for the games owned by the user is: {owned_probs_demo["smoothed_default"]}')
print('Here are the counts for a subset tags for `bign_i`:')
pprint({tag: owned_probs_demo['bign_i'][tag] for tag in ('comedy', 'fps', 'strategy', 'puzzle')})
print(f'Your `bign` value for the games owned by the user is: {owned_probs_demo["bign"]}')
print(f'Your `d` value for the games owned by the user is: {owned_probs_demo["d"]}')
# Results for non-owned games
print('------------------------------------------------------------------')
print('Here are the probabilities of a subset tags within the games not owned by the user:')
pprint({tag: not_owned_probs_demo['tag_probabilities'][tag] for tag in ('comedy', 'fps', 'strategy', 'puzzle')})
print(f'Your `smoothed_default` value for the games not owned by the user is: {not_owned_probs_demo["smoothed_default"]}')
print('Here are the counts for a subset tags for `bign_i`:')
pprint({tag: not_owned_probs_demo['bign_i'][tag] for tag in ('comedy', 'fps', 'strategy', 'puzzle')})
print(f'Your `bign` value for the games not owned by the user is: {not_owned_probs_demo["bign"]}')
print(f'Your `d` value for the games not owned by the user is: {not_owned_probs_demo["d"]}')
Whether your solution is working or not, run the following code cell. It will load the proper results into memory and show the expected output for the demo cell above.
with open('resource/asnlib/publicdata/owned_probs_demo.dill', 'rb') as fp:
owned_probs_demo = dill.load(fp)
with open('resource/asnlib/publicdata/not_owned_probs_demo.dill', 'rb') as fp:
not_owned_probs_demo = dill.load(fp)
The demo should display this printed output.
The full results are in `owned_probs_demo`.
Here are the probabilities of a subset tags within the games owned by the user:
{'comedy': 0.10193321616871705,
'fps': 0.08787346221441125,
'puzzle': 0.08084358523725835,
'strategy': 0.11072056239015818}
Your `smoothed_default` value for the games owned by the user is: 0.0017574692442882249
Here are the counts for a subset tags for `bign_i`:
{'comedy': 57, 'fps': 49, 'puzzle': 45, 'strategy': 62}
Your `bign` value for the games owned by the user is: 230
Your `d` value for the games owned by the user is: 339
------------------------------------------------------------------
Here are the probabilities of a subset tags within the games not owned by the user:
{'comedy': 0.02604415274463007,
'fps': 0.028639618138424822,
'puzzle': 0.06166467780429594,
'strategy': 0.22389618138424822}
Your `smoothed_default` value for the games not owned by the user is: 2.983293556085919e-05
Here are the counts for a subset tags for `bign_i`:
{'comedy': 872, 'fps': 959, 'puzzle': 2066, 'strategy': 7504}
Your `bign` value for the games not owned by the user is: 33181
Your `d` value for the games not owned by the user is: 339
The cell below will test your solution for get_tag_probabilities (exercise 3). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 3
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
execute_tests = dill.load(f)
# Execute test
passed, test_case_vars, e = execute_tests(func=get_tag_probabilities,
ex_name='get_tag_probabilities',
key=b'bXITTMSvN2mdmM8cprP7s1wI32RY8znER4wcdi8MGTY=',
n_iter=20)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to get_tag_probabilities did not pass the test.'
### BEGIN HIDDEN TESTS
passed, test_case_vars, e = execute_tests(func=get_tag_probabilities,
ex_name='get_tag_probabilities',
key=b'6eyvyfuHYk8oOnG006W1whQrRHKNi9LV7vkKWcQmRwc=',
n_iter=5,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to get_tag_probabilities did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
tag_probability_with_default
Your task: define tag_probability_with_default
as follows:
It would be helpful to easily bundle our smoothed_default
into our collection of smooth probabilities.
Create a default dictionary which returns the smoothed_default
for a missing key.
Inputs:
tag_probabilities
: Our dictionary which maps tags to probabilities.smoothed_default
: Our default, smoothed value, which is a float.Return:
tag_probability_default
: A default dictionary containing the values from tag_probabilities
which returns the smoothed_default
for missing values.Requirements/steps:
defaultdict
uses a function to return the default value.Hints
defaultdict
constructor. Either approach works.### Solution - Exercise 4
def tag_probability_with_default(tag_probabilities: dict, smoothed_default):
### BEGIN SOLUTION
tag_probability_default = defaultdict(
lambda: smoothed_default,
tag_probabilities
)
return tag_probability_default
### END SOLUTION
### Demo function call
owned_probs_default_demo = tag_probability_with_default(**{k:v for k,v in owned_probs_demo.items() if k in ['tag_probabilities','smoothed_default']})
not_owned_probs_default_demo = tag_probability_with_default(**{k:v for k,v in not_owned_probs_demo.items() if k in ['tag_probabilities','smoothed_default']})
try:
assert "A SUPER SPECIAL MYSTERY KEY" not in owned_probs_default_demo
except:
print('Did you insert the random key into the dictionary before we checked for it?')
try:
assert isinstance(owned_probs_default_demo, defaultdict), "Are you SURE you're returning a default dictionary?"
except:
print('Are you sure your dictionary is a defaultdict?')
print(f'Default value of one of our dictionaries: {owned_probs_demo["smoothed_default"]}')
print(f'Value associated with absent key: {owned_probs_default_demo["A SUPER SPECIAL MYSTERY KEY"]}')
Whether your solution is working or not, run the following code cell. It will load the proper results into memory and show the expected output for the demo cell above.
with open('resource/asnlib/publicdata/owned_probs_default_demo.dill', 'rb') as fp:
owned_probs_default_demo = dill.load(fp)
with open('resource/asnlib/publicdata/not_owned_probs_default_demo.dill', 'rb') as fp:
not_owned_probs_default_demo = dill.load(fp)
The demo should display this printed output.
Default value of one of our dictionaries: 0.0017574692442882249
Value associated with absent key: 0.0017574692442882249
The cell below will test your solution for tag_probability_with_default (exercise 4). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 4
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
execute_tests = dill.load(f)
# Execute test
passed, test_case_vars, e = execute_tests(func=plugins.defaultdict_check(tag_probability_with_default),
ex_name='tag_probability_with_default',
key=b'bXITTMSvN2mdmM8cprP7s1wI32RY8znER4wcdi8MGTY=',
n_iter=100)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to tag_probability_with_default did not pass the test.'
### BEGIN HIDDEN TESTS
passed, test_case_vars, e = execute_tests(func=plugins.defaultdict_check(tag_probability_with_default),
ex_name='tag_probability_with_default',
key=b'6eyvyfuHYk8oOnG006W1whQrRHKNi9LV7vkKWcQmRwc=',
n_iter=10,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to tag_probability_with_default did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
bernoulli_conditional
Your task: define bernoulli_conditional
as follows:
A Naive Bayes model requires some conditional probability definition to function.
Define our conditional probability function as outlined below.
Inputs:
x_tag
: Some tag, possibly contained as a key in prob_lookup
.present
: An integer equal to 0 or 1, as defined below.prob_lookup
: A default dictionary from exercise 4. It will return the smoothed probability for x_tag.
Return:
conditional_prob
: The Bernoulli conditional probability, as defined below.Requirements/steps:
Hints
prob_lookup
, associated with x_tag
.if
statement, depending on the value of present
. That works too!### Solution - Exercise 5
def bernoulli_conditional(x_tag: str, present: int, prob_lookup: dict):
### BEGIN SOLUTION
A = prob_lookup[x_tag] * present
B = (1 - prob_lookup[x_tag]) * (1 - present)
return A + B
### END SOLUTION
### Demo function call
# Example 0: Simple Case
prob_lookups = {"category_A": 0.7, "category_B": 0.3}
prob_given_A_present = bernoulli_conditional("category_A", 1, prob_lookups)
prob_given_A_not_present = bernoulli_conditional("category_A", 0, prob_lookups)
prob_given_B_present = bernoulli_conditional("category_B", 1, prob_lookups)
prob_given_B_not_present = bernoulli_conditional("category_B", 0, prob_lookups)
print(f"Category A has a {prob_given_A_present:.3} probability of being present and a {prob_given_A_not_present:.3} probability of not being present.")
print(f"Category B has a {prob_given_B_present:.3} probability of being present and a {prob_given_B_not_present:.3} probability of not being present.")
print("------------------------------------------------------------")
# Example 1: Demo Output
bernoulli_conditional_prob_demo = bernoulli_conditional('strategy', 0, owned_probs_default_demo)
print(f'Your conditional function produced the following probability when the "strategy" tag is absent:\n{bernoulli_conditional_prob_demo}')
bernoulli_conditional_prob_demo = bernoulli_conditional('strategy', 1, owned_probs_default_demo)
print(f'Your conditional function produced the following probability when the "strategy" tag is present:\n{bernoulli_conditional_prob_demo}')
The demo should display this printed output.
Category A has a 0.7 probability of being present and a 0.3 probability of not being present.
Category B has a 0.3 probability of being present and a 0.7 probability of not being present.
------------------------------------------------------------
Your conditional function produced the following probability when the "strategy" tag is absent:
0.8892794376098418
Your conditional function produced the following probability when the "strategy" tag is present:
0.11072056239015818
The cell below will test your solution for bernoulli_conditional (exercise 5). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 5
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
execute_tests = dill.load(f)
# Execute test
passed, test_case_vars, e = execute_tests(func=bernoulli_conditional,
ex_name='bernoulli_conditional',
key=b'bXITTMSvN2mdmM8cprP7s1wI32RY8znER4wcdi8MGTY=',
n_iter=100)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to bernoulli_conditional did not pass the test.'
### BEGIN HIDDEN TESTS
passed, test_case_vars, e = execute_tests(func=bernoulli_conditional,
ex_name='bernoulli_conditional',
key=b'6eyvyfuHYk8oOnG006W1whQrRHKNi9LV7vkKWcQmRwc=',
n_iter=10,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to bernoulli_conditional did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
IMPORTANT! You do not need to read this section to solve exercise 6! You might find it helpful, but it is not required.
Notes. A Naive Bayes Classifier typically works by using the following formula:
$$\hat{y} = \text{argmax}_y P(y)\prod_{i=1}^n P(x_i|y)$$We calculate the right-hand side of the equation for each class, $y$, and simply choose the class $y$ which maximizes the result.
To create a recommendation system, we will choose the class which maximizes the ratio of $P(x_i|y)$ for a game's probability. In other words, we want to recommend games with the highest value for the following equation:
$$\frac{\prod_{i=1}^n P(x_i|y_{\text{owned}})}{\prod_{i=1}^n P(x_i|y_{\text{not owned}})}$$However, due to numerical considerations related to small floating-point values, we'll actually take the logarithms of both the numerator and denominator and calculate the difference. So, the equation we actually want to maximize is this:
$$\sum_{i=1}^n\ln{P(x_i|y_{\text{owned}})} - \sum_{i=1}^n\ln{P(x_i|y_{\text{not owned}})}$$This is what you will compute in exercise 6.
If you want to learn more, you may find the relevant Scikit-Learn page informative!
Whether your solution is working or not, run the following code cell. It will load a function needed for Exercise 6.
### Run Me!!!
demo_b_cond = utils.load_object_from_publicdata('demo_b_cond')
bernoulli_bayes
Your task: define bernoulli_bayes
as follows:
For a given game, return the difference of the log-sums of the conditional probabilities.
Inputs:
b_cond
: A correctly defined bernoulli_conditional
function. You may obtain $P(x_i|y)$ by calling b_cond(tag, present, probs)
(where probs
is either owned_probs
or not_owned_probs
).game
: A game dictionary e.g. {name: str, tags: list}, in the form provided by exercise 1.owned_probs
: A default dictionary containing the probabilities for $P(x_i|y_{\text{owned}})$not_owned_probs
: A default dictionary containing the probabilities for $P(x_i|y_{\text{not owned}})$all_tags
: A set containing every tag. $x_i$ is an arbitrary tag in all_tags
.Return:
log_prob_diff
: The difference in the sum of the log probabilities, as defined below.Requirements/steps:
Hints
game
. Otherwise, it should be equal to 0. Note: the b_cond
supplied will fail (generate a KeyError
) if you refer any tag
outside the feature space defined by all_tags
. Use this to your advantage when debugging.
### Solution - Exercise 6
def bernoulli_bayes(b_cond, game: dict, owned_probs: dict, not_owned_probs: dict, all_tags: set):
### BEGIN SOLUTION
tags = set(game['tags'])
if not tags:
return 0
owned_prob = 0
not_owned_prob = 0
for tag in all_tags:
present = 1 if tag in tags else 0
owned_prob += log(b_cond(tag, present, owned_probs))
not_owned_prob += log(b_cond(tag, present, not_owned_probs))
return owned_prob - not_owned_prob
### END SOLUTION
### Demo function call
# Part 1: Computing the log-probabilities
all_tags = set(tag for game in game_info_lookup_demo for tag in game_info_lookup_demo[game]['tags'])
print('Here is the difference in the log-probabilities produced for "Call of Duty: World at War":')
bernoulli_bayes_prob_demo = bernoulli_bayes(
demo_b_cond,
game_info_lookup_demo['10090'],
owned_probs_default_demo,
not_owned_probs_default_demo,
all_tags
)
print(bernoulli_bayes_prob_demo)
# Part 2: Using the scores to create recommendations
game_items = game_info_lookup_demo.values()
game_order = map(
lambda g: bernoulli_bayes(
demo_b_cond,
g,
owned_probs_default_demo,
not_owned_probs_default_demo,
all_tags
),
game_items
)
sorted_games = sorted(zip(game_items, game_order), key=lambda x: x[1], reverse=True)[:15]
content_recs_demo = [game[0]['name'] for game in sorted_games]
print('Here are the 15 highest rated games for our user, based on content filtering:')
pprint(content_recs_demo)
Whether your solution is working or not, run the following code cell. It will show the expected output for the demo cell above.
with open('resource/asnlib/publicdata/bernoulli_bayes_recs.dill', 'rb') as fp:
bernoulli_bayes_recs_demo = dill.load(fp)
The demo should display this printed output.
Here is the difference in the log-probabilities produced for "Call of Duty: World at War":
12.141327968970366
Here are the 15 highest rated games for our user, based on content filtering:
['Call of Juarez® Gunslinger',
'Call of Duty® 4: Modern Warfare®',
'Team Fortress 2',
'Call of Duty®: Ghosts',
'Death Squared',
'Counter-Strike: Global Offensive',
'Killing Floor',
'Hitman: Absolution™',
'Grand Theft Auto IV: Complete Edition',
'Deus Ex: Game of the Year Edition',
'Saints Row 2',
'Left 4 Dead 2',
'Saints Row IV',
'Antichamber',
'Left 4 Dead']
The cell below will test your solution for bernoulli_bayes (exercise 6). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 6
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
execute_tests = dill.load(f)
# Execute test
passed, test_case_vars, e = execute_tests(func=bernoulli_bayes,
ex_name='bernoulli_bayes',
key=b'bXITTMSvN2mdmM8cprP7s1wI32RY8znER4wcdi8MGTY=',
n_iter=100)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to bernoulli_bayes did not pass the test.'
### BEGIN HIDDEN TESTS
passed, test_case_vars, e = execute_tests(func=bernoulli_bayes,
ex_name='bernoulli_bayes',
key=b'6eyvyfuHYk8oOnG006W1whQrRHKNi9LV7vkKWcQmRwc=',
n_iter=10,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to bernoulli_bayes did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
Background. There is another approach we can take to try to determine which games we should recommend to users. This approach is called collaborative filtering. In this approach, we assume that users who are similar to each other are likely to enjoy similar things.
We can exploit this by calculating some score for similarity between two users. Then, we use those scores as weights and sum a collection of weighted votes to rank our items.
You will implement a version of this approach in exercise 8, using the results from the (free!) exercise 7.
cosine_similarity
Example: we have defined cosine_similarity
as follows:
IMPORTANT! The following exercise is free, but you MUST run the cells to earn the points!
Define the function to calculate the cosine similarity of two users, given the collections of games they own.
Inputs:
game_set_A
: A collection of game IDs which are owned by user A.game_set_B
: A collection of game IDs which are owned by user B.alpha
: A smoothing constant.feature_size
: A value to help us calculate the smoothing constant.Return:
smoothed_similarity
: The smoothed similarity, as defined below.Requirements/steps:
Notes:
### Solution - Exercise 7
def cosine_similarity(game_set_A: set, game_set_B: set, alpha: float, feature_size: int):
A_mag = len(game_set_A) ** (1/2)
B_mag = len(game_set_B) ** (1/2)
A_dot_B = len(game_set_A & game_set_B)
smoothed_numerator = A_dot_B + alpha
smoothed_denominator = (A_mag * B_mag) + (alpha * feature_size)
return smoothed_numerator/smoothed_denominator
### Demo function call
set_A = set(item.get('item_id') for item in users[0].get('items'))
set_B = set(item.get('item_id') for item in users[1].get('items'))
cosine_similarity_demo = cosine_similarity(set_A, set_B, 0.01, 28000)
print(f'Our two chosen users have a cosine similarity of:\n{cosine_similarity_demo}')
The demo should display this printed output.
Our two chosen users have a cosine similarity of:
0.12979939616717187
The test cell below will always pass. Please submit to collect your free points for cosine_similarity (exercise 7).
### Test Cell - Exercise 7
print('Passed! Please submit.')
create_collab_scores
Your task: define create_collab_scores
as follows:
Use the weights calculated by our similarity function to create weighted scores for the the games in ID_to_games
.
Inputs:
user_key
: A key which can be used to get the games owned by the user out of the ID_to_games
dictionary.ID_to_games
: A dictionary which maps user IDs to a set of games they own.sim_func
: A function which calculates the cosine similarity of two user's game sets. Call it by writing sim_func(user_set_A, user_set_B)
where the arguments are two sets of user games obtained from ID_to_games
.Return:
game_scores
: A dictionary of ALL games scores, calculated as defined below.Requirements/steps:
user_key
with every other user in ID_to_games
.user_key
with itself!Hints
### Solution - Exercise 8
def create_collab_scores(user_key: int, ID_to_games: dict, sim_func):
### BEGIN SOLUTION
game_set_A = ID_to_games[user_key]
cosine_similarities = {
user_ID: sim_func(game_set_A, ID_to_games[user_ID])
for user_ID
in ID_to_games
}
game_scores = defaultdict(int)
for user_ID in ID_to_games:
if user_ID == user_key:
continue
for game in ID_to_games[user_ID]:
game_scores[game] += cosine_similarities[user_ID]
return dict(game_scores)
### END SOLUTION
### Demo function call
# Create our inputs
user_ID_to_games = {
user.get('user_id', None): set(
item.get('item_id', None)
for item
in user.get('items')
) for user in users
}
def wrap_sim_func(alpha, feature_size):
def inner(set_A, set_B):
return cosine_similarity(set_A, set_B, alpha, feature_size)
return inner
demo_sim_func = wrap_sim_func(0.01, 32000)
# Create our collaborative scores
demo_collab_scores = create_collab_scores(users[0]['user_id'], user_ID_to_games, demo_sim_func)
# Inspect our results
collaborative_recs_demo = sorted(demo_collab_scores, key=lambda game: demo_collab_scores[game], reverse=True)[:15]
collaborative_recs_demo = [game_info_lookup_demo[g]['name'] for g in collaborative_recs_demo]
print(f'Collaborative score for "Call of Duty: World at War":\n{demo_collab_scores["10090"]}')
print('Here are the top 15 recommended games for our user, based on collaborative filtering:')
pprint(collaborative_recs_demo)
Whether your solution is working or not, run the following code cell. It will load the proper results into memory and show the expected output for the demo cell above.
with open('resource/asnlib/publicdata/demo_collab_scores.dill', 'rb') as fp:
demo_collab_scores = dill.load(fp)
with open('resource/asnlib/publicdata/demo_collab_recs.dill', 'rb') as fp:
demo_collab_recs = dill.load(fp)
The demo should display this printed output.
Collaborative score for "Call of Duty: World at War":
61.151157105373954
Here are the top 15 recommended games for our user, based on collaborative filtering:
["Garry's Mod",
'Counter-Strike: Global Offensive',
'Left 4 Dead 2',
'Terraria',
'Unturned',
'Portal 2',
'The Elder Scrolls V: Skyrim',
'PAYDAY 2',
'Borderlands 2',
'Counter-Strike: Source',
'Warframe',
'Half-Life 2',
'Portal',
"Sid Meier's Civilization® V",
'Just Cause 2']
The cell below will test your solution for create_collab_scores (exercise 8). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 8
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
execute_tests = dill.load(f)
# Execute test
passed, test_case_vars, e = execute_tests(func=create_collab_scores,
ex_name='create_collab_scores',
key=b'bXITTMSvN2mdmM8cprP7s1wI32RY8znER4wcdi8MGTY=',
n_iter=100)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to create_collab_scores did not pass the test.'
### BEGIN HIDDEN TESTS
passed, test_case_vars, e = execute_tests(func=create_collab_scores,
ex_name='create_collab_scores',
key=b'6eyvyfuHYk8oOnG006W1whQrRHKNi9LV7vkKWcQmRwc=',
n_iter=10,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
if e: raise e
assert passed, 'The solution to create_collab_scores did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
If you've made it this far, congratulations! You are done. Please submit your exam! The remainder of this notebook is a reflection on our approach and food for further thought.
Postscript: Combining Our Results
You might reasonably be wondering: is there a way for us to combine our results from content and collaborative filtering?
The answer is: absolutely! One approach is to simply weight our rankings and then do something similar to how we approached exercise 8. The code below shows an example of how we might do this. Note that there are many other approaches we could also take. This is just a relatively simple approach.
def hybrid_recommendations(content_recs: list, collaborative_recs: list, alpha: float) -> list:
content_weights = [(1/rank) * alpha for rank in range(1, len(content_recs) + 1)]
collaborative_weights = [(1/rank) * (1 - alpha) for rank in range(1, len(collaborative_recs) + 1)]
recommendation_sort_weights = defaultdict(int)
for rec, weight in zip(content_recs, content_weights):
recommendation_sort_weights[rec] += weight
for rec, weight in zip(collaborative_recs, collaborative_weights):
recommendation_sort_weights[rec] += weight
sorted_recs = sorted(recommendation_sort_weights.items(), key=lambda x: x[1], reverse=True)
return sorted_recs
with open('resource/asnlib/publicdata/bernoulli_bayes_recs.dill', 'rb') as fp:
content_recs_demo = dill.load(fp)
with open('resource/asnlib/publicdata/demo_collab_scores.dill', 'rb') as fp:
demo_collab_scores = dill.load(fp)
with open('resource/asnlib/publicdata/demo_collab_recs.dill', 'rb') as fp:
collaborative_recs_demo = dill.load(fp)
hybrid_recs = hybrid_recommendations(content_recs_demo, collaborative_recs_demo, 0.5)
print('Here are our final recommendations, placing equal weight on content and collaborative filtering:')
pprint(hybrid_recs)
A separate question you might have is: "how good is our recommendation system"?
This is a great question. To properly answer this question, we would need to split our data into training and test sets and evaluate some accuracy metric over the test sets. One metric we might be able to use is the MAP@K, or "Mean Average Precision at K". We encourage those of you who are curious to try obtaining some precision metrics and seeing whether you can improve your recommendations!