Golf Performance Analytics
: Strokes Gained Calculations
¶Version 1.0.3
All of the header information is important. Please read it..
Topics number of exercises: This problem builds on your knowledge of string manipulation, regular expressions, JSON processing, data validation, statistical analysis
. It has 11 exercises numbered 0 to 10. There are 19 available points. However to earn 100% the threshold is 15 points. (Therefore once you hit 15 points you can stop. There is no extra credit for exceeding this threshold.)
Exercise ordering: Each exercise builds logically on previous exercises but you may solve them in any order. That is if you can't solve an exercise you can still move on and try the next one. Use this to your advantage as the exercises are not necessarily ordered in terms of difficulty. Higher point values generally indicate more difficult exercises.
Demo cells: Code cells starting with the comment ### Run Me!!!
load results from prior exercises applied to the entire data set and use those to build demo inputs. These must be run for subsequent demos to work properly but they do not affect the test cells. The data loaded in these cells may be rather large (at least in terms of human readability). You are free to print or otherwise use Python to explore them but we may not print them in the starter code.
Debugging your code: Right before each exercise test cell there is a block of text explaining the variables available to you for debugging. You may use these to test your code and can print/display them as needed (careful when printing large objects you may want to print the head or chunks of rows at a time).
Exercise point breakdown:
Exercise 0 - : 2 point(s)
Exercise 1 - : 2 point(s)
Exercise 2 - : 3 point(s)
Exercise 3 - : 1 point(s)
Exercise 4 - : 0 point(s)
Exercise 5 - : 3 point(s)
Exercise 6 - : 1 point(s)
Exercise 7 - : 2 point(s)
Exercise 8 - : 2 point(s)
Exercise 9 - : 1 point(s)
Exercise 10 - : 2 point(s)
Final reminders:
Golf is a sport where players hit a ball with a club into a series of holes using as few attempts as possible.
This exam analyzes PGA Tour shot-level data to understand professional golf performance using a metric called "strokes gained".
Strokes gained is the measure of how much a shot decreases the expected strokes to put the ball in the hole based on the lie and distance at the start and end of the shot. Strokes gained gives a more complete picture of a golfer's performance in different aspects of the sport which would not be captured in the total score.
The analysis you are about to complete hinges on the central limit theorem (CLT). In short the theorem states:
For the purposes of this analysis the CLT allows us to accurately estimate the expected value of all possible shots as well as each player's skill by taking sample means over a large sample.
You will process and analyze golf shot data through these phases:
In our analysis we will be starting with a dataset called raw_hole_details
. This dataset contains information about each hole played by players across multiple professional golf tournaments, including details about the individual strokes taken on each hole.
The variable raw_hole_details
is a list where each element is a dictionary representing a single golf hole played by a player in a tournament round.
[
{
"player_id": 30926,
"tournament_id": "R2024016",
"round_number": 1,
"hole_details": {
"hole_number": 1,
"hole_score": "4",
"hole_yardage": 532,
"stroke_details": [
{
"strokeNumber": 1,
"finalStroke": false,
"distanceRemaining": "156 yds",
"playByPlay": "373 yds to right rough, 156 yds to hole",
"toLocationCode": "ERR",
"fromLocationCode": "OTB",
"toLocation": "Right Rough",
"fromLocation": "Tee Box"
},
{
"strokeNumber": 2,
"finalStroke": false,
"distanceRemaining": "50 yds",
"playByPlay": "106 yds to fairway, 50 yds to hole",
"toLocationCode": "FWY",
"fromLocationCode": "ERR",
"toLocation": "Fairway",
"fromLocation": "Right Rough"
}
// ... more strokes
]
}
}
// ... more hole records
]
player_id
, tournament_id
, round_number
, hole_details
hole_number
, hole_score
, hole_yardage
, and a list of stroke_details
This list of dictionaries contains all the data needed to perform our analysis.
### Global imports
import dill
from cse6040_devkit import plugins, utils
from cse6040_devkit.training_wheels import run_with_timeout, suppress_stdout
import tracemalloc
from time import time
import re
from collections import defaultdict
from statistics import mean
from statsmodels.nonparametric.smoothers_lowess import lowess
import matplotlib.pyplot as plt
from pprint import pprint
utils.add_from_file('defaultdict_to_dict_recursive', utils)
The cell below loads raw_hole_details
as described above. There are some problems with the data which need to be addressed. Many of the holes are missing key pieces of information. Additionally, many of the holes involve scenarios which would require more detailed knowledge of the rules of golf than what we described in the primer.
In the exercise below, you will write some code to identify these holes so we can filter them out and not consider them in the analysis.
### Run Me!!!
raw_hole_details = utils.load_object_from_publicdata('raw_hole_details')
identify_complex_hole
Your task: define identify_complex_hole
as follows:
Analyze the details of a golf hole to identify specific conditions and data completeness.
Args:
hole_details (dict)
: A dictionary containing information about a golf hole, including 'hole_score' and a list of 'stroke_details'.'hole_score' (str)
: The score for the hole, or an empty string if not available.'stroke_details' (list)
: A list of dictionaries, each representing a stroke with keys:'strokeNumber' (int)
: The sequential number of the stroke.'distanceRemaining' (str)
: The remaining distance after the stroke, or an empty string.'playByPlay' (str, optional)
: A description of the stroke, may include keywords like 'penalty', 'drop', or 'provisional'.Returns:
'penalty' (bool)
: True if any stroke 'playByPlay' mentions a "penalty" (case-insensitive). Otherwise, False.'drop' (bool)
: True if any stroke 'playByPlay' mentions a "drop" (case-insensitive). Otherwise, False.'provisional' (bool)
: True if any stroke 'playByPlay' mentions a "provisional" (case-insensitive). Otherwise, False.'has_distances' (bool)
: True if any stroke has a non-empty 'distanceRemaining'. Otherwise, False.'strokes_in_sequence' (bool)
: True if all strokes are in sequential order. Otherwise, False.'no_score' (bool)
: True if 'hole_score' is an empty string. Otherwise, False.Note:
'penalty', 'drop', 'provisional', 'has_distances'
, and 'strokes_in_sequence'
are all False.Implementation Notes
The startercode has a return statement with the correct format, and sets all of the values to False. It is up to you to update them based on the content of hole_details
.
### Solution - Exercise 0
def identify_complex_hole(hole_details):
penalty = False
drop = False
provisional = False
has_distances = False
strokes_in_sequence = False
no_score = False
### BEGIN SOLUTION
if hole_details.get('stroke_details'):
strokes_in_sequence = True
no_score = hole_details.get('hole_score', '') == ''
for i, stroke in enumerate(hole_details['stroke_details']):
if stroke['strokeNumber'] != i + 1:
strokes_in_sequence = False
if stroke['distanceRemaining'] != '':
has_distances = True
play_by_play = stroke.get('playByPlay', '').lower()
if 'penalty' in play_by_play:
penalty = True
if 'drop' in play_by_play:
drop = True
if 'provisional' in play_by_play:
provisional = True
### END SOLUTION
return {
'penalty': penalty,
'drop': drop,
'provisional': provisional,
'has_distances': has_distances,
'strokes_in_sequence': strokes_in_sequence,
'no_score': no_score
}
### Demo function call
test_holes = [
{
'hole_score': '4',
'stroke_details': [
{'strokeNumber': 1, 'distanceRemaining': '156 yds', 'playByPlay': '373 yds to fairway'},
{'strokeNumber': 2, 'distanceRemaining': '33 ft', 'playByPlay': '165 yds to green'},
{'strokeNumber': 3, 'distanceRemaining': '', 'playByPlay': 'In the hole'}
]
},
{
'hole_score': '5',
'stroke_details': [
{'strokeNumber': 1, 'distanceRemaining': '', 'playByPlay': 'Tee shot with penalty'},
{'strokeNumber': 2, 'distanceRemaining': '180 yds', 'playByPlay': 'After drop to fairway'}
]
},
{
'hole_score': '',
'stroke_details': [
{'strokeNumber': 1, 'distanceRemaining': '', 'playByPlay': 'First shot'},
{'strokeNumber': 3, 'distanceRemaining': '', 'playByPlay': 'Provisional ball needed'}
]
}
]
results = []
for i, hole in enumerate(test_holes):
result = identify_complex_hole(hole)
print(f"identify_complex_hole(test_holes[{i}])")
print(f"--> {result}")
results.append(result)
The demo should display this printed output.
identify_complex_hole(test_holes[0])
--> {'penalty': False, 'drop': False, 'provisional': False, 'has_distances': True, 'strokes_in_sequence': True, 'no_score': False}
identify_complex_hole(test_holes[1])
--> {'penalty': True, 'drop': True, 'provisional': False, 'has_distances': True, 'strokes_in_sequence': True, 'no_score': False}
identify_complex_hole(test_holes[2])
--> {'penalty': False, 'drop': False, 'provisional': True, 'has_distances': False, 'strokes_in_sequence': False, 'no_score': True}
The cell below will test your solution for identify_complex_hole (exercise 0). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 0
from cse6040_devkit.tester_fw.testers import Tester
from yaml import safe_load
from time import time
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
executor = dill.load(f)
@run_with_timeout(error_threshold=200.0, warning_threshold=100.0)
@suppress_stdout
def execute_tests(**kwargs):
return executor(**kwargs)
# Execute test
start_time = time()
passed, test_case_vars, e = execute_tests(func=identify_complex_hole,
ex_name='identify_complex_hole',
key=b'Xu3iSVjUVUiK2GstlArLkir4gmMFaLsb37QrwkeA1vE=',
n_iter=103)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to identify_complex_hole did not pass the test.'
### BEGIN HIDDEN TESTS
start_time = time()
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
passed, test_case_vars, e = execute_tests(func=identify_complex_hole,
ex_name='identify_complex_hole',
key=b'n2DFq7sQKymR55EWuGJD3DJTpo_CW7Hw0fqiTGQ9x_Y=',
n_iter=103,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to identify_complex_hole did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
identify_complex_hole
¶We used a correct implementation of identify_complex_hole
in this code snippet to produce simple_hole_details
simple_hole_details = []
for hole in raw_hole_details:
hole_analysis = identify_complex_hole(hole['hole_details'])
keep_hole = (
not hole_analysis['penalty'] and
not hole_analysis['drop'] and
not hole_analysis['no_score'] and
hole_analysis['has_distances'] and
hole_analysis['strokes_in_sequence'] and
not hole_analysis['provisional']
)
if keep_hole:
simple_hole_details.append(hole)
The pre-computed result is loaded in the cell below.
### Run Me!!!
simple_hole_details = utils.load_object_from_publicdata('simple_hole_details')
The data in simple_hole_details
has the same structure as hole_details
- a nested structure where the information about each individual shot is contained within a "hole-level" dictionary. See the example below:
{
'player_id': 30926,
'tournament_id': 'R2024016',
'round_number': 1,
'hole_details': {
'hole_number': 1,
'hole_score': '4',
'hole_yardage': 532,
'stroke_details': [
{
'strokeNumber': 1,
'finalStroke': False,
'distanceRemaining': '156 yds',
'playByPlay': '373 yds to right rough, 156 yds to hole',
'toLocationCode': 'ERR',
'fromLocationCode': 'OTB',
'toLocation': 'Right Rough',
'fromLocation': 'Tee Box'
},
... # three more shot dictionaries for the same player, tournament, round, and hole
]
}
}
We want to "flatten" it into a list of simple (non-nested) dictionaries, where each dictionary represents a single shot. Each individual shot will contribute to the baseline and eventually have a strokes gained value calculated.
extract_shot_records
Your task: define extract_shot_records
as follows:
Flattens the nested dictionary structure of individual_hole_data
into a list of simple dictionaries.
Args:
individual_hole_data (dict)
: A dictionary with the following keys:'player_id' (int)
: Unique identifier for the player.'tournament_id' (str)
: Unique identifier for the tournament.'round_number' (int)
: The round number within the tournament.'hole_details' (dict)
: Contains:'hole_number' (int)
: The number of the hole.'hole_score' (str)
: The score for the hole.'hole_yardage' (int)
: The yardage of the hole.'stroke_details' (list of dict)
: Each dict represents one stroke
and contains the following keys:'strokeNumber' (int)
: The stroke number.'fromLocation' (str)
: The starting lie/location of the shot.'toLocation' (str)
: The ending lie/location of the shot.'distanceRemaining' (str)
: Distance remaining after the shot.'finalStroke' (bool)
: Whether this stroke finished the hole.Returns:
list
of dict
: A list where each element is a dictionary representing a shot record with the following keys:'player_id' (int)
- comes directly from individual_hole_data['player_id']
'tournament_id' (str)
- comes directly from individual_hole_data['tournament_id']
'round_number' (int)
- comes directly from individual_hole_data['round_number']
'hole_number' (int)
- comes directly from hole_details['hole_number']
'score' (str)
- comes directly from hole_details['hole_score']
'yardage' (str)
- hole_details['hole_yardage']
converted to a string'stroke_number' (int)
- comes from stroke['strokeNumber']
'strokes_to_hole' (int)
- calculated as the score
minus the stroke_number
plus one'start_distance' (str)
- the starting distance for the stroke:stroke
, it is the hole's yardage
stroke
s, it is the 'distanceRemaining'
value from the previous stroke
'start_lie' (str)
- comes from directly from stroke['fromLocation']
'end_lie' (str)
- directly from stroke['toLocation']
if available, otherwise 'Hole'
stroke['toLocation']
is an empty string (''
), the 'end_lie'
value is 'Hole'
.'end_distance' (str)
- stroke['distanceRemaining']
if available, otherwise '0'
stroke['distanceRemaining']
is an empty string (''
), the 'end_distance'
value is '0'
.Implementation Notes
individual_hole_data
represents one golfer playing one hole a single time.stroke
in individual_hole_data['hole_details']['stroke_details']
.stroke
) will vary with each stroke
dictionary in the stroke details.stroke
) will be constant.shot_records
and extracts the hole-level values. It's up to you to populate shot_records
.### Solution - Exercise 1
def extract_shot_records(individual_hole_data):
shot_records = []
hole_details = individual_hole_data['hole_details']
stroke_details = hole_details['stroke_details']
hole_level_values = {
'player_id': individual_hole_data['player_id'],
'tournament_id': individual_hole_data['tournament_id'],
'round_number': individual_hole_data['round_number'],
'hole_number': hole_details['hole_number'],
'score': hole_details['hole_score'],
'yardage': str(hole_details['hole_yardage']),
}
### BEGIN SOLUTION
for i, stroke in enumerate(stroke_details):
if i == 0:
start_distance = str(hole_details['hole_yardage'])
else:
start_distance = stroke_details[i-1]['distanceRemaining']
end_distance = stroke['distanceRemaining'] if stroke['distanceRemaining'] else '0'
end_lie = stroke['toLocation'] if stroke['toLocation'] else 'Hole'
shot_records.append({
**hole_level_values,
'stroke_number': stroke['strokeNumber'],
'strokes_to_hole': int(hole_details['hole_score']) - stroke['strokeNumber'] + 1,
'start_distance': start_distance,
'start_lie': stroke.get('fromLocation', ''),
'end_lie': end_lie,
'end_distance': end_distance,
})
### END SOLUTION
return shot_records
### Demo function call
sample_individual_hole_data = {
'hole_details': {
'hole_number': 1,
'hole_score': '4',
'hole_yardage': 532,
'stroke_details': [
{'distanceRemaining': '156 yds', 'strokeNumber': 1, 'fromLocation': 'Tee Box', 'toLocation': 'Right Rough'},
{'distanceRemaining': '33 ft 7 in.', 'strokeNumber': 2, 'fromLocation': 'Primary Rough', 'toLocation': 'Right Intermediate'},
{'distanceRemaining': '7 in', 'strokeNumber': 3, 'fromLocation': 'Intermediate Rough', 'toLocation': 'Green'},
{'distanceRemaining': '', 'strokeNumber': 4, 'fromLocation': 'Green', 'toLocation': ''}
]
},
'player_id': 30926,
'round_number': 1,
'tournament_id': 'R2024016'
}
result = extract_shot_records(sample_individual_hole_data)
pprint(result)
The demo should display this printed output.
[{'end_distance': '156 yds',
'end_lie': 'Right Rough',
'hole_number': 1,
'player_id': 30926,
'round_number': 1,
'score': '4',
'start_distance': '532',
'start_lie': 'Tee Box',
'stroke_number': 1,
'strokes_to_hole': 4,
'tournament_id': 'R2024016',
'yardage': '532'},
{'end_distance': '33 ft 7 in.',
'end_lie': 'Right Intermediate',
'hole_number': 1,
'player_id': 30926,
'round_number': 1,
'score': '4',
'start_distance': '156 yds',
'start_lie': 'Primary Rough',
'stroke_number': 2,
'strokes_to_hole': 3,
'tournament_id': 'R2024016',
'yardage': '532'},
{'end_distance': '7 in',
'end_lie': 'Green',
'hole_number': 1,
'player_id': 30926,
'round_number': 1,
'score': '4',
'start_distance': '33 ft 7 in.',
'start_lie': 'Intermediate Rough',
'stroke_number': 3,
'strokes_to_hole': 2,
'tournament_id': 'R2024016',
'yardage': '532'},
{'end_distance': '0',
'end_lie': 'Hole',
'hole_number': 1,
'player_id': 30926,
'round_number': 1,
'score': '4',
'start_distance': '7 in',
'start_lie': 'Green',
'stroke_number': 4,
'strokes_to_hole': 1,
'tournament_id': 'R2024016',
'yardage': '532'}]
The cell below will test your solution for extract_shot_records (exercise 1). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 1
from cse6040_devkit.tester_fw.testers import Tester
from yaml import safe_load
from time import time
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
executor = dill.load(f)
@run_with_timeout(error_threshold=200.0, warning_threshold=100.0)
@suppress_stdout
def execute_tests(**kwargs):
return executor(**kwargs)
# Execute test
start_time = time()
passed, test_case_vars, e = execute_tests(func=extract_shot_records,
ex_name='extract_shot_records',
key=b'Xu3iSVjUVUiK2GstlArLkir4gmMFaLsb37QrwkeA1vE=',
n_iter=100)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to extract_shot_records did not pass the test.'
### BEGIN HIDDEN TESTS
start_time = time()
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
passed, test_case_vars, e = execute_tests(func=extract_shot_records,
ex_name='extract_shot_records',
key=b'n2DFq7sQKymR55EWuGJD3DJTpo_CW7Hw0fqiTGQ9x_Y=',
n_iter=100,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to extract_shot_records did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
extract_shot_records
¶We used a correct implementation of extract_shot_records
to build raw_shot_records
with the code snippet below.
raw_shot_records = []
for individual_hole_data in simple_hole_details:
hole_records = extract_shot_records(individual_hole_data)
raw_shot_records.extend(hole_records)
It is loaded in a code cell later in the exam, close to where it is used in the analysis.
Distance is a continuous measure, but we want to compute expected values based on these distances. In golf, there is not much difference in difficulty between a 220 yard shot and a 221 yard shot. It makes sense to consider shots within small distance intervals as having the same distance (i.e consider the shots where $220$ yards $ \le$ distance $\lt 230$ yards as all being 230 yards from the hole). This will increase the accuracy of the mean calculations and reduce noise.
Additionally, the distances in all the data we have seen so far is given as a string containing a value and a unit. We can't do math with strings, so we will need to separate the value and unit.
In the exercise below you will parse a distance string into its value and unit, and use rounding to "bin" the values.
parse_distance
Your task: define parse_distance
as follows:
Parses a distance string and normalizes it to the nearest interval in yards or feet.
The function supports the following input formats:
The parsed value is rounded up to the nearest multiple of the specified interval.
Args:
distance_str (str)
: The distance string to parse.yards_interval (int)
: The interval to round up yards values.feet_interval (int)
: The interval to round up feet values.Returns:
tuple
: A tuple (value, unit), where value is the normalized distance (int)
and unit is either 'yds' or 'ft'.
Raises:
ValueError
: If the input string does not match any supported format.Implementation Notes
distance_str
is empty or '0'
.round_up
. This can be used to round up to the next interval as required by this exercise.### Solution - Exercise 2
def parse_distance(distance_str, yards_interval, feet_interval):
from math import ceil
if distance_str in ('0', ''):
return 0, 'ft'
def round_up(value, interval):
if isinstance(value, str):
value = int(value)
return ceil(value / interval) * interval
### BEGIN SOLUTION
yds_match = re.match(r'^(?P<yards>\d+)\s*yds$', distance_str)
no_unit_match = re.match(r'^(?P<yards>\d+)$', distance_str)
ft_in_match = re.match(r'^(?P<feet>\d+)\s*ft\s*(?P<inches>\d+)\s*in$', distance_str)
ft_match = re.match(r'^(?P<feet>\d+)\s*ft$', distance_str)
in_match = re.match(r'^(?P<inches>\d+)\s*in$', distance_str)
value, unit = None, None
if yds_match:
value, unit = yds_match.group('yards'), 'yds'
elif no_unit_match:
value, unit = no_unit_match.group('yards'), 'yds'
elif ft_in_match:
value, unit = int(ft_in_match.group('feet')) + int(ft_in_match.group('inches'))/12, 'ft'
elif ft_match:
value, unit = ft_match.group('feet'), 'ft'
elif in_match:
value, unit = int(in_match.group('inches'))/12, 'ft'
else:
raise ValueError(f"Invalid distance format: {distance_str}")
if unit == 'yds':
interval = yards_interval
elif unit == 'ft':
interval = feet_interval
return round_up(value, interval), unit
### END SOLUTION
### Demo function call
test_cases = [
('167 yds', 10, 1),
('30 ft 6 in', 10, 1),
('3 ft 4 in', 5, 2),
('422', 25, 5),
('15 ft', 10, 3),
('8 in', 10, 1),
('150 meters', 10, 1),
('16.5 yds', 10, 1)
]
results = []
for i, (distance_str, yards_interval, feet_interval) in enumerate(test_cases):
try:
result = parse_distance(distance_str, yards_interval, feet_interval)
print(f"parse_distance(test_cases[{i}][0], test_cases[{i}][1], test_cases[{i}][2])")
print(f"--> {result}")
results.append(result)
except ValueError as e:
print(f"parse_distance(test_cases[{i}][0], test_cases[{i}][1], test_cases[{i}][2])")
print(f"--> ValueError: {e}")
results.append(f"ValueError: {e}")
The demo should display this printed output.
parse_distance(test_cases[0][0], test_cases[0][1], test_cases[0][2])
--> (170, 'yds')
parse_distance(test_cases[1][0], test_cases[1][1], test_cases[1][2])
--> (31, 'ft')
parse_distance(test_cases[2][0], test_cases[2][1], test_cases[2][2])
--> (4, 'ft')
parse_distance(test_cases[3][0], test_cases[3][1], test_cases[3][2])
--> (425, 'yds')
parse_distance(test_cases[4][0], test_cases[4][1], test_cases[4][2])
--> (15, 'ft')
parse_distance(test_cases[5][0], test_cases[5][1], test_cases[5][2])
--> (1, 'ft')
parse_distance(test_cases[6][0], test_cases[6][1], test_cases[6][2])
--> ValueError: Invalid distance format: 150 meters
parse_distance(test_cases[7][0], test_cases[7][1], test_cases[7][2])
--> ValueError: Invalid distance format: 16.5 yds
The cell below will test your solution for parse_distance (exercise 2). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 2
from cse6040_devkit.tester_fw.testers import Tester
from yaml import safe_load
from time import time
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
executor = dill.load(f)
@run_with_timeout(error_threshold=200.0, warning_threshold=100.0)
@suppress_stdout
def execute_tests(**kwargs):
return executor(**kwargs)
# Execute test
start_time = time()
passed, test_case_vars, e = execute_tests(func=plugins.error_handler(parse_distance),
ex_name='parse_distance',
key=b'Xu3iSVjUVUiK2GstlArLkir4gmMFaLsb37QrwkeA1vE=',
n_iter=100)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to parse_distance did not pass the test.'
### BEGIN HIDDEN TESTS
start_time = time()
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
passed, test_case_vars, e = execute_tests(func=plugins.error_handler(parse_distance),
ex_name='parse_distance',
key=b'n2DFq7sQKymR55EWuGJD3DJTpo_CW7Hw0fqiTGQ9x_Y=',
n_iter=100,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to parse_distance did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
Our shot data has many diverse start and end lies. These include "right/left" designations, descriptive names for various bad lies particular to a specific course, etc. We want to simplify these into only a few categories for our analysis (Tee, Fairway, Rough, Bunker, Green, Recovery, and Hole).
The cell below loads mappings for all observed start and end lies in the data to one of the seven standardized lies mentioned above.
It also loads the data prepared with the logic shown in the earlier cell, Application of extract_shot_records
.
In the next exercise you will use provided mappings to standardize the start and end lies for a single shot.
### Run Me!!!
start_lie_map = utils.load_object_from_publicdata('start_lie_map')
end_lie_map = utils.load_object_from_publicdata('end_lie_map')
raw_shot_records = utils.load_object_from_publicdata('raw_shot_records')
standardize_lie
Your task: define standardize_lie
as follows:
Standardizes the lie of a golf shot based on the provided mappings.
Args:
record (dict)
: A dictionary containing the 'start_lie' and 'end_lie' keys. It may contain other keys as well.start_lie_map (dict)
: A mapping of starting lie values to standardized lie values.end_lie_map (dict)
: A mapping of ending lie values to standardized lie values.Returns:
dict
: The record with 'start_lie' and 'end_lie' replaced with their standardized values. A new dictionary is returned, leaving the original record unchanged.Note:
### Solution - Exercise 3
def standardize_lie(record, start_lie_map, end_lie_map):
### BEGIN SOLUTION
start_lie = start_lie_map[record['start_lie']]
end_lie = end_lie_map[record['end_lie']]
return {
**record,
'start_lie': start_lie,
'end_lie': end_lie
}
### END SOLUTION
### Demo function call
test_records = [
{'start_lie': 'Tee Box', 'end_lie': 'Right Fairway', 'foo': 'bar'},
{'start_lie': 'Primary Rough', 'end_lie': 'Green', 'foo': 'bar'},
{'start_lie': 'Fairway', 'end_lie': 'Hole', 'foo': 'bar'},
{'start_lie': 'Green', 'end_lie': 'Hole', 'foo': 'bar'}
]
sample_start_lie_map = {
'Tee Box': 'Tee',
'Primary Rough': 'Rough',
'Fairway': 'Fairway',
'Green': 'Green'
}
sample_end_lie_map = {
'Right Fairway': 'Fairway',
'Green': 'Green',
'Hole': 'Hole'
}
results = []
for i, record in enumerate(test_records):
result = standardize_lie(record, sample_start_lie_map, sample_end_lie_map)
print(f"standardize_lie(test_records[{i}], sample_start_lie_map, sample_end_lie_map)")
print(f"--> {result}")
results.append(result)
The demo should display this printed output.
standardize_lie(test_records[0], sample_start_lie_map, sample_end_lie_map)
--> {'start_lie': 'Tee', 'end_lie': 'Fairway', 'foo': 'bar'}
standardize_lie(test_records[1], sample_start_lie_map, sample_end_lie_map)
--> {'start_lie': 'Rough', 'end_lie': 'Green', 'foo': 'bar'}
standardize_lie(test_records[2], sample_start_lie_map, sample_end_lie_map)
--> {'start_lie': 'Fairway', 'end_lie': 'Hole', 'foo': 'bar'}
standardize_lie(test_records[3], sample_start_lie_map, sample_end_lie_map)
--> {'start_lie': 'Green', 'end_lie': 'Hole', 'foo': 'bar'}
The cell below will test your solution for standardize_lie (exercise 3). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 3
from cse6040_devkit.tester_fw.testers import Tester
from yaml import safe_load
from time import time
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
executor = dill.load(f)
@run_with_timeout(error_threshold=200.0, warning_threshold=100.0)
@suppress_stdout
def execute_tests(**kwargs):
return executor(**kwargs)
# Execute test
start_time = time()
passed, test_case_vars, e = execute_tests(func=standardize_lie,
ex_name='standardize_lie',
key=b'Xu3iSVjUVUiK2GstlArLkir4gmMFaLsb37QrwkeA1vE=',
n_iter=100)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to standardize_lie did not pass the test.'
### BEGIN HIDDEN TESTS
start_time = time()
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
passed, test_case_vars, e = execute_tests(func=standardize_lie,
ex_name='standardize_lie',
key=b'n2DFq7sQKymR55EWuGJD3DJTpo_CW7Hw0fqiTGQ9x_Y=',
n_iter=100,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to standardize_lie did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
parse_distance
and standardize_lie
¶We used correct implementations of parse_distance
and standardize_lie
to standardize raw_shot_records
with this code snippet.
standardized_shot_records = []
for record in raw_shot_records:
standardized_record = standardize_lie(record, start_lie_map, end_lie_map)
standardized_record['start_distance'], standardized_record['start_unit'] = parse_distance(standardized_record['start_distance'], 10, 1)
standardized_record['end_distance'], standardized_record['end_unit'] = parse_distance(standardized_record['end_distance'], 10, 1)
standardized_shot_records.append(standardized_record)
The code cell below loads the result into the environment.
### Run Me!!!
standardized_shot_records = utils.load_object_from_publicdata('standardized_shot_records')
The standardized_shot_records
is ready to work with, and it's time to calculate the expected value (mean strokes to hole) for each distance and lie.
There may be some anomalies where only a few observations occurred for a particular combination. We do not want to include these in our calculations because they may not illustrate the true difficulty of those shots.
calculate_baseline
Example: we have defined calculate_baseline
as follows:
This is an example. You do not need to implement anything here.
Calculates baseline average strokes to hole for each unique combination of start lie, distance unit, and distance. Groups shots by their starting lie, distance unit, and distance, then computes the mean strokes to hole for each group that meets or exceeds the specified minimum count. The result is a nested dictionary structure with rounded mean values.
Args:
shots_lod (list of dict)
: List of shot dictionaries, each containing at least the following keys:'start_lie' (str)
: The starting lie of the shot (e.g., "Fairway").'start_unit' (str)
: The unit of measurement for the starting distance (e.g., "yds" or "ft").'start_distance' (int)
: The starting distance to the hole (e.g., 150).'strokes_to_hole' (int)
: The number of strokes taken to complete the hole from the starting position.min_count (int)
: Minimum number of shots required in a group with the same starting lie, distance unit, and distance to compute a baseline value.Returns:
dict
: Nested dictionary with structure baseline[lie][unit][distance]
(float) = mean strokes to hole (rounded to 3 decimals) for each lie, distance unit, and distance meeting the minimum count criterion.### Solution - Exercise 4
def calculate_baseline(shots_lod, min_count):
observations = defaultdict(lambda: defaultdict(lambda: defaultdict(list)))
# observations[lie][unit][distance] = list of strokes_to_hole observations
# e.g., observations['Fairway']['yds'][170] = [3, 4, 2, 5, 3]
# This is used to accumulate all strokes_to_hole values for each unique (lie, unit, distance) combination.
baseline = defaultdict(lambda: defaultdict(lambda: defaultdict(float)))
# baseline[lie][unit][distance] = mean strokes_to_hole (rounded to 3 decimals)
# e.g., baseline['Fairway']['yds'][170] = 3.5
# This will store the final computed mean values for each (lie, unit, distance) combination.
# populate observations
for shot in shots_lod:
lie = shot['start_lie']
unit = shot['start_unit']
dist = shot['start_distance']
observations[lie][unit][dist].append(shot['strokes_to_hole'])
# compute baseline means for groups meeting min_count
for lie, unit_dict in observations.items():
for unit, dist_dict in unit_dict.items():
for dist, strokes in dist_dict.items():
n = len(strokes)
if n >= min_count:
baseline[lie][unit][dist] = round(float(mean(strokes)), 3)
# Convert defaultdicts to regular dicts for the final output
# You won't need to do this if you prefer to return defaultdicts.
baseline = utils.defaultdict_to_dict_recursive(baseline)
return baseline
### Demo function call
test_scenarios = [
{
'shots': [
{'start_lie': 'Fairway', 'start_unit': 'yds', 'start_distance': 150, 'strokes_to_hole': 3},
{'start_lie': 'Fairway', 'start_unit': 'yds', 'start_distance': 150, 'strokes_to_hole': 4},
{'start_lie': 'Fairway', 'start_unit': 'yds', 'start_distance': 150, 'strokes_to_hole': 2},
{'start_lie': 'Fairway', 'start_unit': 'yds', 'start_distance': 150, 'strokes_to_hole': 3},
{'start_lie': 'Fairway', 'start_unit': 'yds', 'start_distance': 150, 'strokes_to_hole': 4}
],
'min_count': 3
},
{
'shots': [
{'start_lie': 'Green', 'start_unit': 'ft', 'start_distance': 10, 'strokes_to_hole': 1},
{'start_lie': 'Green', 'start_unit': 'ft', 'start_distance': 10, 'strokes_to_hole': 2},
{'start_lie': 'Green', 'start_unit': 'ft', 'start_distance': 10, 'strokes_to_hole': 1},
{'start_lie': 'Green', 'start_unit': 'ft', 'start_distance': 10, 'strokes_to_hole': 1},
{'start_lie': 'Tee', 'start_unit': 'yds', 'start_distance': 400, 'strokes_to_hole': 4},
{'start_lie': 'Tee', 'start_unit': 'yds', 'start_distance': 400, 'strokes_to_hole': 5},
{'start_lie': 'Rough', 'start_unit': 'yds', 'start_distance': 100, 'strokes_to_hole': 3},
{'start_lie': 'Rough', 'start_unit': 'yds', 'start_distance': 100, 'strokes_to_hole': 4},
{'start_lie': 'Rough', 'start_unit': 'yds', 'start_distance': 100, 'strokes_to_hole': 3},
{'start_lie': 'Rough', 'start_unit': 'yds', 'start_distance': 100, 'strokes_to_hole': 2}
],
'min_count': 4
}
]
results = []
for i, scenario in enumerate(test_scenarios):
result = calculate_baseline(scenario['shots'], scenario['min_count'])
print(f"calculate_baseline(test_scenarios[{i}]['shots'], test_scenarios[{i}]['min_count'])")
print(f"--> {result}")
results.append(result)
pprint(results)
The test cell below will always pass. Please submit to collect your free points for calculate_baseline (exercise 4).
### Test Cell - Exercise 4
print('Passed! Please submit.')
calculate_baseline
¶A correct implementation of calculate_baseline
was used in the following code snippet.
raw_baseline = calculate_baseline(standardized_shot_records, 40)
### Run Me!!!
raw_baseline = utils.load_object_from_publicdata('raw_baseline')
Since we filtered out shot/lie combinations without enough observations, there's going to be gaps in our baseline. As it stands, we can't calculate strokes gained for those shots, or any new shots which fall in the gaps. We will resolve this with linear interpolation.
More formally,
If there are $n-1$ missing observations between the observation, $(x_k, y_k)$, and the next observation $(x_{k+n}, y_{k+n})$, then:
interpolate_distances
Your task: define interpolate_distances
as follows:
Interpolates values between given distances at a specified interval. Given a dictionary mapping distances to values, this function generates a list of (distance, value) pairs, including interpolated values at regular intervals between the original distances. The interpolation is linear between each pair of consecutive distances. All values are rounded to three decimal places.
Example:
Args:
distance_dict (dict)
: Dictionary mapping distances (int) to expected strokes to hole values (float).interval (int)
: The interval at which to interpolate values between distances.Returns:
list of tuple
: List of (distance (int), value (float)) pairs, including original and interpolated points.Implementation Notes:
### Solution - Exercise 5
def interpolate_distances(distance_dict, interval):
### BEGIN SOLUTION
interpolated = []
distances = sorted(distance_dict.keys())
for x0, x1 in zip(distances[:-1], distances[1:]):
y0 = distance_dict[x0]
y1 = distance_dict[x1]
n_steps = (x1 - x0) // interval
for i in range(n_steps):
x = x0 + i * interval
y = y0 + ((y1 - y0) / (x1 - x0)) * i * interval
interpolated.append((x, round(y, 3)))
# Ensure the last point is included
interpolated.append((x1, round(y1, 3)))
return interpolated
### END SOLUTION
### Demo function call
test_scenarios = [
{
'distance_dict': {0: 0, 5: 10, 15: 30},
'interval': 5
},
{
'distance_dict': {28: 2.538, 30: 2.308, 31: 2.384, 32: 2.414, 33: 2.444, 34: 2.455, 35: 2.44, 36: 2.385, 37: 2.542, 38: 2.487},
'interval': 1
},
{
'distance_dict': {10: 1.5, 30: 2.8, 60: 4.2},
'interval': 10
}
]
results = []
for i, scenario in enumerate(test_scenarios):
result = interpolate_distances(scenario['distance_dict'], scenario['interval'])
print(f"interpolate_distances(test_scenarios[{i}]['distance_dict'], test_scenarios[{i}]['interval'])")
print(f"--> {result}")
results.append(result)
The demo should display this printed output.
interpolate_distances(test_scenarios[0]['distance_dict'], test_scenarios[0]['interval'])
--> [(0, 0.0), (5, 10.0), (10, 20.0), (15, 30)]
interpolate_distances(test_scenarios[1]['distance_dict'], test_scenarios[1]['interval'])
--> [(28, 2.538), (29, 2.423), (30, 2.308), (31, 2.384), (32, 2.414), (33, 2.444), (34, 2.455), (35, 2.44), (36, 2.385), (37, 2.542), (38, 2.487)]
interpolate_distances(test_scenarios[2]['distance_dict'], test_scenarios[2]['interval'])
--> [(10, 1.5), (20, 2.15), (30, 2.8), (40, 3.267), (50, 3.733), (60, 4.2)]
The cell below will test your solution for interpolate_distances (exercise 5). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 5
from cse6040_devkit.tester_fw.testers import Tester
from yaml import safe_load
from time import time
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
executor = dill.load(f)
@run_with_timeout(error_threshold=200.0, warning_threshold=100.0)
@suppress_stdout
def execute_tests(**kwargs):
return executor(**kwargs)
# Execute test
start_time = time()
passed, test_case_vars, e = execute_tests(func=interpolate_distances,
ex_name='interpolate_distances',
key=b'Xu3iSVjUVUiK2GstlArLkir4gmMFaLsb37QrwkeA1vE=',
n_iter=100)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to interpolate_distances did not pass the test.'
### BEGIN HIDDEN TESTS
start_time = time()
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
passed, test_case_vars, e = execute_tests(func=interpolate_distances,
ex_name='interpolate_distances',
key=b'n2DFq7sQKymR55EWuGJD3DJTpo_CW7Hw0fqiTGQ9x_Y=',
n_iter=100,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to interpolate_distances did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
interpolate_distances
¶We used a correct implementation of interpolate_distances
in the code snippet below to create interpolated_baseline
.
interpolated_baseline = {}
for lie, distance_dict in raw_baseline.items():
interpolated_baseline[lie] = {}
for unit, distances in distance_dict.items():
if unit == 'yds':
interval = 10
else:
interval = 1
interpolated_distances = interpolate_distances(distances, interval)
interpolated_baseline[lie][unit] = dict(interpolated_distances)
Now there's no gaps, but the data is somewhat noisy. Run the code below to see it on scatterplots.
interpolated_baseline = utils.load_object_from_publicdata('interpolated_baseline')
# Plot for yards units
# Plot for yards units (Interpolated Baseline)
plt.figure(figsize=(12, 6))
for lie, units in interpolated_baseline.items():
if 'yds' in units:
distances, values = zip(*sorted(units['yds'].items()))
plt.scatter(distances, values, s=20, alpha=0.7, label=f"{lie} (points)")
plt.title('Interpolated Baseline by Lie (Yards)')
plt.xlabel('Distance (yds)')
plt.ylabel('Interpolated Baseline')
plt.legend()
plt.show()
# Plot for feet units (Interpolated Baseline)
plt.figure(figsize=(12, 6))
for lie, units in interpolated_baseline.items():
if 'ft' in units:
distances, values = zip(*sorted(units['ft'].items()))
plt.scatter(distances, values, s=20, alpha=0.7, label=f"{lie} (points)")
plt.title('Interpolated Baseline by Lie (Feet)')
plt.xlabel('Distance (ft)')
plt.ylabel('Interpolated Baseline')
plt.legend()
plt.show()
Generally, we want the baseline to indicate that a shot is "easier" the closer it is to the hole with all else equal. (i.e. longer shots have higher expected strokes than shorter shots.) That's not the case for our baseline so far because it's still noisy. We want to apply an algorithm to smooth out our baseline.
This exercise is beyond the scope of this exam, but is included for completeness (and FREE!!!).
lowess_smooth
Example: we have defined lowess_smooth
as follows:
This is a FREE exercise, the solution is provided for you!
### Solution - Exercise 6
def lowess_smooth(data, frac):
x_vals, y_vals = zip(*data)
smoothed = lowess(y_vals, x_vals, frac=frac)
return list((k, round(v, 3)) for k, v in smoothed)
The test cell below will always pass. Please submit to collect your free points for lowess_smooth (exercise 6).
### Test Cell - Exercise 6
print('Passed! Please submit.')
lowess_smooth
¶The implementation of lowess_smooth
above was used to create smoothed_baseline
smoothed_baseline = {}
for lie, distance_dict in interpolated_baseline.items():
smoothed_baseline[lie] = {}
for unit, distances in distance_dict.items():
smoothed_distances = lowess_smooth(distances, frac=0.3)
# We must ensure that the smoothed values are at least 1.0.
# Logically a start position outside of the hole must take at least 1 stroke to get to the hole.
smoothed_distances = [(k, max((v, 1.0))) for k, v in smoothed_distances]
smoothed_baseline[lie][unit] = dict(smoothed_distances)
The precomputed smoothed_baseline
is loaded in the cell below. In the subsequent cell, it is shown on a scatterplot.
### Run Me!!!
smoothed_baseline = utils.load_object_from_publicdata('smoothed_baseline')
# Plot for yards units (Smoothed Baseline)
plt.figure(figsize=(12, 6))
for lie, units in smoothed_baseline.items():
if 'yds' in units:
distances, values = zip(*sorted(units['yds'].items()))
plt.scatter(distances, values, s=20, alpha=0.7, label=f"{lie} (points)")
plt.title('Smoothed Baseline by Lie (Yards)')
plt.xlabel('Distance (yds)')
plt.ylabel('Smoothed Baseline')
plt.legend()
plt.show()
# Plot for feet units (Smoothed Baseline)
plt.figure(figsize=(12, 6))
for lie, units in smoothed_baseline.items():
if 'ft' in units:
distances, values = zip(*sorted(units['ft'].items()))
plt.scatter(distances, values, s=20, alpha=0.7, label=f"{lie} (points)")
plt.title('Smoothed Baseline by Lie (Feet)')
plt.xlabel('Distance (ft)')
plt.ylabel('Smoothed Baseline')
plt.legend()
plt.show()
Strokes gained for a single shot is the expected strokes at the start, minus the stroke for the shot itself, minus the expected strokes at the end of the shot.
Mathematically:
Let $E(\text{lie}, \text{unit}, \text{distance})$ be the expected strokes to hole for a given lie, distance, and unit from the baseline. Then, strokes gained is calculated as follows:
$$SG = E(\text{lie}_{\text{start}}, \text{unit}_{\text{start}}, \text{distance}_{\text{start}}) - 1 - E(\text{lie}_{\text{end}}, \text{unit}_{\text{end}}, \text{distance}_{\text{end}})$$In the exercise below, you will implement this formula.
calc_strokes_gained_shot
Your task: define calc_strokes_gained_shot
as follows:
Calculates the strokes gained for a single golf shot based on baseline expected strokes.
Args:
shot (dict)
: A dictionary containing information about the shot, including:start_lie
(str): The lie type at the start of the shot (e.g., 'Fairway', 'Rough').start_distance
(int): The distance from the hole at the start of the shot.start_unit
(str): The unit of distance (e.g., 'yards', 'meters').end_lie
(str): The lie type at the end of the shot (e.g., 'Green', 'Hole').end_distance
(int): The distance from the hole at the end of the shot.end_unit
(str): The unit of distance for the end position.baseline (dict)
: A nested dictionary mapping starting lies (str), units (str), and distances (int) to expected strokes (float).baseline[lie][unit][distance]
gives the expected strokes from that position.Returns:
float
: The strokes gained value for the shot, rounded to three decimal places.strokes_gained = start_expected_strokes - end_expected_strokes - 1
end_lie == 'Hole'
), then end_expected_strokes
is considered 0.### Solution - Exercise 7
def calc_strokes_gained_shot(shot, baseline):
### BEGIN SOLUTION
start_lie = shot['start_lie']
start_distance = shot['start_distance']
start_unit = shot['start_unit']
end_lie = shot['end_lie']
end_distance = shot['end_distance']
end_unit = shot['end_unit']
try:
start_expected_strokes = baseline[start_lie][start_unit][start_distance]
if end_lie == 'Hole':
return round(start_expected_strokes - 1, 3)
end_expected_strokes = baseline[end_lie][end_unit][end_distance]
strokes_gained = start_expected_strokes - (end_expected_strokes + 1)
return round(strokes_gained, 3)
except KeyError:
return None
### END SOLUTION
### Demo function call
test_scenarios = [
{
'shot': {'end_distance': 170, 'end_lie': 'Fairway', 'end_unit': 'yds', 'start_distance': 430, 'start_lie': 'Tee', 'start_unit': 'yds'},
'baseline': {
'Tee': {'yds': {430: 4.25}},
'Fairway': {'yds': {170: 3.278}}
}
},
{
'shot': {'end_distance': 0, 'end_lie': 'Hole', 'end_unit': 'ft', 'start_distance': 10, 'start_lie': 'Green', 'start_unit': 'ft'},
'baseline': {
'Green': {'ft': {10: 1.5}}
}
},
{
'shot': {'end_distance': 50, 'end_lie': 'Rough', 'end_unit': 'yds', 'start_distance': 200, 'start_lie': 'Fairway', 'start_unit': 'yds'},
'baseline': {
'Tee': {'yds': {430: 4.25}}
}
}
]
results = []
for i, scenario in enumerate(test_scenarios):
result = calc_strokes_gained_shot(scenario['shot'], scenario['baseline'])
print(f"calc_strokes_gained_shot(test_scenarios[{i}]['shot'], test_scenarios[{i}]['baseline'])")
print(f"--> {result}")
results.append(result)
pprint(results)
The demo should display this printed output.
calc_strokes_gained_shot(test_scenarios[0]['shot'], test_scenarios[0]['baseline'])
--> -0.028
calc_strokes_gained_shot(test_scenarios[1]['shot'], test_scenarios[1]['baseline'])
--> 0.5
calc_strokes_gained_shot(test_scenarios[2]['shot'], test_scenarios[2]['baseline'])
--> None
[-0.028, 0.5, None]
The cell below will test your solution for calc_strokes_gained_shot (exercise 7). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 7
from cse6040_devkit.tester_fw.testers import Tester
from yaml import safe_load
from time import time
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
executor = dill.load(f)
@run_with_timeout(error_threshold=200.0, warning_threshold=100.0)
@suppress_stdout
def execute_tests(**kwargs):
return executor(**kwargs)
# Execute test
start_time = time()
passed, test_case_vars, e = execute_tests(func=calc_strokes_gained_shot,
ex_name='calc_strokes_gained_shot',
key=b'Xu3iSVjUVUiK2GstlArLkir4gmMFaLsb37QrwkeA1vE=',
n_iter=102)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to calc_strokes_gained_shot did not pass the test.'
### BEGIN HIDDEN TESTS
start_time = time()
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
passed, test_case_vars, e = execute_tests(func=calc_strokes_gained_shot,
ex_name='calc_strokes_gained_shot',
key=b'n2DFq7sQKymR55EWuGJD3DJTpo_CW7Hw0fqiTGQ9x_Y=',
n_iter=102,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to calc_strokes_gained_shot did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
There's no "application" given, because you have to implement that yourself! We need to apply the formula for all of our shot records.
### Run Me!!!
dummy_calc_strokes_gained_shot = utils.load_object_from_publicdata('dummy_calc_strokes_gained_shot')
calc_strokes_gained_all
Your task: define calc_strokes_gained_all
as follows:
Calculates strokes gained for a list of shots and identifies invalid shots.
Args:
shot_records (list of dict)
: List of shot records, where each shot is represented as a dictionary.baseline (dict)
: Baseline data used for strokes gained calculation. Maps starting lies (str), units (str), and distances (int) to expected strokes (float).sg_calc_func (Callable)
: Function that calculates strokes gained for a shot given the baseline.
sg_calc_func(shot_record, baseline)
should return a numeric value or None if the shot is invalid.Returns:
tuple
: A tuple containing two elements:strokes_gained (list of dict)
: List of shot records with strokes gained valuesstrokes_gained
fieldsg_calc_func
returned a valid numeric valueinvalid_shots (list of dict)
: List of original shot records that could not be processedsg_calc_func
returned None### Solution - Exercise 8
def calc_strokes_gained_all(shot_records, baseline, sg_calc_func):
### BEGIN SOLUTION
strokes_gained = []
invalid_shots = []
for shot in shot_records:
sg_value = sg_calc_func(shot, baseline)
if sg_value is not None:
shot_with_sg = shot.copy()
shot_with_sg['strokes_gained'] = sg_value
strokes_gained.append(shot_with_sg)
else:
invalid_shots.append(shot)
return strokes_gained, invalid_shots
### END SOLUTION
### Demo function call
test_scenarios = [
{
'shots': [
{'end_distance': 170.0, 'end_lie': 'Fairway', 'end_unit': 'yds', 'start_distance': 430.0, 'start_lie': 'Tee', 'start_unit': 'yds', 'player_id': 45609},
{'end_distance': 0, 'end_lie': 'Hole', 'end_unit': 'ft', 'start_distance': 10.0, 'start_lie': 'Green', 'start_unit': 'ft', 'player_id': 45609},
{'end_distance': 20, 'end_lie': 'Green', 'end_unit': 'ft', 'start_distance': 150, 'start_lie': 'Fairway', 'start_unit': 'yds', 'player_id': 45609},
{'end_distance': 999.0, 'end_lie': 'NonExistentLie', 'end_unit': 'yds', 'start_distance': 999.0, 'start_lie': 'InvalidLie', 'start_unit': 'yds', 'player_id': 45609},
],
'baseline': {
'Tee': {'yds': {430.0: 4.25}},
'Green': {'ft': {10.0: 1.5, 20: 2.0}},
'Fairway': {'yds': {150: 3.0}}
}
},
{
'shots': [],
'baseline': {'Tee': {'yds': {200: 3.5}}}
}
]
results = []
for i, scenario in enumerate(test_scenarios):
result = calc_strokes_gained_all(scenario['shots'], scenario['baseline'], dummy_calc_strokes_gained_shot)
print(f"calc_strokes_gained_all(test_scenarios[{i}]['shots'], test_scenarios[{i}]['baseline'], dummy_calc_strokes_gained_shot)")
print(f"--> {result}")
results.append(result)
The demo should display this printed output.
calc_strokes_gained_all(test_scenarios[0]['shots'], test_scenarios[0]['baseline'], dummy_calc_strokes_gained_shot)
--> ([{'end_distance': 170.0, 'end_lie': 'Fairway', 'end_unit': 'yds', 'start_distance': 430.0, 'start_lie': 'Tee', 'start_unit': 'yds', 'player_id': 45609, 'strokes_gained': 0.25}, {'end_distance': 0, 'end_lie': 'Hole', 'end_unit': 'ft', 'start_distance': 10.0, 'start_lie': 'Green', 'start_unit': 'ft', 'player_id': 45609, 'strokes_gained': 0.5}, {'end_distance': 20, 'end_lie': 'Green', 'end_unit': 'ft', 'start_distance': 150, 'start_lie': 'Fairway', 'start_unit': 'yds', 'player_id': 45609, 'strokes_gained': 0.0}], [{'end_distance': 999.0, 'end_lie': 'NonExistentLie', 'end_unit': 'yds', 'start_distance': 999.0, 'start_lie': 'InvalidLie', 'start_unit': 'yds', 'player_id': 45609}])
calc_strokes_gained_all(test_scenarios[1]['shots'], test_scenarios[1]['baseline'], dummy_calc_strokes_gained_shot)
--> ([], [])
The cell below will test your solution for calc_strokes_gained_all (exercise 8). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 8
from cse6040_devkit.tester_fw.testers import Tester
from yaml import safe_load
from time import time
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
executor = dill.load(f)
@run_with_timeout(error_threshold=200.0, warning_threshold=100.0)
@suppress_stdout
def execute_tests(**kwargs):
return executor(**kwargs)
# Execute test
start_time = time()
passed, test_case_vars, e = execute_tests(func=calc_strokes_gained_all,
ex_name='calc_strokes_gained_all',
key=b'Xu3iSVjUVUiK2GstlArLkir4gmMFaLsb37QrwkeA1vE=',
n_iter=100)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to calc_strokes_gained_all did not pass the test.'
### BEGIN HIDDEN TESTS
start_time = time()
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
passed, test_case_vars, e = execute_tests(func=calc_strokes_gained_all,
ex_name='calc_strokes_gained_all',
key=b'n2DFq7sQKymR55EWuGJD3DJTpo_CW7Hw0fqiTGQ9x_Y=',
n_iter=100,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to calc_strokes_gained_all did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
calc_strokes_gained_all
¶We used correct implementations of calc_strokes_gained_shot
and calc_strokes_gained_all
in the code snippet below to create strokes_gained_records
- adding the strokes gained metric to the data for each shot.
strokes_gained_records, invalid_shots = calc_strokes_gained_all(
standardized_shot_records, smoothed_baseline, calc_strokes_gained_shot
)
There were a few invalid shots, which we did not load. The cell below loads strokes_gained_records
into the environment.
### Run Me!!!
strokes_gained_records = utils.load_object_from_publicdata('strokes_gained_records')
In golf there are high-level categories for shot scenarios based on the starting lie and distance from the hole:
In the next exercise you will categorize shots into these categories to eventually determine how good or bad a player is in each scenario based on strokes gained.
classify_shot
Your task: define categorize_shot
as follows:
Categorizes a golf shot based on its starting lie and distance unit.
Args:
shot (dict)
: A dictionary containing information about the shot. Returns:
str
: The category of the shot. Possible values are:Note:
### Solution - Exercise 9
def categorize_shot(shot):
### BEGIN SOLUTION
start_lie = shot['start_lie']
start_unit = shot['start_unit']
if start_lie == 'Tee':
return 'Off The Tee'
elif start_lie == 'Green':
return 'Putting'
elif start_unit == 'yds':
return 'Approach'
elif start_unit == 'ft':
return 'Around The Green'
### END SOLUTION
### Demo function call
sample_shots = [
{'start_lie': 'Tee', 'start_unit': 'yds'},
{'start_lie': 'Fairway', 'start_unit': 'yds'},
{'start_lie': 'Green', 'start_unit': 'ft'},
{'start_lie': 'Rough', 'start_unit': 'ft'}
]
results = []
for i, shot in enumerate(sample_shots):
category = categorize_shot(shot)
print(f"categorize_shot(sample_shots[{i}])")
print(f"--> {category}")
results.append(category)
The demo should display this printed output.
categorize_shot(sample_shots[0])
--> Off The Tee
categorize_shot(sample_shots[1])
--> Approach
categorize_shot(sample_shots[2])
--> Putting
categorize_shot(sample_shots[3])
--> Around The Green
The cell below will test your solution for classify_shot (exercise 9). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 9
from cse6040_devkit.tester_fw.testers import Tester
from yaml import safe_load
from time import time
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
executor = dill.load(f)
@run_with_timeout(error_threshold=200.0, warning_threshold=100.0)
@suppress_stdout
def execute_tests(**kwargs):
return executor(**kwargs)
# Execute test
start_time = time()
passed, test_case_vars, e = execute_tests(func=categorize_shot,
ex_name='classify_shot',
key=b'Xu3iSVjUVUiK2GstlArLkir4gmMFaLsb37QrwkeA1vE=',
n_iter=100)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to classify_shot did not pass the test.'
### BEGIN HIDDEN TESTS
start_time = time()
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
passed, test_case_vars, e = execute_tests(func=categorize_shot,
ex_name='classify_shot',
key=b'n2DFq7sQKymR55EWuGJD3DJTpo_CW7Hw0fqiTGQ9x_Y=',
n_iter=100,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to classify_shot did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
categorize_shot
¶The code snippet below uses a correct implementation of categorize_shot
to produce categorized_strokes_gained_records
- adding a category to each shot.
categorized_strokes_gained_records = []
for shot in strokes_gained_records:
category = categorize_shot(shot)
categorized_strokes_gained_records.append({
**shot,
'category': category
})
There are a lot of variables affecting an individual golf shot besides the player's skill. However, by taking the mean strokes gained for each player in each category we will get a good approximation for that player's skill in each scenario.
In the next exercise you will compute the mean strokes gained for each player in each category (as well as the mean across categories) to reveal each player's strengths and weaknesses relative to the other PGA TOUR players.
### Run Me!!!
categorized_strokes_gained_records = utils.load_object_from_publicdata('categorized_strokes_gained_records')
player_strokes_gained_summary
Your task: define summarize_player_strokes_gained
as follows:
Summarizes strokes gained statistics for each player and category. Aggregates total strokes gained and number of shots per player and category, then computes the average strokes gained for each category and overall ("Total"). Returns a nested dictionary mapping player IDs to their category-wise average strokes gained.
Args:
categorized_strokes_gained_records (list of dict)
: List of shot records, each containing 'player_id' (str), 'category' (str), and 'strokes_gained' (float).Returns:
dict
: Nested dictionary where keys are player IDs, values are dictionaries mapping category names (including "Total") to average strokes gained (float) rounded to three decimal places.Note:
defaultdict
results.### Solution - Exercise 10
def summarize_player_strokes_gained(categorized_strokes_gained_records):
### BEGIN SOLUTION
summary = defaultdict(lambda: defaultdict(lambda: defaultdict(float)))
for shot in categorized_strokes_gained_records:
player_id = shot['player_id']
category = shot['category']
summary[player_id][category]['total_strokes_gained'] += shot['strokes_gained']
summary[player_id][category]['num_shots'] += 1
summary[player_id]['Total']['total_strokes_gained'] += shot['strokes_gained']
summary[player_id]['Total']['num_shots'] += 1
for player_id, categories in summary.items():
for category, stats in categories.items():
if stats['num_shots'] > 0:
average_strokes_gained = round(
stats['total_strokes_gained'] / stats['num_shots'], 3)
else:
average_strokes_gained = 0.0
del stats
categories[category] = average_strokes_gained
return utils.defaultdict_to_dict_recursive(summary)
### END SOLUTION
### Demo function call
sample_categorized_records = [
{'player_id': 45609, 'category': 'Off The Tee', 'strokes_gained': 0.041},
{'player_id': 45609, 'category': 'Approach', 'strokes_gained': -0.037},
{'player_id': 45609, 'category': 'Putting', 'strokes_gained': -0.21},
{'player_id': 45609, 'category': 'Off The Tee', 'strokes_gained': 0.15},
{'player_id': 45609, 'category': 'Approach', 'strokes_gained': 0.08},
{'player_id': 12345, 'category': 'Putting', 'strokes_gained': 0.25},
{'player_id': 12345, 'category': 'Around The Green', 'strokes_gained': -0.10},
{'player_id': 12345, 'category': 'Putting', 'strokes_gained': 0.18},
{'player_id': 67890, 'category': 'Off The Tee', 'strokes_gained': 0.12},
{'player_id': 67890, 'category': 'Approach', 'strokes_gained': 0.05},
{'player_id': 67890, 'category': 'Around The Green', 'strokes_gained': -0.03}
]
result = summarize_player_strokes_gained(sample_categorized_records)
pprint(result)
The demo should display this printed output.
{12345: {'Around The Green': -0.1, 'Putting': 0.215, 'Total': 0.11},
45609: {'Approach': 0.022,
'Off The Tee': 0.096,
'Putting': -0.21,
'Total': 0.005},
67890: {'Approach': 0.05,
'Around The Green': -0.03,
'Off The Tee': 0.12,
'Total': 0.047}}
The cell below will test your solution for player_strokes_gained_summary (exercise 10). The testing variables will be available for debugging under the following names in a dictionary format.
input_vars
- Input variables for your solution. original_input_vars
- Copy of input variables from prior to running your solution. Any key:value
pair in original_input_vars
should also exist in input_vars
- otherwise the inputs were modified by your solution. returned_output_vars
- Outputs returned by your solution. true_output_vars
- The expected output. This should "match" returned_output_vars
based on the question requirements - otherwise, your solution is not returning the correct output. ### Test Cell - Exercise 10
from cse6040_devkit.tester_fw.testers import Tester
from yaml import safe_load
from time import time
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
# Load testing utility
with open('resource/asnlib/publicdata/execute_tests', 'rb') as f:
executor = dill.load(f)
@run_with_timeout(error_threshold=200.0, warning_threshold=100.0)
@suppress_stdout
def execute_tests(**kwargs):
return executor(**kwargs)
# Execute test
start_time = time()
passed, test_case_vars, e = execute_tests(func=summarize_player_strokes_gained,
ex_name='player_strokes_gained_summary',
key=b'Xu3iSVjUVUiK2GstlArLkir4gmMFaLsb37QrwkeA1vE=',
n_iter=100)
# Assign test case vars for debugging
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to player_strokes_gained_summary did not pass the test.'
### BEGIN HIDDEN TESTS
start_time = time()
tracemalloc.start()
mem_start, peak_start = tracemalloc.get_traced_memory()
print(f"initial memory usage: {mem_start/1024/1024:.2f} MB")
passed, test_case_vars, e = execute_tests(func=summarize_player_strokes_gained,
ex_name='player_strokes_gained_summary',
key=b'n2DFq7sQKymR55EWuGJD3DJTpo_CW7Hw0fqiTGQ9x_Y=',
n_iter=100,
hidden=True)
input_vars, original_input_vars, returned_output_vars, true_output_vars = test_case_vars
duration = time() - start_time
print(f"Test duration: {duration:.2f} seconds")
current_memory, peak_memory = tracemalloc.get_traced_memory()
print(f"memory after test: {current_memory/1024/1024:.2f} MB")
print(f"memory peak during test: {peak_memory/1024/1024:.2f} MB")
tracemalloc.stop()
if e: raise e
assert passed, 'The solution to player_strokes_gained_summary did not pass the test.'
### END HIDDEN TESTS
print('Passed! Please submit.')
summarize_player_strokes_gained
¶We used a correct implementation of summarize player_strokes_gained
to create player_strokes_gained_summary
- which contains the overall mean strokes gained and means within each shot category for all players in the data set.
player_strokes_gained_summary = summarize_player_strokes_gained(categorized_strokes_gained_records)
### Run Me!!!
player_strokes_gained_summary = utils.load_object_from_publicdata('player_strokes_gained_summary')
You have successfully completed a comprehensive golf analytics pipeline, transforming raw PGA TOUR data into meaningful performance insights. You've implemented key techniques used by professional golf analysts to evaluate player performance and identify strategic patterns.
In addition to evaluating elite professionals, the same techniques are used in paid software subscriptions and by PGA professionals to evaluate golfers at all levels as part of the multi-billion dollar golfing industry.
If you have made it this far, congratulations! Remember to submit the exam to ensure you receive all the points you have earned!