Problem 7: Tensor computations

This problem is about tensors, which generalize matrices from linear algebra. It tests whether you can read an abstract description of a computational problem and translate it into an efficient implementation.

In particular, you need to implement a computational operation of the form,

$$ A(i, r) = \sum_{j=0}^{N_1-1} \sum_{k=0}^{N_2-1} \sum_{l=0}^{N_3-1} T(i, j, k, l) \cdot B(j, r) \cdot C(k, r) \cdot D(l, r). $$

Observe that it involves not only matrices ($A$, $B$, $C$, and $D$), but a tensor, $T$, which in this case has 4 "modes" (i.e., indices or axes) rather than just 2 as in the case of matrices. While that may, at first glance, seem innocuous enough, here is the catch: we will give you one implementation, and your goal is to create another one that is faster. You will get some points for correctness on a small dataset, and additional points for correctness and speed on a larger dataset.

We believe most of you will find this problem to be hard. So plan accordingly. The problem has two parts, or "exercises."

  • Exercise 0 is about the input dataset and contains just one warm-up exercise worth 2 points.
  • Exercise 1 is about implementing the above operation, with up to 3 points for correctness and 5 points for speed.

Exercise 0 (2 points): The Blog Authorship Corpus Dataset

In this problem, we are providing a random subset of blogs taken from the Blog Authorship Corpus, which in turn were taken from blogger.com in 2004.

Each blog consists of several posts. Each post consists of words. For each post, we went ahead and created "word co-occurrence pairs" based on whatever words appear close to one another within the post. Lastly, the corpus has several pieces of metadata, namely, the author's self-reported gender, age, job sector, and Western zodiac sign.

Let's use Pandas to read in these data, which are stored in comma-separated value (CSV) files. To do that, run the following code cell.

In [1]:
from IPython.display import display
import pandas as pd

def get_timer(): # Used in some timing experiments
    from time import perf_counter
    return perf_counter()

def read_blogs(tag, verbose=True):
    def msg(f):
        if verbose: print("Reading {}...".format(f))
    
    from os.path import isdir, isfile
    
    ROOT = './resource/asnlib/publicdata/blogs/' if isdir('.voc') else './publicdata/blogs/'
    BLOGGERS_FILE = '{}bloggers{}.csv'.format(ROOT, tag)
    POSTS_FILE = '{}posts{}.csv'.format(ROOT, tag)
    OCCUR_FILE = '{}occur{}.csv'.format(ROOT, tag)
    TENSOR_COORDS_DF_SOLN_FILE = '{}tensor_coords_df_soln{}.csv'.format(ROOT, tag)

    t_read__ = get_timer()
    msg(BLOGGERS_FILE)
    bloggers_df = pd.read_csv(BLOGGERS_FILE)
    msg(POSTS_FILE)
    posts_df = pd.read_csv(POSTS_FILE)
    msg(OCCUR_FILE)
    occur_df = pd.read_csv(OCCUR_FILE)    
    from os.path import isfile
    if isfile(TENSOR_COORDS_DF_SOLN_FILE):
        msg(TENSOR_COORDS_DF_SOLN_FILE)
        soln_df = pd.read_csv(TENSOR_COORDS_DF_SOLN_FILE)
    else:
        soln_df = None
    t_read__ = get_timer() - t_read__
    if verbose:
        print("==> Total loading time: ~ {:.1f} seconds.".format(t_read__))

    return bloggers_df, posts_df, occur_df, soln_df

# Read in a small sample dataset:
bloggers_df, posts_df, occur_df, soln_df = read_blogs('-25x2')
Reading ./resource/asnlib/publicdata/blogs/bloggers-25x2.csv...
Reading ./resource/asnlib/publicdata/blogs/posts-25x2.csv...
Reading ./resource/asnlib/publicdata/blogs/occur-25x2.csv...
Reading ./resource/asnlib/publicdata/blogs/tensor_coords_df_soln-25x2.csv...
==> Total loading time: ~ 0.4 seconds.

The cell produces three notable dataframes: bloggers_df, posts_df, and occur_df. Let's quickly look at each one in turn.

Bloggers (bloggers_df). The first dataframe consists of bloggers:

In [2]:
print("There are {} bloggers.".format(len(bloggers_df)))
display(bloggers_df.head())
There are 25 bloggers.
blog_id age gender sector zodiac
0 0 17 female Student Cancer
1 1 23 male Student Gemini
2 2 26 male Technology Libra
3 3 17 male indUnk Virgo
4 4 25 female indUnk Leo

Each blogger has a unique integer ID number, blog_id, which starts at zero. The age is another integer; the gender, job sector, and zodiac signs are strings.

For example, suppose the following were a row in the dataframe:

blog_id age gender sector zodiac
11 16 male Student Taurus

Then that would indicate that blogger #11 is 16 years of age, male, a student, and is a Taurus.

Posts (posts_df). The second dataframe is a list of posts:

In [3]:
print("There are {} posts.".format(len(posts_df)))
display(posts_df.head())
There are 520 posts.
post_id blog_id
0 0 0
1 1 0
2 2 0
3 3 0
4 4 0

Each post has a unique integer ID, post_id, numbered starting at zero. The author is given by blog_id, his or her integer ID. For instance, suppose this row were in the table:

post_id blog_id
2291 94

Then that would indicate that blogger #94 wrote a post whose ID number is #2291.

Co-occurrences (occur_df). This third dataframe is the list of co-occurrences:

In [4]:
print("There are {} co-occurrence triplets.".format(len(occur_df)))
occur_df.head()
There are 97911 co-occurrence triplets.
Out[4]:
word_i word_j post_id
0 10 11 0
1 22 23 0
2 3 35 0
3 2 18 0
4 5 6 0

The column word_i is an integer ID of a word; the column word_j is an integer ID of a second word; and the column post_id is the integer ID of a post. Each row means that the two corresponding words appeared "near" one another within given post. For example, suppose the following row were part of the table:

word_i word_j post_id
37 1738 598

Then that would mean word #37 and word #1738 appeared "close together" in post #598.

Exercise 0 (2 points). Suppose we assign each Western zodiac sign the following integer IDs (run the next cell to see the IDs):

In [5]:
zodiac_signs = ['Aquarius', 'Pisces', 'Aries', 'Taurus', 'Gemini',
                'Cancer', 'Leo', 'Virgo', 'Libra', 'Scorpio', 'Sagittarius', 'Capricorn']
zodiac_ids = {s: zodiac_signs.index(s) for s in zodiac_signs}
zodiac_ids
Out[5]:
{'Aquarius': 0,
 'Pisces': 1,
 'Aries': 2,
 'Taurus': 3,
 'Gemini': 4,
 'Cancer': 5,
 'Leo': 6,
 'Virgo': 7,
 'Libra': 8,
 'Scorpio': 9,
 'Sagittarius': 10,
 'Capricorn': 11}

Write a function,

def add_zodiac_ids(occur_df, posts_df, bloggers_df):

that takes the three input dataframes as input and returns a copy of occur_df with one new column, zodiac_id, containing the integer ID of the Zodiac sign for the blogger who authored the post.

For example, suppose occur_df has the following row:

word_i word_j post_id
126 2618 165

If you look up post #165 in posts_df, you'll see the following entry:

post_id blog_id
165 11

And blogger #11 has these attributes, according to bloggers_df:

blog_id age gender sector zodiac
11 16 male Student Taurus

Since a Taurus has an ID of 3, your output should contain a row of the form,

word_i word_j post_id zodiac_id
126 2618 165 3
In [6]:
import numpy as np
def add_zodiac_ids(occur_df, posts_df, bloggers_df):
    ### BEGIN SOLUTION
    #posts_zodiac_df = posts_df.merge(bloggers_df[['blog_id', 'zodiac']], how='left', on='blog_id')
    #posts_zodiac_df['zodiac_id'] = posts_zodiac_df['zodiac'].apply(lambda z: zodiac_ids[z])
    #occur_zodiac_df = occur_df.merge(posts_zodiac_df[['post_id', 'zodiac_id']], how='left', on='post_id')
    #return occur_zodiac_df
    tempdf = occur_df.merge(posts_df, on='post_id')
    tempdf = tempdf.merge(bloggers_df, on='blog_id')    
    tempdf['zodiac_id'] = tempdf['zodiac'].map(zodiac_ids)    
    return tempdf.drop(['blog_id', 'age', 'gender', 'sector', 'zodiac'], axis=1)
    ### END SOLUTION

# Preview your result:
occur_zodiac_df = add_zodiac_ids(occur_df, posts_df, bloggers_df)
assert type(occur_zodiac_df) is pd.DataFrame, "Does your function return a Pandas DataFrame?"

print("First few rows:")
display(occur_zodiac_df.head())

print("\nRow from the example:")
occur_zodiac_df[(occur_zodiac_df['word_i'] == 126)
                & (occur_zodiac_df['word_j'] == 2618)
                & (occur_zodiac_df['post_id'] == 165)]
First few rows:
word_i word_j post_id zodiac_id
0 10 11 0 5
1 22 23 0 5
2 3 35 0 5
3 2 18 0 5
4 5 6 0 5
Row from the example:
Out[6]:
word_i word_j post_id zodiac_id
26818 126 2618 165 3
In [7]:
# Test cell: `occur_zodiac_df_test`

def canonicalize_tibble(X):
    var_names = sorted(X.columns)
    Y = X[var_names].copy()
    Y.sort_values(by=var_names, inplace=True)
    Y.reset_index(drop=True, inplace=True)
    return Y

def tibbles_are_equivalent(A, B):
    A_hat = canonicalize_tibble(A)
    B_hat = canonicalize_tibble(B)
    equal = (A_hat == B_hat)
    return equal.all().all()

print("Checking equivalence...")
assert tibbles_are_equivalent(occur_zodiac_df,
                              soln_df), "Your result does not match the sample solution."
print("\n(Passed!)")
Checking equivalence...

(Passed!)

Background for the next exercise: Tensor analysis and the "MTTKRP"

Matrices are suitable for looking at 2-way relations (between rows and columns), as in the pairwise association mining problem (Topic 2) and PageRank (Topic 11).

However, a dataset may often be naturally decomposed into more than two features. For example, consider this recent blog post by Chris Moody, a data analyst at the e-commerce clothing site, Stitch Fix.

Moody considers the problem of mining user comments using, as an example, a three-way data representation: two to represent a word pair and the third to represent the document (i.e., comment) in which a word pair co-occurs. That is, imagine a 3-D table $T$, where each entry, $T(i, j, k)$, somehow measures the association of word $i$, word $j$, and document $k$. The left-hand side of the following image illustrates this concept.

The acronym "PMI" in this figure stands for pointwise mutual information, but for the purpose of this problem, you don't need to know anything about this concept.

Tensor decomposition

Definition: tensors and tensor modes. We call this type of multiway table a tensor. In particular, a $d$-way tensor is one with $d$ "axes" or modes.

You've already seen tensors where $d \leq 2$: a scalar is a 0-way tensor; a vector is a 1-way tensor; and a matrix is a 2-way tensor.

In Exercise 0 above, you created occur_zodiac_df. Each row could be interpreted as the coordinates of a 4-way tensor, and indeed, we will use it as such below.

Tensor decompositions. Moody's computational analysis task is to "factor" a $d$-way tensor into a kind of multiplication among $d$ matrices, as illustrated in the right-hand side of the figure above. This factoring is also known as a tensor decomposition. You don't need to know anything specific about tensor decompositions, only that they motivated this exam problem.

The computational bottleneck: an "MTTKRP." The slowest part of many tensor decompositions is an operation known by the (regrettable) acronym, MTTKRP, which stands for "matricized tensor times Katri-Rao product." (Try saying that five times quickly in succession!) There are several variations, but here is the one you need in this problem.

Suppose $d=4$. Next, let $N_0$, $N_1$, $N_2$, $N_3$, and $R$ be positive integers. Also let

  • $T$ be a 4-way tensor of size $N_0 \times N_1 \times N_2 \times N_3$;
  • $A$ be a matrix (2-way tensor) of size $N_0 \times R$;
  • $B$ be another matrix of size $N_1 \times R$;
  • $C$ be a third matrix of size $N_2 \times R$; and
  • $D$ be a fourth matrix of size $N_3 \times R$.

The mode-0 MTTKRP computes $A$ from $T$, $B$, $C$, and $D$ as follows. For all $0 \leq i < N_0$ and $0 \leq r < R$,

$$ A(i, r) = \sum_{j=0}^{N_1-1} \sum_{k=0}^{N_2-1} \sum_{l=0}^{N_3-1} T(i, j, k, l) \cdot B(j, r) \cdot C(k, r) \cdot D(l, r). $$

We will also refer to $T$ as the input tensor; $B$, $C$, and $D$ as the input factors; and $A$ as the output factor.

Assumptions. In data analysis problems, $T$ is typically very sparse, meaning it is mostly zero. By contrast, the factor matrices $A$, $B$, $C$, and $D$ are typically dense matrices. Also, the factors are usually "tall-and-skinny," meaning $R$ is small compared to $N_0$, $N_1$, $N_2$, and $N_3$.

The baseline MTTKRP in "pure Python"

Based on those assumptions, we have provided you with a baseline implementation written in "pure" Python.

  • It uses a dictionary to store the (sparse) tensor $T$, where each key is an $(i, j, k, l)$ tuple of coordinate indices and the value is the corresponding tensor entry, $T(i, j, k, l)$.
  • For the factor matrices ($A, B, C, D$), it assumes dense row-major storage using flat lists.

The next code cell implements this approach. Take a minute to inspect it and compare it to the formula for $A(i, r)$. Here is the same formula repeated, so it's easier to compare against the code: for all $0 \leq i < N_0$ and $0 \leq r < R$,

$$ A(i, r) = \sum_{j=0}^{N_1-1} \sum_{k=0}^{N_2-1} \sum_{l=0}^{N_3-1} T(i, j, k, l) \cdot B(j, r) \cdot C(k, r) \cdot D(l, r). $$
In [8]:
def mttkrp_0__py(T, B, C, D, N0, R):
    assert R >= 1
    assert (type(T) is dict) and all([type(X) is list for X in [B, C, D]])
    
    A = [0.] * (N0 * R)
    for (i, j, k, l), T_ijkl in T.items():
        for r in range(R):
            A[i*R + r] += T_ijkl * B[j*R + r] * C[k*R + r] * D[l*R + r]
    return A

Now run the following two code cells. The first one creates a sparse tensor, T_py, using the coordinates from occur_zodiac_df and random nonzero values. It also creates the random input factor matrices, using $R=6$. The second cell runs mttkrp_0__py to compute the output factor, and measures how long it takes.

In [9]:
from cse6040bench import benchit

def get_timer(): # Used in some timing experiments
    from time import perf_counter
    return perf_counter()

def uniform_random_values(K):
    """Returns a Python list having N uniformly sampled values in (-1, 1)."""
    from random import uniform
    return [uniform(-1, 1) for _ in range(K)]

# Construct a 4-way tensor from `occur_zodiac_df`
def assemble_random_tensor_4way_from_df(df, columns):
    from random import uniform
    assert type(columns) is list and all([type(c) is str for c in columns])
    ci, cj, ck, cl = columns[0], columns[1], columns[2], columns[3]
    I, J, K, L = list(df[ci]), list(df[cj]), list(df[ck]), list(df[cl])
    V = [uniform(-1, 1) for _ in range(len(I))]
    T = {(i, j, k, l): v for i, j, k, l, v in zip(I, J, K, L, V)}
    return T

def generate_inputs(df, R=None):
    assert R is not None
    
    num_words = max(occur_zodiac_df['word_i'].max(), occur_zodiac_df['word_j'].max()) + 1
    num_posts = occur_zodiac_df['post_id'].max() + 1
    num_zodiac = occur_zodiac_df['zodiac_id'].max() + 1

    N0, N1, N2, N3 = num_words, num_words, num_posts, num_zodiac
    print("Using a 4-way tensor of size N0 x N1 x N2 x N3 == {} x {} x {} x {}.".format(N0, N1, N2, N3))
    print("There are {} nonzero entries.".format(len(df)))
    print("Using factors with {} columns.".format(R))

    print("\nGenerating tensor... (may take a minute)")
    t0__ = get_timer()
    T_py = assemble_random_tensor_4way_from_df(occur_zodiac_df, ['word_i', 'word_j', 'post_id', 'zodiac_id'])
    print("==> Done generating inputs. (~ {:.1f} seconds)".format(get_timer() - t0__))
    
    print("\nGenerating input factors...")
    B_py = uniform_random_values(N1 * R)
    C_py = uniform_random_values(N2 * R)
    D_py = uniform_random_values(N3 * R)
    print("==> Done.")

    return T_py, B_py, C_py, D_py, N0, N1, N2, N3, R

# Demo: Generate random-valued inputs
T_py, B_py, C_py, D_py, N0, N1, N2, N3, R = generate_inputs(occur_zodiac_df, R=6)
Using a 4-way tensor of size N0 x N1 x N2 x N3 == 11653 x 11653 x 520 x 10.
There are 97911 nonzero entries.
Using factors with 6 columns.

Generating tensor... (may take a minute)
==> Done generating inputs. (~ 0.1 seconds)

Generating input factors...
==> Done.
In [10]:
def generate_solution(T_py, B_py, C_py, D_py, N0, R):
    print("Executing the baseline MTTKRP method...")
    t1__ = get_timer()
    A_py = mttkrp_0__py(T_py, B_py, C_py, D_py, N0, R)
    print("==> Done with one instance. (~ {:.1f} seconds)".format(get_timer() - t1__))
    print("""
==> NOTE: This timing is only a quick estimate;
it is based on only one measurement, and so should
not be regarded as reliable. (Later, we will test
more rigorously.)
""")
    return A_py
    
# Demo
A_py = generate_solution(T_py, B_py, C_py, D_py, N0, R)
Executing the baseline MTTKRP method...
==> Done with one instance. (~ 0.3 seconds)

==> NOTE: This timing is only a quick estimate;
it is based on only one measurement, and so should
not be regarded as reliable. (Later, we will test
more rigorously.)

Exercise 1 (8 points): Implement a faster Mode-0 MTTKRP

Your task is now to implement a version of the mode-0 MTTKRP that is faster than the pure Python one.

To start, think about how you want to store the data objects. The pure Python method used a dictionary keyed on tuples to store the tensor and row-major lists to store the matrices. What are some alternatives? You'll want to think carefully about your approach. In particular, although we will test your code for correctness on a small example, we will perform timings using both a small and a (relatively) large example. So be sure you think about efficient use of time and storage.

Next, think about how your choice affects how the MTTKRP operation will look in code.

There are many potential approaches. One idea is to view the sparse MTTKRP operation as a relational query, but do not feel constrained by this suggestion.

Once you think you have a promising approach, you'll need to implement it by creating three different functions. You must create these; our testing and timing code will assume their existence.

The first function is convert_inputs():

def convert_inputs(T_py, B_py, C_py, D_py, N0, N1, N2, N3, R):
        ...

    T, B, C, D = convert_inputs(...)

It should translate the baseline's data structures into your alternative method. Our test code needs convert_inputs() to ensure that your implementation has the same inputs as the baseline. Your function should return the converted objects.

We will not grade based on the speed of convert_inputs(), so it's okay if it is not very fast. However, it should not be too slow, either, as you don't want the autograder to time out!

The second function you need to write is mttkrp_0(), which should use your converted objects to calculate the mode-0 MTTKRP:

def mttkrp_0(T, B, C, D, N0, N1, N2, N3, R):
        ...

    A = mttkrp_0(T, B, ...)

Your mttkrp_0() should return the output factor, A, as your alternative object type.

We will grade based on how fast mttkrp_0() is. Your implementation needs to be at least two-times ($2\times$) faster than the baseline to get full credit. (There is some partial credit if is faster but does not meet the target above.)

The third function is convert_output(). It should take the result of your mttkrp_0() function and convert it back into a row-major list:

def convert_output(A, N0, R):
        ...

    A_py_result = convert_output(A, N0, R)

We need convert_output() so we can check your output against the baseline.

Lastly, before you start, note that there are three separate code cells, one for each of the above functions. After these cells, there are several more for testing and timing cells.

$$ A(i, r) = \sum_{j=0}^{N_1-1} \sum_{k=0}^{N_2-1} \sum_{l=0}^{N_3-1} T(i, j, k, l) \cdot B(j, r) \cdot C(k, r) \cdot D(l, r). $$
In [11]:
import numpy as np
import random
In [12]:
def convert_inputs(T_py, B_py, C_py, D_py, N0, N1, N2, N3, R):
    assert type(T_py) is dict and all([type(X) is list for X in [B_py, C_py, D_py]])
    assert len(B_py) >= N1*R and len(C_py) >= N2*R and len(D_py) >= N3*R
    
    ### BEGIN SOLUTION
    # This approach uses Pandas to store the tensor
    # and Numpy to store the matrices
    T = py_tensor_to_df(T_py, columns=['i', 'j', 'k', 'l', 'v'])
    B = py_matrix_to_np(B_py, N1, R)
    C = py_matrix_to_np(C_py, N2, R)
    D = py_matrix_to_np(D_py, N3, R)
    return T, B, C, D
    
def py_tensor_to_df(T_py, columns):
    indices = list(T_py.keys())
    values = list(T_py.values())
    index_colnames = columns[:-1]
    value_colname = columns[-1]
    T = pd.DataFrame(indices, columns=index_colnames)
    T[value_colname] = values
    return T

def py_matrix_to_np(X, M, N):
    return np.array(X).reshape((M, N))
    ### END SOLUTION
    
print("Calling your conversion routine...")
T, B, C, D = convert_inputs(T_py, B_py, C_py, D_py, N0, N1, N2, N3, R)
print(len(T_py))
print(len(B_py))
print(len(C_py))
print(len(D_py))
print(N0)
print("==> Done. At least it runs without crashing!")
Calling your conversion routine...
97911
69918
3120
60
11653
==> Done. At least it runs without crashing!
In [13]:
def mttkrp_0(T, B, C, D, N0, N1, N2, N3, R):
    ### BEGIN SOLUTION
    I = T['i']
    J = T['j']
    K = T['k']
    L = T['l']
    T_values = T['v'].values
    
    # Elementwise products: E_np(i, r), an N0 x R matrix:
    E_np = T_values[:, np.newaxis] * B[J, :] * C[K, :] * D[L, :]
    
    # Convert to dataframe so we can group-by I
    E_df = pd.DataFrame(E_np) # Convert to dataframe
    E_df['i'] = I
    
    # For each (i, r), evaluate the sum over (j, k, l).
    # Result is a N0_hat x R array, where N0_hat <= N0.
    # (There may be fewer rows because output might be sparse.)
    A_df = E_df.groupby('i').sum()
    
    # Return the result as a dense Numpy array
    A = np.zeros((N0, R))
    A[A_df.index, :] = A_df.values
    return A
    ### END SOLUTION
    
print("Calling your mode-0 MTTKRP...")
t2__ = get_timer()
A = mttkrp_0(T, B, C, D, N0, N1, N2, N3, R)
print("  ==> Done (~ {:.3f} seconds).".format(get_timer() - t2__))

print("""\n==> NOTE: This timing is only a quick estimate;
it is based on only one measurement, and so should
not be regarded as reliable. (Later, we will test
more rigorously, after a correctness check.)
""")
Calling your mode-0 MTTKRP...
  ==> Done (~ 0.019 seconds).

==> NOTE: This timing is only a quick estimate;
it is based on only one measurement, and so should
not be regarded as reliable. (Later, we will test
more rigorously, after a correctness check.)

In [14]:
def convert_output(A, N0, R):
    ### BEGIN SOLUTION
    return list(A.reshape(N0*R))
    ### END SOLUTION
    
print("Converting back for testing...")
A_py__yours = convert_output(A, N0, R)
Converting back for testing...

Test cells for Exercise 1. Here are the test cells that must pass. There are several, so you can accumulate some partial credit. The comment at the top of each test cell tells you how many points it is worth.

In [15]:
# Test cell: `correctness_0` (1 point)

def compare_answers(A_py, A_py__yours, multiplier=None):
    assert multiplier is not None
    ABS_ERR_THRESHOLD = 2e-15 * multiplier

    print("Basic correctness check: Comparing your output to the reference, A_py...")
    print("    [threshold: {}]...".format(ABS_ERR_THRESHOLD))

    your_differences = [(a-b) for a, b in zip(A_py, A_py__yours)]
    your_abs_differences = [abs(x) for x in your_differences]
    your_max_abs_difference = max(your_abs_differences)
    print("==> Maximum absolute error observed:", max(your_abs_differences))

    if your_max_abs_difference <= ABS_ERR_THRESHOLD:
        big_err_locations = None
    else:
        big_err_locations = [(int(i/R), i % R, x) for i, x in enumerate(your_abs_differences) if x > ABS_ERR_THRESHOLD]
        print("Your solution produced a difference greater than the acceptable tolerance, {}.".format(ABS_ERR_THRESHOLD))
        print("For a list of entries, print the list, `big_err_locations`.")
        print("(Note that {} locations exceeded the threshold.)".format(len(big_err_locations)))
    return big_err_locations

big_err_locations = compare_answers(A_py, A_py__yours, multiplier=len(T_py))
assert big_err_locations is None, \
       f"Errors at {len(big_err_locations)} locations. Check `big_err_locations` for details."
    
print("\n(Passed correctness check!)")
Basic correctness check: Comparing your output to the reference, A_py...
    [threshold: 1.9582200000000002e-10]...
==> Maximum absolute error observed: 4.973799150320701e-14

(Passed correctness check!)
In [16]:
# Test cell: `correctness_1` (1 point)
#
# This cell tests on the same structure with different nonzero values.

print("Generating a new test instance...")
from random import choice
T_py, B_py, C_py, D_py, N0, N1, N2, N3, R = generate_inputs(occur_zodiac_df, R=1+choice(range(9)))
print("==> Done.")

print("Running your implementation...")
print("    input conversion...")
T, B, C, D = convert_inputs(T_py, B_py, C_py, D_py, N0, N1, N2, N3, R)
print("    mode-0 MTTKRP...")
t3__ = get_timer()
A = mttkrp_0(T, B, C, D, N0, N1, N2, N3, R)
t3__ = get_timer() - t3__
print("    output conversion...")
A_py__yours = convert_output(A, N0, R)
print("==> Done. (MTTKRP ~ {:.1f} seconds.)".format(t3__))
print("""==> NOTE: This timing is only a quick estimate;
it is based on only one measurement, and so should
not be regarded as reliable. (Later, we will test
more rigorously, after a correctness check.""")

print("Checking the answer...")
A_py = generate_solution(T_py, B_py, C_py, D_py, N0, R)
big_err_locations = compare_answers(A_py, A_py__yours, multiplier=len(T_py))
assert big_err_locations is None

print("\n(Passed!)")
Generating a new test instance...
Using a 4-way tensor of size N0 x N1 x N2 x N3 == 11653 x 11653 x 520 x 10.
There are 97911 nonzero entries.
Using factors with 9 columns.

Generating tensor... (may take a minute)
==> Done generating inputs. (~ 0.1 seconds)

Generating input factors...
==> Done.
==> Done.
Running your implementation...
    input conversion...
    mode-0 MTTKRP...
    output conversion...
==> Done. (MTTKRP ~ 0.0 seconds.)
==> NOTE: This timing is only a quick estimate;
it is based on only one measurement, and so should
not be regarded as reliable. (Later, we will test
more rigorously, after a correctness check.
Checking the answer...
Executing the baseline MTTKRP method...
==> Done with one instance. (~ 0.4 seconds)

==> NOTE: This timing is only a quick estimate;
it is based on only one measurement, and so should
not be regarded as reliable. (Later, we will test
more rigorously.)

Basic correctness check: Comparing your output to the reference, A_py...
    [threshold: 1.9582200000000002e-10]...
==> Maximum absolute error observed: 7.460698725481052e-14

(Passed!)
In [17]:
# Test cell: `correctness_2` (1 point)
#
# This cell tests on the same structure but on a larger problem.

print("Loading a larger test problem...")
bloggers_df, posts_df, occur_df, soln_df = read_blogs('-100x2')
occur_zodiac_df = add_zodiac_ids(occur_df, posts_df, bloggers_df)

T_py, B_py, C_py, D_py, N0, N1, N2, N3, R = generate_inputs(occur_zodiac_df, R=5)
print("==> Done.")

print("Running your implementation...")
print("    input conversion...")
T, B, C, D = convert_inputs(T_py, B_py, C_py, D_py, N0, N1, N2, N3, R)
print("    mode-0 MTTKRP...")
t3__ = get_timer()
A = mttkrp_0(T, B, C, D, N0, N1, N2, N3, R)
t3__ = get_timer() - t3__
print("    output conversion...")
A_py__yours = convert_output(A, N0, R)
print("==> Done. (MTTKRP ~ {:.1f} seconds.)".format(t3__))
print("""==> NOTE: This timing is only a quick estimate;
it is based on only one measurement, and so should
not be regarded as reliable. (Later, we will test
more rigorously, after a correctness check.)
""")

print("Checking the answer...")
A_py = generate_solution(T_py, B_py, C_py, D_py, N0, R)
big_err_locations = compare_answers(A_py, A_py__yours, multiplier=len(T_py))
assert big_err_locations is None

print("\n(Passed!)")
Loading a larger test problem...
Reading ./resource/asnlib/publicdata/blogs/bloggers-100x2.csv...
Reading ./resource/asnlib/publicdata/blogs/posts-100x2.csv...
Reading ./resource/asnlib/publicdata/blogs/occur-100x2.csv...
Reading ./resource/asnlib/publicdata/blogs/tensor_coords_df_soln-100x2.csv...
==> Total loading time: ~ 0.9 seconds.
Using a 4-way tensor of size N0 x N1 x N2 x N3 == 30863 x 30863 x 2492 x 12.
There are 568421 nonzero entries.
Using factors with 5 columns.

Generating tensor... (may take a minute)
==> Done generating inputs. (~ 0.6 seconds)

Generating input factors...
==> Done.
==> Done.
Running your implementation...
    input conversion...
    mode-0 MTTKRP...
    output conversion...
==> Done. (MTTKRP ~ 0.1 seconds.)
==> NOTE: This timing is only a quick estimate;
it is based on only one measurement, and so should
not be regarded as reliable. (Later, we will test
more rigorously, after a correctness check.)

Checking the answer...
Executing the baseline MTTKRP method...
==> Done with one instance. (~ 1.3 seconds)

==> NOTE: This timing is only a quick estimate;
it is based on only one measurement, and so should
not be regarded as reliable. (Later, we will test
more rigorously.)

Basic correctness check: Comparing your output to the reference, A_py...
    [threshold: 1.136842e-09]...
==> Maximum absolute error observed: 3.481659405224491e-13

(Passed!)
In [18]:
# Test cell: `performance_0`
#
# This cell checks that you achieved at least a 1.25x (25%) speed improvement.

print("\nTiming baseline...")
t_baseline = benchit("mttkrp_0__py(T_py, B_py, C_py, D_py, N0, R)", scope=globals())

print("\nTiming your method...")
t_better = benchit("mttkrp_0(T, B, C, D, N0, N1, N2, N3, R)", scope=globals())

speedup = t_baseline['mean_time_per_run'] / t_better['mean_time_per_run']
speedup_target = 1.25
assert speedup >= speedup_target, "Your method was only {:.2f}x faster (< 1 means it was slower)!".format(speedup,
                                                                                                          speedup_target)

print("\n(Passed -- you were {:.1f}x > {}x faster!)".format(speedup, speedup_target))
Timing baseline...
Timing result: (5 trials) x (1 runs) in 6.363179047009908 secs
==> 1.2726358094019816 secs per run

Timing your method...
Timing result: (5 trials) x (10 runs) in 3.070418054005131 secs
==> 0.06140836108010262 secs per run

(Passed -- you were 20.7x > 1.25x faster!)
In [19]:
# Test cell: `performance_1`
#
# This cell checks that you achieved at least a 1.5x (50%) speed improvement.

speedup_target = 1.5
assert speedup >= speedup_target, "Your method was only {:.2f}x faster (< 1 means it was slower)!".format(speedup,
                                                                                                          speedup_target)

print("\n(Passed -- you were {:.1f}x > {}x faster!)".format(speedup, speedup_target))
(Passed -- you were 20.7x > 1.5x faster!)
In [20]:
# Test cell: `performance_2`
#
# This cell checks that you achieved at least a 2x (100%) speed improvement.

speedup_target = 2.0
assert speedup >= speedup_target, "Your method was only {:.2f}x faster (< 1 means it was slower)!".format(speedup,
                                                                                                          speedup_target)

print("\n(Passed -- you were {:.1f}x > {}x faster!)".format(speedup, speedup_target))
(Passed -- you were 20.7x > 2.0x faster!)

Fin! You've reached the end of this problem. Be sure to restart the kernel and re-run it from top-to-bottom to make sure should work. Then try submitting it to the autograder. If it passes the autograder, then you'll be good to go!