This notebook illustrates amassing a medium-sized dataset (1 second of 32- or 64-bit float mono audio, essentially – a simple sine wave) using the record_history
decorator, and then using the stats.history_as_DataFrame
attribute to obtain that dataset as a Pandas DataFrame.
After that basic (and naive) example, we compare the performance of this approach to one that uses numpy's ability to vectorize the same computation, and conclude that if you can vectorize, certainly you should do so. (The example is naive precisely because no one would call the function f
below in a for
loop when it's possible to use numpy
universal functions (ufuncs). When that alternative is unavailable, however, record_history
can come in handy.)
stats.history_as_DataFrame
attribute%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from log_calls import record_history
@record_history()
def f(freq, t):
return np.sin(freq * 2 * np.pi * t)
ran_t = np.arange(0.0, 1.0, 1/44100, dtype=np.float32)
ran_t
Now, naively, call f 44,100 times in a for
loop, and obtain its call history as a Pandas
DataFrame
:
#f.stats.clear_history()
for t in ran_t:
f(17, t)
df = f.stats.history_as_DataFrame
Examine and do stuff with it:
df.info()
from IPython.display import display
display(df.head())
display(df.tail())
len(f.stats.history)
df[['t', 'retval']].head()
plt.plot(df.t, df.retval);
record_history
vs vectorization with numpy ufuncsdef g(freq, t):
return np.sin(freq * 2 * np.pi * t)
nodeco_vectorized_secs = %timeit -o Hz_17 = g(17, ran_t)
nodeco_vectorized_secs.best
Hz_17
plt.plot(Hz_17);
for
loop:nodeco_loop_secs = %timeit -o for t in ran_t: g(7, t)
nodeco_loop_secs.best
def comparison(slower, faster):
'slower, faster: seconds'
ratio = slower/faster
order_of_magnitude = np.log10(ratio)
return ratio, order_of_magnitude
print("With record_history disabled:\n"
"Vectorized approach is %d times (about %.1f orders of magnitude) faster"
% comparison(slower=loop_secs.best, faster=vectorized_secs.best))
Now let's compare the performance of the record_history
-decorated version (f
) of the same function
record_history
disabledf.stats.clear_history()
f.record_history_settings.enabled = False
vectorized_secs_rh_disabled = %timeit -o Hz_17 = f(17, ran_t)
vectorized_secs_rh_disabled.best
for
loop:loop_secs_rh_disabled = %timeit -o for t in ran_t: f(7, t)
loop_secs_rh_disabled.best
print("With record_history disabled:\n"
"Vectorized approach is %d times (about %.1f orders of magnitude) faster"
% comparison(slower=loop_secs_rh_disabled.best, faster=vectorized_secs_rh_disabled.best))
print("Called in a for-loop, the no-decorator version is %d times (about %.1f orders of magnitude) faster"
% comparison(slower=loop_secs_rh_disabled.best, faster=nodeco_loop_secs.best))
record_history
enabledf.record_history_settings.enabled = True
f.stats.clear_history()
vectorized_secs_rh_enabled = %timeit -o f.stats.clear_history(); Hz_17 = f(17, ran_t)
vectorized_secs_rh_enabled.best
len(f.stats.history)
def size_of_t_for_row(row):
return f.stats.history[row].argvals[1].size
size_of_t_for_row(0)
f.stats.history[0].retval.size
f.stats.history
f.stats.clear_history()
for
loop:loop_secs_rh_enabled = %timeit -o for t in ran_t: f(7, t); f.stats.clear_history()
loop_secs_rh_enabled.best
print("With record_history enabled:\n"
"Vectorized approach is %d times (about %.1f orders of magnitude) faster"
% comparison(slower=loop_secs_rh_enabled.best, faster=vectorized_secs_rh_enabled.best))