In a recent paper, we investigated the time course of a large set of features across the Weimar Jazz Database. Several interesting trends could be observed mainly towards higher complexity and virtuosity. In this little tutorial, we want to present an (hopefully) interesting by-product of this research that did not make it into the paper. The task is the predict the recording year of solo using a small set of 24 selected features.

Tab 1. Selected features
Feature Description
cdpx_density_3 Rel. frequency of chordal thirds
durclass_abs_hist_very_short Rel. frequency of very short tones (< 16th notes in 120 bpm)
lick rel. frequency of lick MLUs
line rel. frequency of line MLUS
loudness_sd Standard deviation of tones loudness
pitch_entropy Entropy of pitch distribution
cdcpx_density_6 Rel. frequency of chordal sixths
cdpcx_density_b2 rel. frequency of flat ninths, art_range
cdpcx_density_5 rel. frequency of chordal fifths
cdpcx_density rel. frequency of chordal roots
int_bigram_entropy entropy of interval combinations
abs_int_range size of largest interval
durclass_abs_entropy entropy of absolute duration classes (reference 120 bpm)
pitch_range pitch range
pitch_std mean distance from mean pitch.

A short history of jazz

Fig. 1. Distribution of recording years in the Weimar Jazz Database.

Looking at the temporal distribution of the solos in the WJD (Fig. 1), there are certain clusters as well as clear gaps noticeable. The first gap is located in the 1930’s, the era of big band jazz, in which smaller ensembles and longer solos were not in fashion. This gap reflects a factual scarcity of interesting and/or long enough recorded solos in small ensemble context. These changes at the end the 1930s only come to a short halt due to the recording ban in the US in 1942-43. The “second golden era” of jazz (counting the 1920s as the first “Golden Age”) begins shortly after the ban with the advent of bebop, and spans the (very) long 1950s, from 1945 till about 1965. After the second golden era, with diminishing audiences and revenues due to the growing popularity of pop and rock music on one hand and soul, funk, and R&B on the other hand, jazz underwent a process of stylistic diversification. Free and avant-garde jazz was developing overlapping with soul jazz and continuing bebop-derived styles. By end of the 1960s, Miles Davis and others went “electric” by incorporating rock instrumentations and grooves into their music. The dominating style of the 1970s was probably jazz rock and fusion with a penchant for original tunes, complicated arrangements and a diminished significance of the solo. The declined role of the solo and the diminished availability of lead sheets explain the second big gap in our data in the 1970s. The solos from this time in WJD are by Art Pepper and Chet Baker, still playing their style from 1950s, a few calypsos by Sonny Rollins, solos of postbop trumpeter Woody Shaw, and an early solo by Bob Berg in a hard bop context with Cedatr Walton. The late 1980s then saw a renewed interest in acoustical jazz, largely abandoning rock elements but still informed and influenced by it (e.g., Branford & Wynton Marsalis, Michael Brecker, Steve Coleman, Pat Metheny etc.). This new era of “modern” jazz, or “postbop”, continues till today, even though the landscape of jazz is very diversified by now. Basically every older jazz style is continued and there are also some newer additions like “Acid Jazz”, “Nu Jazz” or “Electro-Swing”, all of which are not covered (yet) in the WJD.

A listing of performers broken up by decades can be found in Tab. 1.

Table 1. Performers in the WJD broken down by decade. Number of solos in brackets.
Decade Performers
1920s Bix Beiderbecke (5), Johnny Dodds (6), Kid Ory (5), Louis Armstrong (8)
1930s Benny Goodman (7), Buck Clayton (1), Chu Berry (2), Coleman Hawkins (1), Dickie Wells (1), Lester Young (2), Lionel Hampton (6), Roy Eldridge (1), Sidney Bechet (2)
1940s Buck Clayton (2), Charlie Parker (13), Charlie Shavers (1), Coleman Hawkins (2), Dexter Gordon (1), Dickie Wells (5), Dizzy Gillespie (5), Don Byas (5), Fats Navarro (6), Henry Allen (1), J.C. Higginbotham (1), J.J. Johnson (3), Kai Winding (1), Kenny Dorham (1), Lee Konitz (4), Lester Young (5), Roy Eldridge (3), Sidney Bechet (3), Sonny Stitt (6), Warne Marsh (3)
1950s Ben Webster (4), Benny Carter (3), Cannonball Adderley (3), Charlie Parker (4), Chet Baker (6), Clifford Brown (9), Coleman Hawkins (3), Curtis Fuller (1), Dizzy Gillespie (1), Don Byas (3), Gerry Mulligan (5), Hank Mobley (2), J.J. Johnson (5), John Coltrane (11), Johnny Hodges (2), Kenny Dorham (3), Lee Konitz (4), Lee Morgan (2), Miles Davis (10), Milt Jackson (6), Nat Adderley (1), Ornette Coleman (5), Paul Desmond (1), Pepper Adams (5), Phil Woods (3), Red Garland (1), Rex Stewart (1), Roy Eldridge (2), Sonny Rollins (9), Stan Getz (4), Steve Lacy (5), Zoot Sims (6)
1960s Ben Webster (1), Cannonball Adderley (2), Curtis Fuller (1), Dexter Gordon (5), Don Ellis (6), Eric Dolphy (6), Freddie Hubbard (6), George Coleman (1), Gerry Mulligan (1), Hank Mobley (2), Harry Edison (1), Herbie Hancock (5), Joe Henderson (6), John Coltrane (9), Kenny Dorham (3), Lee Morgan (2), Miles Davis (9), Nat Adderley (1), Paul Desmond (7), Phil Woods (3), Stan Getz (2), Steve Lacy (1), Wayne Shorter (10)
1970s Art Pepper (6), Bob Berg (1), Chet Baker (2), Pat Martino (1), Sonny Rollins (4), Woody Shaw (4)
1980s Benny Carter (4), Branford Marsalis (6), David Liebman (6), David Murray (2), John Abercrombie (1), Kenny Wheeler (1), Michael Brecker (2), Pat Metheny (2), Steve Coleman (6), Steve Turre (3), Woody Shaw (4), Wynton Marsalis (4)
1990s Bob Berg (6), Chris Potter (2), David Liebman (5), David Murray (4), Joe Henderson (2), Joe Lovano (8), Joshua Redman (5), Kenny Garrett (2), Kenny Wheeler (2), Michael Brecker (8), Pat Metheny (2), Steve Coleman (4), Von Freeman (1), Wynton Marsalis (3)
2000s Chris Potter (5)

Predicting recording year with random forests

In our last analysis, we attempt to predict the recording year of solos by their features. To this end, we used random forests with recording year as a target variable with the full set of 159 and the reduced set of 24 features as predictors. Random forests are an extension of decisions tree and known to be one of the most powerful classification and regression methods which is able to handle large sets of possibly correlated predictors. The prediction results using the full set of features is 55% of explained variance with a mean prediction error of 12.6 years. Using the reduced set, still 51% of variance can be explained with a mean prediction error of 13.4 years. The 10 most important features for the reduced set are pitch_entropy, art_range, pitch_range, pitch_std, cdpcx_density_5, int_bigram_entropy, f0_median_dev, abs_int_range, line, and cdpcx_density_6, which are mostly related to pitch and interval variability, but also tonal aspects and, notably, to articulation and intonation.

Fig. 2. Distribution of minimal tree depths for the top 10 variables.

Feeding these variables into a standard decision tree, finds the solution depicted in Fig. 3. Here, four features related to pitch variability and articulation resulted in 50% explained variance and a mean prediction error of 13.3 years, not worse than the random forest.
Fig. 3. Decision tree for recording year predicion.

Misclassifed Solos

Finally, it might be interesting to see, for which solos the random forest prediction was very imprecise. This is basically an outlier detection. A list of solos with a prediction error of at least 30 years can be found in Tab. 2.

Table 2. Solos misclassified by at least 30 years
Performer Title Recording Year Predicted Year Difference Pitch Range Pitch Entropy Perc. 6ths Articulation range
Louis Armstrong Basin Street Blues 1928 1971.8 43.8 37 4.0 11.9% 1.0
Wynton Marsalis U.M.M.G. 1991 1955.1 -35.9 24 4.2 8.8% 0.8
Bob Berg Second Sight 1993 1958.3 -34.7 31 4.1 7.0% 0.9
Lionel Hampton Whispering 1936 1969.8 33.8 30 4.5 6.6% 0.9
Kenny Garrett Brother Hubbard 1997 1963.2 -33.8 27 3.8 5.4% 1.0
David Murray Blues for Two 1990 1956.3 -33.7 28 4.2 6.6% 1.0
Steve Coleman Cross-Fade 1990 1957.3 -32.7 24 4.3 0.0% 0.9
Steve Coleman Cross-Fade 1990 1957.3 -32.7 27 3.6 0.0% 0.9
Lionel Hampton Memories of You 1939 1971.5 32.5 36 4.5 7.1% 0.9
Steve Turre Steve’s Blues 1987 1954.6 -32.4 29 4.2 9.2% 0.9
Branford Marsalis Three Little Words 1988 1956.1 -31.9 30 4.1 9.6% 0.9
John Abercrombie Ralph’s Piano Waltz 1988 1956.3 -31.7 28 4.4 11.7% 0.9
Branford Marsalis Housed from Edward 1988 1956.5 -31.5 21 3.7 8.7% 1.0
Chris Potter Togo 2007 1975.9 -31.1 42 4.2 1.8% 1.0
Steve Coleman Slipped again 1991 1960.8 -30.2 26 4.3 6.1% 0.9
Dizzy Gillespie Be-Bop 1945 1975.2 30.2 37 4.6 6.3% 1.0

Interestingly, most of these are post-1985 postbop solos which got erroneously sorted into the 1950s or 1960s due to unusual small pitch ranges and/or low pitch entropy. There are two solos by Lionel Hampton from the end of the 1930s which got sorted into the 1970s. This is explained by the fact that the pitch range of the vibraphone is rather larger and more easily allows jumps in pitch space. Finally, two solos by Louis Armstrong got propelled forward in time, from 1927 to 1960 and from 1928 to 1971, which is also due to an unusual large pitch range of 37 and 27 semitones and high pitch entropies, which nevertheless indicate his forward thinking status as an improviser.