Writing your own Feature Definition Files

In melfeature, the feature configuration files are written in the YAML language, which improves their readability and structure. The first example feature will simply compute the pitch range (ambitus) in semitones of a given note sequence:

label: Pitch range in semitones
description: Pitch range in semitones
feature:
  source:
    PITCH:
      param: pitch
  process:
    STAT:
      type: stat
      measure: range
      inputVec: PITCH.outputVec
  sink:
    PITCH_RANGE_FEATURE:
      input: STAT.outputVec
      label: pitch_range

Main categories

The configuration file has three main categories:

  • label - this label is used for headlines and table entries in this documentation, but can be any textual label, even empty.

  • description - a more detailed description of the feature’s functionality, which is used in this documentation in the feature explanation, but can be any textual label, even empty.

  • feature - the actual feature definition.

Feature categories

Inside the feature environment, three feature groups need to be defined:

  • source - Defines the basic transformation as the starting point. For each used transformation a separate source module must be defined.

  • process - Here, all processing modules are defined and connected.

  • sink - Sink modules receive output data of process modules and store it as features. melfeature saves this data to a CSV file for further analysis or visualization.

Module definition

For each module, a type must be defined. Examples for frequently used module types are

  • arithmetic - provides simple arithmetic operations.

  • hist - computes nominal, metrical, and ordinal histograms.

  • stat - allows for simple statistic operations such as minimum, maximum, or mean.

  • logic - computes different logical operations such as and or or to compare two vectors.

  • ngram - compute n-grams from arbitrary input vectors.

For the source module(s) and the sink module(s), no type need to be specified.

Each module has a specific set of input parameters and output parameters, which are explained in detail in the corresponding section in this document. Some of these parameters are mandatory, i.e., they have to be defined. Some of them are optional, i.e., they can be defined, whereas a default value is used if not defined.

For example, the stat module used in the example shown above has an input parameter named inputVec. By defining:

inputVec: PITCH.outputVec

in the config file, we connect the output parameter outputVec of the source module named PITCH (which provides all note pitch values in one vector) to the input parameter inputVec of the stat module. This allows us to further process the vector with all pitches of note events.

The second parameter measure defines the actual statistical measure we want to compute. If we are just interested in the range (of the input vector), we define:

measure: range

The computed pitch range is then stored in the parameter outputVec of the stat module, which we connect—in a similar way—to the final sink module by:

input: STAT.outputVec

This configuration allows melfeature to compute the pitch range for one or multiple given transcriptions and to finally store it using the feature label pitch_range.