Writing your own Feature Definition Files¶
In melfeature, the feature configuration files are written in the YAML language, which improves their readability and structure. The first example feature will simply compute the pitch range (ambitus) in semitones of a given note sequence:
label: Pitch range in semitones
description: Pitch range in semitones
feature:
source:
PITCH:
param: pitch
process:
STAT:
type: stat
measure: range
inputVec: PITCH.outputVec
sink:
PITCH_RANGE_FEATURE:
input: STAT.outputVec
label: pitch_range
Main categories¶
The configuration file has three main categories:
label
- this label is used for headlines and table entries in this documentation, but can be any textual label, even empty.
description
- a more detailed description of the feature’s functionality, which is used in this documentation in the feature explanation, but can be any textual label, even empty.
feature
- the actual feature definition.
Feature categories¶
Inside the feature
environment, three feature groups need to be defined:
source
- Defines the basic transformation as the starting point. For each used transformation a separate source module must be defined.
process
- Here, all processing modules are defined and connected.
sink
- Sink modules receive output data of process modules and store it as features. melfeature saves this data to aCSV
file for further analysis or visualization.
Module definition¶
For each module, a type
must be defined.
Examples for frequently used module types are
arithmetic
- provides simple arithmetic operations.
hist
- computes nominal, metrical, and ordinal histograms.
stat
- allows for simple statistic operations such as minimum, maximum, or mean.
logic
- computes different logical operations such asand
oror
to compare two vectors.
ngram
- compute n-grams from arbitrary input vectors.
For the source
module(s) and the sink
module(s), no type
need to be specified.
Each module has a specific set of input parameters and output parameters, which are explained in detail in the corresponding section in this document. Some of these parameters are mandatory, i.e., they have to be defined. Some of them are optional, i.e., they can be defined, whereas a default value is used if not defined.
For example, the stat
module used in the example shown above has an input parameter named inputVec
.
By defining:
inputVec: PITCH.outputVec
in the config file, we connect the output parameter outputVec
of the source module named PITCH
(which provides all note pitch values in one vector) to the
input parameter inputVec
of the stat
module. This allows us to further process the vector with all pitches of note events.
The second parameter measure
defines the actual statistical measure we want to compute.
If we are just interested in the range (of the input vector), we define:
measure: range
The computed pitch range is then stored in the parameter outputVec
of the stat
module, which we connect—in a similar way—to the final sink module by:
input: STAT.outputVec
This configuration allows melfeature to compute the pitch range for one or multiple given transcriptions and to finally store it using the feature label pitch_range
.