Data Processing & Analysis¶

So far, only feature selection is implemented.

Feature Selection¶

Feature selection is a method from Statistics / Machine Learning that allows to identify those features that best separate different classes of items in the feature space. For instance, we can use feature selection, to find the best features, that allow to distinguish jazz solos from two different styles or jazz solos played by two or more different artists.

In particular, melvis uses the Tree-based feature selection algorithm from the scikit-learn python package (see scikit-learn documentation for details.)

One-vs-all Feature Selection¶

If we have defined more than 2 groups of items using Item Selection & Grouping, we are often interested in individually comparing each group against all remaining groups. This is referred to as one-vs-all feature selection. For instance, if we define three groups (lets say the solos from three different artists A, B, and C), using one-vs-all feature selection, we can find the best feature to

separate solos from artist A against all solos from artists B and C,
separate solos from artist B against all solos from artists A and C, and
separate solos from artist C against all solos from artists A and B

Here is how we can invoke a one-vs-all feature selection in the melvis configuration file:

# Data processing & Analysis
learnMode: FeatureSelection
featureSelectionMode: OneVsAll
featureSelectionNumFeatures: 10

The parameter learnMode defines the type of data processing / analysis method we want to apply (currently, only featureSelection is supported) Using the parameter featureSelectionMode, we can define whether we want to use one-vs-all or multi-class feature selection, which will be explained in the next section. Feature selection algorithms select the best features for a given item grouping and rank them. Usually we are only interested in the best 5-10 features, the number can be defined using the parameter featureSelectionNumFeatures.

Multi-class Feature Selection¶

For multi-class feature selection all defined groups are considered at the same time and only those features that allow to best discriminate between all classes are selected. The results of the feature selection are given in the result TXT file, this is an example output for the two group example from Example 2 - Complex grouping:

[melvis] Feature selection results for group Berg (3 items) vs. all others (6 items)
[melvis] Feature rank | Feature label | Centroid for class [Berg] | Centroid over remaining items
[melvis] 1 | DURCLASS_RUNLENGTH.durclass_mean_seg_len | 4.3500 | 2.5900
[melvis] 2 | PC_AV_NUM_UNIQUE_PC_PHRASES.mean_number_of_unique_pc | 6.0100 | 7.5800
[melvis] 3 | CPC_HIST_FEATURES.hist_density_11 | 0.0600 | 0.0500
[melvis] 4 | GENERAL_SELF_SIM_MATRIX_PHRASES_FEATURES.nonadjacent_phrase_similarity_entropy | 0.9700 | 0.9700
[melvis] 5 | PARSON_CONST_DIRECTION_AV_LEN.mean_segment_length_constant_positive_interval_direction | 1.8200 | 2.0900
[melvis] 6 | INT_CHROMATIC_SEQUENCES_RATIO.ratio_of_chromatic_note_sequences | 0.1000 | 0.0900
[melvis] 7 | PARSON_HIST_FEATURES.parsons_hist_constant | 0.0600 | 0.0300
[melvis] 8 | DURCLASS_HIST_FEATURES.dur_class_hist_very_long | 0.0100 | 0.0100
[melvis] 9 | DURCLASS_RUNLENGTH.long_mean_seg_len | 1.0000 | 1.0100
[melvis] 10 | FUZZYINT_HIST_FEATURES.fuzzyint_hist_jump_down | 0.0500 | 0.0500

We can see that the first row indicates the one-vs-all setting: group “Berg” against all other items from other groups. Here, the 10 best features are selected. For each feature, we can see the feature rank with 1 being the best feature, the feature label, and the centroids for the class (mean feature value over all items of group “Berg”) and the other classes (mean feature value over all remaining items).

Note

See the following demo files for further examples:

test_melvis_feature_selection_complex_grouping_artists.yml (one-vs-all feature selection)
test_melvis_feature_selection_simple_grouping_tp_vs_sax.yml (multi-class feature selection)

Parameter label	Explanation	Mandatory?	Default value
`learnMode`	Type of analysis method [ `FeatureSelection` \| `None` ]	no	`None` (no analysis is applied)
`featureSelectionMode`	feature selection mode [ `OneVsAll` \| `MultiClass` ]	no	`OneVsAll`
`featureSelectionNumFeatures`	number of features to select	no	5