melpat

Introduction

Note

This document pertains to melpat v1.3 and higher, in which some significant changes have been made to previous version v1.1 by which backwards compatibility is partly broken. Particularly the field searches is now called requests and display types csv and raw are no longer valid and should be list. Partitions can be requested by partition or all, max became invalid. The snippet display mode is now absorbed in the list mode. Besides these changes, melpat has some new powerful features such as database mode and search with very fast regular expressions, secondary searches as well as a larger set of available transformations. So, please update your MeloSpySuite to gain the full new melpat experience.

Note

From the 1.0 version on, the MeloSpyGUI contains (nearly) all functionality of melpat with an easier to use interface. All explanations below hold accordingly for the pattern part of MeloSpyGUI as well, though some small differences exist, will be highlighted if necessary.

melpat is a command-line tool for extracting of and searching for patterns in monophonic melodies, which is a basic task in computational musicology and particularly interesting for the investigation of creative processes in jazz improvisation. The term “pattern” here basically refers to N-grams, i.e., subsequences, of melodic abstractions. Processing is done on a (user-)defined set of melodies (“the repository”) and certain abstractions (transformations, viewpoints) such as raw pitch, chordal pitch class, semitone intervals, duration classes etc. derived thereof. melpat can process an arbitrarily long list of pattern requests (MeloSpyGUI: only one request at a time) and will write results to specified output files or to the console. There are three different types of requests: search, partition and database. A pattern search searches the melodic abstractions of all occurrence of a search pattern. A partition finds all N-Grams in a melody according to user-specified criteria, whereby the full repository serves as kind of background for a specific partition, e.g., users can demand to include only N-grams which occur at least in a certain number different melodies. A database request calculates all N-grams up to a certain maximum N in the repository, which are typically a lot, but output can be pruned to manage the abundance (cf. below). This mode essentially allows to extract Markov transition probabilities of arbitrary order from the specified repository, which might be useful for other purposes as well.

melpat can be operated from the commandline alone, but because of the extensive number of parameters needed to define requests the normal mode of operation will most likely employ configuration files written in the YAML language. (MeloSpyGUI: All parameters are handled via the UI). The format of which will be explained below.

N-grams, repositories and patterns: A short introduction

A melody is represented as list of events which certain properties (onset, pitch, duration etc.). Sequences of abstraction, e.g., semitone-intervals, are derived from this basic representation for pattern analysis. A melody repository is a set of melodies, a transformation repository is a corresponding set of abstracted sequences. N-grams are then subsequences of length N of the transformed sequences. More specifically, one can define a N-pattern (pattern for short, sometimes also called m-type) as a set of N-grams having the same value, with instances occurring at certain positions in the repository. E.g., if we have sequences s1= “abcd” and s2= “cdab” in the repository, the the 2-pattern “ab” can be found in at position 0 in s1 and at position 2 in s2 (sequence elements are numbered starting with 0). The (absolute) frequency of this specific 2-pattern is 2, the relative frequency is the absolute frequency divided by the total number of N-grams in the repository, which is the combined length of all sequences minus the number of elements multiplied by N-1, since a sequence of length M can have at most M-(N-1) N-grams. For our example, this means that there are 4+4 - 2*(2-1) = 8-2 =6 possible N-grams, hence the relative frequency (or probability for that matter) is 2/6 = 1/3. On the other hand, the bi-pattern “bc” is only occurring once in the repository, hence its probability is 1/6. A partition of s1 with respect to our mini-repository using only patterns that occur at least twice would consist of “ab”, “cd”. The pattern database for this repository with maximum N of 4 (tetragrams) is “a” (frequency: 2), “b” (2), “c” (2), “d” (2) for unigrams and “ab” (freq. 2), “bc” (1), “cd” (2), “da” (1) for bigrams, “abc” (1), “bcd” (1), “cda” (1), “dab” (1) for trigrams and “abcd” (1) “cdab” (1) for 4-grams.

Input formats

melpat reads the following input formats:

  • Weimar Jazz Database-Format (SQLITE3)

  • MIDI (monophonic)

  • MCSV1 (a CSV-based melody format)

  • MCSV2 (a CSV-based melody format)

  • EsAC (one melody per file)

  • EsAC Database (SQLITE3)

  • Tony/yYIN note layers as CSV exports

Output formats

There are three different output formats available (selected by the parameter display, cf. below): list, stats and midi, where the last one is only availabe for searches but not in partition and database mode. In list mode , all pattern instances will be written to a CSV formatted file comprising several coordinates (indices, onset, metrical position) and lengths (number of elements, duration) as well as the actual values and absolute and relative frequencies. In stats mode each pattern is listed along with it absolute and relative frequencies, no position information is given. The midi format is kind of special. For each occurrence of the specified pattern, the corresponding melody snippet will be rendered as MIDI and these snippets will be concatenated with 2 sec breaks in between into a single MIDI file. The purpose of this output format is to allow quick auditory checks of the found patterns. The naming scheme for different combinations of requests and labels can be found here.

Usage

Here is the basic call to melpat:

melpat [-h] [--version] [-d DIR] [-o OUTDIR] [-f FILE] [-c CONFIG]
              [-s SEARCH] [-m MAXN] [--verbose] [--dbtype {SQLITE3}]
              [--dbpath DBPATH] [--dbuser DBUSER] [--dbpwd DBPWD]
              [positional]

Options and arguments

-h, --help

Show a help message and exits.

--version

Shows program’s version number and exit

-d <DIR>, --dir <DIR>

Use <DIR> as working directory for reading the project files. Default is the current directory.

-o <OUTDIR>, --outdir <OUTDIR>

Write result file(s) to <OUTDIR>. Default is the current directory.

-f FILE, --file FILE

Input file (wildcards allowed). If the melodies are retrieved from a database <FILE> must have the Query block syntax, wrapped up in a Python- style dictionary and embraced by single or double quotes.

-c <CONFIG>, --config <CONFIG>

Use the configuration file <CONFIG>, see below. The arguments <DIR> and <OUTDIR> override the settings in the config file, but the positional argument takes precedence over the one specified in the configuration file.

-s SEARCH, --search SEARCH

This string specificies the search (or operation) to be performed. It has the same syntax as the Search pattern syntax used in the configuration file.

-m MAXN, --maxN MAXN

Specifies the maximal N-gram length to be used.

--verbose

Be more talkative.

--dbtype DBTYPE

To read melodies from a database, the required database options need to be specified and the file option must be a query. This option indicates the type of database to use. Currently only SQLITE3 is supported (also default). Has precedence over according value in configuration file.

--dbpath DBPATH

To read melodies from a database, the required database options need to be specified and the file option must be a query. Absolute or relative path to output database are allowed. Has precedence over according value in configuration file.

--dbuser DBUSER

If credentials are needed for database access, here the user name can be specified. Has precedence over according value in configuration file.

--dbpwd DBPWD

If credentials are needed for database access, here the password can be specified. Has precedence over according value in configuration file.

positional

The last positional parameter specifies the output file basename. It is possible to use stdout here, in which case, the output will not be written to files but printed to stdout, i.e., the console window.

Configuration files

Here is a typical example:

---
dir: /My/Data/
outdir: /My/Data/Patterns/
outfile: melpat_example.csv

maxN: 30
tunes:

  #will be ignored since we are working in database mode
  - file: 'Miles*_PREFINAL.sv'

  #all solos  from the database
  - query:
      conditions:
        solo_info:
          performer: '%'
          title: '%'
      display:
        transcription_info: filename_sv
      type: sv

#use the WJazzD
database:
  type: sqlite3
  path: wjazzd.db
  password: None
  use: True

requests:

  #Interval pattern partition of "So What" by Miles Davis with minimal length of 8 intervals,
  #that occur at least twice in two different solos. Write only partition statistics
  -
    transform: interval
    pattern: max
    minN: 8
    minOccur: 2
    minSource: 2
    display: stats
    trillfilter: 2,2
    arpeggiofilter: d
    scalefilter: d
    simul: False
    items:
      - title: So What

  #Search for (strictly) rising of falling scales beginnings with respect to chord context,
  #Listing of all instances
  -
    transform: cpc
    pattern: [0,  2, '(', 3, '|', 4, ')', 5, '(',  7, ')+?(', 8, '|', 9, ')+?']
    secondary:
        transform: parsons
        pattern: ['(', +1, '|', -1, ')', '+']
        operation: match
    display: list
    label: scale_test

  #All uni- and bigrams with frequencies and probabilities from the Metrical Circle Map (N=48)
  -
    transform: mcm-48
    pattern: database
    minN: 1
    minOccur: 1
    maxN: 2
    display: stats

Explanation

The beginning of the configuration file contains global settings for the working directory dir, the output directory outdir and the basic output filename outfile. This is followed by maxN which is global parameter for maximum N-Gram size used in the partition and database mode, but this value can (and will be) overwritten for database base mode (cf. Database mode). The tunes section specifies the input set of melodies to be used. See Data selection for an detailed explanation. If the melodies are taken from a SQL-database, the very last section database comes into play, otherwise it is ignored. (Of course the order of sections is arbitrary and can be freely shuffled). The most important section is the request section. It can contain a YAML list (items starting with a hyphen) of pattern requests. As mentioned above, there are three different kinds of requests: Partition mode, Search mode and Database mode, which will be explained in full detail in following.

Pattern requests

For each type of requests a different set of parameters is used, which will be explained below.

Partition mode

Partition mode is designed for finding pattern partitions of single pieces, i.e., complete lists of all real patterns occurring in a melody with respect to all other melodies in the repository as specified by user-definable criteria, which are are minimal and maximum length (N), minimal number of occurrences (total frequency in the whole repository) and minimum number of different sources (i.e., number of different melodies where the pattern must occur). Sub-patterns are filtered out, unless they do not occur somewhere else, hence, having a “life” on their own. This filter feature reduces the result set vastly. Furthermore, options exists to filter out patterns of lesser interest, i.e., trills, scales sections and arpeggios. Partition can be stored as Lists (every pattern listed) or just with a set of Stats for each partition.

Partition mode

Field

Values

Description

transform

  1. Suitable transformations

The transformation to be applied to the melodies. Currently, only a certain subset of transformations are available.

pattern

all, partition

Both values indicate the same operation (and exist only because of the indecisiveness of the programmer).

minN

positive integer > 0

Specifies the minimum length of N-grams that should be included in the partitioning.

minOccur

positive integer > 0

Specifies the minimum number of occurrences a pattern must have in order to be included.

minSource

positive integer > 0

Specifies the minimum number of different source sequences in the repository in which a pattern must occur in order to be included. If minOccur is smaller than minSource, minOccur is effectively the same as minSource.

display

stats, list

If stats is selected, certain statistical information (cf. below) about the retrieved partitions are written to the output file. In list mode, all patterns are listed using a certain format (cf. Pattern output formats).

trillfilter

<minPeriod>, <maxPeriod>

If specified, a trill filter will be applied. A trill is defined here as a pattern that consist solely of a repetition of a single sub-pattern, which is a slight generalization from the standard musical definition. Only trills with a sub-pattern length of at least <minPeriod> and at most <maxPeriod> will be filtered from the patterns of the partition.

scalefilter

d u, n

If the transformation is pitch or interval this option enables to filter scale-like patterns, i.e., sequences of semi- and whole tones. If the value is set to d, only directed scale-like patterns will be filtered out, i.e., all having intervals the same direction (ascending, descending), if set to u direction is not considered. If set to n, the scale filter is ignored. (MeloSpyGUI: Choose Directed, Undirected or None.)

arpeggiofilter

d u, n

If the transformation is pitch or interval this option enables to filter arpeggio-like patterns, i.e., sequences for minor and major thirds. If the value is set to d, only directed arpeggios will be filtered out, i.e., all having intervals the same direction (ascending, descending). If set to u direction is not considered. If set to n, the arpeggio filter is ignored. (MeloSpyGUI: Choose Directed, Undirected or None.)

simul

True|False|N

If set to True or to a positive integer N>0, the partition will not be done using the actual melody objects but derived simulated transformations with the same overall length and the same N-gram distribution (Markov model of N-th order) as the melodies. This is useful for comparison.

items

Integer span OR list of field-value pairs specifying melodies.

Partitions will only be calculated for the specified items. If this parameter is missing, all possible items will be considered. Items can be specified either by an integer span, or by a list of metadata specifiations. An integer span has the form <N>-<M>, e.g., 0-3, in which case partitions for the first four item in the repository will be calculated. In the second case, a list (YAML list items are starting with a hyphen) of field-value pairs can be used. The basic syntax is similar to the input file specification: <METADATA-FIELD>:<VALUE>. No wild cards are allowed for the value, but all metadata fields can be used, e.g., performer, title or filename_sv for solos from the WJazzD. In the example above, the melody repository consists of all Miles Davis solos in the database, which serve as a “pattern background” for the partition of pitch patterns, but only the partition for Mile’s solo on “So What” will be written to the output file. (MeloSpyGUI: Items cannot be specified at the moment, all partitions for all melodies will be calculated.)

Search mode

In search mode, a specific pattern is searched across the repository of melodic sequences. Patterns are defined with respect to some Suitable transformations and can be searched using a special Search pattern syntax which includes the option to use regular expressions. The search can be complemented with a secondary search, mostly using another transformation, with allow to filter the search result, e.g, with respect to certain constraints such as patterns starting or ending on a strong beat in a bar on at the begin or end of musical phrase (Accents might be helpful.).

Primary search specification

Field

Values

Description

transform

Suitable transformations

The transformation to be used on the melodies. Currently, only a certain subset of transformations are usable(cf. Suitable transformations.

pattern

Search pattern syntax

A valid search pattern, other than database, all or partition.

secondary

YAML dictionary

Specifies a secondary search, i.e., a search for a pattern (possibly from a different transformation) in the result set of the primary search. See Secondary search specification below for details.

display

list, stats, midi

If stats is selected, statistical information (cf. Pattern output formats) are written to the output file (a CSV file). In list mode, all patterns are listed using a certain format (cf. above). In midi mode, all retrieved patterns are written to a MIDI file.

label

String

An arbitrary label for the given search. If a label is specified, the resulting patterns for a single search are written to different files each, which names are the value of outfile appended by the pattern label.

As explained above, secondary searches can be defined which act as a filter on the result set of the primary search. The parameters are quite similar.

Secondary search specification

Field

Values

Description

transform

Suitable transformations

The transformation to be used on the melodies. Currently, only a certain subset of transformations are usable(cf. Suitable transformations.

pattern

Search pattern syntax

A valid search pattern, other than database, all or partition.

operation

match, find, exclude, ignore

Specifies how the secondary search should act on the primary result set (basically a filter). For match, only results which match the secondary search exactly will be included in the final result set. For find, patterns which contain the secondary search are included. For exclude, results in which the secondary is not found (partially contained) are kept. For ignore the secondary search is ignored at all.

Database mode

In database mode, all information about all patterns occurring in the selected pieces using the specified transformation can be evaluated either as complete dump (list) or as statistics (stats mode). The last option is specifically useful for estimating Markov transition probabilities. In database mode, the global maximum N can be overridden, which might be useful, since there will be a lot of N-grams. Additionally, there are options to specify minimal N and minimal frequencies to prune the output.

Database mode

Field

Values

Description

transform

cf. Suitable transformations

The transformation to be used on the melodies. Currently, only a certain meaningful subset of transformations are usable.

pattern

database

minN

positive integer > 0

Specifies the minimum length of N-grams.

minOccur

positive integer > 0

Specifies the minimum number of occurrences a N-Gram to be written. Helpful to prune the output.

display

stats, list

If stats is selected, statistical information about the N-Gram are written to the output file. In list mode, all N-Grams are listed using a certain format (cf. Pattern output formats). Prepare for very large output files in list mode, if maximum N is high and/or the melody repository is large.

simul

True|False|N

If set to True or to a positive integer N>0, the database will not be calculated on the actual melody objects but on a derived simulated transformations with the same overall length and the same N-gram distribution (Markov model of N-th order) as the melody. This is useful for comparison.

Pattern output formats

There are three possible output formats list, stats, and mid for pattern request modes, whereas the mid mode is only available for pattern search, and the actual format of the output files in stats do differ for the different mode.

  • list: In this mode, the patterns occurrences are written to a CSV formatted output file. Each line contains an ID of the containing melody as well as starting positions (index in the melody, raw onset in secs, and metrical position), absolute and relative frequency, length (number of elements and absolute duration in sec) as well as the value of the pattern. If a label is specified for the search, the results will be written in a file with name derived from the specified basic output file name and the label. If the label is missing, all results will be written into a single file.

  • stats: In this mode, only global information of the retrieved patterns are written, e.g., length, value and absolute and relative frequencies.

  • midi: In this mode, the occurrences of the patterns are written to a single MIDI file with a 2 seconds pause in between pattern instances. A label allows to write the result sets of each search to a different MIDI file.

Sample output in list mode: for a search request:

id;start;N;onset;dur;metricalposition;value;freq;prob100
SonnyRollins_TheEverywhereCalypso-2_PREFINAL.sv;870;6;176.11465;0.45279;4.6.142.4.6;[0, 2, 4, 5, 7, 8];1;0.011
SonnyStitt_Elora_PREFINAL.sv;108;6;27.03469;0.94286;4.2.17.1.1;[0, 2, 4, 5, 7, 9];2;0.023
SteveTurre_Steve'sBlues_PREFINAL.sv;124;6;30.44286;1.13918;4.2.22.2.1;[0, 2, 4, 5, 7, 9];2;0.023

Sample output in stats mode for the same search request:

value;N;freq;prob100
[0, 2, 4, 5, 7, 8];6;1;0.011
[0, 2, 4, 5, 7, 9];6;2;0.023

Sample output of stats mode for a partition of two solos by Zoo Sims:

id;note_count;min_N;max_N;min_occur;min_source;pattern_count;coverage;avg_N;avg_overlap;over_coverage;log_excess_prob
ZootSims_DancingInTheDark-1_PREFINAL.sv;109;5;30;2;2;8;0.303;6.375;2.571;0.545;8.949
ZootSims_DancingInTheDark-2_PREFINAL.sv;168;5;30;2;2;7;0.22;6.0;0.833;0.135;8.698
Meaning of fields in pattern output

Field

Description

Mode

id

Identifier of the melody. For WJazzD solos, filename_sv, for songs from the EsAC DB, the EsAC id and for input files the filename will be used.

list

start

Start ID of pattern in the melody (zero-based).

list

N

Length of the pattern.

list, stats

freq

Absolute frequency of the pattern in the repository.

list, stats

prob100

Relative frequency of the pattern in percent. Baseline is the number of all possible N-grams with the same length as the pattern.

list, stats

value

Actual value of the pattern with respect to the underlying transformation. Numerical transformation will be represented as Python arrays, hence, the with square brackets around (might be changed in the future).

list, stats

onset

Onset of pattern in the melody in seconds (for EsAC songs this might be based on an arbitrary tempo of 120 bpm).

list

dur

Real duration of pattern in seconds in the melody (for EsAC songs this might be based on an arbitrary tempo of 120 bpm).

list

metricalposition

Metrical start position of pattern in the melody (cf Metrical position notation for description of the syntax).

list

note_count

Number of notes in melody.

partition: stats

min_N

Mininum pattern length as specified by the user.

partition: stats

max_N

Maximum pattern length as specified by the user.

partition: stats

min_occur

Minimum number of occurrence as specified by the user.

partition: stats

min_sources

Minimum number of different sources as specified by the user.

partition: stats

pattern_count

Total number of patterns found by the partition algorithm.

partition: stats

coverage

Share of note in the solo contained in a least one pattern (“Coverage”)

partition: stats

avg_N

Average length of patterns in the partition.

partition: stats

avg_overlap

Average overlap between patterns in the partition.

partition: stats

over_coverage

Indicator, in how many patterns notes are contained on the average. Equals 0 if every note of the melody is contained in exactly one pattern.

partition: stats

log_excess_prob

Average value of logarithm of excess probability of all patterns. Excess probability in this sense is the quotient of relative frequency of the pattern divided by the product of probabilities of each single element in the pattern. Low excess probability indicates that a pattern might have occurred just by chance from a 0-th order Markov process.

partition: stats

Suitable transformations

Here is a list of suitable transformations, since not all available transformations make sense for pattern retrieval.

Pattern transformations

Transformation

Abbreviation(s)

Type

MIDI Pitch

pitch

Integer [0:127]

Absolute Pitch Class

pc, pitch-class

Integer [0:11]

trans_intervals

interval

Integer

trans_fuzzy_intervals

fuzzyinterval

Integer [-4:4]

trans_contour

parsons

Integer [-1:1]

Chordal Pitch Class

cpc, chordal-pitch-class

Integer [0:11]

Chordal Diatonic Pitch Class

cdpc, chord-dpc, chordal-dpc, chordal-diatonic-pitch-class

String of symbols “1234567TL<>B”

Extended Chordal Diatonic Pitch Class

cdpcx, chord-dpc-x, chordal-dpc-ext, chordal-diatonic-pitch-class-ext

String of symbols “1234567TL<>B%”

Duration Classes

durclass-abs, durclass-rel

Integer [-2:+2]

Inter-onset Interval Classes

ioiclass-abs, ioiclass-rel

Integer [-2:+2]

trans_mw

metricalweights , weights, mw

Integer [0:2]

trans_meter_mcm

mcm, mcm-<N>

Integer [0:N-1], default N=48

Accents

accent-<TYPE>

Integer, Real [0:1]

Search pattern syntax

The search pattern syntax is derived from corresponding Python constructs, which are basically strings and arrays, mixed and enhanced with syntactical constructs from Pythons regular expressions. Python arrays are written as comma-separated list, embraced by square brackets, strings are list of character embraced by single- or double-quotes. The third column in table Pattern transformations indicates which syntax is to use with each transformation. If a search pattern contains values outside of the indicated range, no error will issued, but melpat will return an empty result set. The regular expression symbols are added in the case of integer arrays as string elements in the array, i.e., elements surrounded by single- or double quotes. For string based transformation, this is not necessary.

Note

The syntax used in the MeloSpyGUI is little bit simplified in comparison to the melpat syntax (for convenience). First, surrounding square brackets ([ and ]) can be left out. Second, instead of comma separated lists, single spaces can be used as separators (but not arbitrary whitespace, just single spaces!). Regular expression bits still have to be embraced by quotes (except for string type transformations).

Let’s consider the slightly simplified example from the configuration file above:

transform: cpc
pattern: [0, 2, '(', 3, '|', 4, ')', 5]
secondary:
    transform: parsons
    pattern: [+1, '+|', -1, '+']
    operation: match

The transformation of the primary search is cpc, i.e., chordal pitch classes which take values in the range from 0 to 11. The specified pattern starts with two simple elements, 0 and 2, i.e., the root and the major ninth of chords. Then follows an opening bracket ( as string element surrounded by single quotes. Next is the regular element 3, which means the minor third of a chord. Next up is the pipe symbol | which means “OR” in regular expression parlance, readily followed by a 4, which stands for the major third of a chord, and the closing bracket ). Hence, we have the partial expression (3|4) which means “either a minor or a major third”. Finally, there is a regular 5, which matches the fourth of a chord. Note, that each element has to occur exactly once. Taken together, this pattern will match any cpc-subsequence of the form 0235 or 0245, something like the lower tetrachord. However, because cpc’s are pitch classes without octave, information about interval direction is lost. A sequence 0234 consisting of a jump of minor seventh down, a jump a ninth up and a jump a major seventh down will also match the specified pattern. Now have a look at the secondary search. The transformation is Parson’s code, or interval direction, with possible values of -1, 0, and +1. Using D, R, and U for simplicity, the pattern for the secondary search translates to the regular expressions (U+|D+), which will match any sequence of only ups or only downs. The operation of the secondary search is match in this case, hence, it will find cpc sequences of the desired primary form, which are also strictly ascending or descending. If the operation would have been find, any Parson’s sequence containing at least one U or one D would be matched, which most likely would not change the result set at all (since a 0234 sequence that only consists of note repetitions would be quite special). In the case of the secondary operation exclude, the final result set would probably be empty for the same reason.

Regular Expressions: A short introduction

Regular expression are a powerful albeit a bit cryptical tool which is an integral part of nearly all important programming languages. They provide means for pattern matching and searching for substrings in strings or texts (sequences of alphanumeric and interpunctuations signs). Regexes, as they are frequently called, are around in the computing world since the 50’s and can be viewed as specialised mini-programming language. There are basically three different kind of elements in a regular expression, all of which are expressed with cryptical character sequences, which exactly makes regular expression often hard to read and digest. The three kinds of information are “What”, “Where” and “How many”. Elements from the “What” category specify which kind of character should be matched, e.g., the set of lower case characters, which are in many regular expression dialects (unfortunately, there are many different dialects, which contributes to the confusion) expressed as a so-called character set [a-z]. The simplest case is just a single character, e.g., “A”. This is already a valid regular expression and will match any occurrence of the character “A” in a string. There are also wildcards, such as the dot ., which stands for any character (except the newline, probably). The “Where” category offers the least options, basically only whether a pattern should be located at the very beginning (the caret ^) or the very end of a string (the dollar sign $). Finally, the “How many” symbols, also called quantifiers, allow to be more specific about occurrence frequency, i.e., repetitions of characters. In Python’s regular expressions, these are the special symbols * (arbitrary number, 0 to N), + (at least 1, 1 to N), ? (one or none, 0-1). Furthermore, one can be even more specific with the constructs {m} and {n,m} which translate to “exactly m” and “between n and m repetitions”. All quantifiers are written directly following a “What” specification and pertain only to the immediately preceding part. The quantifiers in their pure and raw form are “greedy little suckers”, which means, that they will try to match as many characters as possible. This is often not desired. To make them non-greedy, hence, to make them match as less as possible, one can add a ? sign, e.g., +?, *?, ??, which are the non-greedy versions. “What” specifications can be constructed from atomic “What” specifications by means of the pipe symbol | which means “this or that”, e.g., “a|b” matches an “a” or a “b”, and parantheses, e.g., “(a|b)|c” will match either an “a” or a “b” or a “c”, which means that the brackets are actually redundant in this case. But combined with quantifiers they might become significant, e.g., “(a|b)??|c+?” will match zero or one occurrence of “a” or “b” or at least one occurrence of “c” (non-greedy!). Hence, “c” will match, “a” and “b” will match, in “ac” it will find the “a”, the leftmost options, but the empty string “” will not match. There are some more features of Python regular expressions, but most of them are not useful in the case of melodic patterns, e.g., special character classes. But see here and there for more detailed information on Python’s regular expressions.

Output file naming scheme

The following table summarize the different output file namings done by melpat. <outfile> is the name of the specified output file name, <transformation> is the transformation and <label> the label of the pattern request. Optional parts are shown in square brackets, where “tf” stands for trill filter, “sf” for scale filter, “af” for arpeggio filter, and “sim<N>” for simulated N-gram databases using a Markov model of order N.

Output file naming scheme

Pattern Mode

Display Mode

Result without label

Result with label

Search

list

<outfile>.csv

<outfile>_<label>.csv

Search

stats

<outfile>_stats.csv

<outfile>_<label>_stats.csv

Search

midi

<outfile>.midi

<outfile>_<label>.midi

Partition

list

<outfile>_<transformation>_<min_N>_<min_occur>_<min_source>[_tf][_sf]_[af][_sim<N>].csv

<outfile>_<label>.csv

Partition

stats

<outfile>_<transformation>_<min_N>_<min_occur>_<min_source>[_tf][_sf]_[af]_stats[_sim<N>].csv

<outfile>_<label>.csv

Database

list

<outfile>_<transformation>_db.csv

<outfile>_<label>_db.csv

Database

stats

<outfile>_<transformation>_db_stats.csv

<outfile>_<label>_stats.csv