melpat¶

Introduction¶

Note

This document pertains to melpat v1.3 and higher, in which some significant changes have been made to previous version v1.1 by which backwards compatibility is partly broken. Particularly the field searches is now called requests and display types csv and raw are no longer valid and should be list. Partitions can be requested by partition or all, max became invalid. The snippet display mode is now absorbed in the list mode. Besides these changes, melpat has some new powerful features such as database mode and search with very fast regular expressions, secondary searches as well as a larger set of available transformations. So, please update your MeloSpySuite to gain the full new melpat experience.

Note

From the 1.0 version on, the MeloSpyGUI contains (nearly) all functionality of melpat with an easier to use interface. All explanations below hold accordingly for the pattern part of MeloSpyGUI as well, though some small differences exist, will be highlighted if necessary.

melpat is a command-line tool for extracting of and searching for patterns in monophonic melodies, which is a basic task in computational musicology and particularly interesting for the investigation of creative processes in jazz improvisation. The term “pattern” here basically refers to N-grams, i.e., subsequences, of melodic abstractions. Processing is done on a (user-)defined set of melodies (“the repository”) and certain abstractions (transformations, viewpoints) such as raw pitch, chordal pitch class, semitone intervals, duration classes etc. derived thereof. melpat can process an arbitrarily long list of pattern requests (MeloSpyGUI: only one request at a time) and will write results to specified output files or to the console. There are three different types of requests: search, partition and database. A pattern search searches the melodic abstractions of all occurrence of a search pattern. A partition finds all N-Grams in a melody according to user-specified criteria, whereby the full repository serves as kind of background for a specific partition, e.g., users can demand to include only N-grams which occur at least in a certain number different melodies. A database request calculates all N-grams up to a certain maximum N in the repository, which are typically a lot, but output can be pruned to manage the abundance (cf. below). This mode essentially allows to extract Markov transition probabilities of arbitrary order from the specified repository, which might be useful for other purposes as well.

melpat can be operated from the commandline alone, but because of the extensive number of parameters needed to define requests the normal mode of operation will most likely employ configuration files written in the YAML language. (MeloSpyGUI: All parameters are handled via the UI). The format of which will be explained below.

N-grams, repositories and patterns: A short introduction¶

A melody is represented as list of events which certain properties (onset, pitch, duration etc.). Sequences of abstraction, e.g., semitone-intervals, are derived from this basic representation for pattern analysis. A melody repository is a set of melodies, a transformation repository is a corresponding set of abstracted sequences. N-grams are then subsequences of length N of the transformed sequences. More specifically, one can define a N-pattern (pattern for short, sometimes also called m-type) as a set of N-grams having the same value, with instances occurring at certain positions in the repository. E.g., if we have sequences s1= “abcd” and s2= “cdab” in the repository, the the 2-pattern “ab” can be found in at position 0 in s1 and at position 2 in s2 (sequence elements are numbered starting with 0). The (absolute) frequency of this specific 2-pattern is 2, the relative frequency is the absolute frequency divided by the total number of N-grams in the repository, which is the combined length of all sequences minus the number of elements multiplied by N-1, since a sequence of length M can have at most M-(N-1) N-grams. For our example, this means that there are 4+4 - 2*(2-1) = 8-2 =6 possible N-grams, hence the relative frequency (or probability for that matter) is 2/6 = 1/3. On the other hand, the bi-pattern “bc” is only occurring once in the repository, hence its probability is 1/6. A partition of s1 with respect to our mini-repository using only patterns that occur at least twice would consist of “ab”, “cd”. The pattern database for this repository with maximum N of 4 (tetragrams) is “a” (frequency: 2), “b” (2), “c” (2), “d” (2) for unigrams and “ab” (freq. 2), “bc” (1), “cd” (2), “da” (1) for bigrams, “abc” (1), “bcd” (1), “cda” (1), “dab” (1) for trigrams and “abcd” (1) “cdab” (1) for 4-grams.

Input formats¶

melpat reads the following input formats:

Weimar Jazz Database-Format (SQLITE3)

MIDI (monophonic)

MCSV1 (a CSV-based melody format)

MCSV2 (a CSV-based melody format)

EsAC (one melody per file)

EsAC Database (SQLITE3)

Tony/yYIN note layers as CSV exports

Output formats¶

There are three different output formats available (selected by the parameter display, cf. below): list, stats and midi, where the last one is only availabe for searches but not in partition and database mode. In list mode , all pattern instances will be written to a CSV formatted file comprising several coordinates (indices, onset, metrical position) and lengths (number of elements, duration) as well as the actual values and absolute and relative frequencies. In stats mode each pattern is listed along with it absolute and relative frequencies, no position information is given. The midi format is kind of special. For each occurrence of the specified pattern, the corresponding melody snippet will be rendered as MIDI and these snippets will be concatenated with 2 sec breaks in between into a single MIDI file. The purpose of this output format is to allow quick auditory checks of the found patterns. The naming scheme for different combinations of requests and labels can be found here.

Usage¶

Here is the basic call to melpat:

melpat [-h] [--version] [-d DIR] [-o OUTDIR] [-f FILE] [-c CONFIG]
              [-s SEARCH] [-m MAXN] [--verbose] [--dbtype {SQLITE3}]
              [--dbpath DBPATH] [--dbuser DBUSER] [--dbpwd DBPWD]
              [positional]

Options and arguments¶

-h, --help¶: Show a help message and exits.

--version¶: Shows program’s version number and exit

-d <DIR>, --dir <DIR>¶: Use <DIR> as working directory for reading the project files. Default is the current directory.

-o <OUTDIR>, --outdir <OUTDIR>¶: Write result file(s) to <OUTDIR>. Default is the current directory.

-f FILE, --file FILE¶: Input file (wildcards allowed). If the melodies are retrieved from a database <FILE> must have the Query block syntax, wrapped up in a Python- style dictionary and embraced by single or double quotes.

-c <CONFIG>, --config <CONFIG>¶: Use the configuration file <CONFIG>, see below. The arguments <DIR> and <OUTDIR> override the settings in the config file, but the positional argument takes precedence over the one specified in the configuration file.

-s SEARCH, --search SEARCH¶: This string specificies the search (or operation) to be performed. It has the same syntax as the Search pattern syntax used in the configuration file.

-m MAXN, --maxN MAXN¶: Specifies the maximal N-gram length to be used.

--verbose¶: Be more talkative.

--dbtype DBTYPE¶: To read melodies from a database, the required database options need to be specified and the file option must be a query. This option indicates the type of database to use. Currently only SQLITE3 is supported (also default). Has precedence over according value in configuration file.

--dbpath DBPATH¶: To read melodies from a database, the required database options need to be specified and the file option must be a query. Absolute or relative path to output database are allowed. Has precedence over according value in configuration file.

--dbuser DBUSER¶: If credentials are needed for database access, here the user name can be specified. Has precedence over according value in configuration file.

--dbpwd DBPWD¶: If credentials are needed for database access, here the password can be specified. Has precedence over according value in configuration file.

positional¶: The last positional parameter specifies the output file basename. It is possible to use stdout here, in which case, the output will not be written to files but printed to stdout, i.e., the console window.

Configuration files¶

Here is a typical example:

---
dir: /My/Data/
outdir: /My/Data/Patterns/
outfile: melpat_example.csv

maxN: 30
tunes:

  #will be ignored since we are working in database mode
  - file: 'Miles*_PREFINAL.sv'

  #all solos  from the database
  - query:
      conditions:
        solo_info:
          performer: '%'
          title: '%'
      display:
        transcription_info: filename_sv
      type: sv

#use the WJazzD
database:
  type: sqlite3
  path: wjazzd.db
  password: None
  use: True

requests:

  #Interval pattern partition of "So What" by Miles Davis with minimal length of 8 intervals,
  #that occur at least twice in two different solos. Write only partition statistics
  -
    transform: interval
    pattern: max
    minN: 8
    minOccur: 2
    minSource: 2
    display: stats
    trillfilter: 2,2
    arpeggiofilter: d
    scalefilter: d
    simul: False
    items:
      - title: So What

  #Search for (strictly) rising of falling scales beginnings with respect to chord context,
  #Listing of all instances
  -
    transform: cpc
    pattern: [0,  2, '(', 3, '|', 4, ')', 5, '(',  7, ')+?(', 8, '|', 9, ')+?']
    secondary:
        transform: parsons
        pattern: ['(', +1, '|', -1, ')', '+']
        operation: match
    display: list
    label: scale_test

  #All uni- and bigrams with frequencies and probabilities from the Metrical Circle Map (N=48)
  -
    transform: mcm-48
    pattern: database
    minN: 1
    minOccur: 1
    maxN: 2
    display: stats

Explanation¶

The beginning of the configuration file contains global settings for the working directory dir, the output directory outdir and the basic output filename outfile. This is followed by maxN which is global parameter for maximum N-Gram size used in the partition and database mode, but this value can (and will be) overwritten for database base mode (cf. Database mode). The tunes section specifies the input set of melodies to be used. See Data selection for an detailed explanation. If the melodies are taken from a SQL-database, the very last section database comes into play, otherwise it is ignored. (Of course the order of sections is arbitrary and can be freely shuffled). The most important section is the request section. It can contain a YAML list (items starting with a hyphen) of pattern requests. As mentioned above, there are three different kinds of requests: Partition mode, Search mode and Database mode, which will be explained in full detail in following.

Pattern requests¶

For each type of requests a different set of parameters is used, which will be explained below.

Partition mode¶

Partition mode is designed for finding pattern partitions of single pieces, i.e., complete lists of all real patterns occurring in a melody with respect to all other melodies in the repository as specified by user-definable criteria, which are are minimal and maximum length (N), minimal number of occurrences (total frequency in the whole repository) and minimum number of different sources (i.e., number of different melodies where the pattern must occur). Sub-patterns are filtered out, unless they do not occur somewhere else, hence, having a “life” on their own. This filter feature reduces the result set vastly. Furthermore, options exists to filter out patterns of lesser interest, i.e., trills, scales sections and arpeggios. Partition can be stored as Lists (every pattern listed) or just with a set of Stats for each partition.

Partition mode¶
Field	Values	Description
`transform`	Suitable transformations	The transformation to be applied to the melodies. Currently, only a certain subset of transformations are available.
`pattern`	`all`, `partition`	Both values indicate the same operation (and exist only because of the indecisiveness of the programmer).
`minN`	positive integer > 0	Specifies the minimum length of N-grams that should be included in the partitioning.
`minOccur`	positive integer > 0	Specifies the minimum number of occurrences a pattern must have in order to be included.
`minSource`	positive integer > 0	Specifies the minimum number of different source sequences in the repository in which a pattern must occur in order to be included. If `minOccur` is smaller than `minSource`, `minOccur` is effectively the same as `minSource`.
`display`	`stats`, `list`	If `stats` is selected, certain statistical information (cf. below) about the retrieved partitions are written to the output file. In `list` mode, all patterns are listed using a certain format (cf. Pattern output formats).
`trillfilter`	`<minPeriod>, <maxPeriod>`	If specified, a trill filter will be applied. A trill is defined here as a pattern that consist solely of a repetition of a single sub-pattern, which is a slight generalization from the standard musical definition. Only trills with a sub-pattern length of at least `<minPeriod>` and at most `<maxPeriod>` will be filtered from the patterns of the partition.
`scalefilter`	`d` `u`, `n`	If the transformation is `pitch` or `interval` this option enables to filter scale-like patterns, i.e., sequences of semi- and whole tones. If the value is set to `d`, only directed scale-like patterns will be filtered out, i.e., all having intervals the same direction (ascending, descending), if set to `u` direction is not considered. If set to `n`, the scale filter is ignored. (MeloSpyGUI: Choose `Directed`, `Undirected` or `None`.)
`arpeggiofilter`	`d` `u`, `n`	If the transformation is `pitch` or `interval` this option enables to filter arpeggio-like patterns, i.e., sequences for minor and major thirds. If the value is set to `d`, only directed arpeggios will be filtered out, i.e., all having intervals the same direction (ascending, descending). If set to `u` direction is not considered. If set to `n`, the arpeggio filter is ignored. (MeloSpyGUI: Choose `Directed`, `Undirected` or `None`.)
`simul`	`True\|False\|N`	If set to `True` or to a positive integer N>0, the partition will not be done using the actual melody objects but derived simulated transformations with the same overall length and the same N-gram distribution (Markov model of N-th order) as the melodies. This is useful for comparison.
`items`	Integer span OR list of field-value pairs specifying melodies.	Partitions will only be calculated for the specified items. If this parameter is missing, all possible items will be considered. Items can be specified either by an integer span, or by a list of metadata specifiations. An integer span has the form `<N>-<M>`, e.g., `0-3`, in which case partitions for the first four item in the repository will be calculated. In the second case, a list (YAML list items are starting with a hyphen) of field-value pairs can be used. The basic syntax is similar to the input file specification: `<METADATA-FIELD>:<VALUE>`. No wild cards are allowed for the value, but all metadata fields can be used, e.g., `performer`, `title` or `filename_sv` for solos from the WJazzD. In the example above, the melody repository consists of all Miles Davis solos in the database, which serve as a “pattern background” for the partition of pitch patterns, but only the partition for Mile’s solo on “So What” will be written to the output file. (MeloSpyGUI: Items cannot be specified at the moment, all partitions for all melodies will be calculated.)

Search mode¶

In search mode, a specific pattern is searched across the repository of melodic sequences. Patterns are defined with respect to some Suitable transformations and can be searched using a special Search pattern syntax which includes the option to use regular expressions. The search can be complemented with a secondary search, mostly using another transformation, with allow to filter the search result, e.g, with respect to certain constraints such as patterns starting or ending on a strong beat in a bar on at the begin or end of musical phrase (Accents might be helpful.).

Primary search specification¶
Field	Values	Description
`transform`	Suitable transformations	The transformation to be used on the melodies. Currently, only a certain subset of transformations are usable(cf. Suitable transformations.
`pattern`	Search pattern syntax	A valid search pattern, other than `database`, `all` or `partition`.
`secondary`	YAML dictionary	Specifies a secondary search, i.e., a search for a pattern (possibly from a different transformation) in the result set of the primary search. See Secondary search specification below for details.
`display`	`list`, `stats`, `midi`	If `stats` is selected, statistical information (cf. Pattern output formats) are written to the output file (a CSV file). In `list` mode, all patterns are listed using a certain format (cf. above). In `midi` mode, all retrieved patterns are written to a MIDI file.
`label`	String	An arbitrary label for the given search. If a label is specified, the resulting patterns for a single search are written to different files each, which names are the value of `outfile` appended by the pattern label.

As explained above, secondary searches can be defined which act as a filter on the result set of the primary search. The parameters are quite similar.

Secondary search specification¶
Field	Values	Description
`transform`	Suitable transformations	The transformation to be used on the melodies. Currently, only a certain subset of transformations are usable(cf. Suitable transformations.
`pattern`	Search pattern syntax	A valid search pattern, other than `database`, `all` or `partition`.
`operation`	`match`, `find`, `exclude`, `ignore`	Specifies how the secondary search should act on the primary result set (basically a filter). For `match`, only results which match the secondary search exactly will be included in the final result set. For `find`, patterns which contain the secondary search are included. For `exclude`, results in which the secondary is not found (partially contained) are kept. For `ignore` the secondary search is ignored at all.

Database mode¶

In database mode, all information about all patterns occurring in the selected pieces using the specified transformation can be evaluated either as complete dump (list) or as statistics (stats mode). The last option is specifically useful for estimating Markov transition probabilities. In database mode, the global maximum N can be overridden, which might be useful, since there will be a lot of N-grams. Additionally, there are options to specify minimal N and minimal frequencies to prune the output.

Database mode¶
Field	Values	Description
`transform`	cf. Suitable transformations	The transformation to be used on the melodies. Currently, only a certain meaningful subset of transformations are usable.
`pattern`	`database`
`minN`	positive integer > 0	Specifies the minimum length of N-grams.
`minOccur`	positive integer > 0	Specifies the minimum number of occurrences a N-Gram to be written. Helpful to prune the output.
`display`	`stats`, `list`	If `stats` is selected, statistical information about the N-Gram are written to the output file. In `list` mode, all N-Grams are listed using a certain format (cf. Pattern output formats). Prepare for very large output files in `list` mode, if maximum N is high and/or the melody repository is large.
`simul`	`True\|False\|N`	If set to `True` or to a positive integer N>0, the database will not be calculated on the actual melody objects but on a derived simulated transformations with the same overall length and the same N-gram distribution (Markov model of N-th order) as the melody. This is useful for comparison.

Pattern output formats¶

There are three possible output formats list, stats, and mid for pattern request modes, whereas the mid mode is only available for pattern search, and the actual format of the output files in stats do differ for the different mode.

list: In this mode, the patterns occurrences are written to a CSV formatted output file. Each line contains an ID of the containing melody as well as starting positions (index in the melody, raw onset in secs, and metrical position), absolute and relative frequency, length (number of elements and absolute duration in sec) as well as the value of the pattern. If a label is specified for the search, the results will be written in a file with name derived from the specified basic output file name and the label. If the label is missing, all results will be written into a single file.
stats: In this mode, only global information of the retrieved patterns are written, e.g., length, value and absolute and relative frequencies.
midi: In this mode, the occurrences of the patterns are written to a single MIDI file with a 2 seconds pause in between pattern instances. A label allows to write the result sets of each search to a different MIDI file.

Sample output in list mode: for a search request:

id;start;N;onset;dur;metricalposition;value;freq;prob100
SonnyRollins_TheEverywhereCalypso-2_PREFINAL.sv;870;6;176.11465;0.45279;4.6.142.4.6;[0, 2, 4, 5, 7, 8];1;0.011
SonnyStitt_Elora_PREFINAL.sv;108;6;27.03469;0.94286;4.2.17.1.1;[0, 2, 4, 5, 7, 9];2;0.023
SteveTurre_Steve'sBlues_PREFINAL.sv;124;6;30.44286;1.13918;4.2.22.2.1;[0, 2, 4, 5, 7, 9];2;0.023

Sample output in stats mode for the same search request:

value;N;freq;prob100
[0, 2, 4, 5, 7, 8];6;1;0.011
[0, 2, 4, 5, 7, 9];6;2;0.023

Sample output of stats mode for a partition of two solos by Zoo Sims:

id;note_count;min_N;max_N;min_occur;min_source;pattern_count;coverage;avg_N;avg_overlap;over_coverage;log_excess_prob
ZootSims_DancingInTheDark-1_PREFINAL.sv;109;5;30;2;2;8;0.303;6.375;2.571;0.545;8.949
ZootSims_DancingInTheDark-2_PREFINAL.sv;168;5;30;2;2;7;0.22;6.0;0.833;0.135;8.698

Meaning of fields in pattern output¶
Field	Description	Mode
`id`	Identifier of the melody. For WJazzD solos, `filename_sv`, for songs from the EsAC DB, the EsAC id and for input files the filename will be used.	`list`
`start`	Start ID of pattern in the melody (zero-based).	`list`
`N`	Length of the pattern.	`list`, `stats`
`freq`	Absolute frequency of the pattern in the repository.	`list`, `stats`
`prob100`	Relative frequency of the pattern in percent. Baseline is the number of all possible N-grams with the same length as the pattern.	`list`, `stats`
`value`	Actual value of the pattern with respect to the underlying transformation. Numerical transformation will be represented as Python arrays, hence, the with square brackets around (might be changed in the future).	`list`, `stats`
`onset`	Onset of pattern in the melody in seconds (for EsAC songs this might be based on an arbitrary tempo of 120 bpm).	`list`
`dur`	Real duration of pattern in seconds in the melody (for EsAC songs this might be based on an arbitrary tempo of 120 bpm).	`list`
`metricalposition`	Metrical start position of pattern in the melody (cf Metrical position notation for description of the syntax).	`list`
`note_count`	Number of notes in melody.	partition: `stats`
`min_N`	Mininum pattern length as specified by the user.	partition: `stats`
`max_N`	Maximum pattern length as specified by the user.	partition: `stats`
`min_occur`	Minimum number of occurrence as specified by the user.	partition: `stats`
`min_sources`	Minimum number of different sources as specified by the user.	partition: `stats`
`pattern_count`	Total number of patterns found by the partition algorithm.	partition: `stats`
`coverage`	Share of note in the solo contained in a least one pattern (“Coverage”)	partition: `stats`
`avg_N`	Average length of patterns in the partition.	partition: `stats`
`avg_overlap`	Average overlap between patterns in the partition.	partition: `stats`
`over_coverage`	Indicator, in how many patterns notes are contained on the average. Equals 0 if every note of the melody is contained in exactly one pattern.	partition: `stats`
`log_excess_prob`	Average value of logarithm of excess probability of all patterns. Excess probability in this sense is the quotient of relative frequency of the pattern divided by the product of probabilities of each single element in the pattern. Low excess probability indicates that a pattern might have occurred just by chance from a 0-th order Markov process.	partition: `stats`

Suitable transformations¶

Here is a list of suitable transformations, since not all available transformations make sense for pattern retrieval.

Pattern transformations¶
Transformation	Abbreviation(s)	Type
MIDI Pitch	`pitch`	Integer [0:127]
Absolute Pitch Class	`pc`, `pitch-class`	Integer [0:11]
trans_intervals	`interval`	Integer
trans_fuzzy_intervals	`fuzzyinterval`	Integer [-4:4]
trans_contour	`parsons`	Integer [-1:1]
Chordal Pitch Class	`cpc`, `chordal-pitch-class`	Integer [0:11]
Chordal Diatonic Pitch Class	`cdpc`, `chord-dpc`, `chordal-dpc`, `chordal-diatonic-pitch-class`	String of symbols “1234567TL<>B”
Extended Chordal Diatonic Pitch Class	`cdpcx`, `chord-dpc-x`, `chordal-dpc-ext`, `chordal-diatonic-pitch-class-ext`	String of symbols “1234567TL<>B%”
Duration Classes	`durclass-abs`, `durclass-rel`	Integer [-2:+2]
Inter-onset Interval Classes	`ioiclass-abs`, `ioiclass-rel`	Integer [-2:+2]
trans_mw	`metricalweights` , `weights`, `mw`	Integer [0:2]
trans_meter_mcm	`mcm`, `mcm-<N>`	Integer [0:N-1], default N=48
Accents	`accent-<TYPE>`	Integer, Real [0:1]

Search pattern syntax¶

The search pattern syntax is derived from corresponding Python constructs, which are basically strings and arrays, mixed and enhanced with syntactical constructs from Pythons regular expressions. Python arrays are written as comma-separated list, embraced by square brackets, strings are list of character embraced by single- or double-quotes. The third column in table Pattern transformations indicates which syntax is to use with each transformation. If a search pattern contains values outside of the indicated range, no error will issued, but melpat will return an empty result set. The regular expression symbols are added in the case of integer arrays as string elements in the array, i.e., elements surrounded by single- or double quotes. For string based transformation, this is not necessary.

Note

The syntax used in the MeloSpyGUI is little bit simplified in comparison to the melpat syntax (for convenience). First, surrounding square brackets ([ and ]) can be left out. Second, instead of comma separated lists, single spaces can be used as separators (but not arbitrary whitespace, just single spaces!). Regular expression bits still have to be embraced by quotes (except for string type transformations).

Let’s consider the slightly simplified example from the configuration file above:

transform: cpc
pattern: [0, 2, '(', 3, '|', 4, ')', 5]
secondary:
    transform: parsons
    pattern: [+1, '+|', -1, '+']
    operation: match

The transformation of the primary search is cpc, i.e., chordal pitch classes which take values in the range from 0 to 11. The specified pattern starts with two simple elements, 0 and 2, i.e., the root and the major ninth of chords. Then follows an opening bracket ( as string element surrounded by single quotes. Next is the regular element 3, which means the minor third of a chord. Next up is the pipe symbol | which means “OR” in regular expression parlance, readily followed by a 4, which stands for the major third of a chord, and the closing bracket ). Hence, we have the partial expression (3|4) which means “either a minor or a major third”. Finally, there is a regular 5, which matches the fourth of a chord. Note, that each element has to occur exactly once. Taken together, this pattern will match any cpc-subsequence of the form 0235 or 0245, something like the lower tetrachord. However, because cpc’s are pitch classes without octave, information about interval direction is lost. A sequence 0234 consisting of a jump of minor seventh down, a jump a ninth up and a jump a major seventh down will also match the specified pattern. Now have a look at the secondary search. The transformation is Parson’s code, or interval direction, with possible values of -1, 0, and +1. Using D, R, and U for simplicity, the pattern for the secondary search translates to the regular expressions (U+|D+), which will match any sequence of only ups or only downs. The operation of the secondary search is match in this case, hence, it will find cpc sequences of the desired primary form, which are also strictly ascending or descending. If the operation would have been find, any Parson’s sequence containing at least one U or one D would be matched, which most likely would not change the result set at all (since a 0234 sequence that only consists of note repetitions would be quite special). In the case of the secondary operation exclude, the final result set would probably be empty for the same reason.

Regular Expressions: A short introduction

Regular expression are a powerful albeit a bit cryptical tool which is an integral part of nearly all important programming languages. They provide means for pattern matching and searching for substrings in strings or texts (sequences of alphanumeric and interpunctuations signs). Regexes, as they are frequently called, are around in the computing world since the 50’s and can be viewed as specialised mini-programming language. There are basically three different kind of elements in a regular expression, all of which are expressed with cryptical character sequences, which exactly makes regular expression often hard to read and digest. The three kinds of information are “What”, “Where” and “How many”. Elements from the “What” category specify which kind of character should be matched, e.g., the set of lower case characters, which are in many regular expression dialects (unfortunately, there are many different dialects, which contributes to the confusion) expressed as a so-called character set [a-z]. The simplest case is just a single character, e.g., “A”. This is already a valid regular expression and will match any occurrence of the character “A” in a string. There are also wildcards, such as the dot ., which stands for any character (except the newline, probably). The “Where” category offers the least options, basically only whether a pattern should be located at the very beginning (the caret ^) or the very end of a string (the dollar sign $). Finally, the “How many” symbols, also called quantifiers, allow to be more specific about occurrence frequency, i.e., repetitions of characters. In Python’s regular expressions, these are the special symbols * (arbitrary number, 0 to N), + (at least 1, 1 to N), ? (one or none, 0-1). Furthermore, one can be even more specific with the constructs {m} and {n,m} which translate to “exactly m” and “between n and m repetitions”. All quantifiers are written directly following a “What” specification and pertain only to the immediately preceding part. The quantifiers in their pure and raw form are “greedy little suckers”, which means, that they will try to match as many characters as possible. This is often not desired. To make them non-greedy, hence, to make them match as less as possible, one can add a ? sign, e.g., +?, *?, ??, which are the non-greedy versions. “What” specifications can be constructed from atomic “What” specifications by means of the pipe symbol | which means “this or that”, e.g., “a|b” matches an “a” or a “b”, and parantheses, e.g., “(a|b)|c” will match either an “a” or a “b” or a “c”, which means that the brackets are actually redundant in this case. But combined with quantifiers they might become significant, e.g., “(a|b)??|c+?” will match zero or one occurrence of “a” or “b” or at least one occurrence of “c” (non-greedy!). Hence, “c” will match, “a” and “b” will match, in “ac” it will find the “a”, the leftmost options, but the empty string “” will not match. There are some more features of Python regular expressions, but most of them are not useful in the case of melodic patterns, e.g., special character classes. But see here and there for more detailed information on Python’s regular expressions.

Output file naming scheme¶

The following table summarize the different output file namings done by melpat. <outfile> is the name of the specified output file name, <transformation> is the transformation and <label> the label of the pattern request. Optional parts are shown in square brackets, where “tf” stands for trill filter, “sf” for scale filter, “af” for arpeggio filter, and “sim<N>” for simulated N-gram databases using a Markov model of order N.

Output file naming scheme¶
Pattern Mode	Display Mode	Result without label	Result with label
Search	`list`	<outfile>.csv	<outfile>_<label>.csv
Search	`stats`	<outfile>_stats.csv	<outfile>_<label>_stats.csv
Search	`midi`	<outfile>.midi	<outfile>_<label>.midi
Partition	`list`	<outfile>_<transformation>_<min_N>_<min_occur>_<min_source>[_tf][_sf]_[af][_sim<N>].csv	<outfile>_<label>.csv
Partition	`stats`	<outfile>_<transformation>_<min_N>_<min_occur>_<min_source>[_tf][_sf]_[af]_stats[_sim<N>].csv	<outfile>_<label>.csv
Database	`list`	<outfile>_<transformation>_db.csv	<outfile>_<label>_db.csv
Database	`stats`	<outfile>_<transformation>_db_stats.csv	<outfile>_<label>_stats.csv