Produce Like a Nerd (2): You can do all of this with a DAW, can't you?…

In part 1 we have seen a toolchain that automates the production of a notation video with a reasonable backing track from a notation text file and a configuration file.

All of this is based on several open source programmes and a coordination programme called LilypondToBandVideoConverter [LTBVC] on the command line, hence it is not very interactive.

An obvious idea would be to combine the interactive approach of a DAW with the possibilities of this tool chain.

I shall show the procedure using the example of the DAW Reaper, but you could do this in similar form for other DAWs as long as those provide an API and you can port the scripts presented to that platform.

What we're doing is the following: we shall replicate the steps from the first part and have a look on how we can use the DAW intelligently for them. This will rely on scripts [LTBVC-Plugins] and plugins [SoXPlugins] I have written and provided via GitHub.

As an example we again take the demo song from part one. The GitHub link also provides the corresponding Reaper demo project, which we will use in this part.

Structuring the Project

The notes for the four voices (bass, vocals, guitar, drums) are each collect in a single track of the DAW, but we split the notes into fragments within the tracks.

For an easier orientation on the time axis, we define regions for intro, verse 1 and 2 and the outro. This can normally be done by snapping your fingers, but we use a simple script for that, which derives those regions from a structure track.

From experience it is a lot easier to copy, move, rename or colour MIDI items on a track than to manipulate regions correspondingly. Hence we build such a structure track and generate the regions from it via the action MakeRegionsFromRegionStructureTrack. Position, name and colours will be taken over from the MIDI items into the regions (see figure 1).

It shows the simple structure of the demo song quite clearly.

regions and structure track side by side — Fig. 1: Regions and Structure Track in Comparison

Note Entry — Phases “extract”, “score” and “midi”

The demo song consists of several similar parts as one could easily see in part 1. In the DAW we enter the notes in the standard manner and make aliases (“pooled items”) for identical fragments, which then change synchronously upon a change in one of the connected fragments. For example, the notes in the bass for verse 1 and 2 are identical; hence an “original” and an “alias” is used. Figure 2 shows, how that looks for the song.

structured voice tracks for demo song — Fig. 2: Structured Voice Tracks for the `ltbvc` Demo Song

But since the external tool chain relies heavily on the lilypond program [Lilypond]: how do you get to the note texts for the lilypond file? Do you have to type them in manually somehow? No, there is another script for that called ExportLilypond, which generates the lilypond representation for a selected MIDI item.

Notes will be generated in English notation (which fits quite well to the convention of the external tool chain from part 1). This means, an f♯ (f sharp) will be notated as an “fs”, a e♭ (e flat) will be an “ef”. The algorithm analyses the MIDI notes from the note fragment along the measures and groups them into the minimal number of notes conformant to common notation guidelines. Chords will also be recognized automatically and correctly notated.

Each note will be split into (tied) partial notes that lie on raster positions of a measure where they may be located according to those guidelines. This sometimes leads to seemingly strange splits as you can see in figure 3a shows the original sequence, figure 3b the splitting by the script according to notation guidelines.

naive note split example — Fig. 3: Splitting of Notes onto Musical Raster Positions

complex but correct note split example — Fig. 3: Splitting of Notes onto Musical Raster Positions

The start octave for the lilypond fragment depends on the fragment name (indicating the instrument). For example, the lilypond representation of a guitar fragment starts with the lilypond text “\relative c'” (= C₃), one for a bass fragment with “\relative c” (= C₁).

The result of applying the script on the bass verse is shown in figure 4. For a drum fragment automatically a drum notation is used (see figure 5).

lilypond result for a bass MIDI item — Fig. 4: Translation of the Bass Verse into Lilypond Notation

Note that — compared to the notation in the first part — any repetitions are not automatically generated (“\repeat unfold 7 {  a8  }); you will have to correct that manually.

lilypond result for a drums MIDI item — Fig. 5: Translation of the Drums Verse into Lilypond Notation

It is also useful to normalise the notes input into the DAW. Typically you want to compare them with the externally generated MIDI file (for example, to check the effects of the humanisation settings).

For this normalisation the script normalizeStructuredVoiceTracks has to be executed; it scans all structured voice tracks (identifying them via a naming convention), set all MIDI velocities to the default 80, quantises the notes on the lilypond raster and deletes all MIDI control codes for pan, volume and reverb, because those settings will be defined in the external tool chain via the configuration file.

So it is quite straightforward to gather the notes for the lilypond file and with it one can generate the voice extracts, the score, the silent video and the MIDI file.

Comparing the DAW Notes with the Generated MIDI File — “midi” Phase

Because you typically fiddle around with the notes in the lilypond file (for example, to extract common parts or notate the repetitions more intelligently), it could happen that the DAW notes do not fully match the generated MIDI file.

Of course, there are also differences caused by the humanisation for the MIDI file, but the difference to the normalised notes in the DAW is typically very small. If you want to check the tool chain, we are mainly interested in massive deviations (for example, that a note is in the wrong octave or that the song structure is completely divergent).

The MIDI file generated by the ltbvc could be repeatedly imported into the DAW project, but this is tedious, especially when those MIDI tracks are distributed freely in the project. But once the import has been done once, a script called ImportMidi helps massively with the re-import.

The script relies on the predictable structure and the naming conventions for the tracks in the MIDI file generated. It scans the project for tracks according those conventions: having a single MIDI item, whose name ends with “.mid” and starts with voice name and MIDI file name.

Altogether is does the following: it reads the MIDI file into temporary tracks, updates existing tracks with the appropriate names and the deletes those temporary tracks. In the updated tracks all MIDI control codes for pan, volume and reverb are removed (because — as above — those values will later be defined by the song configuration). Finally those tracks are locked against accidental changes (they shall only be read or re-imported).

imported MIDI tracks of demo song — Fig. 6: “Updateable” MIDI Tracks of the Demo Song

The four voices of the demo song need four MIDI tracks. Figure 6 shows the tracks and the MIDI items with the names conforming to the convention in the Reaper demo project.

Once you have imported those tracks you can typically compare the structured voice tracks from above with those imported tracks from the external tool chain. For example, in Reaper you can open the MIDI editor for such a structure tracks and additionally select the corresponding MIDI file track for a comparison.

Figure 7 shows such a comparison between a structured bass track and the related MIDI track. For demonstration purposes, some notes in the structured bass track were moved arbitrarily after having transferred them to the external tool chain: it is quite visible, that the tracks are not identical any longer (the moved notes have a red, the generated notes a green circle).

visual comparison between original and
generated MIDI data — Fig. 7: Comparison between Structured and MIDI Bass Track

Intermediate Result (after “preprocess” Phase)

We have covered the first half of the processing phases (see figure 8. With a DAW we can export notes into lilypond (via “exportLilypond”), normalize the notes in structured voice tracks (via “normalizeStructuredVoiceTracks”) and also quickly import the generated MIDI file to compare it with the data in the DAW (via “importMidi”).

coverage of first processing phases by scripts
for Reaper — Fig. 8: Script Support in the DAW for the First Phases of the `ltbvc`

However, we have only arrived at the MIDI file and the silent video. Now the question arises how the audio generation can be supported. The aim must be that the result in the DAW is as close as possible to — and ideally identical with — the result of the external tool chain on the command line.

The idea is that plugins in the DAW transform MIDI data into audio data and apply effects to it in a form fully corresponding to the external tool chain.

In the following sections we will see, if this works and how.

Generation of Audio from MIDI Notes — Emulation of the “rawaudio” Phase

Well, it should not be too hard to make a DAW convert a MIDI track into audio: this is what it does all the time 😉

In principle, this is correct: you just add a conversion plugin to the track (very often a VSTi-Plugin), that takes MIDI and generates audio. And that's it.

But we do not want just any generated audio signal, but one that is as close as possible to the audio file generated in the tool chain via the programme fluidsynth [Fluidsynth] using a soundfont.

There were some previous efforts like Alexey Zhelezov's [FluidSynthVST] or Birch Labs' [JuicySFPlugin], but those are a bit tricky to use and support for them is unclear. But the main point is that even though they rely on the FluidSynth library, they do not exactly reproduce an external fluidsynth rendition of some MIDI data.

So I implemented two programs based on the FluidSynth library [FSPlugin]. The first component is a DAW plugin called FluidSynthPlugin. It has a simplistic interface where you specify a soundfont, several fluidsynth settings and possibly a MIDI program to be selected by putting text data in a text field. Then you are able to convert an incoming MIDI stream in a DAW to audio using the FluidSynth library.

Another tool circumvents some rasterization by the player of FluidSynth. That second component is a simplistic but pedantic command-line converter called FluidSynthConverter. It converts a MIDI file into a WAV file, is also based on the fluidsynth library and does the same sample-exact event feeding into that library as the plugin.

This means: a MIDI-track (or a structured voice track) has fluidsynthplugin as its first effect and there the exact MIDI instrument is selected, which has been specified for the voice in the configuration file. If you have specified there the MIDI instrument “34” for the bass voice, then in fluidsynthplugin a “Picked Bass” should be selected using the same soundfont. Figure 9 shows that configuration according to the configuration file for all voices of the demo song.

FluidSynth plugin settings for bass — Fig. 9: Example FluidSynthPlugin Settings for Bass

Unfortunately there are some problems: when a soundfont uses so-called modulators, those will never be in sync with those in an external tool chain. Also plugins process audio data with a specific raster (the buffer size, for example, 64 samples); when internal and external raster are not completely congruent, the generated audio signal will not be identical. And that happens already when the start position of the DAW playback does not lie on such a raster position of the external generation.

Nevertheless for innocent soundfonts (without modulators and/or chorus) and some raster adaptation in the DAW the difference between DAW and external rendering is minimal. When using both components (command-line and DAW) on the same MIDI data they typically produce audio output with a difference of less than -200dBFS in a spectrum analysis (as shown in figure 10).

minimal noise floor found when doing
deviation measurement — Fig. 10: Measurement of the Deviation between Internal FluidSynthPlugin and Pedantic Fluidsynth Converter in the External Tool Chain

Effects from SoX as Plugins — Emulation of the “refinedaudio” Phase

Okay, if it is hard to transform MIDI notes into raw audio data in the DAW bitidentically to the external tool chain, then it is for sure completely hopeless to emulate the SoX effects in the DAW. SoX [SoX] does not provide any DAW-Plugins — it is just a command line programme — and arbitrary effects by other manufacturers are, of course, never bitidentical to those command-line effects.

Completely futile. Crap.

Until 2021, the situation was like that, but because I want to exactly emulate the command-line processing in the DAW, I wrote my own SoX-plugins for the bread-and-butter effects of SoX [SoXPlugins]. Those run on typical platforms and are freely available (for Windows, MacOSX and Linux).

The SoX effects allpass, band, bandpass, bandreject, bass, biquad, compand, equalizer, gain, highpass, lowpass, mcompand, overdrive, phaser, reverb, treble and tremolo are covered by that. They are grouped into six VST-plugins: Multiband-Compander, Filter, Gain, Overdrive, Reverb and Tremolo/Phaser.

All their parameters can be set by sliders and the ranges are identical to those in the SoX effects on the command-line. This needs some getting used to, because, for example, the compander threshold may be set to a minimum of -128dB or the phaser gain may be 1000 at most, but this is just intentional. Figure 11a shows the UI of the SoX-phaser, figure 11b the UI of one band in a SoX multiband compander.

SoX phaser plugin — Fig. 11: SoX Phaser (a) and SoX Multiband Compander (b) as Plugins

SoX compander plugin — Fig. 11: SoX Phaser (a) and SoX Multiband Compander (b) as Plugins

So the aim for those plugins is not to provide overwhelming effects, but to emulate the command line effects from SoX.

Although those plugins have been completely reimplemented, the deliver identical results to the existing SoX effect on the command line for each parameter combination. To check that at least for a random sample, the software package contains a test project, where — for each of the SoX effects above — audio fragments rendered via command line are compared with audio rendered in realtime by the SoX-plugins. This is done by adding the internal and externally rendered audio, but phase shifting one of the summands by 180° (a so-called “null-test”). They practically erase themselves completely, which means: they are identical. Strictly speaking they have a deviation of less than -150dB when doing a spectral analysis, you can only hear that with golden ears 😉 (see figure 12).

Ha, plugin master, now I have caught you in a lie after all: You have two modulation effects tremolo and phaser; those, for sure, do not run in-sync internally with some external tool chain. So nothing with “bitidentical”, right?

In principle, this is correct thinking — and we already had that when discussing the audio generation from MIDI in the previous section —: if some externally generated audio snippet with modulation were at an arbitrary position in the project, a modulation by a corresponding plugin would only be congruent by accident, typically it is out of phase. The reason for that is that plugins typically start their modulation when playback begins. This means technically the phase 0° of the modulation is on the time of playback start.

Figure 13 show that situation for an example. We assume a tremolo effect that is applied to a fragment in an audio track starting at time t_fragment in the DAW. Also the playback in the DAW is assumed to start at time t_play. As one can easily find out,the modulation for the externally processed fragment (that just puts a tremolo on the original snippet) has its phase 0° exactly at time t_fragment. However, the internal effect in the DAW has its phase 0° at time t_play (see also the red dots on the respective tracks marking the phase 0° positions). This would lead to massive differences between externally and internally generated audio.

time-locking scenarios — Fig. 13: Deviation in Modulation between External and Internal Generation and Timelocking

But it can be rectified by defining for the internal effect at which point in time the modulation phase should be 0° (so-called “timelocking”). If you set this to t_fragment, the modulations will be synchronous (as you can see when comparing the second with the lowest track). Of course, the effect starts at t_play, but its modulation phase is shifted such that it reaches phase 0° exactly at t_fragment.

Figure 14 shows, for example, how to set this reference time for the SoX tremolo effect. The parameter Time Offset will be set to the above time t_fragment (we assume that this is the position 324,0s), to be synchronous to the external modulation.

setting reference time for a SoX effect — Fig. 14: Setting the Reference Time in SoX Tremolo Effect

If you configure the external effect chain in SoX and the internal chain in the DAW identically with the same parameter values and also adds the time shifts as shown above for modulations, there are no deviations at all between external and internal audio (precisely it is less than -150dB, see above; this is absolutely inaudible).

Well, even that is a little lie. When null testing sometimes you can hear deviations as ticks; this was caused by digital clipping in the external SoX programme that does not occur in the DAW plugins. Those use floating point arithmetic instead of the integer arithmetic used in SoX. However, the SoX programme signals that clipping in its error log; you just have to adapt the effect parameters accordingly to get rid of that clipping.

Mixing and Adding Audiotracks to Video — “mix” and “finalvideo” Phases

For mixing the audio tracks we do not need any special functions in a DAW: when there are voices in separate tracks of the DAW, their proportion of the total mix can be controlled via faders that should be set to the values in the configuration file.

And the silent video can often be put into its own track in a DAW — Reaper can handle that —; so it is easy to check how well it fits to the audio track (or vice versa).

What we do not have are the subtitles in the video for the measure numbers. They are dispensable: in a DAW we have a time clock that typically shows those measures anyway.

Conclusion and Final Remark

We are done!!

In part 1 we have seen, how to use a text file for an arrangement in Lilypond notation and a configuration file as inputs to fully automatically generate several results: voice extracts, score, MIDI file, audio files with effects and finally notation vides. The process runs on the command-line using common open-source-programmes and the coordination programme ltbvc.

It is a bit tedious, but, however, it delivers usable results.

In this part 2 it was shown, that this process on the command line (a classical “edit-compile-run approach”) can be reasonably complemented by interaction with a DAW (see figure 15). You can

record the notes themselves in the DAW and transform then into text fragments for Lilypond (via “exportLilypond”),
quickly compare the externally generated and humanised MIDI file with the notes in the DAW (via “importMidi”),
produce audio from MIDI that is very close to the externally rendered audio (with a noise of -200dB), when some preconditions are obeyed,
apply DAW plugins (“SoXPlugins”) equivalent in output to the ones in the external effect chain from SoX, and
of course, use simple DAW services (like, for example, mixing voices via the volume faders).

There is some gap in the emulation with the DAW: currently it is hard to convert MIDI for complex soundfonts with modulation into audio both in a DAW and on the command line in an identical fashion. Here one must accept either to use simpler soundfonts or have some difference in modulations.

coverage of audio processing phases by
scripts for Reaper — Fig. 15: Script Support in the DAW for the Phases of the `ltbvc`

Wow, whoever has persevered to this point has my absolute respect: in the two parts of this article, you have seen a very extensive and complex process for making music arrangements and notation videos, but admittedly also a rather nerdy one (hence the title of this article).

For me this approach is perfect; maybe somebody can use it for himself or at least adapt parts of it for his context.

Of course, any feedback is welcome, whether positive or negative!

References

	[Fluidsynth]		Moebert, Tom; Ceresa, Jean-Jacques; Weseloh, Marcus: FluidSynth - Software synthesizer based on the SoundFont 2 specifications.
	[FluidVST]		Zhelezov, Alexey: FluidsynthVST.
	[FSPlugin]		Tensi, Thomas: Simple Wrappers Around the FluidSynth Library as DAW Plugin and Pedantic Command Line Processor .
	[JuicySFPlugin]		Birch, Alex; Birch Jamie: Juicy SF Plugin.
	[LTBVC]		Tensi, Thomas: LilypondToBandVideoConverter - Generator for Notation Backing Track Videos from Lilypond Files.
	[LTBVC-Plugins]		Tensi, Thomas: Reaper Plugins for the LilypondToBandVideoConverter.
	[SFFluid]		Wen, Frank: FluidR3_GM.sf2 SoundFont at archive.org.
	[SoX]		Bagwell, Chris; Norskog, Lance et al.: SoX - Sound eXchange - Documentation.
	[SoXPlugins]		Tensi, Thomas: SoX Plugins - A Reimplementation of the SoX Commandline Processor as DAW Plugins.


(a)		(b)


(a)		(b)