As mentioned in the previous post, the automatic transcription process is checked and examined by humans. What is actually done in this last step of creating a ready-for-publish product?
Firstly, the transcription is compared with the audio. Audio recognition is a highly complex task. The relevant pitches have to be separated from the frequencies that are irrelevant to the transcription such as overtones and noise (e. g. percussion or consonants of the lyrics). All these are merged together in a single, complex waveform which should be decoded. If wrong notes or chords appear, they are corrected. This in turn helps the audio recognition technology to become even better.
Secondly, sometimes the score has to be adjusted to conventions of notation. For example, rhythms that reveal the beats of the basic metre (mostly requiring the division of a note into two which then are tied), or the notation of swing with even note values and the respective interpretation mark rather than triplets, or enharmonic change of accidentals to fit the tonality of the song and so on.
Thirdly, if the score is an arrangement for other than the original instruments, it might be necessary to do some adjustments for easier execution or for more idiomatic sound. Especially when an accompaniment is written out, the voicings of chords often are subject to subtle changes.
Finally, the layout is adjusted. Every note should have enough space to be easily readable, a system should not contain too few or too many bars, a page should not comprise too few or too many systems. Furthermore, every score should comply to Soundnotation’s score design.
Thus, the process of creating a score from an audio file involves both automatic processes and human expertise. The first one is doing the otherwise time-consuming transcription work, while the latter ensures a high-quality product