Topic Name: U of R Researchers Successfully Compressed Music File 1,000 Times Smaller than MP3
Category: Computer science & technology
Research persons: Mark Bocko
Location: University of Rochester, United States
Researchers at the
University of Rochester
have digitally reproduced music in a file nearly 1,000 times smaller than a
regular MP3 file.
The music, a 20-second clarinet solo, is encoded in less than a single kilobyte,
and is made possible by two innovations: recreating in a computer both the
real-world physics of a clarinet and the physics of a clarinet player.
The achievement, announced at the
International Conference on
Acoustics Speech and Signal Processing held in Las Vegas, is not yet a
flawless reproduction of an original performance, but the researchers say it's
"This is essentially a human-scale system of reproducing music," says Mark Bocko,
professor of electrical and computer engineering and co-creator of the
technology. "Humans can manipulate their tongue, breath, and fingers only so
fast, so in theory we shouldn't really have to measure the music many thousands
of times a second like we do on a CD. As a result, I think we may have found the
absolute least amount of data needed to reproduce a piece of music."
In replaying the music, a computer literally reproduces the original performance
based on everything it knows about clarinets and clarinet playing. Two of
Bocko's doctoral students, Xiaoxiao Dong and Mark Sterling, worked with Bocko to
measure every aspect of a clarinet that affects its sound—from the backpressure
in the mouthpiece for every different fingering, to the way sound radiates from
the instrument. They then built a computer model of the clarinet, and the result
is a virtual instrument built entirely from the real-world acoustical
The team then set about creating a virtual player for the virtual clarinet. They
modeled how a clarinet player interacts with the instrument including the
fingerings, the force of breath, and the pressure of the player's lips to
determine how they would affect the response of the virtual clarinet. Then, says
Bocko, it's a matter of letting the computer "listen" to a real clarinet
performance to infer and record the various actions required to create a
specific sound. The original sound is then reproduced by feeding the record of
the player's actions back into the computer model.
At present the results are a very close, though not yet a perfect,
representation of the original sound.
"We are still working on including 'tonguing,' or how the player strikes the
reed with the tongue to start notes in staccato passages," says Bocko. "But in
music with more sustained and connected notes the method works quite well and
it's difficult to tell the synthesized sound from the original."
As the method is refined the researchers imagine that it may give computer
musicians more intuitive ways to create expressive music by including the
actions of a virtual musician in computer synthesizers. And although the human
vocal tract is highly complex, Bocko says the method may in principle be
extended to vocals as well.
The current method handles only a single instrument at a time, however in other
work in the University's Music Research Lab with post-doctoral researcher
Gordana Velikic and Dave Headlam, professor of music theory at the
University of Rochester's
Eastman School of Music, the team has produced a method of separating
multiple instruments in a mix so the two methods can be combined to produce a
very compact recording.
Bocko believes that the quality will continue to improve as the acoustic
measurements and the resulting synthesis algorithms become more accurate, and he
says this process may represent the maximum possible data compression of music.
"Maybe the future of music recording lies in reproducing performers and not
recording them," says Bocko.
Note for MPEG-1 Audio Layer 3
MPEG-1 Audio Layer 3, more commonly referred to as MP3, is a digital audio
encoding format using a form of lossy data compression.
It is a common audio format for consumer audio storage, as well as a de facto
standard encoding for the transfer and playback of music on digital audio
MP3's use of a lossy compression algorithm is designed to greatly reduce the
amount of data required to represent the audio recording and still sound like a
faithful reproduction of the original uncompressed audio for most listeners, but
is not considered High Fidelity audio by the elite connoisseur. An MP3 file that
is created using the mid-range bitrate setting of 128 kbit/s will result in a
file that is typically about 1/10th the size of the CD file created from the
original audio source. An MP3 file can also be constructed at higher or lower
bitrates, with higher or lower resulting quality.
MP3 is an audio-specific format. It was invented by a team of international
engineers at Philips, CCETT (Centre commun d'études de télévision et
télécommunications), IRT, AT&T-Bell Labs and Fraunhofer Society, and it became
an ISO/IEC standard in 1991. The compression works by reducing accuracy of
certain parts of sound that are deemed beyond the auditory resolution ability of
most people. This method is commonly referred to as Perceptual Coding.
It provides a representation of sound within a short term time/frequency
analysis window, by using psychoacoustic models to discard or reduce precision
of components less audible to human hearing, and recording the remaining
information in an efficient manner. This is relatively similar to the principles
used by JPEG, an image compression format.
The MPEG-1 standard does not include a precise specification for an MP3 encoder.
Implementers of the standard were supposed to devise their own algorithms
suitable for removing parts of the information in the raw audio (or rather its
MDCT representation in the frequency domain). During encoding, 576 time domain
samples are taken and are transformed to 576 frequency domain samples. If there
is a transient, 192 samples are taken instead of 576. This is done to limit the
temporal spread of quantization noise accompanying the transient. (See
As a result, there are many different MP3 encoders available, each producing
files of differing quality. Comparisons are widely available, so it is easy for
a prospective user of an encoder to research the best choice. It must be kept in
mind that an encoder that is proficient at encoding at higher bit rates (such as
LAME) is not necessarily as good at lower bit rates.
Decoding, on the other hand, is carefully defined in the standard. Most decoders
are "bitstream compliant", which means that the decompressed output - that they
produce from a given MP3 file - will be the same (within a specified degree of
rounding tolerance) as the output specified mathematically in the ISO/IEC
standard document. The MP3 file has a standard format, which is a frame that
consists of 384, 576, or 1152 samples (depends on MPEG version and layer), and
all the frames have associated header information (32 bits) and side information
(9, 17, or 32 bytes, depending on MPEG version and stereo/mono). The header and
side information help the decoder to decode the associated Huffman encoded data
Several bit rates are specified in the MPEG-1 Layer 3 standard: 32, 40, 48, 56,
64, 80, 96, 112, 128, 144, 160, 192, 224, 256 and 320 kbit/s, and the available
sampling frequencies are 32, 44.1 and 48 kHz. A sample rate of 44.1 kHz is
almost always used, because this is also used for CD audio, the main source used
for creating MP3 files. A greater variety of bit rates are used on the Internet.
128 kbit/s is the most common, beause it typically offers adequate audio quality
in a relatively small space. 192 kbit/s is often used by those who notice
artifacts at lower bit rates. As the Internet bandwidth availability and hard
drive sizes have increased, 128 kbit/s bitrate files are slowly being replaced
with higher bitrates like 192 kbit/s, with some being encoded up to MP3's
maximum of 320 kbit/s. It is unlikely that higher bit rates will be popular with
any lossy audio codec as higher bit rates than 320 kbit/s encroach on the domain
of lossless codecs such as FLAC.
By contrast, uncompressed audio as stored on a compact disc has a bit rate of
1,411.2 kbit/s (16 bits/sample × 44100 samples/second × 2 channels / 1000
Some additional bit rates and sample rates were made available in the MPEG-2 and
the (unofficial) MPEG-2.5 standards: bit rates of 8, 16, 24, and 144 kbit/s and
sample rates of 8, 11.025, 12, 16, 22.05 and 24 kHz.
Note for Guitar Synthesizer
A guitar/synthesizer (also guitar synthesizer, guitar/synth, g-synth, synth
guitar, guitar-synth,or guitar synth) is any one of a number of systems
originally conceived to allow a guitar player to play synthesizers. MIDI guitar
is often used as a synonym for the field of guitar/synthesis or for a
guitar/synthesizer, but MIDI is not involved in every case.
Traditionally, synthesizers have a keyboard interface to allow a human to play
the instrument, but the human interface does not necessarily need to be a
keyboard, nor indeed is any human interface necessary. (See sound module.)
Because synthesizers generate sounds electronically, theoretically any sort of
input device can actuate them. A guitar/synthesizer provides an interface which
is familiar to a guitarist.
There are two main types of guitar/synthesizer: those which are real guitars
outfitted with additional gear to actuate a synthesizer, and those which are
guitar-like MIDI controllers. Both types have their advantages and
Some manufacturers of effects units market so-called guitar/synth pedals. These
effects use a variety of techniques to make a guitar sound more like a
synthesizer, but they aren't really guitar/synthesizers.
The earliest guitar/synthesizers were based on actual guitars. Roland
Corporation developed the earliest truly functioning guitar synth system: the
Roland GR-500, and remains a significant proponent for this paradigm of guitar
synthesis. Other notable manufacturers include(d) Arp, Terratec/Axon, Ibanez,
Casio and Yamaha Corporation. Guitar/synths in this category are the most
popular, and consist of the following components:
A guitar. The guitar is usually an electric guitar, but may also be an acoustic
A hexaphonic pickup (also called a divided pickup), which provides six distinct
outputs, one for each string.
A converter, which determines the pitch coming from each of the strings and
transmits this information to a synthesizer.
A synthesizer, which generates the intended note.
The hexaphonic pickup may be a separate component which can be mounted on almost
any guitar, or it may be built into the guitar as original equipment. The
earliest guitar/synths required the musician to use a proprietary guitar, which
was designed with an integrated hexaphonic pickup. Roland later developed its GK
line of pickups which allowed the pickup to be mounted onto any guitar. Today,
several guitar manufacturers, such as Godin, offer their guitar models with
integrated "RMC hexaphonic pickup and preamp system" which is compatible with
Roland guitar-synth hardware. The RMC pickup system uses a piezo crystal
technology built into the saddles of the guitar bridge that conducts the string
vibration. This vibration is transferred to be converted into either piezo
acoustic or 13 pin hexaphonic synth signal. Fender Instruments released their
version of the guitar synth coined "Roland-ready", a Fender Stratocaster that
directly integrates the Roland GK-2 harware.
The chief advantages of this type of system are:
The timbres of the guitar and synthesizer can be blended together at any ratio,
enabling the musician to play guitar alone, guitar and synthesizer, or
In many models, almost any guitar can be used.
This research is funded by the
National Science Foundation.
Human performance recorded using MP3 format
performance using Bocko's new compression
ASU scientists improve chip memory by stacking cells, Computer Program Traces Ancestry Using Anonymous DNA Samples, EtherNet/IP Performance Test Tool Enables Manufacturers to Predict the Performance of Data Communication System Machines, How Small Can Computers Get? Computing In A Molecule, Intel's New Breed of Chips: The chip maker tries to diversify with system-on-chip designs, MIT Researchers develop lecture search engine to aid students, Rensselaer Researcher Gets Firsthand View of Behind-the-Scenes Military Technology, Researcher revealed that Internet users give up privacy in exchange for trust, Researchers has demonstrated a highly efficient add-drop filter using a three-dimensional photonic crystal, Researchers say Software can now analyze your e-mails, Robot Enlisted to Spot Rare Woodpecker, Software-Defined Networking, The breakthroughs in superconductivity bring us to the threshold of a new age, Theoritical solution of supercomputers problem, U of N Reported Impact of Human Values to the Enlargement of Innovative Computer Technology, UCLA mathematician works to make virtual surgery a viable technology, Vanish : Self Destructing Digital Data