U.S. patent application number 14/231962 was filed with the patent office on 2014-07-31 for method for dynamically adjusting the spectral content of an audio signal.
The applicant listed for this patent is J. CRAIG OXFORD, D. MICHAEL SHIELDS, PATRICK TAYLOR. Invention is credited to J. CRAIG OXFORD, D. MICHAEL SHIELDS, PATRICK TAYLOR.
Application Number | 20140211967 14/231962 |
Document ID | / |
Family ID | 46327326 |
Filed Date | 2014-07-31 |
United States Patent
Application |
20140211967 |
Kind Code |
A1 |
OXFORD; J. CRAIG ; et
al. |
July 31, 2014 |
METHOD FOR DYNAMICALLY ADJUSTING THE SPECTRAL CONTENT OF AN AUDIO
SIGNAL
Abstract
A method for dynamically adjusting the spectral content of an
audio signal, which increases the harmonic content of said audio
signal, said method comprising translating an encoded digital
signal into data bands, creating a psychoacoustic model to identify
sections of said data bands that are deficient in harmonic quality,
analyzing the fundamental frequency and amplitude of said
harmonically deficient data bands, creating additional higher order
harmonics for said harmonically deficient data bands, adding said
higher order harmonics back to said encoded digital signal to form
a newly enhanced signal, inverse filtering said newly enhanced
signal, and converting said inverse filtered signal to an analog
waveform for consumption by the listener.
Inventors: |
OXFORD; J. CRAIG;
(NASHVILLE, TN) ; TAYLOR; PATRICK; (HUNTSVILLE,
TN) ; SHIELDS; D. MICHAEL; (ST. PAUL, MN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
OXFORD; J. CRAIG
TAYLOR; PATRICK
SHIELDS; D. MICHAEL |
NASHVILLE
HUNTSVILLE
ST. PAUL |
TN
TN
MN |
US
US
US |
|
|
Family ID: |
46327326 |
Appl. No.: |
14/231962 |
Filed: |
April 1, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13037207 |
Feb 28, 2011 |
8687818 |
|
|
14231962 |
|
|
|
|
11708452 |
Feb 20, 2007 |
7899192 |
|
|
13037207 |
|
|
|
|
11633908 |
Dec 5, 2006 |
|
|
|
11708452 |
|
|
|
|
60794293 |
Apr 22, 2006 |
|
|
|
Current U.S.
Class: |
381/101 |
Current CPC
Class: |
H03G 5/16 20130101; G10L
21/038 20130101; H04R 5/04 20130101 |
Class at
Publication: |
381/101 |
International
Class: |
H03G 5/16 20060101
H03G005/16 |
Claims
1. A method for modifying the spectral content of an audio signal,
comprising the steps of: identifying sections of an audio signal
that are deficient in harmonic quality; adding higher order
harmonics into said audio signal to form an enhanced signal; and
inverse filtering said enhanced signal.
2. The method of claim 1, wherein said audio signal is an encoded
digital signal.
3. The method of claim 1, wherein the step of identifying includes
the creation of a psychoacoustic model.
4. The method of claim 4, wherein the audio signal is first
translated into data bands, and the psychoacoustic model identifies
sections of the data bands that are deficient in harmonic
quality.
5. The method of claim 4, wherein the fundamental frequency and
amplitude of the harmonically deficient data bands are analyzed
prior to creating additional higher order harmonics for the
harmonically deficient data bands.
6. The method of claim 1, wherein the inverse-filtered enhanced
signal is a digital signal.
7. The method of claim 6, further comprising the step of converting
the inverse-filtered enhanced digital signal to an analog
waveform.
8. The method of claim 4, wherein said psychoacoustic model
incorporates several layers of harmonics to identify said deficient
data bands.
Description
[0001] This application is a continuation of and claims the benefit
of U.S. Utility application Ser. No. 13/037,207, now issued as U.S.
Pat. No. 8,687,818, filed Feb. 28, 2011, which is a continuation of
U.S. Utility application Ser. No. 11/708,452, filed Feb. 20, 2007,
which claims benefit of and priority to U.S. Provisional Patent
Application No. 60/794,293, filed Apr. 22, 2006, and also which is
a continuation-in-part application of U.S. Ser. No. 11/633,908,
filed Dec. 5, 2006, which claims benefit of and priority to U.S.
Provisional Patent Application No. 60/794,293, filed Apr. 22, 2006.
The specification, figures and complete disclosures of U.S.
Provisional Patent Application No. 60/794,293 and U.S. Utility
application Ser. Nos. 11/633,908; 11/653,510; 11/708,452; and
13/037,207 are incorporated herein by specific reference for all
purposes.
FIELD OF INVENTION
[0002] The present invention relates to a method for dynamically
adjusting the spectral content of a digital audio signal wherein
significant processing is performed to modify a signal's harmonic
content.
BACKGROUND OF THE INVENTION
[0003] Much audio is stored, distributed and processed in the
digital domain. Regardless of this fact, the audio must ultimately
be converted back to analog in order to be used. Many audio purists
resist the digitization of audio, preferring pure analog sources
such as LP recordings, which originate from analog master tapes.
This is because of inherent defects in what are termed "lossy
compression" and "lossless compression" in audio data compression.
In both lossy and lossless compression, information redundancy is
reduced, using methods such as coding, pattern recognition and
linear prediction to reduce the amount of information used to
describe the data. The idea behind lossy audio compression was to
use psychoacoustics to recognize that not all data in an audio
stream can be perceived by the human auditory system. Most lossy
compression reduces perceptual redundancy by first identifying
sounds which are considered perceptually insignificant. Typical
examples include high frequencies, or sounds that occur at the same
time as other louder sound, which are coded with decreased accuracy
or not coded at all.
[0004] However, reducing perceptual redundancy often does not
achieve sufficient compression for a particular application and
requires further lossy compression with a difference in quality
that is more readily perceived by the user. While the data
reduction is again guided by some model of how important the sound
is as perceived by the human ear, with the goal of efficiency and
optimized quality for the target data rate, the use of lossy
compression may result in a perceived reduction of the audio
quality that ranges from none to severe.
[0005] Currently, data removed during lossy compression cannot be
recovered by decompression. Additionally, audio quality is affected
when a file is decompressed and recompressed (generational losses)
which makes lossy compression unsuitable for storing the
intermediate results in professional audio engineering applications
but makes it very popular with end users (particularly MP3) since a
megabyte can store almost a minute's worth of music at adequate
quality.
[0006] Timbre or tone color is known in psychoacoustics as sound
quality or sound color. Timbre has been called "the
psychoacoustician's multidimensional wastebasket category" as it
can denote many apparently unrelated aspects of sound. McAdams, S.,
and Bregman, A. "Hearing Musical Streams," Comput. Music J. It
should be pointed out that the addition or restoration of harmonics
will have the effect of sharpening the rise of the leading edge of
transient signals, this is analogous to edge enhancement in video.
It has been observed that the rendering of the leading edge of
transient signals is a key element in the perception of tone color
or timbre and in the rapid identification of sounds. Thus restoring
the harmonics lost to audio compression also serves to restore
timbre resulting in a higher quality listening experience.
[0007] While this method is obviously useful for compressed digital
audio signals, it is also useful to enhance non-compressed digital
audio signals. This will result in a richer timbre or tone color to
the audio signal and an enhanced listening experience.
SUMMARY OF THE INVENTION
[0008] The present invention seeks to restore the perceptual and
emotional elements lost to technical process of audio processing.
The present invention uses a psychoacoustic model to translate an
encoded digital signal into data bands that are analyzed for
harmonic significance. A frequency analysis then is performed and
sections of sound that are deficient in harmonic quality are
identified. The sections are analyzed for their fundamental
frequency and amplitude. Additional signals of higher order
harmonics for the sections are created and the higher order
harmonics are added back to coded signal to form a newly enhanced
signal which is inverse filtered and converted to an analog
waveform for consumption by the listener.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] FIG. 1 represents a block diagram of the audio enhancement
process.
[0010] FIG. 2 shows a block diagram of the memory elements of
proposed harmonic enhancement process.
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
[0011] Common digital audio standards such as MPEG-1 (Layers
I-III), MPEG-2, Microsoft Windows Media audio, PAC, ATRAC, and
others use a variety of encoding techniques to quantize and produce
digital representations of analog acoustic sources. The sampling
and encoding of audio is performed according to complex
psychoacoustic models of human auditory perception in conjunction
with data reduction schemes to produce a coded audio signal which
can be decoded with less sophisticated circuitry to produce a
stereophonic audio signal. Limitations bandwidth and bit rate
requirements for the storage and transmission of digital data
dictate the use inherently lossy coding algorithms. The purpose of
the psychoacoustic model is to take advantage of the fact that the
human auditory system can detect sound information up to certain
thresholds and the presence of certain sounds can influence the
ability of the brain to detect and perceive other sounds. The
overall amount of data can be reduced by not encoding the audio
signals that would be masked from the perception of the listener.
For this reason, this family of encoding schemes is referred to as
perceptual encoding.
[0012] Perceptual coding commonly works by separating an incoming
audio signal into groups of bands that are compared to the
psychoacoustic model. Those signals that are above the auditory
threshold are quantized and passed through the encoding chain. The
signals below the masking threshold are discarded, and all
information from those samples is destroyed. The net effect is a
final audio signal that is representative of the original analog
source but that is inherently incomplete. Some the information that
is lost in the perceptual coding processes is the some of the most
important information necessary to retain the richness of the
original analog recording. One of the major reasons for the effect
is that fact that most psychoacoustic models are created and tested
using static, non-organic sounds such as steady sinusoidal tones.
The tones are produced at varying amplitudes and frequencies to
determine the clinical ranges of human audio perception. Models,
however, do not incorporate the complex and often unpredictable
response of the ear to complex changing stimuli such as musical
recordings which incorporate the perception of several layers of
harmonics. The resulting digital signals are often described as
being technically precise, but lacking in perceptual depth.
[0013] The present invention is designed to enhance a pre-produced
digital audio signal to produce a more musically convincing product
for the listener. The digital damage done to the audio signal in
the form of quantization noise, and the information lost during the
original recording encoding cannot be directly recovered during the
decoding process. It is therefore necessary to create a set of
processing techniques and algorithms that will work in conjunction
with previously established decoding standards to produce a new
enhanced output signal.
[0014] The DSP implementation, as shown in FIG. 1, involves the use
of a harmonic analyzer to examine the existing encoded data. In
order to minimize the amount of digital noise from further data
conversions, the encoded data is reevaluated after the audio stream
has passed through the demultiplexing and error checking processes
of the decoder. The subbands of digital data are windowed and
scaled at values appropriate for the harmonic analysis. A
filterbank is applied to the newly reconstructed bands of data, and
an enhanced audio signal is created.
[0015] The psychoacoustic analyzer dynamically examines the decoded
sub bands of data with adaptive sample windowing to account for the
differences in window size necessary to accurately detect transient
audio information and frequency dependent audio information. A
buffer, as shown in FIG. 2, is used to store sequential window
information for dynamic analysis. In each sample window, the
fundamental frequency of the incoming signal is determined and a
series of supplementary signals is created at multiples of the
detected fundamental frequency. The supplementary signals have
decreasingly large amplitudes as they are created. The original
signal and the artificially created harmonic implements are merged
together and placed in a buffer for distribution to inverse
filterbanks for the final creation of the analog output signal.
[0016] The psychoacoustic model used in the harmonic analysis is
designed based upon the responsiveness of the human ear to harmonic
stimulation. For the sake of audio reproduction, the preferred
embodiment of the new psychoacoustic model is to use musical
influences as the test and effectiveness criteria for the design.
In this psychoacoustic model instead of using static, non-organic
sounds such as steady sinusoidal tones, the complexity of musical
influences are used and would incorporate several layers of
harmonics
[0017] Thus, it should be understood that the embodiments and
examples described herein have been chosen and described in order
to best illustrate the principles of the invention and its
practical applications to thereby enable one of ordinary skill in
the art to best utilize the invention in various embodiments and
with various modifications as are suited for particular uses
contemplated. Even though specific embodiments of this invention
have been described, they are not to be taken as exhaustive. There
are several variations that will be apparent to those skilled in
the art.
* * * * *