U.S. patent application number 13/909996 was filed with the patent office on 2013-12-05 for acoustic signature matching of audio content.
The applicant listed for this patent is Microsoft Corporation. Invention is credited to Damien Auroux, Manuel Millot, Michael Oneppo.
Application Number | 20130325888 13/909996 |
Document ID | / |
Family ID | 49671426 |
Filed Date | 2013-12-05 |
United States Patent
Application |
20130325888 |
Kind Code |
A1 |
Oneppo; Michael ; et
al. |
December 5, 2013 |
ACOUSTIC SIGNATURE MATCHING OF AUDIO CONTENT
Abstract
Various embodiments relating to identifying an acoustic
signature of an audio content item are provided. In one embodiment,
an audio subsample of a test audio content item may be compared
with corresponding audio subsamples of each of a plurality of
catalog audio content items. If the audio subsample of the test
audio content item matches the corresponding audio samples of two
or more catalog audio content items, those catalog audio content
items may be selected as candidate audio content items. A complete
audio sample of the test audio content item may be compared to
corresponding complete audio samples of each of the candidate audio
content items. One of the candidate audio content items may be
selected as a matching audio content item.
Inventors: |
Oneppo; Michael; (Paris,
FR) ; Millot; Manuel; (Saint-Germain-en-Laye, FR)
; Auroux; Damien; (Paris, FR) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Microsoft Corporation |
Redmond |
WA |
US |
|
|
Family ID: |
49671426 |
Appl. No.: |
13/909996 |
Filed: |
June 4, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61655406 |
Jun 4, 2012 |
|
|
|
Current U.S.
Class: |
707/758 |
Current CPC
Class: |
G06F 16/148 20190101;
G06Q 30/0256 20130101; H04L 65/4084 20130101; G06F 16/4387
20190101; H04L 65/60 20130101; G06Q 50/184 20130101; G06F 16/27
20190101; G06F 16/178 20190101 |
Class at
Publication: |
707/758 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A method of identifying an acoustic signature of a test audio
content item, the method comprising: comparing an audio subsample
of the test audio content item with corresponding audio subsamples
of each of a plurality of catalog audio content items; if the audio
subsample of the test audio content item matches the corresponding
audio subsamples of two or more catalog audio content items,
selecting those catalog audio content items as candidate audio
content items; comparing a complete audio sample of the test audio
content item to corresponding complete audio samples of each of the
candidate audio content items; and selecting one of the candidate
audio content items as a matching audio content item.
2. The method of claim 1, wherein, if the complete audio sample of
the test audio content item matches the corresponding complete
audio sample of only one of the candidate audio content items, then
that candidate audio content item is selected as the matching audio
content item.
3. The method of claim 1, wherein, if the complete audio sample of
the test audio content item matches corresponding complete audio
samples of two or more candidate audio content items, then one of
the two or more candidate audio content items is selected as the
matching audio content item based on secondary matching
criteria.
4. The method of claim 3, wherein the secondary matching criteria
includes metadata.
5. The method of claim 1, further comprising: if the audio
subsample of the test audio content item matches the corresponding
audio subsample of only one catalog audio content item, selecting
that catalog audio content item as the matching audio content
item.
6. The method of claim 1, further comprising: if the complete audio
sample of the test audio content item does not match corresponding
audio samples of any of the candidate audio content items,
reporting that the test audio content item does not match any of
the catalog audio content items.
7. The method of claim 6, further comprising: converting the test
audio content item to a catalog audio content item.
8. The method of claim 1, wherein the audio subsample of the test
audio content item and the corresponding audio subsamples of the
catalog audio content items have a same duration and a same
temporal position.
9. The method of claim 1, wherein the complete audio sample is an
entire audio duration of the test audio content item.
10. A method of identifying an acoustic signature of a test audio
content item, the method comprising: comparing an audio subsample
of the test audio content item with corresponding audio subsamples
of each of a plurality of catalog audio content items; if the audio
subsample of the test audio content item matches the corresponding
audio subsample of only one catalog audio content item, selecting
that catalog audio content item as a matching audio content item;
if the audio subsample of the test audio content item matches the
corresponding audio samples of two or more catalog audio content
items, selecting those catalog audio content items as candidate
audio content items; comparing a complete audio sample of the test
audio content item to corresponding complete audio samples of each
of the candidate audio content items; and if the complete audio
sample of the test audio content item matches the second
corresponding complete audio sample of only one candidate audio
content item, selecting that candidate audio content item as the
matching audio content item.
11. The method of claim 10, further comprising: if the complete
audio sample of the test audio content item matches corresponding
complete audio samples of two or more candidate audio content
items, selecting one of the two or more candidate audio content
items as the matching audio content item based on secondary
matching criteria.
12. The method of claim 11, wherein the secondary matching criteria
includes metadata.
13. The method of claim 10, further comprising: if the audio
subsample of the test audio content item does not match
corresponding audio sub samples of any of the catalog audio content
items, reporting that the test audio content item does not match
any of the catalog audio content items and converting the test
audio content item to a catalog content item.
14. The method of claim 10, further comprising: if the complete
audio sample of the test audio content item does not match
corresponding complete audio samples of any of the candidate audio
content items, reporting that the test audio content item does not
match any of the catalog audio content items; and converting the
test audio content item to a catalog content item.
15. The method of claim 10, wherein the audio subsample of the test
audio content item and the corresponding audio subsamples of the
catalog audio content items have a same duration and a same
temporal position.
16. The method of claim 10, wherein the complete audio sample is an
entire audio duration of the test audio content item.
17. A method of identifying an acoustic signature of a test audio
content item, the method comprising: receiving a hash file of the
test audio content item; if the hash file identifies a catalog
audio content item of a plurality of catalog audio content items,
selecting that catalog audio content item as a matching audio
content item; if the hash file does not identify a catalog audio
content item, comparing an audio subsample of the test audio
content item with corresponding audio subsamples of each of the
plurality of catalog audio content items; if the audio subsample of
the test audio content item matches the corresponding audio
subsample of only one catalog audio content item, selecting that
catalog audio content item as a matching audio content item; if the
audio subsample of the test audio content item matches the
corresponding audio samples of two or more catalog audio content
items, selecting those catalog audio content items as candidate
audio content items; comparing a complete audio sample of the test
audio content item to corresponding complete audio samples of each
of the candidate audio content items; and if the complete audio
sample of the test audio content item matches the second
corresponding complete audio sample of only one candidate audio
content item, selecting that candidate audio content item as the
matching audio content item.
18. The method of claim 17, further comprising: if the complete
audio sample of the test audio content item matches corresponding
complete audio samples of two or more candidate audio content
items, selecting one of the two or more candidate audio content
items that has metadata that most closely matches corresponding
metadata of the test audio content item as the matching audio
content item.
19. The method of claim 17, wherein the audio subsample of the test
audio content item and the corresponding audio subsamples of the
catalog audio content items have a same duration and a same
temporal position.
20. The method of claim 17, wherein the complete audio sample is an
entire audio duration of the test audio content item.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Patent
Application No. 61/655,406, filed Jun. 4, 2012 and entitled
MULTI-SCREEN MEDIA DELIVERY, the entirety of which is hereby
incorporated herein by reference for all purposes.
BACKGROUND
[0002] Various applications may employ acoustic audio
identification to identify audio content, such as to provide
identifying information for an unknown song. Existing approaches
for providing acoustic audio identification (a.k.a., acoustic
fingerprinting) typically analyze a small portion (e.g., 15
seconds) of an audio file, because analyzing the entire audio file
can be an expensive process, both in regard to processing resources
and other system costs. However, because these approaches only
analyze a small portion of an audio file, in cases where different
audio files have only very minor differences, such audio files
cannot be easily differentiated and identified. For example, such
acoustic fingerprinting approaches may be incapable of
differentiating between an explicit version and a censored version
of the same song, as the analyzed portion of the songs may be the
same, but other portions of the song may be different.
SUMMARY
[0003] This Summary is provided to introduce a selection of
concepts in a simplified form that are further described below in
the Detailed Description. This Summary is not intended to identify
key features or essential features of the claimed subject matter,
nor is it intended to be used to limit the scope of the claimed
subject matter. Furthermore, the claimed subject matter is not
limited to implementations that solve any or all disadvantages
noted in any part of this disclosure.
[0004] Various embodiments relating to identifying an acoustic
signature of an audio content item are provided. In one embodiment,
an audio subsample of a test audio content item may be compared
with corresponding audio subsamples of each of a plurality of
catalog audio content items. If the audio subsample of the test
audio content item matches the corresponding audio subsamples of
two or more catalog audio content items, those catalog audio
content items may be selected as candidate audio content items. A
complete audio sample of the test audio content item may be
compared to corresponding complete audio samples of each of the
candidate audio content items. One of the candidate audio content
items may be selected as a matching audio content item.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] FIG. 1 shows a computing system according to an embodiment
of the present disclosure.
[0006] FIGS. 2-3 show a method of identifying an acoustic signature
of a test audio content item according to an embodiment of the
present disclosure.
[0007] FIG. 4 shows a computing system according to an embodiment
of the present disclosure.
DETAILED DESCRIPTION
[0008] This description relates to identifying an acoustic
signature of an audio content item (a.k.a., acoustic
fingerprinting). More particularly, this description relates to an
acoustic fingerprinting approach that includes a two-pass process
to identify an audio content item by comparing the audio content
item (referred to herein as the `test audio content item`) to a
plurality of audio content items in a catalog (referred to herein
as `catalog audio content items`). In particular, the first pass
may include comparing an audio subsample (e.g., a 15 second clip)
of a test audio content item with corresponding audio subsamples of
each of a plurality of catalog audio content items. If the audio
subsample of the test audio content item matches the corresponding
audio subsamples of two or more catalog audio content items, those
catalog audio content items may be selected as candidate audio
content items. The second pass may include comparing a complete
audio sample of the test audio content item to corresponding
complete audio samples of each of the candidate audio content
items. One of the candidate audio content items may be selected as
a matching audio content item based on matching criteria.
[0009] By identifying candidates in the first step and comparing
complete audio samples in the second step, accuracy of acoustic
identification may be increased relative to an approach that merely
compares audio subsamples. Moreover, processing performance may be
increased relative to an approach that compares a complete audio
sample of a test audio content item to complete audio samples of
all catalog audio content items.
[0010] FIG. 1 shows a computing system 100 in accordance with an
embodiment of the present disclosure. The computing system 100
includes a plurality of client computing machines (represented by a
client computing machine 102 (referred to herein as the `client`).
The plurality of client computing machines may be in communication
with an audio identification service computing machine 110
(referred to herein as the `audio identification service`) over a
network 108, such as the Internet. In particular, the clients may
send requests to the audio identification service to acoustically
fingerprint or identify different audio content items. Further,
other services may send requests to the audio identification
service to acoustically fingerprint or identify different audio
content items, such as a music management service or the like.
Moreover, it is to be understood that the audio identification
service may acoustically fingerprint or identify an audio content
item without a request from another entity. In other words, the
audio identification service may initiate acoustic identification
of an audio content item.
[0011] It should be understood that virtually any number of
different clients may be in communication with the audio
identification service without departing from the scope of this
disclosure. Non-limiting examples of clients may include desktop
computers, laptop computers, smart phones, tablet computers, gaming
consoles, set-top boxes, networked televisions, networked stereos,
mobile devices, and any other suitable computing machine.
[0012] The client 102 includes a library 104 of audio content
items. The library of audio content items may include any suitable
type of audio content item including any suitable audio file or
audio recording, such as a song or other music, audio book or other
spoken word, sound effect, movie with audio component, etc. The
library of audio content items may include any suitable number of
audio content items. For example, the library may include a
collection of songs that a user has purchased via an online
marketplace, uploaded from compact discs or other media, or
otherwise acquired. In some embodiments, the library may be at
least partially stored locally at the client. In some embodiments,
the library may be stored remotely from the client, and may be
accessed by the client (e.g., the library may include pointers that
point to remote storage locations of corresponding audio content
items). Although the fingerprinting and identification concepts are
described in the context of sound and acoustic signatures, it is to
be understood that these concepts are broadly applicable to visual
or video fingerprinting or identification. In some embodiments, the
identification service may be configured to visually fingerprint or
identify video content or other imagery additionally or instead of
fingerprinting or identifying audio content.
[0013] The library 104 may include a test audio content item 106
that may be acoustically fingerprinted or identified by the audio
identification service. The test audio content item may be
representative of any audio content item in the library. It is to
be understood that the test audio content item may be acoustically
fingerprinted or identified for any suitable reason or as part of
any suitable operation without departing from the scope of the
present disclosure. For example, the test audio content item may be
acoustically identified as part of a copyright compliance,
licensing, or other appropriate scheme.
[0014] The audio identification service 110 may be configured to
acoustically fingerprint or identify an audio signature of a test
audio content item from a client. More particularly, the audio
identification service may be configured to perform a two-pass
identification process that increases the accuracy of acoustic
identification while maintaining good processing performance.
[0015] In some embodiments, the audio identification service may
receive at least some portion of the test audio content item (e.g.,
song bits or data) from the client, such as a subsample or complete
sample of the test audio content item, and the audio identification
service may generate the acoustic fingerprint to perform the
identification analysis. In some embodiments, acoustic fingerprints
of the subsample and/or the complete sample of the test audio
content item may be generated locally at the client, and the
acoustic fingerprints may be sent to the audio identification
service to perform the identification analysis. In some
embodiments, the audio identification service may be integrated
with the client and the acoustic fingerprinting and identification
analysis may be performed locally at the client.
[0016] The audio identification service 110 may include a catalog
112 including a plurality of catalog audio content items 114. The
catalog may include any suitable type of audio content item
including any suitable audio file or audio recording, such as a
song or other music, audio book or other spoken word, sound effect,
etc. For example, the catalog may include different versions of the
same song that may have minimal differences (e.g., an explicit
version and a censored version), and may be treated as different
audio content items. As another example, the catalog may include
different versions of the same song that are acoustically
identical, but have other differences, such as different metadata,
licensing, etc. These acoustically identical songs may be treated
as different audio content items.
[0017] The catalog of audio content items may include any suitable
number of audio content items. Generally, the catalog may include
an entire collection of audio content items, whereas a client's
library may include a subset of the collection of audio content
items. However, in some embodiments, the catalog and a client's
library may have the same collection of audio content items.
Further, in some embodiments, a client's library may include one or
more audio content items that are not included in the catalog, or
that may be added to the catalog upon performing an acoustic
identification as a test audio content item.
[0018] The audio identification service may be configured to
identify an acoustic signature of a test audio content item. For
example, the audio identification service may be configured to
receive an identification of the test audio content item from a
client. In one example, the identification may include metadata
associated with the test audio content item. For example, the
metadata may include an artist, an album, a song title, a duration,
a file name, a folder name, a track number, a release year, or any
other suitable information to identify the audio content item. In
another example, the audio identification service may be configured
to receive a hash file of the test audio content item. The hash
file may be used to determine if the test audio content item has
been previously acoustically identified by the audio identification
service. If the hash file identifies a catalog audio content item,
the audio identification service may be configured to select that
catalog audio content item as a matching audio content item that
matches an acoustic signature of the test audio content item. If
the hash file identifies a catalog audio content item as a matching
audio content item, the identification process does not have to be
performed, because it has been performed previously for the test
audio content item. If the hash file does not identify a catalog
audio content item, the audio identification service may continue
with the identification process. It is to be understood that the
hash file comparison does not involve a comparison of acoustic
signatures or fingerprints. For example, the hash file may include
a relationship between the identification of the test audio content
item and a matching catalogue audio content item. This relationship
may be used to look up a matching audio content item.
[0019] The audio identification service may be configured to
compare an audio subsample of the test audio content item with
corresponding audio subsamples of each of the plurality of catalog
audio content items in the catalog. For example, the audio
subsample and the corresponding audio subsamples may have a same
duration and a same temporal position or offset. In one particular
example, an audio subsample may have a duration of 15 seconds, and
may be offset 1 minute from the beginning of a track. In some
embodiments, the audio subsample may be predefined. In some
embodiments, the audio subsample may be received from the client.
In some embodiments, the test audio content item may be
acoustically analyzed to identify an audio subsample that may be
individualized or unique in order to reduce a possibility of
misidentifying the test audio content item.
[0020] It is to be understood that the audio subsamples may be
compared according to any suitable acoustic fingerprinting or
identification technology without departing from this description.
For example, an acoustic fingerprinting comparison and
identification algorithm may take into account perceptual audio
characteristics of an audio content item. In other words, when
comparing acoustic fingerprints of two audio content items, if two
audio content items sound alike to the human ear, their acoustic
fingerprints should match, even if their bitwise representations
are quite different. For example, a comparison of acoustic
fingerprints may contemplate perceptual characteristics such as
average zero crossing rate, estimated tempo, average spectrum,
spectral flatness, prominent tones across a set of bands, and
bandwidth. Furthermore, differences in frequency, amplitude, and/or
other parameters may be considered by a comparison of acoustic
fingerprints.
[0021] In some embodiments, two audio content items may be
determined to match if one or more of the above characteristics of
each of the two audio content items are within a corresponding
threshold value of each other. In other words, two acoustic
signatures may be determined to match if relative changes of given
characteristics are reasonably the same even if absolute values are
slightly different. In some embodiments, two acoustic signatures
may be determined to match if any suitable portions of the two
samples match within a threshold value. For example, if an acoustic
signature of a sample temporally positioned at 11-15 seconds of a
test song matches of an acoustic signature of a sample temporally
positioned at 9-13 seconds of a catalog song, then the two songs
may be determined to be matching. Such determinations may allow for
small changes in data, such as timing shifts or changes in other
characteristics.
[0022] In some embodiments, a parameter of the test audio content
item may be used to reduce a number of catalog audio content items
involved in the comparison of the audio subsample of the test audio
content item with audio subsamples of the catalog audio content
items. For example, the number of catalog audio content items may
be reduced based on a duration (e.g., the duration of the complete
sample) of the test audio content item. In particular, catalogue
audio content items that have a duration that is different from the
duration of the test audio content item by more than a threshold
value may be omitted from the comparison of the audio subsamples.
It is to be understood that any suitable parameter may be used to
narrow down the number of catalog audio content items involved in
the first pass comparison of audio subsamples. Accordingly, the
first pass comparison may be performed more quickly than a
comparison that involves all catalogue audio content items.
[0023] In some embodiments, if the audio subsample of the test
audio content item does not match any of the corresponding audio
subsamples of the catalog audio content items, the audio
identification service may be configured to report that the test
audio content item does not match any of the catalog audio content
items. In some embodiments, the audio identification service may be
configured to convert the test audio content item to a catalog
audio content item. In other words, the test audio content item may
be added to the catalog. In some embodiments, if the audio
subsample of the test audio content item does not match any of the
corresponding audio subsamples of the catalog audio content items,
another audio subsample of the test audio content item may be
compared to corresponding audio subsamples of the catalog audio
content items. For example, the audio subsample may have a
different duration (e.g., the duration may be increased) or a
different temporal position (e.g., the audio subsample may start 2
minutes from a beginning of the track).
[0024] If the audio subsample of the test audio content item
matches the corresponding audio subsample of only one catalog audio
content item, the audio identification service may be configured to
select that catalog audio content item as a matching audio content
item that matches an acoustic signature of the test audio content
item.
[0025] If the audio subsample of the test audio content item
matches the corresponding audio samples of two or more catalog
audio content items, the audio identification service may be
configured to select those catalog audio content items as candidate
audio content items. The audio identification service may be
configured to compare a complete audio sample of the test audio
content item to corresponding complete audio samples of each of the
candidate audio content items. For example, the complete audio
sample may be an entire audio duration of the test audio content
item.
[0026] If the complete audio sample of the test audio content item
does not match corresponding audio samples of any of the candidate
audio content items, the audio identification service may be
configured to report that the test audio content item does not
match any of the catalog audio content items. In some embodiments,
the audio identification service may be configured to convert the
test audio content item to a catalog audio content item. In other
words, the test audio content item may be added to the catalog.
[0027] If the complete audio sample of the test audio content item
matches the second corresponding complete audio sample of only one
candidate audio content item, the audio identification service may
be configured to select that candidate audio content item as the
matching audio content item that matches an acoustic signature of
the test audio content item.
[0028] If the complete audio sample of the test audio content item
matches corresponding complete audio samples of two or more
candidate audio content items, the audio identification service may
be configured to select one of the two or more candidate audio
content items that has metadata that most closely matches
corresponding metadata of the test audio content item as the
matching audio content item. It is to be understood that any
desired metadata may be used as the secondary matching criteria to
select the matching audio content item. For example, if two or more
candidate audio content items are acoustically identical, the
version that has a license to be played in a region associated with
the client may be selected as the matching audio content item. In
some embodiments, information other than metadata may be used as
secondary matching criteria to select a candidate as a matching
audio content item.
[0029] In some embodiments, the audio identification service may be
configured to select one of the candidate audio content items as a
matching audio content item based on matching criteria. For
example, the matching criteria may include the above described
cases. In another example, if none of the candidates match the test
audio content item, then a next closest item may be selected as a
matching audio content item. For example, if the test audio content
item is an explicit version of a song is not included in the
catalog, then a censored version of the same song that is included
in the catalog may be provided as a matching song.
[0030] In some embodiments, the audio identification service may be
stateless. For example, instead of keeping temporary results of a
comparison of audio subsamples on a server side, the results may be
returned to the client with the first pass results, and sent back
to the server with a second pass request. Moreover, if the audio
identification service identifies candidate audio content items,
the candidates may be reported to the client. In some embodiments,
information from a given identification session or routine may be
stored in a cache shared across instances at the audio
identification service. For example, when a service instance
replies to a first-pass with a response requesting a second-pass, a
requestID and intermediate results (e.g., candidates provided from
the audio subsample comparison) may be stored in the shared cache
(e.g., with expiration after a few minutes). Subsequently, when
another service instance receives the corresponding second-pass
request, the service instance may look up the requestID in the
shared cache and retrieve the intermediate results in order to
process the second pass.
[0031] FIGS. 2-3 show a method 200 of identifying an acoustic
signature of a test audio content item according to an embodiment
of the present disclosure. For example, the method 200 may be
performed by the audio identification service computing machine 110
shown in FIG. 1.
[0032] At 202, the method 200 may include receiving a hash file of
the test audio content item.
[0033] At 204, the method 200 may include determining whether the
hash file identifies a catalog audio content item of a plurality of
catalog audio content items. If the hash file identifies a catalog
audio content item of the plurality of catalog audio content items,
then the method 200 moves to 206. Otherwise, the method 200 moves
to 208.
[0034] At 206, the method 200 may include selecting the catalog
audio content item identified from the hash file as a matching
audio content item, and returning to other operations.
[0035] At 208, the method 200 may include comparing an audio
subsample of the test audio content item with corresponding audio
subsamples of each of the plurality of catalog audio content
items.
[0036] At 210, the method 200 may include determining how many
audio subsamples of catalog audio content items match the audio
subsample of the test audio content item. If the audio subsample of
the test audio content item does not match corresponding audio
subsamples of any of the catalog audio content items, then the
method 200 moves to 222. If the audio subsample of the test audio
content item matches the corresponding audio subsample of only one
catalog audio content item, then the method 200 moves to 220. If
the audio subsample of the test audio content item matches the
corresponding audio samples of two or more catalog audio content
items, then the method 200 moves to 212.
[0037] At 212, the method 200 may include selecting catalog audio
content items that have matching audio subsamples as candidate
audio content items.
[0038] At 214, the method 200 may include comparing a complete
audio sample of the test audio content item to corresponding
complete audio samples of each of the candidate audio content
items.
[0039] At 216 of FIG. 3B, the method 200 may include determining
how many complete audio samples of catalog audio content items
match the complete audio sample of the test audio content item. If
the complete audio sample of the test audio content item does not
match corresponding complete audio samples of any of the catalog
audio content items, then the method 200 moves to 222. If the
complete audio sample of the test audio content item matches the
corresponding complete audio sample of only one catalog audio
content item, then the method 200 moves to 220. If the complete
audio sample of the test audio content item matches the
corresponding complete audio sample of two or more catalog audio
content items, then the method 200 moves to 218.
[0040] At 218, the method 200 may include selecting one of the two
or more candidate audio content items as the matching audio content
item based on secondary matching criteria, and returning to other
operations. For example, the secondary matching criteria may
include selecting one of the candidate audio content items that has
metadata that most closely matches corresponding metadata of the
test audio content item as the matching audio content item. It is
to be understood that any suitable secondary matching criteria may
be employed to select a candidate audio content items as a matching
audio content item.
[0041] At 220, the method 200 may include selecting a catalog audio
content item that has been identified as having the only matching
audio subsample or the only matching complete audio sample as the
matching audio content item, and returning to other operations.
[0042] At 222, the method 200 may include reporting that the test
audio content item does not match any of the catalog audio content
items.
[0043] At 224, the method 200 may include converting the test audio
content item to a catalog audio content item. In other words, the
test audio content item may be added to the catalog as a catalog
audio content item.
[0044] By identifying candidates in the first step and comparing
complete audio samples in the second step, accuracy of acoustic
identification may be increased relative to an approach that merely
compares audio subsamples. Moreover, processing performance may be
increased relative to an approach that compares a complete audio
sample of a test audio content item to complete audio samples of
all catalog audio content items.
[0045] In some embodiments, the methods and processes described
herein may be tied to a computing system of one or more computing
devices. In particular, such methods and processes may be
implemented as a computer-application program or service, an
application-programming interface (API), a library, and/or other
computer-program product.
[0046] FIG. 4 schematically shows a non-limiting embodiment of a
computing system 400 that can enact one or more of the methods and
processes described above. For example, computing system 400 may be
representative of the client computing machine 102 or the audio
identification service computing machine 110 shown in FIG. 1.
Computing system 400 is shown in simplified form. Computing system
400 may take the form of one or more personal computers, server
computers, tablet computers, home-entertainment computers, network
computing devices, gaming devices, mobile computing devices, mobile
communication devices (e.g., smart phone), and/or other computing
devices.
[0047] Computing system 400 includes a logic machine 402 and a
storage machine 404. Computing system 400 may optionally include a
display subsystem 406, input subsystem 408, communication subsystem
410, and/or other components not shown in FIG. 4.
[0048] Logic machine 402 includes one or more physical devices
configured to execute instructions. For example, the logic machine
may be configured to execute instructions that are part of one or
more applications, services, programs, routines, libraries,
objects, components, data structures, or other logical constructs.
Such instructions may be implemented to perform a task, implement a
data type, transform the state of one or more components, achieve a
technical effect, or otherwise arrive at a desired result.
[0049] The logic machine may include one or more processors
configured to execute software instructions. Additionally or
alternatively, the logic machine may include one or more hardware
or firmware logic machines configured to execute hardware or
firmware instructions. Processors of the logic machine may be
single-core or multi-core, and the instructions executed thereon
may be configured for sequential, parallel, and/or distributed
processing. Individual components of the logic machine optionally
may be distributed among two or more separate devices, which may be
remotely located and/or configured for coordinated processing.
Aspects of the logic machine may be virtualized and executed by
remotely accessible, networked computing devices configured in a
cloud-computing configuration.
[0050] Storage machine 404 includes one or more physical devices
configured to hold instructions executable by the logic machine to
implement the methods and processes described herein. When such
methods and processes are implemented, the state of storage machine
404 may be transformed--e.g., to hold different data.
[0051] Storage machine 404 may include removable and/or built-in
devices. Storage machine 404 may include optical memory (e.g., CD,
DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., RAM,
EPROM, EEPROM, etc.), and/or magnetic memory (e.g., hard-disk
drive, floppy-disk drive, tape drive, MRAM, etc.), among others.
Storage machine 404 may include volatile, nonvolatile, dynamic,
static, read/write, read-only, random-access, sequential-access,
location-addressable, file-addressable, and/or content-addressable
devices.
[0052] It will be appreciated that storage machine 404 includes one
or more physical devices. However, aspects of the instructions
described herein alternatively may be propagated by a communication
medium (e.g., an electromagnetic signal, an optical signal, etc.)
that is not held by a physical device for a finite duration.
[0053] Aspects of logic machine 402 and storage machine 404 may be
integrated together into one or more hardware-logic components.
Such hardware-logic components may include field-programmable gate
arrays (FPGAs), program- and application-specific integrated
circuits (PASIC/ASICs), program- and application-specific standard
products (PSSP/ASSPs), system-on-a-chip (SOC), and complex
programmable logic devices (CPLDs), for example.
[0054] It will be appreciated that a "service", as used herein, is
an application program executable across multiple user sessions. A
service may be available to one or more system components,
programs, and/or other services. In some implementations, a service
may run on one or more server-computing devices.
[0055] When included, display subsystem 406 may be used to present
a visual representation of data held by storage machine 404. This
visual representation may take the form of a graphical user
interface (GUI). As the herein described methods and processes
change the data held by the storage machine, and thus transform the
state of the storage machine, the state of display subsystem 406
may likewise be transformed to visually represent changes in the
underlying data. Display subsystem 406 may include one or more
display devices utilizing virtually any type of technology. Such
display devices may be combined with logic machine 402 and/or
storage machine 404 in a shared enclosure, or such display devices
may be peripheral display devices.
[0056] When included, input subsystem 408 may comprise or interface
with one or more user-input devices such as a keyboard, mouse,
touch screen, or game controller. In some embodiments, the input
subsystem may comprise or interface with selected natural user
input (NUI) componentry. Such componentry may be integrated or
peripheral, and the transduction and/or processing of input actions
may be handled on- or off-board. Example NUI componentry may
include a microphone for speech and/or voice recognition; an
infrared, color, stereoscopic, and/or depth camera for machine
vision and/or gesture recognition; a head tracker, eye tracker,
accelerometer, and/or gyroscope for motion detection and/or intent
recognition; as well as electric-field sensing componentry for
assessing brain activity.
[0057] When included, communication subsystem 410 may be configured
to communicatively couple computing system 400 with one or more
other computing devices. Communication subsystem 410 may include
wired and/or wireless communication devices compatible with one or
more different communication protocols. As non-limiting examples,
the communication subsystem may be configured for communication via
a wireless telephone network, or a wired or wireless local- or
wide-area network. In some embodiments, the communication subsystem
may allow computing system 400 to send and/or receive messages to
and/or from other devices via a network such as the Internet.
[0058] It will be understood that the configurations and/or
approaches described herein are exemplary in nature, and that these
specific embodiments or examples are not to be considered in a
limiting sense, because numerous variations are possible. The
specific routines or methods described herein may represent one or
more of any number of processing strategies. As such, various acts
illustrated and/or described may be performed in the sequence
illustrated and/or described, in other sequences, in parallel, or
omitted Likewise, the order of the above-described processes may be
changed.
[0059] The subject matter of the present disclosure includes all
novel and nonobvious combinations and subcombinations of the
various processes, systems and configurations, and other features,
functions, acts, and/or properties disclosed herein, as well as any
and all equivalents thereof.
* * * * *