U.S. patent application number 17/184165 was filed with the patent office on 2021-06-17 for information processing method and apparatus, and storage medium.
This patent application is currently assigned to Tencent Technology (Shenzhen) Company Limited. The applicant listed for this patent is Tencent Technology (Shenzhen) Company Limited. Invention is credited to Zhaopeng TU, Xing WANG, Baosong YANG.
Application Number | 20210182501 17/184165 |
Document ID | / |
Family ID | 1000005458857 |
Filed Date | 2021-06-17 |
United States Patent
Application |
20210182501 |
Kind Code |
A1 |
TU; Zhaopeng ; et
al. |
June 17, 2021 |
INFORMATION PROCESSING METHOD AND APPARATUS, AND STORAGE MEDIUM
Abstract
Embodiments of this disclosure disclose an information
processing method, apparatus and a non-transitory computer readable
medium. The method includes: obtaining a target text sequence
corresponding to to-be-processed text information; obtaining a
context vector according to the target text sequence; determining a
logical similarity corresponding to the target text sequence
according to the context vector and the target text sequence; and
encoding the target text sequence corresponding to target text
information by using the logical similarity to obtain a text
encoding result. In this embodiment of this disclosure, a context
vector related to a discrete sequence is used to encode the
discrete sequence, to strengthen the dependence between elements in
the discrete sequence, thereby enhancing the performance of a
neural network model and improving the learning capability of the
model.
Inventors: |
TU; Zhaopeng; (Shenzhen,
CN) ; YANG; Baosong; (Shenzhen, CN) ; WANG;
Xing; (Shenzhen, CN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Tencent Technology (Shenzhen) Company Limited |
Shenzhen |
|
CN |
|
|
Assignee: |
Tencent Technology (Shenzhen)
Company Limited
Shenzhen
CN
|
Family ID: |
1000005458857 |
Appl. No.: |
17/184165 |
Filed: |
February 24, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
PCT/CN2019/117227 |
Nov 11, 2019 |
|
|
|
17184165 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06N 3/08 20130101; G06F
40/40 20200101 |
International
Class: |
G06F 40/40 20060101
G06F040/40; G06N 3/08 20060101 G06N003/08 |
Foreign Application Data
Date |
Code |
Application Number |
Nov 19, 2018 |
CN |
201811376563.5 |
Claims
1. An information processing method, applied to a computer device,
comprising: obtaining a target text sequence corresponding to
to-be-processed text information; obtaining a context vector
according to the target text sequence; determining a logical
similarity corresponding to the target text sequence according to
the context vector and the target text sequence; and encoding the
target text sequence by using the logical similarity to obtain a
text encoding result.
2. The method according to claim 1, wherein obtaining the context
vector according to the target text sequence comprises: obtaining a
vector of each element in the target text sequence; and calculating
an average value of the target text sequence according to the
vector of the each element in the target text sequence, the average
value being used to represent the context vector.
3. The method according to claim 1, wherein obtaining the context
vector according to the target text sequence comprises: obtaining L
layers of text sequences generated before the target text sequence,
L being an integer greater than or equal to 1; and generating the
context vector according to the L layers of text sequences.
4. The method according to claim 1, wherein obtaining the context
vector according to the target text sequence comprises: obtaining L
layers of text sequences corresponding to the target text sequence,
the L layers of text sequences being network layers generated
before the target text sequence, L being an integer greater than or
equal to 1; obtaining L layers of first context vectors according
to the L layers of text sequences, each layer of first context
vector being an average value of elements in the each layer of text
sequence; obtaining a second context vector according to the target
text sequence, the second context vector being an average value of
elements in the target text sequence; and calculating the context
vector according to the L layers of first context vectors and the
second context vector.
5. The method according to claim 1, wherein determining the logical
similarity corresponding to the target text sequence according to
the context vector and the target text sequence comprises:
determining a target query vector and a target key vector according
to the context vector and the target text sequence, the target
query vector corresponding to the target text sequence and the
target key vector corresponding to the target text sequence; and
determining the logical similarity according to the target query
vector and the target key vector.
6. The method according to claim 5, wherein determining the target
query vector and the target key vector according to the context
vector and the target text sequence comprises: calculating an
original query vector, an original key vector and an original value
vector according to the target text sequence; calculating a query
vector scalar and a key vector scalar according to the context
vector, the original query vector, and the original key vector; and
calculating the target query vector and the target key vector
according to the context vector, the query vector scalar, and the
key vector scalar.
7. The method according to claim 5, wherein determining the logical
similarity according to the target query vector and the target key
vector comprises: calculating the logical similarity according to
the following manner: e = Q ^ K ^ T d , ##EQU00021## wherein e
represents the logical similarity, {circumflex over (Q)} represents
the target query vector, {circumflex over (K)} represents the
target key vector, {circumflex over (K)}.sup.T represents the
transpose of the target key vector, and d represents the dimension
of the hidden state vector of the model.
8. The method according to claim 5, wherein the method further
comprises: dividing the target text sequence into X text
subsequences, X being an integer greater than 1 after obtaining the
target text sequence corresponding to the to-be-processed text
information; and wherein determining the target query vector and
the target key vector according to the context vector and the
target text sequence comprises: generating X query vectors and X
key vectors according to the context vector and the X text
subsequences, each text subsequence corresponding to one query
vector and one key vector.
9. The method according to claim 8, wherein: determining the
logical similarity according to the target query vector and the
target key vector comprises calculating the each text subsequence
and a query vector and a key vector that correspond to the each
text subsequence to obtain X sub-logical similarities.
10. The method according to claim 9, wherein: encoding the target
text sequence by using the logical similarity to obtain the text
encoding result comprises: determining a sub-weight value
corresponding to the each text subsequence according to the each
sub-logical similarity, the sub-weight value representing a
relationship between elements in the text subsequence; determining
a sub-output vector according to the sub-weight value corresponding
to the each text subsequence; generating a target output vector
according to the sub-output vector corresponding to the each text
subsequence; and encoding the target text sequence by using the
target output vector to obtain the text encoding result.
11. The method according to claim 1, wherein encoding the target
text sequence by using the logical similarity to obtain the text
encoding result comprises: determining a weight value corresponding
to the target text sequence according to the logical similarity,
the weight value being used for representing a relationship between
elements in the target text sequence; determining a target output
vector according to the weight value corresponding to the target
text sequence; and encoding the target text sequence by using the
target output vector to obtain the text encoding result.
12. The method according to claim 1, wherein obtaining the context
vector according to the target text sequence comprises: obtaining
vector relationships between elements in the target text sequence;
and calculating the context vector according to the vector
relationships between the elements in the target text sequence.
13. The method according to claim 1, further comprising: obtaining
a target context vector according to the text encoding result;
determining a logical similarity corresponding to the text encoding
result according to the target context vector and the text encoding
result; and decoding the text encoding result by using the logical
similarity corresponding to the text encoding result to obtain a
text decoding result.
14. An information processing apparatus, comprising: a memory,
configured to store a program; a processor; and a bus system, being
configured to electrically couple the memory and the processor and
to enable the memory and the processor to communicate with each
other, wherein the processor is configured to execute the program
in the memory to perform a plurality of steps comprising: obtaining
a target text sequence corresponding to to-be-processed text
information; obtaining a context vector according to the target
text sequence; determining a logical similarity corresponding to
the target text sequence according to the context vector and the
target text sequence; and encoding the target text sequence by
using the logical similarity to obtain a text encoding result.
15. The information processing apparatus according to claim 14,
wherein the processor is further configured to execute the program
in the memory to perform steps, comprising: obtaining a target
context vector according to the text encoding result; determining a
logical similarity corresponding to the text encoding result
according to the target context vector and the text encoding
result; and decoding the text encoding result by using the logical
similarity corresponding to the text encoding result to obtain a
text decoding result.
16. The information processing apparatus according to claim 14,
wherein the step of obtaining the context vector according to the
target text sequence comprises: obtaining a vector of each element
in the target text sequence; and calculating an average value of
the target text sequence according to the vector of the each
element in the target text sequence, the average value being used
for representing the context vector.
17. The information processing apparatus according to claim 14,
wherein the step of determining the logical similarity
corresponding to the target text sequence according to the context
vector and the target text sequence comprises: determining a target
query vector and a target key vector according to the context
vector and the target text sequence, the target query vector
corresponding to the target text sequence and the target key vector
corresponding to the target text sequence; and determining the
logical similarity according to the target query vector and the
target key vector.
18. The information processing apparatus according to claim 17,
wherein: the processor is further configured to execute the program
in the memory to perform the step of dividing the target text
sequence into X text subsequences, X being an integer greater than
1 after obtaining the target text sequence corresponding to the
to-be-processed text information; and determining the target query
vector and the target key vector according to the context vector
and the target text sequence comprises: generating X query vectors
and X key vectors according to the context vector and the X text
subsequences, each text subsequence corresponding to one query
vector and one key vector.
19. The information processing apparatus according to claim 18,
wherein: determining the logical similarity according to the target
query vector and the target key vector comprises calculating the
each text subsequence and a query vector and a key vector that
correspond to the each text subsequence to obtain X sub-logical
similarities; and encoding the target text sequence by using the
logical similarity to obtain the text encoding result comprises:
determining a sub-weight value corresponding to the each text
subsequence according to the each sub-logical similarity, the
sub-weight value representing a relationship between elements in
the text subsequence; determining a sub-output vector according to
the sub-weight value corresponding to the each text subsequence;
generating a target output vector according to the sub-output
vector corresponding to the each text subsequence; and encoding the
target text sequence by using the target output vector to obtain
the text encoding result.
20. A non-transitory computer readable medium storing a
computer-readable program, when executed, causing a computer device
to perform a plurality of steps, comprising: obtaining a target
text sequence corresponding to to-be-processed text information;
obtaining a context vector according to the target text sequence;
determining a logical similarity corresponding to the target text
sequence according to the context vector and the target text
sequence; and encoding the target text sequence by using the
logical similarity to obtain a text encoding result.
Description
RELATED APPLICATIONS
[0001] This application is a continuation of the PCT International
Patent Application No. PCT/CN2019/117227, entitled "INFORMATION
PROCESSING METHOD AND APPARATUS, AND STORAGE MEDIUM" and filed with
the China National Intellectual Property Administration on Nov. 11,
2019, which claims priority to Chinese Patent Application No.
201811376563.5, entitled "TEXT TRANSLATION METHOD AND INFORMATION
PROCESSING METHOD AND APPARATUSES" and filed with the Chinese
Patent Office on Nov. 19, 2018. This application claims priority to
the above applications, and the above applications are incorporated
by reference in their entireties.
TECHNICAL FIELD
[0002] This application relates to the field of artificial
intelligence (AI), and in particular, to an information processing
method and apparatus and a storage medium.
BACKGROUND
[0003] An attention mechanism has become a basic module in most
deep learning models, and can dynamically select relevant
representations in networks as required. Studies have shown that
the attention mechanism plays a significant role in tasks such as
machine translation (MT) and image annotation.
[0004] In related technologies, an attention weight is calculated
for each element in a discrete sequence. The dependence between
hidden states in a neural network is directly calculated. A direct
connection is established between each upper-layer network
representation and a lower-layer network representation.
[0005] However, during the calculation of the dependence between
two elements in related technologies, only the relationship between
the two elements is considered. Therefore, for a discrete sequence,
the network representation of elements in the entire discrete
sequence is weak, and as a result, the performance of a neural
network model is degraded.
SUMMARY
[0006] Some embodiments of this disclosure provide an information
processing method and apparatus and a storage medium. A context
vector related to a discrete sequence is used to encode the
discrete sequence, to strengthen the dependence between elements in
the discrete sequence. Thereby, the performance of a neural network
model is enhanced, and the learning capability of the model is
improved.
[0007] In view of this, an aspect of this disclosure provides a
text translation method, applied to a computer device. The method
includes:
[0008] obtaining a target text sequence corresponding to target
text information, the target text sequence including a plurality of
elements;
[0009] obtaining a context vector according to the target text
sequence;
[0010] determining a target query vector and a target key vector
according to the context vector and the target text sequence, the
target query vector having a correspondence with elements in the
target text sequence, the target key vector having a correspondence
with elements in the target text sequence;
[0011] determining a logical similarity corresponding to the target
text sequence according to the target query vector and the target
key vector;
[0012] encoding the target text sequence corresponding to the
target text information by using the logical similarity to obtain a
text encoding result; and
[0013] decoding the text encoding result to obtain a text
translation result corresponding to the target text
information.
[0014] Another aspect of this disclosure provides an information
processing method, including:
[0015] obtaining a target text sequence corresponding to
to-be-processed text information;
[0016] obtaining a context vector according to the target text
sequence;
[0017] determining a logical similarity corresponding to the target
text sequence according to the context vector and the target text
sequence; and
[0018] encoding the target text sequence by using the logical
similarity to obtain a text encoding result.
[0019] Still another aspect of this disclosure provides a text
translation apparatus, including:
[0020] an obtaining module, configured to obtain a target text
sequence corresponding to target text information, the target text
sequence including a plurality of elements;
[0021] the obtaining module being further configured to obtain a
context vector according to the target text sequence;
[0022] a determination module, configured to determine a target
query vector and a target key vector according to the context
vector and the target text sequence that are obtained by the
obtaining module, the target query vector having a correspondence
with elements in the target text sequence, the target key vector
having a correspondence with elements in the target text
sequence;
[0023] the determination module being further configured to
determine a logical similarity corresponding to the target text
sequence according to the target query vector and the target key
vector;
[0024] an encoding module, configured to encode the target text
sequence corresponding to the target text information by using the
logical similarity determined by the determination module to obtain
a text encoding result; and
[0025] a decoding module, configured to decode the text encoding
result encoded by the encoding module to obtain a text translation
result corresponding to the target text information.
[0026] Still another aspect of this disclosure provides an
information processing method, applied to a computer device, and
including:
[0027] obtaining a text encoding result;
[0028] obtaining a target context vector according to the text
encoding result;
[0029] determining a logical similarity corresponding to the text
encoding result according to the target context vector and the text
encoding result; and
[0030] decoding the text encoding result by using the logical
similarity corresponding to the text encoding result to obtain a
text decoding result.
[0031] Still another aspect of this disclosure provides an
information processing apparatus, including:
[0032] an obtaining module, configured to obtain a target text
sequence corresponding to to-be-processed text information, the
obtaining module being further configured to obtain a context
vector according to the target text sequence;
[0033] a determination module, configured to determine a logical
similarity corresponding to the target text sequence according to
the context vector and the target text sequence that are obtained
by the obtaining module; and
[0034] an encoding module, configured to encode the target text
sequence by using the logical similarity determined by the
determination module to obtain a text encoding result.
[0035] Still another aspect of this disclosure provides a text
translation apparatus, including a memory, a processor, and a bus
system,
[0036] the memory being configured to store a program;
[0037] the processor being configured to execute the program in the
memory, to perform the following operations:
[0038] obtaining a target text sequence corresponding to target
text information, the target text sequence including a plurality of
elements;
[0039] obtaining a context vector according to the target text
sequence;
[0040] determining a target query vector and a target key vector
according to the context vector and the target text sequence, the
target query vector having a correspondence with elements in the
target text sequence, the target key vector having a correspondence
with elements in the target text sequence;
[0041] determining a logical similarity corresponding to the target
text sequence according to the target query vector and the target
key vector;
[0042] encoding the target text sequence corresponding to the
target text information by using the logical similarity to obtain a
text encoding result; and
[0043] decoding the text encoding result to obtain a text
translation result corresponding to the target text information;
and
[0044] the bus system being configured to connect the memory and
the processor, to enable the memory and the processor to
communicate with each other.
[0045] Still another aspect of this disclosure provides an
information processing apparatus, including a memory, a processor,
and a bus system,
[0046] the memory being configured to store a program;
[0047] the processor being configured to execute the program in the
memory, to perform the following operations:
[0048] obtaining a target text sequence corresponding to
to-be-processed text information;
[0049] obtaining a context vector according to the target text
sequence;
[0050] determining a logical similarity corresponding to the target
text sequence according to the context vector and the target text
sequence; and
[0051] encoding the target text sequence by using the logical
similarity to obtain a text encoding result; and
[0052] the bus system being configured to connect the memory and
the processor, to enable the memory and the processor to
communicate with each other.
[0053] Still another aspect of this disclosure provides a
computer-readable storage medium, the computer-readable storage
medium storing instructions, the instructions, when run on a
computer, causing the computer to perform the method in the
foregoing aspects.
[0054] Still another aspect of this disclosure provides an
information processing apparatus, including:
[0055] an obtaining module, configured to obtain a text encoding
result;
[0056] the obtaining module being further configured to obtain a
target context vector according to the text encoding result;
[0057] a determination module, configured to determine a logical
similarity corresponding to the text encoding result according to
the target context vector and the text encoding result; and
[0058] a decoding module, configured to decode the text encoding
result by using the logical similarity corresponding to the text
encoding result to obtain a text decoding result.
[0059] Still another aspect of this disclosure provides an
information processing apparatus, including a memory, a processor,
and a bus system,
[0060] the memory being configured to store a program;
[0061] the processor being configured to execute the program in the
memory, to perform the following operations:
[0062] obtaining a text encoding result;
[0063] obtaining a target context vector according to the text
encoding result;
[0064] determining a logical similarity corresponding to the text
encoding result according to the target context vector and the text
encoding result; and
[0065] decoding the text encoding result by using the logical
similarity corresponding to the text encoding result to obtain a
text decoding result; and
[0066] the bus system being configured to connect the memory and
the processor, to enable the memory and the processor to
communicate with each other.
[0067] Still another aspect of this disclosure provides a
non-transitory computer readable medium storing a computer-readable
program, when executed, causing a computer device to perform a
plurality of steps. The steps comprise obtaining a target text
sequence corresponding to to-be-processed text information;
obtaining a context vector according to the target text sequence;
determining a logical similarity corresponding to the target text
sequence according to the context vector and the target text
sequence; and encoding the target text sequence by using the
logical similarity to obtain a text encoding result.
[0068] It can be seen from the foregoing technical solutions that
the embodiments of this disclosure have the following
advantages.
[0069] In the embodiments of this disclosure, an information
processing method is provided. First, a target text sequence
corresponding to to-be-processed text information is obtained, the
target text sequence including a plurality of elements; a context
vector is then obtained according to the target text sequence; a
target query vector and a target key vector are then determined
according to the context vector and the target text sequence, the
target query vector having a correspondence with elements in the
target text sequence, the target key vector having a correspondence
with elements in the target text sequence; and finally, a logical
similarity corresponding to the target text sequence is determined
according to the target query vector and the target key vector, and
the target text sequence corresponding to target text information
is encoded by using the logical similarity to obtain a text
encoding result. In the foregoing manner, a context vector related
to a discrete sequence is used to encode the discrete sequence, to
strengthen the dependence between elements in the discrete
sequence, thereby enhancing the performance of a neural network
model and improving the learning capability of the model.
BRIEF DESCRIPTION OF THE DRAWINGS
[0070] FIG. 1 is a schematic diagram of a basic architecture of
modeling a discrete sequence using a self-attention neural network
(SAN) model in the related technologies.
[0071] FIG. 2 is a schematic diagram showing the relationship
between two words in a SAN model in the related technologies.
[0072] FIG. 3 is a schematic diagram of the architecture of a text
translation system according to an embodiment of this
disclosure.
[0073] FIG. 4 is a schematic flowchart of the calculation of a SAN
model according to an embodiment of this disclosure.
[0074] FIG. 5 is a schematic diagram of an embodiment of a text
translation method according to an embodiment of this
disclosure.
[0075] FIG. 6 is a schematic diagram of an embodiment of an
information processing method according to an embodiment of this
disclosure.
[0076] FIG. 7 is a schematic diagram of another embodiment of an
information processing method according to an embodiment of this
disclosure.
[0077] FIG. 8 is a schematic diagram of an embodiment of a global
context vector according to an embodiment of this disclosure.
[0078] FIG. 9 is a schematic diagram of an embodiment of a depth
context vector according to an embodiment of this disclosure.
[0079] FIG. 10 is a schematic diagram of an embodiment of a
depth-global context vector according to an embodiment of this
disclosure.
[0080] FIG. 11 is a schematic structural diagram of a stacked
multi-head self-attention network according to an embodiment of
this disclosure.
[0081] FIG. 12 is a schematic diagram of a comparison of
translation using a SAN model in an application scenario according
to this disclosure.
[0082] FIG. 13 is a schematic diagram of another embodiment of an
information processing method according to an embodiment of this
disclosure.
[0083] FIG. 14 is a schematic diagram of an embodiment of a text
translation apparatus according to an embodiment of this
disclosure.
[0084] FIG. 15 is a schematic diagram of an embodiment of an
information processing apparatus according to an embodiment of this
disclosure.
[0085] FIG. 16 is a schematic diagram of another embodiment of an
information processing apparatus according to an embodiment of this
disclosure.
[0086] FIG. 17 is a schematic diagram of another embodiment of an
information processing apparatus according to an embodiment of this
disclosure.
[0087] FIG. 18 is a schematic diagram of another embodiment of an
information processing apparatus according to an embodiment of this
disclosure.
[0088] FIG. 19 is a schematic structural diagram of a terminal
device according to an embodiment of this disclosure.
[0089] FIG. 20 is a schematic structural diagram of a server
according to an embodiment of this disclosure.
DETAILED DESCRIPTION
[0090] AI involves a theory, a method, a technology, and an
application system that use a digital computer or a machine
controlled by the digital computer to simulate, extend, and expand
human intelligence, perceive an environment, obtain knowledge, and
use knowledge to obtain an optimal result. In other words, AI is a
comprehensive technology in computer science and attempts to
understand the essence of intelligence and produce a new
intelligent machine that can react in a manner similar to human
intelligence. AI is to study the design principles and
implementation methods of various intelligent machines and to
enable the machines to have the functions of perception, reasoning,
and decision-making.
[0091] The AI technology is a comprehensive discipline and relates
to a wide range of fields including a hardware-level technology and
a software-level technology. The basic AI technology generally
includes technologies such as a sensor, a dedicated AI chip, cloud
computing, distributed storage, a big data processing technology,
an operating/interaction system, and electromechanical integration.
AI software technologies mainly include several major directions
such as a natural language processing (NLP) technology and machine
learning (ML)/deep learning.
[0092] NLP is an direction in the fields of computer science and
AI. It studies various theories and methods that enable effective
communication between humans and computers in natural language. NLP
is a science that integrates linguistics, computer science, and
mathematics. Therefore, research in this field involves natural
language, that is, a language that people use daily, so it is
closely related to the study of linguistics. NLP technologies
usually include text processing, semantic understanding, machine
translation, robot question answering, knowledge graphs, and other
technologies.
[0093] ML is a multi-disciplinary subject involving a plurality of
disciplines such as probability theory, statistics, approximation
theory, convex analysis, and algorithm complexity theory. ML
specializes in studying how a computer simulates or implements a
human learning behavior to obtain new knowledge or skills, and
reorganize an existing knowledge structure to keep improving its
performance. ML is the core of the AI, is a basic way to make the
computer intelligent, and is applied to various fields of AI. ML
and deep learning generally include technologies such as an
artificial neural network, a belief network, reinforcement
learning, transfer learning, inductive learning, and learning from
demonstrations.
[0094] With the research and progress of the AI technology, the AI
technology is studied and applied to a plurality of fields, such as
a common virtual assistant, a smart speaker, and a smart customer
service. It is believed that with the development of technologies,
the AI technology will be applied to more fields and play an
increasingly important role.
[0095] Some embodiments of this disclosure provide a text
translation method, an information processing method and
apparatuses. A context vector related to a discrete sequence is
used to encode the discrete sequence and to strengthen the
dependence between elements in the discrete sequence. Thereby, the
performance of a neural network model is enhanced, and the learning
capability of the model is improved.
[0096] In the specification, claims, and accompanying drawings of
this disclosure, the terms "first," "second," "third," "fourth,"
and the like (if existing) are intended to distinguish between
similar objects rather than describe a specific sequence or a
precedence order. It is to be understood that data used in this way
is exchangeable in a proper case, so that the embodiments of this
disclosure described herein can be implemented in an order
different from the order shown or described herein. Moreover, the
terms "include," "contain," and any other variants mean to cover
the non-exclusive inclusion. For example, a process, method,
system, product, or device that includes a list of steps or units
is not necessarily limited to those steps or units that are
expressly listed, but it may include other steps or units not
expressly listed or inherent to such a process, method, system,
product, or device.
[0097] It is to be understood that a SAN model provided in this
disclosure is a neural network structure model based on a
self-attention mechanism. The application based on the SAN model is
also very extensive. The SAN model calculates an attention weight
for each element pair in a discrete sequence by using, such as,
question answering systems, acoustic modeling, natural language
inference, sentence classification, text translation, and the like.
Therefore, compared with a recurrent neural network (RNN) in a
conventional sequence modeling method, the SAN model may capture
long-distance dependencies more directly. For example, in a
new-generation neural machine translation (NMT) architecture, the
translation can fully use the attention mechanism, and it achieves
better translation quality than an neural machine translation
system that uses RNN for modeling in the translation task of
multiple language pairs.
[0098] FIG. 1 is a schematic diagram of the architecture of
modeling a discrete sequence using a SAN model in related
technologies. The SAN model can directly calculate the dependence
between hidden states in a neural network. A direct connection is
established between each upper-layer network representation and a
lower-layer network representation. A SAN model specializes in
capturing the dependence between elements. FIG. 2 is a schematic
diagram showing the relationship between two words in a SAN model
in solutions in related technologies. As shown in the figure, in a
SAN model using an attention mechanism, only the relationship
between two words is considered to calculate the dependence between
the two words (such as "talk" and "Sharon" in FIG. 2). However, it
is found through research that context information is capable of
enhancing the dependence between network representations.
Especially, for an attention model, the use of context information
can enhance the SAN model. In addition, in this disclosure,
internal elements of a discrete sequence are used to represent
context information to avoid dependence on external resources,
thereby greatly improving the simplicity and flexibility of the SAN
model and easy deployment and implementation.
[0099] The application of the SAN model to a text translation
scenario is used as an example for description below. FIG. 3 is a
schematic diagram of the architecture of a text translation system
according to an embodiment of this disclosure. As shown in the
figure, a SAN model provided in this disclosure is deployed on a
server. After a terminal device transmits text information to the
server, the server encodes and decodes the text information to
generate a translation result. The server then transmits the
translation result to the terminal device, and the terminal device
displays the translation result. In an embodiment, during the
actual application, the SAN model may be deployed on the terminal
device. That is, when the terminal device is offline, the SAN model
may still be used to translate text information to generate a
translation result. Then, the terminal device displays the
translation result. It may be understood that the terminal device
includes, but not limited to, a tablet computer, a mobile phone, a
notebook computer, a personal computer (PC), and a palm
computer.
[0100] The SAN model provided in this disclosure generally includes
four text processing steps, namely, generating word vectors,
encoding, applying an attention mechanism, and predicting. First,
in the first step, high-dimensional sparse binary vectors are
mapped into low-dimensional dense vectors in a word vector table.
For example, assuming that a received text is a string of American
Standard Code for Information Interchange (ASCII) characters and
has 256 possible values, each possible value is represented as a
256-dimensional binary vector. Only the value of the 97.sup.th
dimension of the vector of a character "a" is equal to 1, and the
values of other dimensions are all equal to 0. Only the value of
the 98.sup.th dimension of the vector of a character "b" is equal
to 1, and the values of other dimensions are all equal to 0. This
representation method is referred to as a "one hot" form. The
vector representations of different characters are completely
different. In most neural network models, an input text is first
divided into several words, and the words are then represented by
using word vectors. Other models extend a word vector
representation with other information. For example, in addition to
the identification of a word, a string of tags is also entered.
Next, tag vectors may be obtained through learning, and the tag
vectors are concatenated into a word vector. This may allow the
addition of some position-sensitive information to the word vector
representation.
[0101] In the second step, assuming that the sequence of the word
vector is obtained, the encoding step is to convert the sequence of
the word vector into a sentence matrix, and each row of the matrix
represents the meaning of each word in the context. A bidirectional
RNN model may be used in this step. Certainly, the model effects of
long short-term memory (LSTM) and gated recurrent unit (GRU)
structures are adequate. Each row of vectors is calculated in two
parts: The first part is forward calculation, and the second part
is reverse calculation. The two parts are then concatenated into a
complete vector.
[0102] In the third step, the matrix obtained in the second step is
compressed into a vector representation, so that the vector
representation may be transmitted into a standard feedforward
neural network for prediction. The advantage of the attention
mechanism over other compression methods is that an auxiliary
context vector is inputted. Finally, in the prediction step, after
text content is compressed into a vector, a final target
representation, that is, a category tag, a real value, or a vector,
may be learned. A network model may be considered as a state
machine controller, for example, a transition-based parser, to make
structured predictions.
[0103] For ease of understanding, FIG. 4 is a schematic flowchart
of the calculation of a SAN model according to an embodiment of
this disclosure. As shown in the figure, in a calculation method
based on a SAN model, a process of generating a network
representation of each element is as follows:
[0104] In step S1, an input sequence is given, and the first layer
of network of the SAN model converts discrete elements in the input
sequence into a continuous spatial representation.
[0105] A masking layer is an optional layer. Since all input
sequences may have inconsistent lengths in actual operation, all
the input sequences may be set as sequences with the same length
through the masking layer. That is, the longest sequence is used as
the standard, and a shorter sequence is set in a zero-padding
manner to a length the same as the length of the longest
sequence.
[0106] In step S2, a context vector is generated according to a
spatial representation of the input sequence.
[0107] In step S3, three different learnable parameter matrices are
used to linearly change the spatial representation of the input
sequence to obtain a query vector sequence, a key vector sequence,
and a value vector sequence. Then, a logical similarity between a
query and each key-value pair is modeled by using a dot product in
combination with the context vector.
[0108] In step S4, the logical similarity is normalized to obtain a
weight between the query and each key-value pair.
[0109] Each element in the input sequence is normalized. Assuming
that there are five elements, the sum of the weights of these five
elements after normalization is 1.
[0110] In step S5, an output vector of a current element is
obtained from weighted summation of each value according to the
weight calculated in step S4, and a dot product of the weight and
the value is calculated in actual calculation.
[0111] This embodiment of this disclosure provides a context
enhancement model that does not need to introduce additional
information (for example, context information), thereby improving
the performance of a self-attention network. A text translation
method in this disclosure is described below. Referring to FIG. 5,
an embodiment of a text translation method in this embodiment of
the present disclosure includes the following steps:
[0112] Step 101: Obtain a target text sequence corresponding to
target text information, the target text sequence including a
plurality of elements.
[0113] In this embodiment, to-be-processed text information is
first obtained. The to-be-processed text information may be a
discrete input sequence, for example, H={h.sub.1, . . . , h.sub.I}.
An embedding layer of a neural network is then used to convert
discrete elements into a continuous spatial representation, that
is, the target text sequence.
[0114] The embedding layer is used to convert input information
into a vector at the beginning layer of the neural network. The
first step of using the embedding layer is to encode the
to-be-processed text information by indexing and assign an index to
each piece of different to-be-processed text information. Next, an
embedding matrix is created to determine how many "latent factors"
need to be assigned to each index, which means how long a vector is
desired, so that the embedding matrix may be used to represent the
to-be-processed text information instead of a huge encoding
vector.
[0115] Step 102: Obtain a context vector according to the target
text sequence.
[0116] In this embodiment, a corresponding context vector is
generated according to the target text sequence. The context vector
is learned from the internal representation in the network, thereby
ensuring the simplicity and ease of use of the SAN model. During
actual disclosure, there are three ways to represent the context
vector. A current layer representation is used to calculate a
global context vector; a history layer representation is used to
calculate a syntax-semantic context vector; and a history layer
global context vector is used to simultaneously obtain global
information and a syntax-semantic context representation.
[0117] Step 103: Determine a target query vector and a target key
vector according to the context vector and the target text
sequence, the target query vector having a correspondence with
elements in the target text sequence and the target key vector
having a correspondence with elements in the target text
sequence.
[0118] In this embodiment, the target query vector and the target
key vector are determined according to the context vector and the
target text sequence. The target query vector has a correspondence
with elements in the target text sequence, and the target key
vector has a correspondence with elements in the target text
sequence. For example, Q.sub.1 in the target query vector
corresponds to the target text sequence h.sub.1, and K.sub.1 in the
target key vector corresponds to the target text sequence
h.sub.1.
[0119] Step 104: Determine a logical similarity corresponding to
the target text sequence according to the target query vector and
the target key vector.
[0120] In this embodiment, the logical similarity corresponding to
the target text sequence is generated according to the target query
vector and the target key vector. It can be seen from the above
steps that the target query vector includes a plurality of
elements, that the target key vector also includes a plurality of
elements, and that each element has a correspondence with elements
in the target text sequence. Therefore, when determining the
logical similarity corresponding to the target text sequence, each
element in the target query vector is associated with each element
in the target key vector. For example, the logical similarity is
represented as e, and e.sub.ij represents the similarity between an
i.sup.th element in the target query vector and a j.sup.th element
in the target key vector.
[0121] Step 105: Encode the target text sequence corresponding to
the target text information by using the logical similarity to
obtain a text encoding result.
[0122] In this embodiment, the logical similarity is used to encode
the target text sequence corresponding to the target text
information. Assuming that the target text information is "Today is
a nice day", the five elements (words) in the sentence need to be
converted to obtain the target text sequence. The logical
similarity is then used to perform first encoding on the target
text sequence, second encoding may further be performed based on
the first encoding, and so on. Assuming a five-layer network, the
target text sequence needs to be encoded five times until a text
encoding result is eventually outputted.
[0123] Step 106: Decode the text encoding result to obtain a text
translation result corresponding to the target text
information.
[0124] In this embodiment in the scenario of translation, after a
source end encodes the target text sequence, the text encoding
result is transmitted to a destination end, and the destination end
decodes the text encoding result. During decoding, elements (words)
are usually generated one by one. That is, one word is generated
after each decoding. The text encoding result is a representation
of a word vector and a context vector of a word. The word vector
and the context vector are used to calculate a new network vector
representation. A word is then obtained after the network vector
representation passes through a softmax layer. This word is then
used to calculate a next word until the translation result of the
target text information is outputted. For example, the result of
"Today is a nice day" will be translated into a sentence with the
same or similar meaning in Chinese.
[0125] In this embodiment of the present disclosure, a text
translation method is provided. First, a target text sequence
corresponding to target text information is obtained, the target
text sequence including a plurality of elements. A context vector
is obtained according to the target text sequence. A target query
vector and a target key vector are determined according to the
context vector and the target text sequence, in which the target
query vector has a correspondence with elements in the target text
sequence, and the target key vector has a correspondence with
elements in the target text sequence. A logical similarity
corresponding to the target text sequence is determined according
to the target query vector and the target key vector. The target
text sequence corresponding to target text information is encoded
by using the logical similarity to obtain a text encoding result,
and the text encoding result is decoded to obtain a text
translation result corresponding to the target text information. In
the foregoing manner, a context vector related to a discrete
sequence is used to encode the discrete sequence, to strengthen the
dependence between elements in the discrete sequence, so that a
network representation between different words can be flexibly
learned by using context information, thereby improving the quality
of machine translation.
[0126] The information processing method in this disclosure is
described below. The information processing method provided in this
embodiment of this disclosure is applied to a computer device. The
computer device is an electronic device with computing and
processing capabilities. For example, the computer device may be a
terminal or a server. The terminal may be a mobile phone, a tablet
computer, a PC, or the like. The server may be a server or a server
cluster formed by a plurality of servers. As shown in FIG. 6, the
information processing method provided in this embodiment of this
disclosure includes the following steps:
[0127] Step 110. Obtain a target text sequence corresponding to
to-be-processed text information.
[0128] In this embodiment of this disclosure, the to-be-processed
text information may be any piece of text information. The
to-be-processed text information may be a discrete sequence. The
target text sequence corresponding to the to-be-processed text
information may be obtained by inputting the to-be-processed text
information into an embedding layer of the neural network. For
example, the target text sequence is a continuous spatial
representation.
[0129] Step 120. Obtain a context vector according to the target
text sequence.
[0130] In this embodiment of this disclosure, the context vector is
used for representing the context information corresponding to the
target text sequence, and the context vector is obtained according
to the target text sequence without introducing additional
information, for example, context information.
[0131] Step 130. Determine a logical similarity corresponding to
the target text sequence according to the context vector and the
target text sequence.
[0132] The logical similarity is used for characterizing the
similarity between a query and a key. The logical similarity
corresponding to the target text sequence is determined by using
the context vector, so that a final calculated target text sequence
incorporates context information.
[0133] Step 140. Encode the target text sequence by using the
logical similarity to obtain a text encoding result.
[0134] The target text sequence is encoded by using the logical
similarity with context information, so that the text encoding
result is more accurate.
[0135] This embodiment of this disclosure is described by using
only an example in which the information processing method is
applied to the field of machine translation. In other possible
implementations, the information processing method provided in this
embodiment of this disclosure is also applicable to other tasks
using self-attention network model language information, for
example, language models, sentence classification, language
reasoning, question answering, and dialog systems. The application
field of the information processing method is not limited in this
embodiment of this disclosure.
[0136] In summary, in the technical solution provided in this
embodiment of this disclosure, a target text sequence corresponding
to to-be-processed text information is obtained; a context vector
is obtained according to the target text sequence; a logical
similarity corresponding to the target text sequence is determined
according to the context vector and the target text sequence; and
the target text sequence is encoded by using the logical similarity
to obtain a text encoding result. In the foregoing manner, a
context vector related to a discrete sequence is used to encode the
discrete sequence, to strengthen the dependence between elements in
the discrete sequence, thereby enhancing the performance of a
neural network model and improving the learning capability of the
model.
[0137] An information processing method in this disclosure is
described below. Referring to FIG. 7, another embodiment of an
information processing method in this embodiment of this disclosure
includes the following steps:
[0138] Step 201: Obtain a target text sequence corresponding to
to-be-processed text information, the target text sequence
including a plurality of elements.
[0139] In this embodiment, to-be-processed text information is
first obtained. The to-be-processed text information may be a
discrete input sequence, for example, H={h.sub.1, . . . ,h.sub.I}.
An embedding layer of a neural network is then used to convert
discrete elements into a continuous spatial representation, that
is, the target text sequence.
[0140] The embedding layer is used to convert input information
into a vector at the beginning layer of the neural network. The
first step of using the embedding layer is to encode the
to-be-processed text information by indexing and assign an index to
each piece of different to-be-processed text information. Next, an
embedding matrix is created to determine how many "latent factors"
need to be assigned to each index, which means how long a vector is
desired, so that the embedding matrix may be used to represent the
to-be-processed text information instead of a huge encoding
vector.
[0141] Step 202: Obtain a context vector according to the target
text sequence.
[0142] In this embodiment, a corresponding context vector is
generated according to the target text sequence. The context vector
is learned from the internal representation in the network, thereby
ensuring the simplicity and ease of use of the SAN model. During
actual disclosure, there are three ways to represent the context
vector. A current layer representation is used to calculate a
global context vector, a history layer representation is used to
calculate a syntax-semantic context vector, and a history layer
global context vector is used to simultaneously obtain global
information and a syntax-semantic context representation.
[0143] Step 203: Determine a target query vector and a target key
vector according to the context vector and the target text
sequence, in which the target query vector has a correspondence
with elements in the target text sequence and the target key vector
has a correspondence with elements in the target text sequence.
[0144] In this embodiment, the target query vector and the target
key vector are determined according to the context vector and the
target text sequence. The target query vector has a correspondence
with elements in the target text sequence, and the target key
vector has a correspondence with elements in the target text
sequence. For example, Q.sub.1 in the target query vector
corresponds to the target text sequence h.sub.1, and K.sub.1 in the
target key vector corresponds to the target text sequence
h.sub.1.
[0145] Step 204: Determine a logical similarity corresponding to
the target text sequence according to the target query vector and
the target key vector.
[0146] In this embodiment, the logical similarity corresponding to
the target text sequence is generated according to the target query
vector and the target key vector. It can be seen from the above
steps that the target query vector includes a plurality of
elements, the target key vector also includes a plurality of
elements, and each element has a correspondence with elements in
the target text sequence. Therefore, when determining the logical
similarity corresponding to the target text sequence, each element
in the target query vector is associated with each element in the
target key vector. For example, the logical similarity is
represented as e, and e.sub.i,j represents the similarity between
an ith element in the target query vector and a j.sup.th element in
the target key vector.
[0147] Step 205: Encode the target text sequence corresponding to
target text information by using the logical similarity to obtain a
text encoding result.
[0148] In this embodiment, the logical similarity is used to encode
the target text sequence corresponding to the target text
information. Assuming that the target text information is "Today is
a nice day", the five elements (words) in the sentence need to be
converted to obtain the target text sequence. The logical
similarity is then used to perform first encoding on the target
text sequence. Second encoding may further be performed based on
the first encoding, and so on. Assuming a five-layer network, the
target text sequence needs to be encoded five times until a text
encoding result is eventually outputted.
[0149] In this embodiment of this disclosure, the information
processing method is provided. First, a target text sequence
corresponding to to-be-processed text information is obtained, the
target text sequence including a plurality of elements; a context
vector is then obtained according to the target text sequence; a
target query vector and a target key vector are then determined
according to the context vector and the target text sequence, the
target query vector having a correspondence with elements in the
target text sequence, the target key vector having a correspondence
with elements in the target text sequence; and finally, a logical
similarity corresponding to the target text sequence is determined
according to the target query vector and the target key vector, and
the target text sequence corresponding to target text information
is encoded by using the logical similarity to obtain a text
encoding result. In the foregoing manner, a context vector related
to a discrete sequence is used to encode the discrete sequence, to
strengthen the dependence between elements in the discrete
sequence, thereby enhancing the performance of a neural network
model and improving the learning capability of the model.
[0150] Based on the embodiment corresponding to FIG. 7, in an
exemplary embodiment of the information processing method provided
in this embodiment of this disclosure, the step of obtaining a
context vector according to the target text sequence may
include:
[0151] 1. obtaining a vector of each element in the target text
sequence; and
[0152] 2. calculating an average value of the target text sequence
according to the vector of the each element in the target text
sequence, the average value being used for representing the context
vector.
[0153] In this embodiment, a method for globally generating a
context vector by using a target text sequence is described.
Specifically, a context vector corresponding to all elements in a
target text sequence is determined by a unified context vector,
which requires summarization of the information represented by all
elements in a layer.
[0154] A conventional self-attention network calculates an
attention weight between two elements (for example, "talk" and
"Sharon") separately without considering the overall information of
the target text sequence. This embodiment of this disclosure
considers the impact of the entire target text sequence on each
element. FIG. 8 is a schematic diagram of an embodiment of a global
context vector in this embodiment of this disclosure. As shown in
the figure, the average value in the target text sequence is used
as the representation of an input layer. The context vector herein
is not a matrix because it is obtained after averaging calculation
of a layer of target text sequence. Specifically, a target text
sequence H is first obtained. H includes a plurality of elements,
that is, H={h.sub.1, . . . , h.sub.I}, with 1 to I elements. The
average value of the target text sequence is then calculated
according to the vector of each element in the target text
sequence. That is, the average value is calculated by using the
following formula:
c=H,
[0155] where c represents the average value of the target text
sequence, and the average value is the context vector.
[0156] Assuming that the target text sequence includes three
elements A, B, and C, A, B, and C are all vectors. An average value
obtained by using (A+B+C)/3 may be used as the context vector. In
other possible implementations, in this case, the relationships
between the following elements need to be obtained: A and A, A and
B, A and C, B and A, B and B, B and C, C and A, C and B, and C and
C. The average value is calculated according to the vector
relationships between these elements, and the average value is used
as the context vector.
[0157] In the above embodiment, the context vector is only obtained
by averaging the vectors of the elements in a target inverse text
sequence. In other possible implementations, the context vector may
be obtained by seeking the maximum value or other linear changes.
This embodiment of this disclosure does not limit the manner of
obtaining a context vector.
[0158] Second, in this embodiment of this disclosure, a method for
obtaining a context vector based on a global text sequence is
provided. That is, a vector of each element in the target text
sequence is obtained. An average value of the target text sequence
is calculated according to the vector of each element in the target
text sequence. The average value is represented as the context
vector. In the foregoing manner, the context vector may be obtained
through the entire text sequence, to provide a feasible manner of
implementing the solution, thereby improving the operability of the
solution.
[0159] In addition, the method for obtaining a context vector
provided in this embodiment of this disclosure has simple
operations and a fast calculation speed.
[0160] Based on the embodiment corresponding to FIG. 7, in an
exemplary embodiment of the information processing method provided
in this embodiment of this disclosure, the obtaining a context
vector according to the target text sequence may include:
[0161] 1. obtaining L layers of text sequences generated before the
target text sequence, L being an integer greater than or equal to
1; and
[0162] 2. generating the context vector according to the L layers
of text sequences.
[0163] For example, the L layers of text sequences are concatenated
to generate a context vector. In other possible implementations,
the context vector may be generated according to the L layers of
text sequences by using a convolutional neural network, an RNN, or
a gated unit and a variant thereof, or a simple linear
transformation.
[0164] In this embodiment, a method for deeply generating a context
vector by using a target text sequence is described. Specifically,
a neural network model usually has a plurality of layers of
networks, and a depth context vector represents a plurality of
layers of networks that interact with each other. For ease of
description, FIG. 9 is a schematic diagram of an embodiment of a
depth context vector according to an embodiment of this disclosure.
As shown in the figure, assuming that the target text sequence is
an (L+1).sup.th layer, it is necessary to obtain inputs of all
preceding layers, that is, text sequences of the first layer to an
L.sup.th layer. The plurality of layers of text sequences are
concatenated to obtain a depth context vector C:
C=[H.sup.1, . . . , H.sup.L].
[0165] The context vector C herein is a matrix. H.sup.1 in FIG. 9
represents the text sequence of the first layer of network, H.sup.2
represents the text sequence of the second layer of network, and
H.sup.3 represents the target text sequence of the current layer.
For "talk" and "Sharon", it is equivalent to that the bottom two
layers of networks are concatenated together. If the dimension of
each layer of network is 512, the dimension obtained after
concatenation is 1024, that is, the depth d.sub.c=n.times.Ld, where
n represents the number of vectors in one layer of network, L
represents the number of network layers generated before the target
text sequence, and d represents the dimension of the inputted
hidden state.
[0166] Second, in this embodiment of this disclosure, a method for
obtaining a context vector based on a depth text sequence is
provided. That is, L layers of text sequences corresponding to the
target text sequence are first obtained, the L layers of text
sequences being network layers generated before the target text
sequence, L being an integer greater than or equal to 1; and the
context vector is then generated according to the L layers of text
sequences. In the foregoing manner, the context vector may be
obtained by using the plurality of depth text sequences, to provide
a feasible manner of implementing the solution, thereby improving
the operability of the solution.
[0167] Based on the embodiment corresponding to FIG. 7, in an
exemplary embodiment of the information processing method provided
in this embodiment of this disclosure, the step of obtaining a
context vector according to the target text sequence may
include:
[0168] 1. obtaining L layers of text sequences corresponding to the
target text sequence, the L layers of text sequences being network
layers generated before the target text sequence, L being an
integer greater than or equal to 1;
[0169] 2. obtaining L layers of first context vectors according to
the L layers of text sequences, each layer of first context vector
being an average value of elements in each layer of text
sequence;
[0170] 3. obtaining a second context vector according to the target
text sequence, the second context vector being an average value of
elements in the target text sequence; and
[0171] 4. calculating the context vector according to the L layers
of first context vectors and the second context vector.
[0172] In this embodiment, a method for deeply generating a context
vector by using a target text sequence and globally generating a
context vector by using a target text sequence is described.
Specifically, a neural network model usually has a plurality of
layers of networks, a depth context vector represents a plurality
of layers of networks that interact with each other, and a global
context vector represents information represented by all elements
in a target text sequence. For ease of description, FIG. 10 is a
schematic diagram of an embodiment of a depth-global context vector
according to an embodiment of this disclosure. As shown in the
figure, assuming that the target text sequence is the (L+1).sup.th
layer, it is necessary to obtain inputs of all preceding layers,
that is, text sequences of the first layer to the L.sup.th layer.
It is necessary to use the manner provided in an exemplary
embodiment corresponding to FIG. 7 to calculate a global context
vector of each layer of text sequence, to obtain {c.sup.1, . . . ,
c.sup.L}, where c.sup.1 represents the average value of elements in
the first layer of text sequence, referred to as a first context
vector, and c.sup.2 represents the average value of elements in the
second layer of text sequence, and also referred to as the first
context vector. Finally, it is necessary to obtain the average
value of the elements in the target text sequence corresponding to
the current layer, that is, c.sup.L+1, where c.sup.L+1 is referred
to as the second context vector.
[0173] The plurality of layers of context vector representations
are concatenated to obtain a depth-global context vector c of
(L+1)d dimensions, that is,
[0174] c=[c.sup.1, . . . , c.sup.L+1]; where c herein is a vector
rather than a matrix.
[0175] Second, in this embodiment of this disclosure, a method for
obtaining a context vector based on depth and global text sequences
is provided. That is, L layers of first context vectors are first
obtained according to the L layers of text sequences, each layer of
first context vector being an average value of elements in each
layer of text sequence. The second context vector is then obtained
according to the target text sequence, the second context vector
being an average value of elements in the target text sequence.
Finally, the context vector is calculated according to the L layers
of first context vectors and the second context vector. In the
foregoing manner, the context vector may be obtained by using the
plurality of depth-global text sequences, to provide a feasible
manner of implementing the solution, thereby improving the
operability of the solution.
[0176] Based on FIG. 7 and any one of the foregoing embodiments
corresponding to FIG. 7, in an exemplary embodiment of the
information processing method provided in this embodiment of this
disclosure, the determining a target query vector and a target key
vector according to the context vector and the target text sequence
may include:
[0177] calculating an original query vector, an original key
vector, and an original value vector according to the target text
sequence, the original value vector being used for determining a
target output vector corresponding to the target text sequence;
[0178] calculating a query vector scalar and a key vector scalar
according to the context vector, the original query vector, and the
original key vector; and
[0179] calculating the target query vector and the target key
vector according to the context vector, the query vector scalar,
and the key vector scalar.
[0180] In this embodiment, how to generate the target query vector
and the target key vector in combination with the context vector is
described. In this disclosure, a self-attention model is proposed,
and the model may incorporate a context vector based on a text
sequence. First, the original query vector, the original key
vector, and the original value vector are calculated according to
the target text sequence, the original value vector being used for
determining the target output vector corresponding to the target
text sequence. The query vector scalar and the key vector scalar
may then be calculated according to the context vector, the
original query vector, and the original key vector. The scalar is
represented between 0 and 1 and is used for controlling the
strength relationship between the context vector and the original
query vector and the strength relationship between the context
vector and the original value vector. In the range of 0 to 1, a
larger scalar indicates stronger correlation.
[0181] Finally, the target query vector and the target key vector
are calculated according to the context vector, the query vector
scalar, and the key vector scalar.
[0182] Next, in this embodiment of this disclosure, a manner of
determining the target query vector and the target key vector
according to the context vector and the target text sequence is
described. That is, the original query vector, the original key
vector, and the original value vector are first calculated
according to the target text sequence. The query vector scalar and
the key vector scalar are then calculated according to the context
vector, the original query vector, and the original key vector.
Finally, the target query vector and the target key vector are
calculated according to the context vector, the query vector
scalar, and the key vector scalar. In the foregoing manner, the
context vector is incorporated into the target query vector and the
target key vector, to enhance the feature representation of the
original query vector and the original key vector, thereby
strengthening the network representation of the entire text
sequence and improving the model learning performance.
[0183] In this embodiment according to specific formulas, the
original query vector, the original key vector, and the original
value vector are calculated, the query vector scalar and the key
vector scalar are calculated, and the target query vector and the
target key vector are calculated.
[0184] In this embodiment of this disclosure, the query vector
scalar is used for controlling the strength relationship between
the context vector and the original query vector, and the key
vector scalar is used for controlling the strength relationship
between the context vector and the original key vector.
[0185] Specifically, the sequence represented by a source end
vector needs to be generated first. That is, the target text
sequence H={h.sub.1, . . . , h.sub.I} corresponding to the
to-be-processed text information is obtained, and the output of the
lower layer is then used as the input of the current layer. The
original query vector, the original key vector, and the original
value vector are calculated in the following manner:
[ Q K V ] = H [ W Q W K W V ] , ##EQU00001##
[0186] where Q represents the original query vector, K represents
the original key vector, V represents the original value vector, H
represents the target text sequence, W.sub.Q represents a first
parameter matrix, W.sub.K represents a second parameter matrix,
W.sub.V represents a third parameter matrix, and the first
parameter matrix, the second parameter matrix, and the third
parameter matrix are pre-trained parameter matrices, that is,
{W.sub.Q, W.sub.K, W.sub.V} are all trainable parameter matrices.
The parameter matrix may be represented as d.times.d. d represents
the dimension of the inputted hidden state (a value such as 512 or
1024 may be used, which is not limited herein). Certainly, during
actual application, the parameter matrix may also be represented as
d.sub.1.times.d.sub.2.
[0187] Based on the original query vector Q and the original key
vector K obtained above, the query vector scalar and the key vector
scalar may be calculated in combination with the context vector,
that is,
[ .lamda. Q .lamda. K ] = .sigma. ( [ Q K ] [ V Q H V K H ] + C [ U
Q U K ] [ V Q C V K C ] ) , ##EQU00002##
[0188] where .lamda..sub.Q represents the query vector scalar,
.lamda..sub.K represents the key vector scalar,
.sigma.(.quadrature.) represents a logical sigmoid nonlinear
change, and is used for mapping the scalar to a value between 0 and
1. C represents the context vector, U.sub.Q represents a fourth
parameter matrix, U.sub.K represents a fifth parameter matrix, the
fourth parameter matrix and the fifth parameter matrix are
pre-trained parameter matrices, V.sub.Q.sup.H represents a first
linear transformation factor, V.sub.K.sup.H represents a second
linear transformation factor, V.sub.Q.sup.C represents a third
linear transformation factor, and V.sub.K.sup.C represents a fourth
linear transformation factor.
[0189] The fourth parameter matrix U.sub.Q and the fifth parameter
matrix U.sub.K are trainable parameter matrices of d.sub.c.times.d.
The first linear transformation factor V.sub.Q.sup.H and the second
linear transformation factor V.sub.K.sup.H are linear
transformation factors of d.times.1, and are used for linearly
mapping the d-dimensional vector to a scalar. The third linear
transformation factor V.sub.Q.sup.C and the fourth linear
transformation factor V.sub.K.sup.C are linear transformation
factors of d.times.1, and are used for linearly mapping each vector
(d-dimensional) in Q to a scalar (1-dimensional).
[0190] Finally, the target query vector and the target key vector
are calculated in the following manner:
[ Q ^ K ^ ] = ( 1 - [ .lamda. Q .lamda. K ] ) [ Q K ] + [ .lamda. Q
.lamda. K ] ( C [ U Q U K ] ) , ##EQU00003##
[0191] where {circumflex over (Q)} represents the target query
vector, and {circumflex over (K)} represents the target key vector.
Therefore, the target query vector and the target key vector with
the context vector are obtained. The weighted sum of the original
query vector and the context vector is calculated, the weight is a
scalar .lamda..sub.Q, the weighted sum of the original key vector
and the context vector is calculated, and the weight is a scalar
.lamda..sub.K. The weighted sums are used at the same time to
dynamically adjust the proportions of the context representation
participating in a final target query vector and a final target key
vector.
[0192] Next, in this embodiment of this disclosure, a specific
calculation manner is provided. The original query vector, the
original key vector, and the original value vector may be
calculated according to the target text sequence. The query vector
scalar and the key vector scalar are calculated according to the
context vector, the original query vector, and the original key
vector. The target query vector and the target key vector are
calculated according to the context vector, the query vector
scalar, and the key vector scalar. In the foregoing manner, a
specific operation manner is provided for implementing the
solution, and the calculation of the formula is used to clarify how
to obtain the parameters, thereby ensuring the feasibility and
operability of the solution.
[0193] In this embodiment, after the target query vector and the
target key vector with the context vector are obtained, the logical
similarity may be calculated by using the following formula, that
is,
e = Q ^ K ^ T d , ##EQU00004##
[0194] where e represents the logical similarity, {circumflex over
(Q)} represents the target query vector, {circumflex over (K)}
represents the target key vector, {circumflex over (K)}.sup.T
represents the transpose of the target key vector, and d represents
the dimension of the hidden state vector of the model. e herein
represents a matrix, where e.sub.ij represents a logical similarity
between an i.sup.th element of the target query vector {circumflex
over (Q)} and a j.sup.th element of the target key vector
{circumflex over (K)}.
[0195] Next, in this embodiment of this disclosure, a manner of
calculating the logical similarity corresponding to the target text
sequence according to the target query vector and the target key
vector is provided. In the foregoing manner, a specific operation
manner is provided for implementing the solution, and the
calculation of the formula is used to clarify how to obtain the
parameters, thereby ensuring the feasibility and operability of the
solution.
[0196] Based on the embodiment corresponding to FIG. 7, in an
exemplary embodiment of the information processing method provided
in this embodiment of this disclosure, the encoding the target text
sequence corresponding to the target text information by using the
logical similarity to obtain a text encoding result may
include:
[0197] determining a weight value corresponding to the target text
sequence according to the logical similarity, the weight value
being used for representing a relationship between elements in the
target text sequence;
[0198] determining a target output vector according to the weight
value corresponding to the target text sequence; and
[0199] encoding the target text sequence corresponding to target
text information by using the target output vector to obtain the
text encoding result.
[0200] In this embodiment, after the logical similarity is
obtained, the target text sequence corresponding to the target text
information may be encoded by using the logical similarity to
obtain the text encoding result. Specifically, the weight value
corresponding to the target text sequence is first determined
according to the logical similarity. The weight value is used for
representing a relationship between elements in the target text
sequence. That is, the weight value a of each key-value pair may be
calculated by using the following formula:
.alpha.=softmax(e),
[0201] according to the obtained weight value .alpha., since an
output vector of a current element is obtained by the weighted
summation of all values, during actual calculation, the dot product
of the weight and the value needs to be calculated, that is,
O=.alpha.V,
[0202] where O represents the target output vector, and V
represents the original value vector. An output vector needs to be
calculated for each layer of network until the network
representation of each element is encoded.
[0203] Next, in this embodiment of this disclosure, how to encode
the target text sequence corresponding to the target text
information by using the logical similarity to obtain a text
encoding result is described. First, the weight value corresponding
to the target text sequence is determined according to the logical
similarity. The target output vector is then determined according
to the weight value corresponding to the target text sequence, and
the target text sequence corresponding to the target text
information is finally encoded by using the target output vector to
obtain the text encoding result. In the foregoing manner, in the
process of encoding text information, the output vector containing
the context vector is used to strengthen the local information of
the discrete sequence. This implementation improves the quality of
model learning and implements better application to different
products.
[0204] Based on the embodiment corresponding to FIG. 7, in an
exemplary embodiment of the information processing method provided
in this embodiment of this disclosure, after the obtaining a target
text sequence corresponding to to-be-processed text information,
the method may further include dividing the target text sequence
into X text subsequences, X being an integer greater than 1. The
step of determining a target query vector and a target key vector
according to the context vector and the target text sequence may
include generating X query vectors and X key vectors according to
the context vector and the X text subsequences, each text
subsequence corresponding to one query vector and one key vector.
The step of determining the logical similarity according to the
target query vector and the target key vector may include
calculating the each text subsequence and a query vector and a key
vector that correspond to the each text subsequence, to obtain X
sub-logical similarities. The step of encoding the target text
sequence by using the logical similarity to obtain a text encoding
result may include: determining a sub-weight value corresponding to
the each text subsequence according to each sub-logical similarity,
the sub-weight value being used for representing a relationship
between elements in the text subsequence; determining a sub-output
vector according to the sub-weight value corresponding to the each
text subsequence; generating a target output vector according to
the sub-output vector corresponding to the each text subsequence;
and encoding the target text sequence by using the target output
vector to obtain the text encoding result.
[0205] In this embodiment, a method for encoding a target text
sequence by using a stacked multi-head self-attention network is
described. For ease of understanding, FIG. 11 is a schematic
structural diagram of a stacked multi-head self-attention network
in this embodiment of this disclosure. First, the target text
sequence is divided into X text subsequences (that is, X heads are
obtained). Assuming that X is 4, the entire target text sequence is
divided into 4 text subsequences. A corresponding query vector and
key vector are then generated for each text subsequence. For the
text subsequence corresponding to each head, different parameter
matrices are used to calculate the query vector and the key vector
to obtain different eigenvectors, so that different heads may focus
on different local information. Finally, the outputted vector
features of the heads are integrated through linear transformation
and transferred to the next layer.
[0206] Next, the each text subsequence and a query vector and a key
vector that correspond to the each text subsequence are
sequentially calculated to obtain X sub-logical similarities. A
sub-weight value corresponding to the each text subsequence is then
determined according to each sub-logical similarity. A sub-output
vector is determined according to the sub-weight value
corresponding to the each text subsequence, and a target output
vector is generated according to the sub-output vector
corresponding to the each text subsequence. The target text
sequence corresponding to the target text information is encoded by
using the target output vector. The process is repeated many times
until the encoding is completed for the network representation and
the text encoding result is obtained.
[0207] In the stacked multi-head self-attention network, the query
(Q), key (K), and value (V) first undergo a linear transformation,
and are then inputted into the scaled dot product. This process
needs to be performed .beta. times. That is, for the "multi-head,"
calculation is performed once for one head. Moreover, the parameter
matrix for linear transformation of Q, K, and V is different each
time. The results of the .beta. times of scaling dot products are
concatenated, and a value obtained after a linear transformation is
performed once is then used as the result of multi-head attention.
The benefit of this is that the model is allowed to learn relevant
information in different representation subspaces and that the
information is verified later based on the attention
visualization.
[0208] Multi-head attention is used to connect the encoder to the
decoder. K, V, and Q are the layer outputs of the encoder (K=V
herein) and the input of the multi-head attention in the decoder.
Decoder and encoder attention is used to perform translation
alignment. Multi-head self-attention is used in both the encoder
and the decoder to learn the representation of the text.
Self-attention is K=V=Q. For example, if one sentence is inputted,
attention calculation needs to be performed on every word in the
sentence and all words in the sentence. The purpose is to learn the
word dependence within the sentence and capture the internal
structure of the sentence.
[0209] Next, in this embodiment of this disclosure, a method using
a multi-head attention mechanism is proposed to implement encoding.
That is, a target text sequence is first divided into X text
subsequences, X being an integer greater than 1. X query vectors
and X key vectors are then generated according to the context
vector and the X text subsequences. Each text subsequence and a
query vector and a key vector that correspond to the each text
subsequence are calculated to obtain X sub-logical similarities.
Finally, a sub-weight value corresponding to the each text
subsequence is determined according to each sub-logical similarity.
A sub-output vector is determined according to the sub-weight value
corresponding to the each text subsequence, and a target output
vector is generated according to the sub-output vector
corresponding to the each text subsequence. The target text
sequence corresponding to the target text information is encoded by
using the target output vector to obtain a text encoding result. In
the foregoing manner, the entire network uses residual connections
and normalizes the layers, so that the deep network can be better
optimized, and the training speed is faster than that of the
mainstream model.
[0210] For ease of description, machine translation is used as an
example. Long sentences used for testing are divided into 10
groups, and the length of the sentences and the long sentence
bilingual evaluation understudy (BLEU) scores are evaluated. FIG.
12 is a schematic diagram of a comparison of translation using a
SAN model in an application scenario of this disclosure. As shown
in the figure, the abscissa in FIG. 12 represents the sentence
length, and the ordinate represents the BLEU difference between the
SAN model enhanced by a context vector and a baseline model. It can
be seen that the translation quality of the SAN model enhanced by
the context vector is significantly better than the relevant model
on different sentence lengths. Complicated syntax and deep
semantics are involved in longer sentences (such as sentences with
more than 20 words), so it is more necessary to depend on element
relationships.
[0211] Table 1 shows the effect of the network model provided in
this disclosure on a machine translation system.
TABLE-US-00001 TABLE 1 Computing resources Translation effect
Quantity of Training Model BLEU .DELTA. parameters speed Baseline
Relevant 27.64 -- 88.0M 1.28 model Embodiments Global 28.26 +0.62
91.0M 1.26 of this Depth 28.31 +0.67 95.9M 1.18 disclosure Depth-
28.45 +0.81 99.0M 1.25 global
[0212] Generally, when the BLEU score increases by more than 0.5
points, a significant increase is indicated. .DELTA. is the
absolute value of the increase. The unit of the quantity of
parameters is million (M), and the unit of training speed is the
quantity of iterations per second. Therefore, as shown in Table 1,
the methods proposed in this disclosure significantly improves the
translation quality. In particular, the proposed methods have
better performance in the translation of longer sentences.
[0213] As shown in FIG. 13, an embodiment of this disclosure
further provides another information processing method, applied to
a computer device. The method includes the following steps:
[0214] Step 210: Obtain a text encoding result.
[0215] Step 220: Obtain a target context vector according to the
text encoding result.
[0216] Step 230: Determine a logical similarity corresponding to
the text encoding result according to the target context vector and
the text encoding result.
[0217] Step 240: Decode the text encoding result by using the
logical similarity corresponding to the text encoding result to
obtain a text decoding result.
[0218] For descriptions of step 220 to step 240, reference may be
made to the embodiment in FIG. 7, and details are not described
herein again.
[0219] In summary, in the technical solution provided in this
embodiment of this disclosure, a text encoding result is obtained,
a target context vector is obtained according to the text encoding
result, a logical similarity corresponding to the text encoding
result is determined according to the target context vector and the
text encoding result, and the text encoding result is decoded by
using the logical similarity corresponding to the text encoding
result to obtain a text decoding result. In the foregoing manner,
the dependence between elements in a text encoding result is
strengthened, and network representations between different words
can be flexibly learned by using context information. Thereby, this
implementation enhances the performance of a neural network model
and improves the learning capability of the model.
[0220] In an embodiment, the obtaining a context vector according
to the text encoding result includes:
[0221] obtaining a vector of each element in the text encoding
result; and
[0222] calculating an average value of the text encoding result
according to the vector of the each element in the text encoding
result, the average value being used for representing the context
vector.
[0223] In an embodiment, the obtaining a context vector according
to the text encoding result includes:
[0224] obtaining L layers of text sequences generated before the
text encoding result, L being an integer greater than or equal to
1; and
[0225] generating the context vector according to the L layers of
text sequences.
[0226] In an embodiment, the obtaining a context vector according
to the text encoding result includes:
[0227] obtaining L layers of text sequences corresponding to the
text encoding result, the L layers of text sequences being network
layers generated before the text encoding result, L being an
integer greater than or equal to 1;
[0228] obtaining L layers of first context vectors according to the
L layers of text sequences, each layer of first context vector
being an average value of elements in each layer of text
sequence;
[0229] obtaining a second context vector according to the text
encoding result, the second context vector being an average value
of elements in the text encoding result; and
[0230] calculating the context vector according to the L layers of
first context vectors and the second context vector.
[0231] In an embodiment, the determining a logical similarity
corresponding to the text encoding result according to the context
vector and the text encoding result includes:
[0232] determining a target query vector and a target key vector
according to the context vector and the text encoding result, the
target query vector corresponding to the text encoding result, the
target key vector corresponding to the text encoding result;
and
[0233] determining the logical similarity according to the target
query vector and the target key vector.
[0234] In an embodiment, determining a target query vector and a
target key vector according to the context vector and the text
encoding result includes:
[0235] calculating an original query vector, an original key
vector, and an original value vector according to the text encoding
result;
[0236] calculating a query vector scalar and a key vector scalar
according to the context vector, the original query vector, and the
original key vector; and
[0237] calculating the target query vector and the target key
vector according to the context vector, the query vector scalar,
and the key vector scalar.
[0238] In an embodiment, calculating an original query vector, an
original key vector, and an original value vector according to the
text encoding result includes:
[0239] calculating the original query vector, the original key
vector, and the original value vector in the following manner:
[ Q K V ] = H [ W Q W K W V ] , ##EQU00005##
[0240] where Q represents the original query vector, K represents
the original key vector, V represents the original value vector, H
represents the text encoding result, W.sub.Q represents a first
parameter matrix, W.sub.K represents a second parameter matrix,
W.sub.V represents a third parameter matrix, and the first
parameter matrix, the second parameter matrix, and the third
parameter matrix are pre-trained parameter matrices; and
[0241] the calculating a query vector scalar and a key vector
scalar according to the context vector, the original query vector,
and the original key vector includes:
[0242] calculating the query vector scalar and the key vector
scalar in the following manner:
[ .lamda. Q .lamda. K ] = .sigma. ( [ Q K ] [ V Q H V K H ] + C [ U
Q U K ] [ V Q C V K C ] ) , ##EQU00006##
[0243] where .lamda..sub.Q represents the query vector scalar,
.lamda..sub.K represents the key vector scalar,
.sigma.(.quadrature.) represents a sigmoid nonlinear change, C
represents the context vector, U.sub.Q represents a fourth
parameter matrix, U.sub.K represents a fifth parameter matrix, the
fourth parameter matrix and the fifth parameter matrix are
pre-trained parameter matrices, V.sub.Q.sup.H represents a first
linear transformation factor, V.sub.K.sup.H represents a second
linear transformation factor, V.sub.Q.sup.C represents a third
linear transformation factor, and V.sub.K.sup.C represents a fourth
linear transformation factor; and
[0244] the calculating the target query vector and the target key
vector according to the context vector, the query vector scalar,
and the key vector scalar includes:
[0245] calculating the target query vector and the target key
vector in the following manner:
[ Q ^ K ^ ] = ( 1 - [ .lamda. Q .lamda. K ] ) [ Q K ] + [ .lamda. Q
.lamda. K ] ( C [ U Q U K ] ) , ##EQU00007##
[0246] where {circumflex over (Q)} represents the target query
vector, and {circumflex over (K)} represents the target key
vector.
[0247] In an embodiment, determining the logical similarity
corresponding to the text encoding result according to the target
query vector and the target key vector includes:
[0248] calculating the logical similarity in the following
manner:
e = Q ^ K ^ T d , ##EQU00008##
[0249] where e represents the logical similarity, {circumflex over
(Q)} represents the target query vector, {circumflex over (K)}
represents the target key vector, {circumflex over (K)}.sup.T
represents the transpose of the target key vector, and d represents
the dimension of the hidden state vector of the model.
[0250] In an embodiment, decoding the text encoding result by using
the logical similarity to obtain a text decoding result
includes:
[0251] determining a weight value corresponding to the text
encoding result according to the logical similarity, the weight
value being used for representing a relationship between elements
in the text encoding result;
[0252] determining a target output vector according to the weight
value corresponding to the text encoding result; and
[0253] encoding the text encoding result by using the target output
vector to obtain the text encoding result.
[0254] In an embodiment, after obtaining the text encoding result
corresponding to the to-be-processed text information, the method
further includes:
[0255] dividing the text encoding result into X text subsequences,
X being an integer greater than 1;
[0256] the determining a target query vector and a target key
vector according to the context vector and the text encoding result
includes:
[0257] generating X query vectors and X key vectors according to
the context vector and the X text subsequences, each text
subsequence corresponding to one query vector and one key
vector;
[0258] the determining the logical similarity corresponding to the
text encoding result according to the target query vector and the
target key vector includes:
[0259] calculating the each text subsequence and a query vector and
a key vector that correspond to the each text subsequence, to
obtain X sub-logical similarities; and
[0260] the decoding the text encoding result by using the logical
similarity to obtain a text decoding result includes:
[0261] determining a sub-weight value corresponding to the each
text subsequence according to each sub-logical similarity, the
sub-weight value being used for representing a relationship between
elements in the text subsequence;
[0262] determining a sub-output vector according to the sub-weight
value corresponding to the each text subsequence;
[0263] generating a target output vector according to the
sub-output vector corresponding to the each text subsequence;
and
[0264] decoding the text encoding result by using the target output
vector to obtain the text decoding result.
[0265] For the foregoing descriptions, reference may be made to the
embodiment in FIG. 7, and details are not described herein
again.
[0266] A text translation apparatus in this disclosure is described
below in detail. FIG. 14 is a schematic diagram of an embodiment of
a text translation apparatus according to an embodiment of this
disclosure. The apparatus has the function of implementing the
foregoing method embodiment, and the function may be implemented by
hardware or may be implemented by hardware executing corresponding
software. In an embodiment, the text translation apparatus 30
includes:
[0267] an obtaining module 301, configured to obtain a target text
sequence corresponding to target text information, the target text
sequence including a plurality of elements;
[0268] the obtaining module 301 being further configured to obtain
a context vector according to the target text sequence;
[0269] a determination module 302, configured to determine a target
query vector and a target key vector according to the context
vector and the target text sequence that are obtained by the
obtaining module 301, the target query vector having a
correspondence with elements in the target text sequence, the
target key vector having a correspondence with elements in the
target text sequence;
[0270] the determination module 302 being further configured to
determine a logical similarity corresponding to the target text
sequence according to the target query vector and the target key
vector;
[0271] an encoding module 303, configured to encode the target text
sequence corresponding to the target text information by using the
logical similarity determined by the determination module 302 to
obtain a text encoding result; and
[0272] a decoding module 304, configured to decode the text
encoding result encoded by the encoding module 303 to obtain a text
translation result corresponding to the target text
information.
[0273] In this embodiment, the obtaining module 301 obtains a
target text sequence corresponding to target text information, the
target text sequence including a plurality of elements; the
obtaining module 301 obtains a context vector according to the
target text sequence; the determination module 302 determines a
target query vector and a target key vector according to the
context vector and the target text sequence that are obtained by
the obtaining module 301, the target query vector having a
correspondence with elements in the target text sequence, the
target key vector having a correspondence with elements in the
target text sequence; and the determination module 302 determines a
logical similarity corresponding to the target text sequence
according to the target query vector and the target key vector, the
encoding module 303 encodes the target text sequence corresponding
to the target text information by using the logical similarity
determined by the determination module 302 to obtain a text
encoding result, and the decoding module 304 decodes the text
encoding result encoded by the encoding module 303 to obtain a text
translation result corresponding to the target text
information.
[0274] In this embodiment of this disclosure, the text translation
apparatus is provided. First, a target text sequence corresponding
to target text information is obtained, the target text sequence
including a plurality of elements; a context vector is obtained
according to the target text sequence; a target query vector and a
target key vector are determined according to the context vector
and the target text sequence, the target query vector having a
correspondence with elements in the target text sequence, the
target key vector having a correspondence with elements in the
target text sequence; and a logical similarity corresponding to the
target text sequence is determined according to the target query
vector and the target key vector, the target text sequence
corresponding to the target text information is encoded by using
the logical similarity to obtain a text encoding result, and the
text encoding result is decoded to obtain a text translation result
corresponding to the target text information. In the foregoing
manner, a context vector related to a discrete sequence is used to
encode the discrete sequence, to strengthen the dependence between
elements in the discrete sequence, so that a network representation
between different words can be flexibly learned by using context
information, thereby improving the quality of machine
translation.
[0275] An information processing apparatus in this disclosure is
described below in detail. The information processing apparatus has
the function of implementing the foregoing method embodiment, and
the function may be implemented by hardware or may be implemented
by hardware executing corresponding software. The apparatus may be
a computer device or may be disposed in a computer device. In an
embodiment, as shown in FIG. 15, an information processing
apparatus 1500 includes:
[0276] an obtaining module 1510, configured to obtain a target text
sequence corresponding to to-be-processed text information;
[0277] the obtaining module 1510 being further configured to obtain
a context vector according to the target text sequence;
[0278] a determination module 1520, configured to determine a
logical similarity corresponding to the target text sequence
according to the context vector and the target text sequence that
are obtained by the obtaining module; and
[0279] an encoding module 1530, configured to encode the target
text sequence by using the logical similarity determined by the
determination module to obtain a text encoding result.
[0280] In summary, in the technical solution provided in this
embodiment of this disclosure, a target text sequence corresponding
to to-be-processed text information is obtained; a context vector
is obtained according to the target text sequence; a logical
similarity corresponding to the target text sequence is determined
according to the context vector and the target text sequence; and
the target text sequence is encoded by using the logical similarity
to obtain a text encoding result. In the foregoing manner, a
context vector related to a discrete sequence is used to encode the
discrete sequence, to strengthen the dependence between elements in
the discrete sequence, thereby enhancing the performance of a
neural network model and improving the learning capability of the
model.
[0281] In an exemplary embodiment, the obtaining module 1510 is
configured to:
[0282] obtain a vector of each element in the target text sequence;
and
[0283] calculate an average value of the target text sequence
according to the vector of the each element in the target text
sequence, the average value being used for representing the context
vector.
[0284] In an exemplary embodiment, the obtaining module 1510 is
configured to:
[0285] obtain L layers of text sequences generated before the
target text sequence, L being an integer greater than or equal to
1; and
[0286] generate the context vector according to the L layers of
text sequences.
[0287] In an exemplary embodiment, the obtaining module 1510 is
configured to:
[0288] obtain L layers of text sequences corresponding to the
target text sequence, the L layers of text sequences being network
layers generated before the target text sequence, L being an
integer greater than or equal to 1;
[0289] obtain L layers of first context vectors according to the L
layers of text sequences, each layer of first context vector being
an average value of elements in each layer of text sequence;
[0290] obtain a second context vector according to the target text
sequence, the second context vector being an average value of
elements in the target text sequence; and
[0291] calculate the context vector according to the L layers of
first context vectors and the second context vector.
[0292] In an exemplary embodiment, the determination module 1520 is
configured to:
[0293] determine a target query vector and a target key vector
according to the context vector and the target text sequence, the
target query vector corresponding to the target text sequence, the
target key vector corresponding to the target text sequence;
and
[0294] determine the logical similarity according to the target
query vector and the target key vector.
[0295] In an exemplary embodiment, the determination module 1520 is
configured to:
[0296] calculate an original query vector, an original key vector,
and an original value vector according to the target text
sequence;
[0297] calculate a query vector scalar and a key vector scalar
according to the context vector, the original query vector, and the
original key vector; and
[0298] calculate the target query vector and the target key vector
according to the context vector, the query vector scalar, and the
key vector scalar.
[0299] In an exemplary embodiment, the determination module 1520 is
configured to:
[0300] calculate the original query vector, the original key
vector, and the original value vector in the following manner:
[ Q K V ] = H [ W Q W K W V ] , ##EQU00009##
[0301] where Q represents the original query vector, K represents
the original key vector, V represents the original value vector, H
represents the target text sequence, W.sub.Q represents a first
parameter matrix, W.sub.K represents a second parameter matrix,
W.sub.V represents a third parameter matrix, and the first
parameter matrix, the second parameter matrix, and the third
parameter matrix are pre-trained parameter matrices;
[0302] the determination module 1520 is configured to:
[0303] calculate the query vector scalar and the key vector scalar
in the following manner:
[ .lamda. Q .lamda. K ] = .sigma. ( [ Q K ] [ V Q H V K H ] + C [ U
Q U K ] [ V Q C V K C ] ) , ##EQU00010##
[0304] where .lamda..sub.Q represents the query vector scalar,
.lamda..sub.K represents the key vector scalar,
.sigma.(.quadrature.) represents a sigmoid nonlinear change, C
represents the context vector, U.sub.Q represents a fourth
parameter matrix, U.sub.K represents a fifth parameter matrix, the
fourth parameter matrix and the fifth parameter matrix are
pre-trained parameter matrices, V.sub.Q.sup.H represents a first
linear transformation factor, V.sub.K.sup.H represents a second
linear transformation factor, V.sub.Q.sup.C represents a third
linear transformation factor, and V.sub.K.sup.C represents a fourth
linear transformation factor; and
[0305] the determination module 1520 is configured to:
[0306] calculate the target query vector and the target key vector
in the following manner:
[ Q ^ K ^ ] = ( 1 - [ .lamda. Q .lamda. K ] ) [ Q K ] + [ .lamda. Q
.lamda. K ] ( C [ U Q U K ] ) , ##EQU00011##
[0307] where {circumflex over (Q)} represents the target query
vector, and {circumflex over (K)} represents the target key
vector.
[0308] In an exemplary embodiment, the determination module is
configured to:
[0309] calculate the logical similarity in the following
manner:
e = Q ^ K ^ T d , ##EQU00012##
[0310] where e represents the logical similarity, {circumflex over
(Q)} represents the target query vector, {circumflex over (K)}
represents the target key vector, {circumflex over (K)}.sup.T
represents the transpose of the target key vector, and d represents
the dimension of the hidden state vector of the model.
[0311] The encoding module 1530 is configured to:
[0312] determine a weight value corresponding to the target text
sequence according to the logical similarity, the weight value
being used for representing a relationship between elements in the
target text sequence;
[0313] determine a target output vector according to the weight
value corresponding to the target text sequence; and
[0314] encode the target text sequence by using the target output
vector to obtain the text encoding result.
[0315] In an exemplary embodiment, the apparatus 1500 further
includes:
[0316] a division module (not shown in the figure), configured to
divide the target text sequence into X text subsequences, X being
an integer greater than 1;
[0317] the determination module 1520 is configured to:
[0318] generate X query vectors and X key vectors according to the
context vector and the X text subsequences, each text subsequence
corresponding to one query vector and one key vector;
[0319] the determining the logical similarity according to the
target query vector and the target key vector includes:
[0320] calculating the each text subsequence and a query vector and
a key vector that correspond to the each text subsequence, to
obtain X sub-logical similarities;
[0321] the encoding module 1530 is configured to:
[0322] determine a sub-weight value corresponding to the each text
subsequence according to each sub-logical similarity, the
sub-weight value being used for representing a relationship between
elements in the text subsequence;
[0323] determine a sub-output vector according to the sub-weight
value corresponding to the each text subsequence;
[0324] generate a target output vector according to the sub-output
vector corresponding to the each text subsequence; and
[0325] encode the target text sequence by using the target output
vector to obtain the text encoding result.
[0326] The information processing apparatus in this disclosure is
described below in detail. FIG. 16 is a schematic diagram of
another embodiment of an information processing apparatus according
to an embodiment of this disclosure. The apparatus has the function
of implementing the foregoing method embodiment, and the function
may be implemented by hardware or may be implemented by hardware
executing corresponding software. The apparatus may be a computer
device or may be disposed in a computer device. In an embodiment,
an information processing apparatus 40 includes:
[0327] an obtaining module 401, configured to acquire a target text
sequence corresponding to to-be-processed text information, the
target text sequence including a plurality of elements;
[0328] the obtaining module 401 being configured to obtain a
context vector according to the target text sequence;
[0329] a determination module 402, configured to determine a target
query vector and a target key vector according to the context
vector and the target text sequence that are obtained by the
obtaining module 401, the target query vector having a
correspondence with elements in the target text sequence, the
target key vector having a correspondence with elements in the
target text sequence;
[0330] the determination module 402 being further configured to
determine a logical similarity corresponding to the target text
sequence according to the target query vector and the target key
vector; and
[0331] an encoding module 403, configured to encode the target text
sequence corresponding to target text information by using the
logical similarity determined by the determination module 402 to
obtain a text encoding result.
[0332] In this embodiment, the obtaining module 401 obtains a
target text sequence corresponding to to-be-processed text
information, the target text sequence including a plurality of
elements; the obtaining module 401 obtains a context vector
according to the target text sequence; the determination module 402
determines a target query vector and a target key vector according
to the context vector and the target text sequence that are
obtained by the obtaining module 401, the target query vector
having a correspondence with elements in the target text sequence,
the target key vector having a correspondence with elements in the
target text sequence, and the determination module 402 determines a
logical similarity corresponding to the target text sequence
according to the target query vector and the target key vector; and
the encoding module 403 encodes the target text sequence
corresponding to the target text information by using the logical
similarity determined by the determination module 402 to obtain a
text encoding result.
[0333] In this embodiment of this disclosure, the information
processing apparatus is provided. First, a target text sequence
corresponding to to-be-processed text information is obtained, the
target text sequence including a plurality of elements; a context
vector is then obtained according to the target text sequence; a
target query vector and a target key vector are then determined
according to the context vector and the target text sequence, the
target query vector having a correspondence with elements in the
target text sequence, the target key vector having a correspondence
with elements in the target text sequence; and finally, a logical
similarity corresponding to the target text sequence is determined
according to the target query vector and the target key vector, and
the target text sequence corresponding to target text information
is encoded by using the logical similarity to obtain a text
encoding result. In the foregoing manner, a context vector related
to a discrete sequence is used to encode the discrete sequence, to
strengthen the dependence between elements in the discrete
sequence, thereby enhancing the performance of a neural network
model and improving the learning capability of the model.
[0334] Based on the embodiment corresponding to FIG. 16, in another
embodiment of the information processing apparatus 40 provided in
this embodiment of this disclosure,
[0335] the obtaining module 401 is specifically configured to:
obtain a vector of each element in the target text sequence;
and
[0336] calculate an average value of the target text sequence
according to the vector of the each element in the target text
sequence, the average value being used for representing the context
vector.
[0337] Second, in this embodiment of this disclosure, a method for
obtaining a context vector based on a global text sequence is
provided. That is, a vector of each element in the target text
sequence is obtained. An average value of the target text sequence
is calculated according to the vector of each element in the target
text sequence. The average value is represented as the context
vector. In the foregoing manner, the context vector may be obtained
through the entire text sequence, to provide a feasible manner of
implementing the solution, thereby improving the operability of the
solution.
[0338] Based on the embodiment corresponding to FIG. 16, in another
embodiment of the information processing apparatus 40 provided in
this embodiment of this disclosure,
[0339] the obtaining module 401 is specifically configured to:
obtain L layers of text sequences corresponding to the target text
sequence, the L layers of text sequences being network layers
generated before the target text sequence, L being an integer
greater than or equal to 1; and
[0340] generate the context vector according to the L layers of
text sequences.
[0341] Second, in this embodiment of this disclosure, a method for
obtaining a context vector based on a depth text sequence is
provided. That is, L layers of text sequences corresponding to the
target text sequence are first obtained, the L layers of text
sequences being network layers generated before the target text
sequence, L being an integer greater than or equal to 1; and the
context vector is then generated according to the L layers of text
sequences. In the foregoing manner, the context vector may be
obtained by using the plurality of depth text sequences, to provide
a feasible manner of implementing the solution, thereby improving
the operability of the solution.
[0342] Based on the embodiment corresponding to FIG. 16, in another
embodiment of the information processing apparatus 40 provided in
this embodiment of this disclosure,
[0343] the obtaining module 401 is specifically configured to:
obtain L layers of text sequences corresponding to the target text
sequence, the L layers of text sequences being network layers
generated before the target text sequence, L being an integer
greater than or equal to 1;
[0344] obtain L layers of first context vectors according to the L
layers of text sequences, each layer of first context vector being
an average value of elements in each layer of text sequence;
[0345] obtain a second context vector according to the target text
sequence, the second context vector being an average value of
elements in the target text sequence; and
[0346] calculate the context vector according to the L layers of
first context vectors and the second context vector.
[0347] Second, in this embodiment of this disclosure, a method for
obtaining a context vector based on depth and global text sequences
is provided. That is, L layers of first context vectors are first
obtained according to the L layers of text sequences, each layer of
first context vector being an average value of elements in each
layer of text sequence. The second context vector is then obtained
according to the target text sequence, the second context vector
being an average value of elements in the target text sequence.
Finally, the context vector is calculated according to the L layers
of first context vectors and the second context vector. In the
foregoing manner, the context vector may be obtained by using the
plurality of depth-global text sequences, to provide a feasible
manner of implementing the solution, thereby improving the
operability of the solution.
[0348] Based on the embodiment corresponding to FIG. 16, in another
embodiment of the information processing apparatus 40 provided in
this embodiment of this disclosure,
[0349] the determination module 402 is specifically configured to:
calculate an original query vector, an original key vector, and an
original value vector according to the target text sequence, the
original value vector being used for determining a target output
vector corresponding to the target text sequence;
[0350] calculate a query vector scalar and a key vector scalar
according to the context vector, the original query vector, and the
original key vector; and
[0351] calculate the target query vector and the target key vector
according to the context vector, the query vector scalar, and the
key vector scalar.
[0352] Next, in this embodiment of this disclosure, a manner of
determining the target query vector and the target key vector
according to the context vector and the target text sequence is
described. That is, the original query vector, the original key
vector, and the original value vector are first calculated
according to the target text sequence. The query vector scalar and
the key vector scalar are then calculated according to the context
vector, the original query vector, and the original key vector.
Finally, the target query vector and the target key vector are
calculated according to the context vector, the query vector
scalar, and the key vector scalar. In the foregoing manner, the
context vector is incorporated into the target query vector and the
target key vector, to enhance the feature representation of the
original query vector and the original key vector, thereby
strengthening the network representation of the entire text
sequence and improving the model learning performance.
[0353] Based on the embodiment corresponding to FIG. 16, in another
embodiment of the information processing apparatus 40 provided in
this embodiment of this disclosure,
[0354] the determination module 402 is specifically configured to
calculate the original query vector, the original key vector, and
the original value vector in the following manner:
[ Q K V ] = H [ W Q W K W V ] , ##EQU00013##
[0355] where Q represents the original query vector, K represents
the original key vector, V represents the original value vector, H
represents the target text sequence, W.sub.Q represents a first
parameter matrix, W.sub.K represents a second parameter matrix,
W.sub.V represents a third parameter matrix, and the first
parameter matrix, the second parameter matrix, and the third
parameter matrix are pre-trained parameter matrices;
[0356] calculate the query vector scalar and the key vector scalar
in the following manner:
[ .lamda. Q .lamda. K ] = .sigma. ( [ Q K ] [ V Q H V K H ] + C [ U
Q U K ] [ V Q C V K C ] ) , ##EQU00014##
[0357] where .lamda..sub.Q represents the query vector scalar,
.lamda..sub.K represents the key vector scalar,
.sigma.(.quadrature.) represents a sigmoid nonlinear change, C
represents the context vector, U.sub.Q represents a fourth
parameter matrix, U.sub.K represents a fifth parameter matrix, the
fourth parameter matrix and the fifth parameter matrix are
pre-trained parameter matrices, V.sub.Q.sup.H represents a first
linear transformation factor, V.sub.K.sup.H represents a second
linear transformation factor, V.sub.Q.sup.C represents a third
linear transformation factor, and V.sub.K.sup.C represents a fourth
linear transformation factor; and
[0358] calculate the target query vector and the target key vector
in the following manner:
[ Q ^ K ^ ] = ( 1 - [ .lamda. Q .lamda. K ] ) [ Q K ] + [ .lamda. Q
.lamda. K ] ( C [ U Q U K ] ) , ##EQU00015##
[0359] where {circumflex over (Q)} represents the target query
vector, and {circumflex over (K)} represents the target key
vector.
[0360] Next, in this embodiment of this disclosure, a specific
calculation manner is provided. The original query vector, the
original key vector, and the original value vector may be
calculated according to the target text sequence. The query vector
scalar and the key vector scalar are calculated according to the
context vector, the original query vector, and the original key
vector. The target query vector and the target key vector are
calculated according to the context vector, the query vector
scalar, and the key vector scalar. In the foregoing manner, a
specific operation manner is provided for implementing the
solution, and the calculation of the formula is used to clarify how
to obtain the parameters, thereby ensuring the feasibility and
operability of the solution.
[0361] Based on the embodiment corresponding to FIG. 16, in another
embodiment of the information processing apparatus 40 provided in
this embodiment of this disclosure,
[0362] the determination module 402 is specifically configured to
calculate the logical similarity in the following manner:
e = Q ^ K ^ T d , ##EQU00016##
[0363] where e represents the logical similarity, {circumflex over
(Q)} represents the target query vector, {circumflex over (K)}
represents the target key vector, {circumflex over (K)}.sup.T
represents the transpose of the target key vector, and d represents
the dimension of the hidden state vector of the model.
[0364] Next, in this embodiment of this disclosure, a manner of
calculating the logical similarity corresponding to the target text
sequence according to the target query vector and the target key
vector is provided. In the foregoing manner, a specific operation
manner is provided for implementing the solution, and the
calculation of the formula is used to clarify how to obtain the
parameters, thereby ensuring the feasibility and operability of the
solution.
[0365] Based on the embodiment corresponding to FIG. 16, in another
embodiment of the information processing apparatus 40 provided in
this embodiment of this disclosure,
[0366] the encoding module 403 is specifically configured to:
determine a weight value corresponding to the target text sequence
according to the logical similarity, the weight value being used
for representing a relationship between elements in the target text
sequence;
[0367] determine a target output vector according to the weight
value corresponding to the target text sequence; and
[0368] encode the target text sequence corresponding to target text
information by using the target output vector to obtain a text
encoding result.
[0369] Next, in this embodiment of this disclosure, how to encode
the target text sequence corresponding to the target text
information by using the logical similarity to obtain a text
encoding result is described. First, the weight value corresponding
to the target text sequence is determined according to the logical
similarity, the target output vector is then determined according
to the weight value corresponding to the target text sequence, and
the target text sequence corresponding to the target text
information is finally encoded by using the target output vector to
obtain the text encoding result. In the foregoing manner, in the
process of encoding text information, the output vector containing
the context vector is used to strengthen the local information of
the discrete sequence, improve the quality of model learning, and
implement better disclosure to different products.
[0370] Based on the embodiment corresponding to FIG. 16, referring
to FIG. 17, in another embodiment of the information processing
apparatus 40 provided in this embodiment of this disclosure, the
information processing apparatus 40 further includes a division
module 404;
[0371] the division module 404 is configured to: after the
obtaining module 401 obtains a target text sequence corresponding
to to-be-processed text information, divide the target text
sequence into X text subsequences, X being an integer greater than
1;
[0372] the determination module 402 is specifically configured to:
generate X query vectors and X key vectors according to the context
vector and the X text subsequences, each text subsequence
corresponding to one query vector and one key vector; and
[0373] calculate the each text subsequence and a query vector and a
key vector that correspond to the each text subsequence, to obtain
X sub-logical similarities; and
[0374] the encoding module 403 is specifically configured to:
determine a sub-weight value corresponding to the each text
subsequence according to each sub-logical similarity, the
sub-weight value being used for representing a relationship between
elements in the text sub sequence;
[0375] determine a sub-output vector according to the sub-weight
value corresponding to the each text subsequence;
[0376] generate a target output vector according to the sub-output
vector corresponding to the each text subsequence; and
[0377] encode the target text sequence corresponding to target text
information by using the target output vector to obtain a text
encoding result.
[0378] Next, in this embodiment of this disclosure, a method using
a multi-head attention mechanism is proposed to implement encoding.
That is, a target text sequence is first divided into X text
subsequences, X being an integer greater than 1. X query vectors
and X key vectors are then generated according to the context
vector and the X text subsequences. Each text subsequence and a
query vector and a key vector that correspond to the each text
subsequence are calculated to obtain X sub-logical similarities.
Finally, a sub-weight value corresponding to the each text
subsequence is determined according to each sub-logical similarity.
A sub-output vector is determined according to the sub-weight value
corresponding to the each text subsequence, and a target output
vector is generated according to the sub-output vector
corresponding to the each text subsequence. The target text
sequence corresponding to target text information is encoded by
using the target output vector to obtain a text encoding result. In
the foregoing manner, the entire network uses residual connections
and normalizes the layers, so that the deep network can be better
optimized, and the training speed is faster than that of the
mainstream model.
[0379] An embodiment of this disclosure further provides an
information processing apparatus. The information processing
apparatus has the function of implementing the foregoing method
embodiment, and the function may be implemented by hardware or may
be implemented by hardware executing corresponding software. The
apparatus may be a computer device or may be disposed in a computer
device. In an embodiment, as shown in FIG. 18, an information
processing apparatus 1800 includes:
[0380] an obtaining module 1810, configured to obtain a text
encoding result;
[0381] the obtaining module 1810 being further configured to obtain
a target context vector according to the text encoding result;
[0382] a determination module 1820, configured to determine a
logical similarity corresponding to the text encoding result
according to the target context vector and the text encoding
result; and
[0383] a decoding module 1830, configured to decode the text
encoding result by using the logical similarity corresponding to
the text encoding result to obtain a text decoding result.
[0384] In summary, in the technical solution provided in this
embodiment of this disclosure, a text encoding result is obtained,
a target context vector is obtained according to the text encoding
result, a logical similarity corresponding to the text encoding
result is determined according to the target context vector and the
text encoding result, and the text encoding result is decoded by
using the logical similarity corresponding to the text encoding
result to obtain a text decoding result. In the foregoing manner,
the dependence between elements in a text encoding result is
strengthened, and network representations between different words
can be flexibly learned by using context information, thereby
enhancing the performance of a neural network model and improving
the learning capability of the model.
[0385] In an exemplary embodiment, the obtaining module 1810 is
configured to:
[0386] obtain a vector of each element in the text encoding result;
and
[0387] calculate an average value of the text encoding result
according to the vector of the each element in the text encoding
result, the average value being used for representing the context
vector.
[0388] In an exemplary embodiment, the obtaining module 1810 is
configured to:
[0389] obtain L layers of text sequences generated before the text
encoding result, L being an integer greater than or equal to 1;
and
[0390] generate the context vector according to the L layers of
text sequences.
[0391] In an exemplary embodiment, the obtaining module 1810 is
configured to:
[0392] obtain L layers of text sequences corresponding to the text
encoding result, the L layers of text sequences being network
layers generated before the text encoding result, L being an
integer greater than or equal to 1;
[0393] obtain L layers of first context vectors according to the L
layers of text sequences, each layer of first context vector being
an average value of elements in each layer of text sequence;
[0394] obtain a second context vector according to the text
encoding result, the second context vector being an average value
of elements in the text encoding result; and
[0395] calculate the context vector according to the L layers of
first context vectors and the second context vector.
[0396] In an exemplary embodiment, the determination module 1820 is
configured to:
[0397] determine a target query vector and a target key vector
according to the context vector and the text encoding result, the
target query vector corresponding to the text encoding result, the
target key vector corresponding to the text encoding result;
and
[0398] determine the logical similarity according to the target
query vector and the target key vector.
[0399] In an exemplary embodiment, the determination module 1820 is
configured to:
[0400] calculate an original query vector, an original key vector,
and an original value vector according to the text encoding
result;
[0401] calculate a query vector scalar and a key vector scalar
according to the context vector, the original query vector, and the
original key vector; and
[0402] calculate the target query vector and the target key vector
according to the context vector, the query vector scalar, and the
key vector scalar.
[0403] In an exemplary embodiment, the determination module 1820 is
configured to:
[0404] calculate the original query vector, the original key
vector, and the original value vector in the following manner:
[ Q K V ] = H [ W Q W K W V ] , ##EQU00017##
[0405] where Q represents the original query vector, K represents
the original key vector, V represents the original value vector, H
represents the text encoding result, W.sub.Q represents a first
parameter matrix, W.sub.K represents a second parameter matrix,
W.sub.V represents a third parameter matrix, and the first
parameter matrix, the second parameter matrix, and the third
parameter matrix are pre-trained parameter matrices; and
[0406] the determination module 1820 is configured to:
[0407] calculate the query vector scalar and the key vector scalar
in the following manner:
[ .lamda. Q .lamda. K ] = .sigma. ( [ Q K ] [ V Q H V K H ] + C [ U
Q U K ] [ V Q C V K C ] ) , ##EQU00018##
[0408] where .lamda..sub.Q represents the query vector scalar,
.lamda..sub.K represents the key vector scalar,
.sigma.(.quadrature.) represents a sigmoid nonlinear change, C
represents the context vector, U.sub.Q represents a fourth
parameter matrix, U.sub.K represents a fifth parameter matrix, the
fourth parameter matrix and the fifth parameter matrix are
pre-trained parameter matrices, V.sub.Q.sup.H represents a first
linear transformation factor, V.sub.K.sup.H represents a second
linear transformation factor, V.sub.Q.sup.C represents a third
linear transformation factor, and V.sub.K.sup.C represents a fourth
linear transformation factor; and
[0409] the determination module 1820 is configured to:
[0410] calculate the target query vector and the target key vector
in the following manner:
[ Q ^ K ^ ] = ( 1 - [ .lamda. Q .lamda. K ] ) [ Q K ] + [ .lamda. Q
.lamda. K ] ( C [ U Q U K ] ) , ##EQU00019##
[0411] where {circumflex over (Q)} represents the target query
vector, and K represents the target key vector.
[0412] In an exemplary embodiment, the determination module is
configured to:
[0413] calculate the logical similarity in the following
manner:
e = Q ^ K ^ T d , ##EQU00020##
[0414] where e represents the logical similarity, {circumflex over
(Q)} represents the target query vector, {circumflex over (K)}
represents the target key vector, {circumflex over (K)}.sup.T
represents the transpose of the target key vector, and d represents
the dimension of the hidden state vector of the model.
[0415] In an exemplary embodiment, the decoding module 1830 is
configured to:
[0416] determine a weight value corresponding to the text encoding
result according to the logical similarity, the weight value being
used for representing a relationship between elements in the text
encoding result;
[0417] determine a target output vector according to the weight
value corresponding to the text encoding result; and
[0418] encode the text encoding result by using the target output
vector to obtain the text encoding result.
[0419] In an exemplary embodiment, the apparatus 1800 further
includes:
[0420] a division module (not shown in the figure), configured to
divide the text encoding result into X text subsequences, X being
an integer greater than 1;
[0421] the determination module 1820 is configured to:
[0422] generate X query vectors and X key vectors according to the
context vector and the X text subsequences, each text subsequence
corresponding to one query vector and one key vector;
[0423] the determining the logical similarity corresponding to the
text encoding result according to the target query vector and the
target key vector includes:
[0424] calculating the each text subsequence and a query vector and
a key vector that correspond to the each text subsequence, to
obtain X sub-logical similarities;
[0425] the decoding module 1830 is configured to:
[0426] determine a sub-weight value corresponding to the each text
subsequence according to each sub-logical similarity, the
sub-weight value being used for representing a relationship between
elements in the text subsequence;
[0427] determine a sub-output vector according to the sub-weight
value corresponding to the each text subsequence;
[0428] generate a target output vector according to the sub-output
vector corresponding to the each text subsequence; and
[0429] decode the text encoding result by using the target output
vector to obtain the text decoding result.
[0430] The term "module" may refer to a software module, a hardware
module, or a combination thereof. A software module may include a
computer program or part of the computer program that has a
predefined function and works together with other related parts to
achieve a predefined goal, such as those functions described in
this disclosure. A hardware module may be implemented using
processing circuitry and/or memory configured to perform the
functions described in this disclosure. Each module can be
implemented using one or more processors (or processors and
memory). Likewise, a processor (or processors and memory) can be
used to implement one or more modules. Moreover, each module can be
part of an overall module that includes the functionalities of the
module.
[0431] An embodiment of the present disclosure further provides
another terminal device. As shown in FIG. 19, and for ease of
description, only parts related to the embodiment of the present
disclosure are shown. For specific technical details that are not
disclosed, please refer to the method part of the embodiments of
the present disclosure. The terminal device may be any terminal
device including a mobile phone, a tablet computer, a personal
digital assistant (PDA), a point of sales (POS), and an on-board
computer, and the terminal device being a mobile phone is used as
an example.
[0432] FIG. 19 is a block diagram of the structure of a part of a
mobile phone related to a terminal device according to an
embodiment of the present disclosure. Referring to FIG. 19, the
mobile phone includes components such as a radio frequency (RF)
circuit 510, a memory 520, an input unit 530, a display unit 540, a
sensor 550, an audio circuit 560, a wireless fidelity (Wi-Fi)
module 570, a processor 580, and a power supply 590. A person
skilled in the art may understand that the structure of the mobile
phone shown in FIG. 19 does not constitute a limitation on the
mobile phone, and the mobile phone may include more components or
fewer components than those shown in the figure, or some components
may be combined, or a different component deployment may be
used.
[0433] The following makes a specific description of components of
the mobile phone with reference to FIG. 19.
[0434] The RF circuit 510 may be configured to receive and transmit
signals during an information receiving and transmitting process or
a call process. Specifically, the RF circuit receives downlink
information from a base station, then delivers the downlink
information to the processor 580 for processing, and transmits
designed uplink data to the base station. Usually, the RF circuit
510 includes, but is not limited to, an antenna, at least one
amplifier, a transceiver, a coupler, a low noise amplifier (LNA),
and a duplexer. In addition, the RF circuit 510 may also
communicate with a network and another device through wireless
communication. The wireless communication may use any communication
standard or protocol, including but not limited to Global System
for Mobile Communication (GSM), general packet radio service
(GPRS), Code Division Multiple Access (CDMA), Wideband Code
Division Multiple Access (WCDMA), Long Term Evolution (LTE), email,
Short Messaging Service (SMS), and the like.
[0435] The memory 520 may be configured to store a software program
and module. The processor 580 runs the software program and module
stored in the memory 520, to implement various functional
disclosures and data processing of the mobile phone. The memory 520
may mainly include a program storage area and a data storage area.
The program storage area may store an operating system, an
disclosure program required by at least one function (such as a
sound playback function and an image display function), and the
like. The data storage area may store data (such as audio data and
an address book) created according to the use of the mobile phone,
and the like. In addition, the memory 520 may include a high speed
random access memory, and may also include a non-volatile memory,
such as at least one magnetic disk storage device, a flash memory,
or another volatile solid-state storage device.
[0436] The input unit 530 may be configured to receive input digit
or character information, and generate a keyboard signal input
related to the user setting and function control of the mobile
phone. Specifically, the input unit 530 may include a touch panel
531 and another input device 532. The touch panel 531, which may
also be referred to as a touch screen, may collect a touch
operation of a user on or near the touch panel (such as an
operation of a user on the touch panel 531 or near the touch panel
531 by using any suitable object or accessory such as a finger or a
stylus), and drive a corresponding connection apparatus according
to a preset program. In an embodiment, the touch panel 531 may
include two parts: a touch detection apparatus and a touch
controller. The touch detection apparatus detects a touch position
of the user, detects a signal generated by the touch operation, and
transfers the signal to the touch controller. The touch controller
receives the touch information from the touch detection apparatus,
converts the touch information into touch point coordinates, and
transmits the touch point coordinates to the processor 580.
Moreover, the touch controller can receive and execute a command
transmitted from the processor 580. In addition, the touch panel
531 may be implemented by using various types, such as a resistive
type, a capacitive type, an infrared type, and a surface acoustic
wave type. In addition to the touch panel 531, the input unit 530
may further include the another input device 532. Specifically, the
another input device 532 may include, but is not limited to, one or
more of a physical keyboard, a functional key (such as a volume
control key or a switch key), a track ball, a mouse, and a
joystick.
[0437] The display unit 540 may be configured to display
information inputted by the user or information provided for the
user, and various menus of the mobile phone. The display unit 540
may include a display panel 541. In an embodiment, the display
panel 541 may be configured by using a liquid crystal display
(LCD), an organic light-emitting diode (OLED), or the like.
Further, the touch panel 531 may cover the display panel 541. After
detecting a touch operation on or near the touch panel 531, the
touch panel transfers the touch operation to the processor 580, to
determine a type of a touch event. Then, the processor 580 provides
a corresponding visual output on the display panel 541 according to
the type of the touch event. Although in FIG. 19, the touch panel
531 and the display panel 541 are used as two separate parts to
implement input and output functions of the mobile phone, in some
embodiments, the touch panel 531 and the display panel 541 may be
integrated to implement the input and output functions of the
mobile phone.
[0438] The mobile phone may further include at least one sensor 550
such as an optical sensor, a motion sensor, and other sensors.
Specifically, the optical sensor may include an ambient light
sensor and a proximity sensor. The ambient light sensor may adjust
luminance of the display panel 541 according to brightness of the
ambient light. The proximity sensor may switch off the display
panel 541 and/or backlight when the mobile phone is moved to the
ear. As one type of motion sensor, an acceleration sensor can
detect magnitude of accelerations in various directions (generally
on three axes), may detect magnitude and a direction of the gravity
when static, and may be applied to an disclosure that recognizes
the attitude of the mobile phone (for example, switching between
landscape orientation and portrait orientation, a related game, and
magnetometer attitude calibration), a function related to vibration
recognition (such as a pedometer and a knock), and the like. Other
sensors, such as a gyroscope, a barometer, a hygrometer, a
thermometer, and an infrared sensor, which may be configured in the
mobile phone, are not further described herein.
[0439] The audio circuit 560, a speaker 561, and a microphone 562
may provide audio interfaces between a user and the mobile phone.
The audio circuit 560 may convert received audio data into an
electrical signal and transmit the electrical signal to the speaker
561. The speaker 561 converts the electrical signal into a sound
signal for output. On the other hand, the microphone 562 converts a
collected sound signal into an electrical signal. The audio circuit
560 receives the electrical signal, converts the electrical signal
into audio data, and outputs the audio data to the processor 580
for processing. Then, the processor transmits the audio data to,
for example, another mobile phone by using the RF circuit 510, or
outputs the audio data to the memory 520 for further
processing.
[0440] Wi-Fi is a short distance wireless transmission technology.
The mobile phone may help, by using the Wi-Fi module 570, a user
receive and transmit an email, browse a web page, access stream
media, and the like. This provides wireless broadband Internet
access for the user. Although FIG. 19 shows the Wi-Fi module 570,
it may be understood that the Wi-Fi module is not a necessary
component of the mobile phone, and the Wi-Fi module may be omitted
as required provided that the scope of the essence of the present
disclosure is not changed.
[0441] The processor 580 is a control center of the mobile phone,
and is connected to various parts of the entire mobile phone by
using various interfaces and lines. By running or executing a
software program and/or module stored in the memory 520, and
invoking data stored in the memory 520, the processor executes
various functions of the mobile phone and performs data processing,
thereby monitoring the entire mobile phone. In an embodiment, the
processor 580 may include one or more processing units. In an
embodiment, the processor 580 may integrate an application
processor and a modem. The disclosure processor mainly processes an
operating system, a user interface, an disclosure program, and the
like. The modem mainly processes wireless communication. It may be
understood that the foregoing modem may either not be integrated
into the processor 580.
[0442] The mobile phone further includes the power supply 590 (such
as a battery) for supplying power to the components. In an
embodiment, the power supply may be logically connected to the
processor 580 by using a power management system, thereby
implementing functions such as charging, discharging, and power
consumption management by using the power management system.
[0443] Although not shown in the figure, the mobile phone may
further include a camera, a Bluetooth module, and the like, which
are not further described herein.
[0444] In this embodiment of the present disclosure, the processor
580 included in the terminal also has the functions of implementing
the foregoing method embodiments.
[0445] FIG. 20 is a schematic structural diagram of a server
according to an embodiment of this disclosure. The server 600 may
vary greatly due to different configurations or performance, and
may include one or more central processing units (CPUs) 622 (for
example, one or more processors) and a memory 632, and one or more
storage media 630 (for example, one or more mass storage devices)
that store disclosure programs 642 or data 644. The memory 632 and
the storage medium 630 may be transient or persistent storages. A
program stored in the storage medium 630 may include one or more
modules (which are not marked in the figure), and each module may
include a series of instruction operations on the server. Further,
the CPU 622 may be set to communicate with the storage medium 630,
and perform, on the server 600, the series of instruction
operations in the storage medium 630.
[0446] The server 600 may further include one or more power
supplies 626, one or more wired or wireless network interfaces 650,
one or more input/output (I/O) interfaces 658, and/or one or more
operating systems 641 such as Windows Server.TM., Mac OS X.TM.,
Unix.TM., Linux.TM., or FreeBSD.TM..
[0447] The steps performed by the server in the foregoing
embodiment may be based on the server structure shown in FIG.
20.
[0448] In this embodiment of this disclosure, the CPU 622 included
in the server has the function of implementing the foregoing method
embodiment.
[0449] Persons skilled in the art may clearly understand that, for
the purpose of convenient and brief description, for a detailed
working process of the system, apparatus, and unit described above,
refer to a corresponding process in the method embodiments, and
details are not described herein again.
[0450] In the embodiments provided in this disclosure, it is to be
understood that the disclosed system, apparatus, and method may be
implemented in other manners. For example, the described apparatus
embodiment is merely an example. For example, the unit division is
merely logical function division and may be other division during
actual implementation. For example, a plurality of units or
components may be combined or integrated into another system, or
some features may be ignored or not performed. In addition, the
displayed or discussed mutual couplings or direct couplings or
communication connections may be implemented by using some
interfaces. The indirect couplings or communication connections
between the apparatuses or units may be implemented in electronic,
mechanical, or other forms.
[0451] The units described as separate components may or may not be
physically separated, and the components displayed as units may or
may not be physical units, and may be located in one place or may
be distributed over multiple network units. Some or all of the
units may be selected according to actual needs to achieve the
objectives of the solutions of the embodiments.
[0452] In addition, functional units in the embodiments of this
disclosure may be integrated into one processing unit, or each of
the units may be physically separated, or two or more units may be
integrated into one unit. The integrated unit may be implemented in
the form of hardware, or may be implemented in a form of a software
functional unit.
[0453] When the integrated unit is implemented in the form of a
software functional unit and sold or used as an independent
product, the integrated unit may be stored in a computer-readable
storage medium. Based on such an understanding, the technical
solutions of this disclosure essentially, or the part contributing
to the prior art, or all or some of the technical solutions may be
implemented in the form of a software product. The computer
software product is stored in a storage medium and includes several
instructions for instructing a computer device (which may be a PC,
a server, or a network device) to perform all or some of the steps
of the methods described in the embodiments of this disclosure. The
foregoing storage medium includes: any medium that can store
program code, such as a USB flash drive, a removable hard disk, a
read-only memory (ROM), a random access memory (RAM), a magnetic
disk, or an optical disc.
[0454] The foregoing embodiments are merely intended for describing
the technical solutions of this disclosure, but not for limiting
this disclosure. Although this disclosure is described in detail
with reference to the foregoing embodiments, persons of ordinary
skill in the art are to understand that they may still make
modifications to the technical solutions described in the foregoing
embodiments or make equivalent replacements to some technical
features thereof, without departing from the spirit and scope of
the technical solutions of the embodiments of this disclosure.
* * * * *