FIELD: physics.
SUBSTANCE: invention relates to the field of machine learning models implementing the synthesis of video clips based on text descriptions. A method of generating a video clip from a text description includes steps of: receiving a text description of the video clip to be generated; obtaining a vector representation e of the text description according to the vector space of the pre-trained neural network model of linking images and text descriptions based on the received text description of the video clip; vector representation 2×N×L sequence of key points of the generated video clip, which is synthesized by the trained diffusion motion model based on the vector representation e of the text description, where 2 is the number of coordinates, the first coordinate indicates the height H of the frame, and the second coordinate indicates the width W of the frame, N is the number of key points on each frame, and L is the number of frames in the video; displaying a vector representation of a sequence of key points of the generated video clip into a row of L two-dimensional images of key points of the generated video clip, wherein each two-dimensional image of key points from said row corresponds to a corresponding frame of the generated video clip, and generating a sequence of video clip frames using a pretrained stable diffusion model, wherein generation of each frame of the video clip by the stable diffusion model is additionally controlled by the controlling neural network model based on the two-dimensional image of key points of the corresponding frame from said row.
EFFECT: possibility of synthesizing a high-quality video clip from a text description without using reference video clips.
18 cl, 9 dwg
Title | Year | Author | Number |
---|---|---|---|
METHOD AND SYSTEM FOR TRAINING CHATBOT SYSTEM | 2023 |
|
RU2820264C1 |
METHOD AND SYSTEM FOR RECOGNIZING USER'S SPEECH FRAGMENT | 2021 |
|
RU2808582C2 |
METHOD AND SYSTEM FOR SEARCHING GRAPHIC IMAGES | 2022 |
|
RU2807639C1 |
METHOD FOR CONSTRUCTING A DEPTH MAP FROM A PAIR OF IMAGES | 2022 |
|
RU2806009C2 |
NEURAL NETWORKS WITH ATTENTION-BASED SEQUENCE TRANSFORMATION | 2018 |
|
RU2749945C1 |
METHOD AND SERVER FOR PERFORMING PROBLEM-ORIENTED TRANSLATION | 2021 |
|
RU2820953C2 |
METHOD AND SYSTEM FOR DETERMINING SYNTHETICALLY MODIFIED FACE IMAGES ON VIDEO | 2021 |
|
RU2768797C1 |
METHOD OF SYNTHESIS OF A TWO-DIMENSIONAL IMAGE OF A SCENE VIEWED FROM A REQUIRED VIEW POINT AND ELECTRONIC COMPUTING APPARATUS FOR IMPLEMENTATION THEREOF | 2020 |
|
RU2749749C1 |
METHOD FOR GENERATING MATHEMATICAL MODELS OF A PATIENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES | 2017 |
|
RU2720363C2 |
NEURAL NETWORK TRANSFER OF THE FACIAL EXPRESSION AND POSITION OF THE HEAD USING HIDDEN POSITION DESCRIPTORS | 2020 |
|
RU2755396C1 |
Authors
Dates
2024-07-22—Published
2024-01-18—Filed