🎮 Demo Preview

Interact with caution. Content generated by third-party code.

💻 Usage

Install SDK

pip install gradio

Run Locally

git clone https://huggingface.co/spaces/PAIR/streamingt2v

Space Overview

StreamingT2V

This repository is the official implementation of StreamingT2V.

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Roberto Henschel, Levon Khachatryan, Daniil Hayrapetyan, Hayk Poghosyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi

arXiv preprint | Video | Project page

StreamingT2V is an advanced autoregressive technique that enables the creation of long videos featuring rich motion dynamics without any stagnation. It ensures temporal consistency throughout the video, aligns closely with the descriptive text, and maintains high frame-level image quality. Our demonstrations include successful examples of videos up to 1200 frames, spanning 2 minutes, and can be extended for even longer durations. Importantly, the effectiveness of StreamingT2V is not limited by the specific Text2Video model used, indicating that improvements in base models could yield even higher-quality videos.

News

* [03/21/2024] Paper StreamingT2V released! * [04/03/2024] Code and model released!

Setup

Clone this repository and enter:

``shell git clone https://github.com/Picsart-AI-Research/StreamingT2V.git cd StreamingT2V/


        
          code
          
        
        2. Install requirements using Python 3.10 and CUDA >= 11.6


     shell
conda create -n st2v python=3.10
conda activate st2v
pip install -r requirements.txt
      

        
          code
          
        
        3. (Optional) Install FFmpeg if it's missing on your system
      
     shell
conda install conda-forge::ffmpeg

`

Download the weights from HF and put them into the t2v_enhanced/checkpoints`

StreamingT2V

This repository is the official implementation of StreamingT2V.

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Roberto Henschel, Levon Khachatryan, Daniil Hayrapetyan, Hayk Poghosyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi

arXiv preprint | Video | Project page

StreamingT2V is an advanced autoregressive technique that enables the creation of long videos featuring rich motion dynamics without any stagnation. It ensures temporal consistency throughout the video, aligns closely with the descriptive text, and maintains high frame-level image quality. Our demonstrations include successful examples of videos up to 1200 frames, spanning 2 minutes, and can be extended for even longer durations. Importantly, the effectiveness of StreamingT2V is not limited by the specific Text2Video model used, indicating that improvements in base models could yield even higher-quality videos.

News

* [03/21/2024] Paper StreamingT2V released! * [04/03/2024] Code and model released!

Setup

Clone this repository and enter:

``shell git clone https://github.com/Picsart-AI-Research/StreamingT2V.git cd StreamingT2V/


        
          code
          
        
        2. Install requirements using Python 3.10 and CUDA >= 11.6


     shell
conda create -n st2v python=3.10
conda activate st2v
pip install -r requirements.txt
      

        
          code
          
        
        3. (Optional) Install FFmpeg if it's missing on your system
      
     shell
conda install conda-forge::ffmpeg


          code


        4. Download the weights from HF and put them into the t2v_enhanced/checkpoints directory.
---  

<h2 class="text-xl font-bold mt-8 mb-4 text-gray-900 dark:text-white">Inference</h2>
<h3 class="text-lg font-semibold mt-6 mb-3 text-gray-900 dark:text-white">For Text-to-Video</h3>


     shell
cd StreamingT2V/
python inference.py --prompt="A cat running on the street"


          code


        To use other base models add the --base_model=AnimateDiff argument. Use python inference.py --help for more options.<h3 class="text-lg font-semibold mt-6 mb-3 text-gray-900 dark:text-white">For Image-to-Video</h3>


     shell
cd StreamingT2V/
python inference.py --image=../examples/underwater.png --base_model=SVD
      

        
          code
          
        
        <h2 class="text-xl font-bold mt-8 mb-4 text-gray-900 dark:text-white">Results</h2>
Detailed results can be found in the Project page.
<h2 class="text-xl font-bold mt-8 mb-4 text-gray-900 dark:text-white">License</h2>
Our code is published under the CreativeML Open RAIL-M license.
We include ModelscopeT2V, AnimateDiff, DynamiCrafter in the demo for research purposes and to demonstrate the flexibility of the StreamingT2V framework to include different T2V/I2V models. For commercial usage of such components, please refer to their original license.

<h2 class="text-xl font-bold mt-8 mb-4 text-gray-900 dark:text-white">BibTeX</h2>
If you use our work in your research, please cite our publication:
      
    
@article{henschel2024streamingt2v,
  title={StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text},
  author={Henschel, Roberto and Khachatryan, Levon and Hayrapetyan, Daniil and Poghosyan, Hayk and Tadevosyan, Vahram and Wang, Zhangyang and Navasardyan, Shant and Shi, Humphrey},
  journal={arXiv preprint arXiv:2403.14773},
  year={2024}
}

``

3,433 characters total

streamingt2v

Best Scenarios

Technical Constraints

🕸️ Neural Graph Explorer

📈 Interest Trend

🕸️ Neural Graph Explorer

📈 Interest Trend

🔬Deep Dive

🛠️ Technical Profile

⚡ Hardware & Scale

🌐 Cloud & Rights

🎮 Demo Preview

💻 Usage

Space Overview

StreamingT2V

News

Setup

StreamingT2V

News

Setup

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!

Best Scenarios

Technical Constraints

🕸️ Neural Graph Explorer

📚 Learn More

📈 Interest Trend

🕸️ Neural Graph Explorer

📚 Learn More

📈 Interest Trend

🔬Deep Dive

🛠️ Technical Profile

⚡ Hardware & Scale

🌐 Cloud & Rights

🎮 Demo Preview

💻 Usage

Space Overview

StreamingT2V

News

Setup

StreamingT2V

News

Setup