- cross-posted to:
- localllama@sh.itjust.works
- cross-posted to:
- localllama@sh.itjust.works
X-post : https://aussie.zone/post/23348593 by @Eyekaytee@aussie.zone
🚀 Meet Qwen-Image — a 20B MMDiT model for next-gen text-to-image generation. Especially strong at creating stunning graphic posters with native text. Now open-source.
🔍 Key Highlights:
🔹 SOTA text rendering — rivals GPT-4o in English, best-in-class for Chinese
🔹 In-pixel text generation — no overlays, fully integrated
🔹 Bilingual support, diverse fonts, complex layouts
🎨 Also excels at general image generation — from photorealistic to anime, impressionist to minimalist. A true creative powerhouse.
Blog: https://qwenlm.github.io/blog/qwen-image/
Hugging Face: https://huggingface.co/Qwen/Qwen-Image
Model Scope: https://modelscope.cn/models/Qwen/Qwen-Image/summary
GitHub: https://github.com/QwenLM/Qwen-Image
Technical Report: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf
WaveSpeed Demo: https://wavespeed.ai/models/wavespeed-ai/qwen-image/text-to-image
Demo: https://modelscope.cn/aigc/imageGeneration?tab=advanced
Would love to see someone running this on the Ryzen AI Max 395. This running locally would be pretty awesome
What kind of hardware is needed to run models like this?
In the page it says that on the A100 is almost 2 minutes per generation and the model is nearly 30Gb in size.
So probably only top consumer grsde gpus, like 5090, could run it at quite slow inference speed. At least the raw model.
40gb+ of VRAM