Recent advancements in text-to-image generation have been driven by diffusion models, but single-stage models face challenges in computational efficiency and image detail refinement. To address this, the authors propose CogView3, a cascaded framework that enhances text-to-image diffusion by first creating low-resolution images and then applying relay-based super-resolution. This approach results in competitive text-to-image outputs while greatly reducing training and inference costs. Experimental results show that CogView3 outperforms the current state-of-the-art open-source text-to-image diffusion model, SDXL, by 77.0% in human evaluations, and its distilled variant achieves comparable performance while using only 1/10 of the inference time.

AI drawing more and more volume, cogview-3 Plus effect comprehensive upgrade
Original Bag Algorithm Notes Bag Algorithm Notes
27 September 2024 10:01 Beijing
My buddy at Smart Spectrum told me that their recent Vincennes model has been iterated again and again, upgraded from cogview model to cogview-3 Plus, and this time it’s absolutely top-notch.

In my impression, cogview is a product of the pre-large model era, before the birth of LLM, cogview models represented by SD have been killing it, but the china start and follow up late, often the Chinese model does not understand the Chinese, and make a lot of harmonic terrain.

For example, ‘Squirrel Mandarin Fish

Pictures such as ‘Buddha jumps over the wall

Pictures such as ‘Donkey Meat Hotcakes


Pictures like Beer Duck


Dude said, it’s really different this time, let me show you a lady I generated first.


Indeed, there is that kind of feeling, from the details, brush strokes, the image perception, are getting better and better. case often say a lot of nonsense, but the specific application of the effect is how, from all dimensions of the various scenarios of the numerous test results to carefully consider.

I said, don’t brag first, I also have an ancestral test set, from the scene, content, lens, style, brush strokes cover many angles, let me torture some.

Oil Painting Prompt: a classic oil painting depicting a blonde noblewoman in a gorgeous blue dress in the style of an oil painting.
Picture

Photography Prompt: a serene mountain lake in the black and white style of Ansel Adams, with the lake reflecting the surrounding pine forest. Morning fog, rolling hills in the distance, and faint morning light in the sky.


Watercolour Prompt: A brown kitten sleeping quietly curled up with soft fluffy fur, very cute, watercolour.



Sketch Prompt: detailed pencil sketch of an imposing tiger, standing on a grassy plain, with dense forest and mountains in the background, every muscle line of the animal is etched in vivid detail.


Crayon Prompt : A child’s crayon drawing of a family, a red house surrounded by a green meadow, the sun high in the sky and four smiling figures standing in front of the door, holding hands in a childlike manner.




Children’s Picture Book Prompt: a page of children’s picture book illustration of a little boy sitting on the moon with an open book in his hand, surrounded by a sky full of stars, the night sky reveals warmth and serenity. Pictures
Stamp Design Prompt: A stamp design depicting an eagle hunting, with a vast sky in the background and the eagle’s wings spread out in a majestic manner.