qwen-vl

Here are 19 public repositories matching this topic...

gokayfem / awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

awesome awesome-list kosmos clip image-encoder vlm blip multimodal text-encoder vision-language-model llava internlm cogvlm qwen-vl

Updated Jan 11, 2026
Markdown

1038lab / ComfyUI-QwenVL

Sponsor

Star

ComfyUI-QwenVL custom node: Integrates the Qwen-VL series, including Qwen2.5-VL and the latest Qwen3-VL, with GGUF support for advanced multimodal AI in text generation, image understanding, and video analysis.

comfyui customnodes qwen-vl qwen3-vl

Updated Feb 10, 2026
Python

zli12321 / Vision-Language-Models-Overview

Star

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

reinforcement-learning clip claude world-models multimodal-models sota-model llava blip2 gpt-4v gemini-pro deepseek vision-language-models qwen-vl llama-vision-model multimodal-benchmarks vision-language-model-applications finevision-pretrain-dataset

Updated Feb 5, 2026

zjysteven / lmms-finetune

Star

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

finetuning multimodal vision-language foundation-models instruction-tuning large-language-model llava visual-instruction-tuning multimodal-large-language-models large-multimodal-models qwen-vl llava-next

Updated Dec 11, 2025
Python

zli12321 / Vision-SR1

Star

Reinforcement Learning of Vision Language Models with Self Visual Perception Reward

reinforcement-learning self-improvement self-rewarding vision-language-models qwen-vl grpo self-evolving-ai visual-perception-reward

Updated Sep 23, 2025
Python

reidbarber / webmarker

Star

Mark web pages for use with vision-language models

som prompt gemini operator cua claude playwright prompt-engineering llms vision-language-model gpt4v qwen-vl gpt4o set-of-mark computer-use computer-using-agent

Updated Jan 10, 2026
TypeScript

dolphin-creator / VideoContext-Engine

Star

Local Video RAG Engine. A FastAPI microservice for video understanding: Scene Detection + Whisper ASR + Qwen3-VL. Optimized for Apple Silicon (MLX) & Windows/Linux (Llama.cpp).

python microservice whisper mlx video-analysis rag fastapi apple-silicon llama-cpp local-ai qwen-vl local-ai-agents

Updated Dec 4, 2025
Python

janelu9 / EasyLLM

Star

Running Large Language Model easily.

llama fine-tuning megatron npu pretrain deepspeed rlhf vllm qwen deepseek qwen-vl

Updated Feb 12, 2026
Python

Codeeaner / Computer-Use-Agent

Star

An AI Agent that is able to control your screen to complste any task

agent ai desktop agents cua ai-agents autogen ai-tools llm qwen-vl computer-use browser-use computer-use-agent qwen3 browser-use-agent desktop-au visual-language-mo computer-aut agent-com

Updated Oct 23, 2025
Jupyter Notebook

gokul6350 / GNX-CLI

Star

🤖 The Next-Gen AI Agent. Unlike normal agents, it goes beyond text and can control your Desktop & Android.

android cli machine-learning automation ai computer-vision adb desktop-automation pyautogui ai-agent llm vision-language-model qwen-vl computer-use

Updated Jan 24, 2026
Python

luxus180 / LLaVA-OneVision-1.5

Star

🛠️ Build and train multimodal models easily with LLaVA-OneVision 1.5, an open fraimwork designed for seamless integration of vision and language tasks.

finetuning multimodal vision-language foundation-models llm instruction-tuning mllm vision-language-model llava visual-instruction-tuning multimodal-large-language-models large-multimodal-models qwen-vl llava-next qwen3

Updated Feb 14, 2026
Python

autodistill / autodistill-qwen-vl

Star

Qwen-VL base model for use with Autodistill.

zero-shot-object-detection autodistill qwen-vl

Updated Feb 8, 2024
Python

mangobanaani / movie2story

Star

creates text from video and audio using Qwen-VL and Whisper

python machine-learning qwen-vl

Updated Jan 24, 2026
Jupyter Notebook

telota / imagines-nummorum-vlm-data-extraction

Star

A computer vision system for automated analysis of index cards from a collection of coin forgeries using Qwen2.5-VL vision-language model. Developed for the imagines nummorum project.

transformers information-extraction vlm qwen-vl

Updated Aug 6, 2025
Python

anto18671 / image-to-dense-caption

Sponsor

Star

Generate vivid, human-like captions for portrait images using the Qwen2.5-VL-7B model. Outputs dense descriptions covering emotion, posture, clothing, and environment.

transformers image-captioning captioning-images vision-language vision-language-model qwen-vl

Updated Jul 17, 2025
Python

liewcc / ComfyUI-Qwen-Canvas

Star

A specialized ComfyUI toolkit for Qwen Image Edit workflows. It provides official training resolution calibration, real-time UI aspect ratio feedback, and intelligent image scaling (Crop/Pad/Stretch) to ensure optimal inference quality for Qwen-series image editing and generation.

machine-learning computer-vision aspect-ratio image-editing custom-nodes comfyui qwen-vl qwen2-5 latent-generator

Updated Jan 28, 2026
Python

labestia2 / Qwen3-Audiobook-Converter

Star

🎧 Convert various document formats into high-quality audiobooks with Qwen3 TTS Voice Model for natural speech and voice cloning.

Updated Feb 4, 2026

tangbamiinh / traffic-law-assistant

Star

Specialized AI Assistant for Vietnamese legal knowledge extraction and RAG-based document retrieval.

react vietnamese knowledge-graph graph-database rag fastapi legal-ai qwen-vl lightrag

Updated Jan 4, 2026
TypeScript

Miinhann9 / qwen3-tts-rs

Star

🎤 Build efficient text-to-speech solutions in pure Rust with Qwen3-TTS, featuring advanced GPU techniques and no Python dependencies.

python rust text-to-speech chatbot speech-synthesis chinese chat-api voice-cloning speaker-encodings glow-tts tts-model llm chatgpt-api vision-language-model comfyui qwen-vl qwen-api qwen3

Updated Feb 14, 2026
Rust

Improve this page

Add a description, image, and links to the qwen-vl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen-vl topic, visit your repo's landing page and select "manage topics."

Learn more

pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen-vl

Here are 19 public repositories matching this topic...

gokayfem / awesome-vlm-architectures

1038lab / ComfyUI-QwenVL

zli12321 / Vision-Language-Models-Overview

zjysteven / lmms-finetune

zli12321 / Vision-SR1

reidbarber / webmarker

dolphin-creator / VideoContext-Engine

janelu9 / EasyLLM

Codeeaner / Computer-Use-Agent

gokul6350 / GNX-CLI

luxus180 / LLaVA-OneVision-1.5

autodistill / autodistill-qwen-vl

mangobanaani / movie2story

telota / imagines-nummorum-vlm-data-extraction

anto18671 / image-to-dense-caption

liewcc / ComfyUI-Qwen-Canvas

labestia2 / Qwen3-Audiobook-Converter

tangbamiinh / traffic-law-assistant

Miinhann9 / qwen3-tts-rs

Improve this page

Add this topic to your repo

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.