Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[New Model]: SparseLLM/prosparse-llama-2-7b
eljrte opened this issue about 2 months ago
eljrte opened this issue about 2 months ago
[Frontend] Multi-Modality Support for Loading Local Image Files
chaunceyjiang opened this pull request about 2 months ago
chaunceyjiang opened this pull request about 2 months ago
[Bug]: awq marlin error for deepseek v2 lite
TechxGenus opened this issue about 2 months ago
TechxGenus opened this issue about 2 months ago
[Hardware][CPU] Update torch 2.5
bigPYJ1151 opened this pull request about 2 months ago
bigPYJ1151 opened this pull request about 2 months ago
[Bug]: Mistral 'SentencePieceTokenizer' object has no attribute 'id_to_byte_piece'
liziniu opened this issue about 2 months ago
liziniu opened this issue about 2 months ago
[Bug]: vllm 0.6.3.post1 does not work with `response_format`
Quang-elec44 opened this issue about 2 months ago
Quang-elec44 opened this issue about 2 months ago
[Bugfix] Respect modules_to_not_convert within awq_marlin
mgoin opened this pull request about 2 months ago
mgoin opened this pull request about 2 months ago
[Bugfix] Fix layer skip logic with bitsandbytes
mgoin opened this pull request about 2 months ago
mgoin opened this pull request about 2 months ago
[Kernels] Add an inductor pass to rewrite and fuse collective communication ops with gemms
bnellnm opened this pull request about 2 months ago
bnellnm opened this pull request about 2 months ago
[Bugfix]Using the correct type hints
gshtras opened this pull request about 2 months ago
gshtras opened this pull request about 2 months ago
add github action to build and push cpu-inference image
br3no opened this pull request about 2 months ago
br3no opened this pull request about 2 months ago
[Kernels] Add an inductor pass to rewrite and fuse collective communication ops with gemms (WIP not for review)
bnellnm opened this pull request about 2 months ago
bnellnm opened this pull request about 2 months ago
fix point in path
youkaichao opened this pull request about 2 months ago
youkaichao opened this pull request about 2 months ago
[Frontend] Add max_tokens prometheus metric
tomeras91 opened this pull request about 2 months ago
tomeras91 opened this pull request about 2 months ago
[V1] Complete v1 sample and prompt logprobs support
afeldman-nm opened this pull request about 2 months ago
afeldman-nm opened this pull request about 2 months ago
[Usage]: Hidden States not working in Speculative Decode
ChiKaWa3077 opened this issue about 2 months ago
ChiKaWa3077 opened this issue about 2 months ago
[Bug]: seq_group_metadata.encoder_seq_data.get_len() AttributeError: 'NoneType' object has no attribute 'get_len'
bingwork opened this issue about 2 months ago
bingwork opened this issue about 2 months ago
[Neuron] Skip model attention head compatibility check with tensor pa…
sssrijan-amazon opened this pull request about 2 months ago
sssrijan-amazon opened this pull request about 2 months ago
[torch.compile] Adding torch compile annotations to some models
CRZbulabula opened this pull request about 2 months ago
CRZbulabula opened this pull request about 2 months ago
[Bug]: Running on a single machine with multiple GPUs error
Wiselnn570 opened this issue about 2 months ago
Wiselnn570 opened this issue about 2 months ago
[Bug]: Function calling with Qwen & Streaming ('NoneType' object has no attribute 'get')
githebs opened this issue about 2 months ago
githebs opened this issue about 2 months ago
[New Model]: 请问接下来有支持诸如GPT-SoVITS之类的语音生成大模型的计划吗?
wyp19960713 opened this issue about 2 months ago
wyp19960713 opened this issue about 2 months ago
[Bug]: You are using a model of type qwen2_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
cqray1990 opened this issue about 2 months ago
cqray1990 opened this issue about 2 months ago
[V1] Support VLMs with fine-grained scheduling
WoosukKwon opened this pull request about 2 months ago
WoosukKwon opened this pull request about 2 months ago
[Bug]: LLaVA-v1.5-13B with OpenAI compatible API reported an error
ligeng0197 opened this issue about 2 months ago
ligeng0197 opened this issue about 2 months ago
[Doc] Update Qwen documentation
jeejeelee opened this pull request about 2 months ago
jeejeelee opened this pull request about 2 months ago
[New Model]: NV-Embed-v2
warlockedward opened this issue about 2 months ago
warlockedward opened this issue about 2 months ago
[torch.compile] rework test plans
youkaichao opened this pull request about 2 months ago
youkaichao opened this pull request about 2 months ago
[Installation]: pynvml.NVMLError_InvalidArgument: Invalid Argument
jedi0605 opened this issue about 2 months ago
jedi0605 opened this issue about 2 months ago
[Misc] Remove deprecated arg for cuda graph capture
ywang96 opened this pull request about 2 months ago
ywang96 opened this pull request about 2 months ago
[Performance]: Qwen2-VL-7B AWQ model performance
zzf2grx opened this issue about 2 months ago
zzf2grx opened this issue about 2 months ago
[Prototype][WIP] Prefix Cache Aware Scheduling for V0
rickyyx opened this pull request about 2 months ago
rickyyx opened this pull request about 2 months ago
[Bugfix/Core] Remove assertion for Flashinfer k_scale and v_scale
pavanimajety opened this pull request about 2 months ago
pavanimajety opened this pull request about 2 months ago
[Frontend] Pythonic tool parser
mdepinet opened this pull request about 2 months ago
mdepinet opened this pull request about 2 months ago
[Kernel][Triton] Add Triton implementation for scaled_mm_triton to support int8 SmoothQuant, symmetric case
rasmith opened this pull request about 2 months ago
rasmith opened this pull request about 2 months ago
[V1] Multiprocessing Tensor Parallel Support for v1
tlrmchlsmth opened this pull request about 2 months ago
tlrmchlsmth opened this pull request about 2 months ago
[Kernel] Initial Machete W4A8 support + Refactors
LucasWilkinson opened this pull request about 2 months ago
LucasWilkinson opened this pull request about 2 months ago
[help wanted]: add sliding window support for flashinfer
youkaichao opened this issue about 2 months ago
youkaichao opened this issue about 2 months ago
[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices
wallashss opened this pull request about 2 months ago
wallashss opened this pull request about 2 months ago
[CI/Build] Adding a forced docker system prune to clean up space
Alexei-V-Ivanov-AMD opened this pull request about 2 months ago
Alexei-V-Ivanov-AMD opened this pull request about 2 months ago
[Bug]: could not broadcast input array from shape (944,) into shape (512,)
olegkhr opened this issue about 2 months ago
olegkhr opened this issue about 2 months ago
[New Model]: BAAI/bge-m3
javiplav opened this issue about 2 months ago
javiplav opened this issue about 2 months ago
[CI/Build] Add Model Tests for Qwen2-VL
alex-jw-brooks opened this pull request about 2 months ago
alex-jw-brooks opened this pull request about 2 months ago
[Feature]: Online video support for VLMs
litianjian opened this issue about 2 months ago
litianjian opened this issue about 2 months ago
[Bug]: api_server.py: error: unrecognized arguments: --task embedding
javiplav opened this issue about 2 months ago
javiplav opened this issue about 2 months ago
[BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100
mzusman opened this pull request about 2 months ago
mzusman opened this pull request about 2 months ago
[Bug] params Type is not right?
cqray1990 opened this issue about 2 months ago
cqray1990 opened this issue about 2 months ago
[Usage]: How to use GLM4v multi_modal_data to make Multi-turn dialogue
Jimmy-L99 opened this issue about 2 months ago
Jimmy-L99 opened this issue about 2 months ago
[Usage]: ValueError: Unexpected weight for Qwen2-VL GPTQ 4-bit custom model.
bhavyajoshi-mahindra opened this issue about 2 months ago
bhavyajoshi-mahindra opened this issue about 2 months ago
[Feature]: host wheel via pypi index?
youkaichao opened this issue about 2 months ago
youkaichao opened this issue about 2 months ago
> **Bug**:Interesting finding: The official pip package v0.6.3 is broken. However, installing `https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl` fixes this issue. (`vLLM API server version 0.6.3.post2.dev139+g622b7ab9`)
Wiselnn570 opened this issue about 2 months ago
Wiselnn570 opened this issue about 2 months ago
[V1] `AsyncLLM` Implementation
robertgshaw2-neuralmagic opened this pull request about 2 months ago
robertgshaw2-neuralmagic opened this pull request about 2 months ago
[Bug]: The Qwen series models produce garbled output when generating long texts.
hongqing1986 opened this issue about 2 months ago
hongqing1986 opened this issue about 2 months ago
[Kernel][ROCm][AMD] fp8 moe configs for MI300X. Mixtral-8x(7B,22B) TP=1,2,4,8
divakar-amd opened this pull request about 2 months ago
divakar-amd opened this pull request about 2 months ago
[Model] Support quantization of Qwen2VisionTransformer for Qwen2-VL
mgoin opened this pull request about 2 months ago
mgoin opened this pull request about 2 months ago
[CI/Build] Add Model Tests for PixtralHF
mgoin opened this pull request about 2 months ago
mgoin opened this pull request about 2 months ago
[Feature]: Integrate Writing in the Margins inference pattern ($5,000 Bounty)
melisa-writer opened this issue about 2 months ago
melisa-writer opened this issue about 2 months ago
[Bugfix][Quantization]Fix support for non quantized visual layers in otherwise quantized mllama model, including missing scaling factors
gshtras opened this pull request about 2 months ago
gshtras opened this pull request about 2 months ago
[Bug]: Can't start `vllm serve microsoft/Phi-3-small-8k-instruct`
Yuto-24 opened this issue about 2 months ago
Yuto-24 opened this issue about 2 months ago
[Usage]: prefix caching support for multimodal models
mearcstapa-gqz opened this issue about 2 months ago
mearcstapa-gqz opened this issue about 2 months ago
[Feature]: Is it supported Qwen2.5 tool_choice: auto?
deku0818 opened this issue about 2 months ago
deku0818 opened this issue about 2 months ago
[Performance]: Sampler account for most of time comparing to prefill and decode
zhjunqin opened this issue about 2 months ago
zhjunqin opened this issue about 2 months ago
[torch.compile] Simplify exception trace in compilation tests
CRZbulabula opened this pull request about 2 months ago
CRZbulabula opened this pull request about 2 months ago
[Bug]: ModuleNotFoundError: No module named 'openai.types'
StevenTang1998 opened this issue about 2 months ago
StevenTang1998 opened this issue about 2 months ago
[Hardware] using current_platform.seed_everything
wangshuai09 opened this pull request about 2 months ago
wangshuai09 opened this pull request about 2 months ago
[Usage]: how to return logits
psh0628-eng opened this issue about 2 months ago
psh0628-eng opened this issue about 2 months ago
Prototyping `LLMEngineCore`
robertgshaw2-neuralmagic opened this pull request about 2 months ago
robertgshaw2-neuralmagic opened this pull request about 2 months ago
[Frontend]: enable state callbacks for offline inference
sethkimmel3 opened this pull request about 2 months ago
sethkimmel3 opened this pull request about 2 months ago
[Misc] Refactor benchmark_throughput.py
lk-chen opened this pull request about 2 months ago
lk-chen opened this pull request about 2 months ago
trigger
robertgshaw2-neuralmagic opened this pull request about 2 months ago
robertgshaw2-neuralmagic opened this pull request about 2 months ago
[Bug]: [help wanted] MoE + TP + custom allreduce bug
youkaichao opened this issue about 2 months ago
youkaichao opened this issue about 2 months ago
[Doc] fix third-party model example
russellb opened this pull request about 2 months ago
russellb opened this pull request about 2 months ago
[Bug]: Nonsense output for Qwen2.5 72B after upgrading to latest vllm 0.6.3.post1 [REPROs]
pseudotensor opened this issue about 2 months ago
pseudotensor opened this issue about 2 months ago
[perf bench] H200 development
simon-mo opened this pull request about 2 months ago
simon-mo opened this pull request about 2 months ago
[Model] Add Idefics3 support
jeejeelee opened this pull request about 2 months ago
jeejeelee opened this pull request about 2 months ago
[Model][Quantization] HQQ support through Marlin kernel expansion
ElizaWszola opened this pull request about 2 months ago
ElizaWszola opened this pull request about 2 months ago
[Bug]: OSError: [Errno 98] Address already in use
ericxsun opened this issue about 2 months ago
ericxsun opened this issue about 2 months ago
[Model] add tool parser for openbmb/MiniCPM3-4B
Cppowboy opened this pull request about 2 months ago
Cppowboy opened this pull request about 2 months ago
[Feature]: Qwen2.5 model : ValueError: This model does not support the 'embedding' task. Supported tasks: {'generate'}
tarikbeijing opened this issue about 2 months ago
tarikbeijing opened this issue about 2 months ago
[Frontend] Chat-based Embeddings API
DarkLight1337 opened this pull request about 2 months ago
DarkLight1337 opened this pull request about 2 months ago
[Bug]: Llama-3.2-11B-Vision-Instruct Inference Can't Stop
sudanl opened this issue about 2 months ago
sudanl opened this issue about 2 months ago
[xpu] Use allreduce to replace gather can reduce extra cat
ys950902 opened this pull request about 2 months ago
ys950902 opened this pull request about 2 months ago
[Model] Add support for H2OVL-Mississippi models
cooleel opened this pull request about 2 months ago
cooleel opened this pull request about 2 months ago
Bump actions/checkout from 4.2.1 to 4.2.2
dependabot[bot] opened this pull request about 2 months ago
dependabot[bot] opened this pull request about 2 months ago
Bump actions/setup-python from 5.2.0 to 5.3.0
dependabot[bot] opened this pull request about 2 months ago
dependabot[bot] opened this pull request about 2 months ago
[Usage]:
Slicknuts23 opened this issue about 2 months ago
Slicknuts23 opened this issue about 2 months ago
[Feature]: OpenAI supply image path
SinanAkkoyun opened this issue about 2 months ago
SinanAkkoyun opened this issue about 2 months ago
[misc] use out argument for flash attention
youkaichao opened this pull request about 2 months ago
youkaichao opened this pull request about 2 months ago
[Bug]: ValueError: At most 1 image(s) may be provided in one request.
eav-solution opened this issue about 2 months ago
eav-solution opened this issue about 2 months ago
[Bug]: offline inference with ray fails on multinode
gpucce opened this issue about 2 months ago
gpucce opened this issue about 2 months ago
[Bug]: "Address already in use" for 1 minute after crash (since 0.6.2)
hibukipanim opened this issue about 2 months ago
hibukipanim opened this issue about 2 months ago
[Usage]: MistralModel architecture not supported
dequeueing opened this issue about 2 months ago
dequeueing opened this issue about 2 months ago
[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson
conroy-cheers opened this pull request about 2 months ago
conroy-cheers opened this pull request about 2 months ago
Fix cache management in "Close inactive issues and PRs" actions workflow
hmellor opened this pull request about 2 months ago
hmellor opened this pull request about 2 months ago
[Usage]: It seems succeed to deploy llm. but I get errors when "curl http://127.0.0.1:8011/v1/models"
gaohang opened this issue about 2 months ago
gaohang opened this issue about 2 months ago
[Bug]: Qwen2-VL incoherent output with OpenAI API
SinanAkkoyun opened this issue about 2 months ago
SinanAkkoyun opened this issue about 2 months ago
[Bug]: tensor parallelism multinode
gpucce opened this issue about 2 months ago
gpucce opened this issue about 2 months ago
[Bugfix][SpecDecode] kv corruption with bonus tokens in spec decode
llsj14 opened this pull request about 2 months ago
llsj14 opened this pull request about 2 months ago
[Bug]: Bfloat16 or Half are not compatible with HF float16/bfloat16 result.
jason9693 opened this issue about 2 months ago
jason9693 opened this issue about 2 months ago
[Bug]: Jetson support regression
conroy-cheers opened this issue about 2 months ago
conroy-cheers opened this issue about 2 months ago