Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[New Model]: Mistral-Nemo
Hambaobao opened this issue 6 months ago
Hambaobao opened this issue 6 months ago
[Bug]: Failed to import from vllm._C with ImportError("/lib64/libc.so.6: version `GLIBC_2.32' not found
balcklive opened this issue 6 months ago
balcklive opened this issue 6 months ago
[Usage]: Can't utilize all VRAM for context
vlsav opened this issue 6 months ago
vlsav opened this issue 6 months ago
[Performance]: GPU utilization is low when running large batches on H100
sleepwalker2017 opened this issue 6 months ago
sleepwalker2017 opened this issue 6 months ago
[ Misc ] `fbgemm` checkpoints
robertgshaw2-neuralmagic opened this pull request 6 months ago
robertgshaw2-neuralmagic opened this pull request 6 months ago
[Bug]: Cannot load fp8 model of internlm2-chat-7b offline
EstellaXinyuZhang opened this issue 6 months ago
EstellaXinyuZhang opened this issue 6 months ago
[Core] Allow specifying custom Executor
Yard1 opened this pull request 6 months ago
Yard1 opened this pull request 6 months ago
[RFC]: Single Program Multiple Data (SPMD) Worker Control Plane
ruisearch42 opened this issue 6 months ago
ruisearch42 opened this issue 6 months ago
[Bug]: vllm doesn't support multi-instance GPU
cfhammill opened this issue 6 months ago
cfhammill opened this issue 6 months ago
[ci][test] add correctness test for cpu offloading
youkaichao opened this pull request 6 months ago
youkaichao opened this pull request 6 months ago
[Model] Support Mistral-Nemo
mgoin opened this pull request 6 months ago
mgoin opened this pull request 6 months ago
[ Kernel ] Enable Dynamic Per Token `fp8`
robertgshaw2-neuralmagic opened this pull request 6 months ago
robertgshaw2-neuralmagic opened this pull request 6 months ago
[CI/Build] bump ruff version, fix linting issues
dtrifiro opened this pull request 6 months ago
dtrifiro opened this pull request 6 months ago
[Feature]: mistralai/Mistral-Nemo-Instruct-2407 support
bjoernpl opened this issue 6 months ago
bjoernpl opened this issue 6 months ago
[Usage]: How to release GPU of vLLM model in python code
quanshr opened this issue 6 months ago
quanshr opened this issue 6 months ago
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes
mawong-amd opened this pull request 6 months ago
mawong-amd opened this pull request 6 months ago
[CI/Build] replace yapf with ruff
dtrifiro opened this pull request 6 months ago
dtrifiro opened this pull request 6 months ago
[Misc] Consolidate and optimize logic for building padded tensors
DarkLight1337 opened this pull request 6 months ago
DarkLight1337 opened this pull request 6 months ago
[Feature]: return Usage info for streaming request for each chunk in ChatCompletion
yecohn opened this issue 6 months ago
yecohn opened this issue 6 months ago
[Bug]: vllm turned off my pc (loading mixtral8x7b)
juanluis17 opened this issue 6 months ago
juanluis17 opened this issue 6 months ago
[Bug]: vllm not support fp8 kv cache when use flashinfer
kuangdao opened this issue 6 months ago
kuangdao opened this issue 6 months ago
[Bugfix] Corrected Typographical Errors from "indicies" to "indices"
JHLEE17 opened this pull request 6 months ago
JHLEE17 opened this pull request 6 months ago
[Core] Reduce unnecessary compute when logprobs=None
peng1999 opened this pull request 6 months ago
peng1999 opened this pull request 6 months ago
[Bug]: inter-token latency is lower than TPOT in serving benchmark result
Jeffwan opened this issue 6 months ago
Jeffwan opened this issue 6 months ago
[doc][distributed] add more doc for setting up multi-node environment
youkaichao opened this pull request 6 months ago
youkaichao opened this pull request 6 months ago
[Misc] Support FP8 kv cache scales from compressed-tensors
mgoin opened this pull request 6 months ago
mgoin opened this pull request 6 months ago
added bitsandbytes dependency in common requirement.txt file
dipatidar opened this pull request 6 months ago
dipatidar opened this pull request 6 months ago
[Misc] Small perf improvements
Yard1 opened this pull request 6 months ago
Yard1 opened this pull request 6 months ago
[Model] Pipeline Parallel Support for DeepSeek v2
tjohnson31415 opened this pull request 6 months ago
tjohnson31415 opened this pull request 6 months ago
[Model] Initialize support for InternVL2 series models
Isotr0py opened this pull request 6 months ago
Isotr0py opened this pull request 6 months ago
FP8 Dynamic-Per-Token Quant
varun-sundar-rabindranath opened this pull request 6 months ago
varun-sundar-rabindranath opened this pull request 6 months ago
[DOC] - Add docker image to Cerebrium Integration
milo157 opened this pull request 6 months ago
milo157 opened this pull request 6 months ago
[Usage]: No chat template provided. Chat API will not work. How do I get vllm to support Codellama-34B in openai format?
x0w3n opened this issue 6 months ago
x0w3n opened this issue 6 months ago
[Feature]: Add OpenAI server `prompt_logprobs` support
Theodotus1243 opened this issue 6 months ago
Theodotus1243 opened this issue 6 months ago
[Bug]: The _get_stats() are called multiple times which cause incorrect metrics collecting in do_log_stats()
yejingfu opened this issue 6 months ago
yejingfu opened this issue 6 months ago
[TPU] Refactor TPU worker & model runner
WoosukKwon opened this pull request 6 months ago
WoosukKwon opened this pull request 6 months ago
[Misc] Use `torch.Tensor` for type annotation
WoosukKwon opened this pull request 6 months ago
WoosukKwon opened this pull request 6 months ago
[TPU] Remove multi-modal args in TPU backend
WoosukKwon opened this pull request 6 months ago
WoosukKwon opened this pull request 6 months ago
[New Model]: Support for Telechat
hzhaoy opened this issue 6 months ago
hzhaoy opened this issue 6 months ago
[Model] Add Support for GPTQ Fused MOE
izhuhaoran opened this pull request 6 months ago
izhuhaoran opened this pull request 6 months ago
[Bugfix] Update flashinfer.py with PagedAttention forwards - Fixes Gemma2 OpenAI Server Crash
noamgat opened this pull request 6 months ago
noamgat opened this pull request 6 months ago
[Bug]: When I use gemma2 27b, the openai.api returns content "" as none ChatCompletionMessage(content='', role='assistant', function_call=None, tool_calls=[])
Minami-su opened this issue 6 months ago
Minami-su opened this issue 6 months ago
deploying embedding model in same way as LLM
riyajatar37003 opened this issue 6 months ago
riyajatar37003 opened this issue 6 months ago
[core][model] yet another cpu offload implementation
youkaichao opened this pull request 6 months ago
youkaichao opened this pull request 6 months ago
[Bugfix] Fix for multinode crash on 4 PP
andoorve opened this pull request 6 months ago
andoorve opened this pull request 6 months ago
[Bug]: The metrics have not improved.
zjjznw123 opened this issue 6 months ago
zjjznw123 opened this issue 6 months ago
Sequence parallel
wbdr opened this pull request 6 months ago
wbdr opened this pull request 6 months ago
[Not for review]test gemma lora
jeejeelee opened this pull request 6 months ago
jeejeelee opened this pull request 6 months ago
[misc][distributed] add seed to dummy weights
youkaichao opened this pull request 6 months ago
youkaichao opened this pull request 6 months ago
[CI/Build] Update flashinfer to v0.0.9 (#6489)
170928 opened this pull request 6 months ago
170928 opened this pull request 6 months ago
[Misc] Updated flashinfer to v0.0.9 in the following test scripts:
170928 opened this issue 6 months ago
170928 opened this issue 6 months ago
[misc][distributed] improve tests
youkaichao opened this pull request 6 months ago
youkaichao opened this pull request 6 months ago
[ Kernel ] Fp8 Channelwise Weight Support
robertgshaw2-neuralmagic opened this pull request 6 months ago
robertgshaw2-neuralmagic opened this pull request 6 months ago
[Bug]: No module named `jsonschema.protocols`.
eff-kay opened this issue 6 months ago
eff-kay opened this issue 6 months ago
[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models.
sroy745 opened this pull request 6 months ago
sroy745 opened this pull request 6 months ago
[Model] Support Mamba
tlrmchlsmth opened this pull request 6 months ago
tlrmchlsmth opened this pull request 6 months ago
[Not for review] Spmd tp rebase
ruisearch42 opened this pull request 6 months ago
ruisearch42 opened this pull request 6 months ago
[ROCm] Cleanup Dockerfile and remove outdated patch
hongxiayang opened this pull request 6 months ago
hongxiayang opened this pull request 6 months ago
[New Model]: Codestral Mamba
K-Mistele opened this issue 6 months ago
K-Mistele opened this issue 6 months ago
[Bug]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding' / gemma-2-9b with vllm=0.5.2
choco9966 opened this issue 6 months ago
choco9966 opened this issue 6 months ago
[Bug]: Gemma 27B crashes on GCP A100
noamgat opened this issue 6 months ago
noamgat opened this issue 6 months ago
[Bug]: [vllm-openvino]: ValueError: `use_cache` was set to `True` but the loaded model only supports `use_cache=False`.
HPUedCSLearner opened this issue 6 months ago
HPUedCSLearner opened this issue 6 months ago
[Feature]: Pipeline parallelism support for qwen model
hiyforever opened this issue 6 months ago
hiyforever opened this issue 6 months ago
[Usage]: PeftModelForCausalLM is not JSON serializable
jazzisfuture opened this issue 6 months ago
jazzisfuture opened this issue 6 months ago
[Performance]: [Speculative Decoding] Measurement of Cost Coefficient through vLLM
bong-furiosa opened this issue 7 months ago
bong-furiosa opened this issue 7 months ago
[Misc][Speculative decoding] Typos and typing fixes
ShangmingCai opened this pull request 7 months ago
ShangmingCai opened this pull request 7 months ago
[Bug]: failed when run Qwen2-54B-A14B-GPTQ-Int4(MOE)
weiminw opened this issue 7 months ago
weiminw opened this issue 7 months ago
unable to run vllm model deployment
riyajatar37003 opened this issue 7 months ago
riyajatar37003 opened this issue 7 months ago
[Bugfix][Frontend] Fix missing `/metrics` endpoint
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Bug]: Can't load gemma-2-9b-it with vllm 0.5.2
vlsav opened this issue 7 months ago
vlsav opened this issue 7 months ago
[Bug]: No metrics exposed at /metrics with 0.5.2 (0.5.1 is fine), possible regression?
frittentheke opened this issue 7 months ago
frittentheke opened this issue 7 months ago
[CI/Build] Remove "boardwalk" image asset
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Bugfix] enable prefix caching for AsyncLLMEngine when requesting prompt_logprobs
KrishnaM251 opened this pull request 7 months ago
KrishnaM251 opened this pull request 7 months ago
[Distributed][Model] Rank-based Component Creation for Pipeline Parallelism Memory Optimization
wushidonguc opened this pull request 7 months ago
wushidonguc opened this pull request 7 months ago
[Misc] Log spec decode metrics
comaniac opened this pull request 7 months ago
comaniac opened this pull request 7 months ago
[Bug]: vLLM is unable to load Mistral on Inferentia and AWS neuron
servient-ashwin opened this issue 7 months ago
servient-ashwin opened this issue 7 months ago
[Model] H2O Danube3-4b
g-eoj opened this pull request 7 months ago
g-eoj opened this pull request 7 months ago
[Bug]: Seed issue with Pipeline Parallel
andoorve opened this issue 7 months ago
andoorve opened this issue 7 months ago
[Not for review] PP ADAG
ruisearch42 opened this pull request 7 months ago
ruisearch42 opened this pull request 7 months ago
[Bug]: TypeError: 'NoneType' object is not callable when start Gemma2-27b-it
candowu opened this issue 7 months ago
candowu opened this issue 7 months ago
[Core] Use numpy to speed up padded token processing
peng1999 opened this pull request 7 months ago
peng1999 opened this pull request 7 months ago
[Draft] proposal for ipex quant support
jikunshang opened this pull request 7 months ago
jikunshang opened this pull request 7 months ago
[doc][misc] doc update
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Bug]: Severe computation errors when batching request for microsoft/Phi-3-mini-128k-instruct
lance0108 opened this issue 7 months ago
lance0108 opened this issue 7 months ago
[Doc] add env docs for flashinfer backend
DefTruth opened this pull request 7 months ago
DefTruth opened this pull request 7 months ago
[VLM] Minor space optimization for `ClipVisionModel`
ywang96 opened this pull request 7 months ago
ywang96 opened this pull request 7 months ago
Add FUNDING.yml
simon-mo opened this pull request 7 months ago
simon-mo opened this pull request 7 months ago
v0.5.2, v0.5.3, v0.6.0 Release Tracker
simon-mo opened this issue 7 months ago
simon-mo opened this issue 7 months ago
bump version to v0.5.2
simon-mo opened this pull request 7 months ago
simon-mo opened this pull request 7 months ago
[Bug]: autogen can't work with vllm v0.5.1
tonyaw opened this issue 7 months ago
tonyaw opened this issue 7 months ago
[Doc][CI/Build] Update docs and tests to use `vllm serve`
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Bugfix] Convert image to RGB by default
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Bug]: illegal memory access when increase max_model_length on FP8 models
IEI-mjx opened this issue 7 months ago
IEI-mjx opened this issue 7 months ago
[Bugfix] Benchmark serving script used global parameter 'args' in function 'sample_random_requests'
lxline opened this pull request 7 months ago
lxline opened this pull request 7 months ago
[Bug]: Paligemma support for PNG files
BabyChouSr opened this issue 7 months ago
BabyChouSr opened this issue 7 months ago
[ CI ] 0.4.3.post1 Hotfix
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago
[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug
mzusman opened this pull request 7 months ago
mzusman opened this pull request 7 months ago
[Feature]: Return softmax of attention layer.
DouHappy opened this issue 7 months ago
DouHappy opened this issue 7 months ago
[ Misc ] Enable Quantizing All Layers of DeekSeekv2
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago