Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[Bugfix] We have fixed the bug that occurred when using FlashInfer as the backend in vLLM Speculative Decoding.
bong-furiosa opened this pull request 7 months ago
bong-furiosa opened this pull request 7 months ago
[Bugfix]Fix evict v2 with long context length
puf147 opened this pull request 7 months ago
puf147 opened this pull request 7 months ago
[CI] docfix
rkooo567 opened this pull request 7 months ago
rkooo567 opened this pull request 7 months ago
[Doc] add debugging tips
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Core] Refactor Worker and ModelRunner to consolidate control plane communication
stephanie-wang opened this pull request 7 months ago
stephanie-wang opened this pull request 7 months ago
[Performance]: Qwen2-72B-Instruction-GPTQ-Int4 Openai Server Request Problem
syngokhan opened this issue 7 months ago
syngokhan opened this issue 7 months ago
hidden-states from final (or middle layers)
janphilippfranken opened this issue 7 months ago
janphilippfranken opened this issue 7 months ago
[Bug]:The vllm service takes two hours to start Because of NCCL
zhaotyer opened this issue 7 months ago
zhaotyer opened this issue 7 months ago
[Bug]: topk=1 and temperature=0 cause different output in vllm
rangehow opened this issue 7 months ago
rangehow opened this issue 7 months ago
[Doc][Typo] Fixing Missing Comma
ywang96 opened this pull request 7 months ago
ywang96 opened this pull request 7 months ago
[Bugfix] Add device assertion to TorchSDPA
bigPYJ1151 opened this pull request 7 months ago
bigPYJ1151 opened this pull request 7 months ago
[Kernel] Suppress mma.sp warning on CUDA 12.5 and later
tlrmchlsmth opened this pull request 7 months ago
tlrmchlsmth opened this pull request 7 months ago
[Speculative decoding] Initial spec decode docs
cadedaniel opened this pull request 7 months ago
cadedaniel opened this pull request 7 months ago
[Core][Distributed] add shm broadcast
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Bugfix] fix lora_dtype value type in arg_utils.py
c3-ali opened this pull request 7 months ago
c3-ali opened this pull request 7 months ago
[Bug]: EngineArgs missing value type for `lora_dtype`
c3-ali opened this issue 7 months ago
c3-ali opened this issue 7 months ago
[Kernel] Vectorized FP8 quantize kernel
comaniac opened this pull request 7 months ago
comaniac opened this pull request 7 months ago
[Bug]: Llama3 output limited to around 10 tokens
arifsaeed opened this issue 7 months ago
arifsaeed opened this issue 7 months ago
[ci] Fix Buildkite agent path
khluu opened this pull request 7 months ago
khluu opened this pull request 7 months ago
[Kernel] Factor out epilogues from cutlass kernels
tlrmchlsmth opened this pull request 7 months ago
tlrmchlsmth opened this pull request 7 months ago
[Kernel] Adding fused bias add to cutlass_scaled_mm_dq kernel
cyang49 opened this pull request 7 months ago
cyang49 opened this pull request 7 months ago
[Misc] Remove VLLM_BUILD_WITH_NEURON env variable
WoosukKwon opened this pull request 7 months ago
WoosukKwon opened this pull request 7 months ago
[Doc] Add documentation for FP8 W8A8
mgoin opened this pull request 7 months ago
mgoin opened this pull request 7 months ago
[Kernel] `w4a16` support for `compressed-tensors`
dsikka opened this pull request 7 months ago
dsikka opened this pull request 7 months ago
Bump version to v0.5.0
simon-mo opened this pull request 7 months ago
simon-mo opened this pull request 7 months ago
[Docs] Add Docs on Limitations of VLM Support
ywang96 opened this pull request 7 months ago
ywang96 opened this pull request 7 months ago
[CI] Upgrade codespell version.
rkooo567 opened this pull request 7 months ago
rkooo567 opened this pull request 7 months ago
[Hardware][Intel] OpenVINO vLLM backend
ilya-lavrenov opened this pull request 7 months ago
ilya-lavrenov opened this pull request 7 months ago
[RFC]: OpenVINO vLLM backend
ilya-lavrenov opened this issue 7 months ago
ilya-lavrenov opened this issue 7 months ago
0.4.3 error CUDA error: an illegal memory access was encountered
maxin9966 opened this issue 7 months ago
maxin9966 opened this issue 7 months ago
[misc][typo] fix typo
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Core][Distributed] add same-node detection
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Misc] Various simplifications and typing fixes
njhill opened this pull request 7 months ago
njhill opened this pull request 7 months ago
[WIP][Core] Support tensor parallel division with remainder of attention heads
NadavShmayo opened this pull request 7 months ago
NadavShmayo opened this pull request 7 months ago
[Bug]: Docker image starts vllm.entrypoints.openai.api_server , Docker opens port 8000 but vllm isn't listening on 8000
elabz opened this issue 7 months ago
elabz opened this issue 7 months ago
[Bug]: load nvidia/Llama3-ChatQA-1.5-8B model 15 min
JJplane opened this issue 7 months ago
JJplane opened this issue 7 months ago
[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy
KuntaiDu opened this pull request 7 months ago
KuntaiDu opened this pull request 7 months ago
[Bug]: Multi GPU setup for VLLM in Openshift still does not work
jayteaftw opened this issue 7 months ago
jayteaftw opened this issue 7 months ago
[Model] Add GLM-4v support
songxxzp opened this pull request 7 months ago
songxxzp opened this pull request 7 months ago
[CI/Test] improve robustness of test by replacing del with context manager (vllm_runner)
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Kernel][RFC] Initial commit containing new Triton kernels for multi lora serving.
FurtherAI opened this pull request 7 months ago
FurtherAI opened this pull request 7 months ago
[Bugfix] Take the VRAM usage of prompt_logprobs into account
Conless opened this pull request 7 months ago
Conless opened this pull request 7 months ago
[Core][Distributed] merge two broadcast_tensor_dict
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input_scale
mgoin opened this pull request 7 months ago
mgoin opened this pull request 7 months ago
[Bug Fix] Fix the support check for FP8 CUTLASS
cli99 opened this pull request 7 months ago
cli99 opened this pull request 7 months ago
[Bug]: TorchSDPAMetadata is out of date
Reichenbachian opened this issue 7 months ago
Reichenbachian opened this issue 7 months ago
[Misc] Update to comply with the new `compressed-tensors` config
dsikka opened this pull request 7 months ago
dsikka opened this pull request 7 months ago
[Bugfix][Core] fix broken state for recompute
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker
sroy745 opened this pull request 7 months ago
sroy745 opened this pull request 7 months ago
[CI/Test] improve robustness of test by replacing del with context manager (hf_runner)
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[RFC]: Refactor MoE
robertgshaw2-neuralmagic opened this issue 7 months ago
robertgshaw2-neuralmagic opened this issue 7 months ago
[Misc] Remove unused cuda_utils.h in CPU backend
DamonFool opened this pull request 7 months ago
DamonFool opened this pull request 7 months ago
[Bug]: with `--enable-prefix-caching` , `/completions` crashes server with `echo=True` above certain prompt length
hibukipanim opened this issue 7 months ago
hibukipanim opened this issue 7 months ago
[Bug]: Qwen2 MoE: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'. Did you mean: 'qweight'?
geekwish opened this issue 7 months ago
geekwish opened this issue 7 months ago
fix DbrxFusedNormAttention missing cache_config
Calvinnncy97 opened this pull request 7 months ago
Calvinnncy97 opened this pull request 7 months ago
[Performance]: [Automatic Prefix Caching] When hitting the KV cached blocks, the first execute is slow, and then is fast.
soacker opened this issue 7 months ago
soacker opened this issue 7 months ago
[Usage]: Howto quiet the terminal 'Info' outputs in vllm
rohitnanda1443 opened this issue 7 months ago
rohitnanda1443 opened this issue 7 months ago
[Bug]: non-deterministic Python gc order leads to flaky tests
youkaichao opened this issue 7 months ago
youkaichao opened this issue 7 months ago
[Bug]: Getting an empty string ('') for every call on fine-tuned Code-Llama-7b-hf model
arthbohra opened this issue 7 months ago
arthbohra opened this issue 7 months ago
[Misc] Add args for selecting distributed executor to benchmarks
BKitor opened this pull request 7 months ago
BKitor opened this pull request 7 months ago
[Bug]: Unexpected prompt token logprob behaviors of llama 2 when setting echo=True for openai-api server
fywalter opened this issue 7 months ago
fywalter opened this issue 7 months ago
[Misc][Utils] allow get_open_port to be called for multiple times
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
remove sort_keys=True in guided_decoding
DeyangKong opened this pull request 7 months ago
DeyangKong opened this pull request 7 months ago
[Core] Fix sharing of stateful logits processors
maxdebayser opened this pull request 7 months ago
maxdebayser opened this pull request 7 months ago
[Bug]: vLLM does not support virtual GPU
youkaichao opened this issue 7 months ago
youkaichao opened this issue 7 months ago
[MISC] Upgrade dependency to PyTorch 2.3.1
comaniac opened this pull request 7 months ago
comaniac opened this pull request 7 months ago
Sa 24 sparse
dsikka opened this pull request 7 months ago
dsikka opened this pull request 7 months ago
[Doc] Add an automatic prefix caching section in vllm documentation
KuntaiDu opened this pull request 7 months ago
KuntaiDu opened this pull request 7 months ago
[AMD][ROCm][CI] unit tests fixes or skip
hongxiayang opened this pull request 7 months ago
hongxiayang opened this pull request 7 months ago
[Usage]: Streaming Response from vLLM 0.4.2 -> 0.4.3
BiboyQG opened this issue 7 months ago
BiboyQG opened this issue 7 months ago
[Feature][Frontend]: Continued `stream_options` implementation also in CompletionRequest
Etelis opened this pull request 7 months ago
Etelis opened this pull request 7 months ago
[New Model]: mistralai/Codestral-22B-v0.1
eduardozamudio opened this issue 7 months ago
eduardozamudio opened this issue 7 months ago
[Installation]: Compiling VLLM for cpu only.
Zibri opened this issue 7 months ago
Zibri opened this issue 7 months ago
[Performance]: gptq and awq quantization do not improve the performance
aaronlyt opened this issue 7 months ago
aaronlyt opened this issue 7 months ago
[Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs
maor-ps opened this pull request 7 months ago
maor-ps opened this pull request 7 months ago
GLM-4-9B-Chat:
Geaming-CHN opened this issue 7 months ago
Geaming-CHN opened this issue 7 months ago
[Bugfix]if the content is started with ":"(response of ping), client should i…
sywangyi opened this pull request 7 months ago
sywangyi opened this pull request 7 months ago
[Installation]: Building editable for vllm fails (pip install -e .)
felixzhu555 opened this issue 7 months ago
felixzhu555 opened this issue 7 months ago
[Bug]: Cannot request more than 5 logprobs
coder109 opened this issue 7 months ago
coder109 opened this issue 7 months ago
Addition of lacked ignored_seq_groups in _schedule_chunked_prefill
JamesLim-sy opened this pull request 7 months ago
JamesLim-sy opened this pull request 7 months ago
[Core][Distributed] add coordinator to reduce code duplication in tp and pp
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Hardware] Initial TPU integration
WoosukKwon opened this pull request 7 months ago
WoosukKwon opened this pull request 7 months ago
[Misc] Skip for logits_scale == 1.0
WoosukKwon opened this pull request 7 months ago
WoosukKwon opened this pull request 7 months ago
[Usage]: the docker image v0.4.3 cannot work
BUJIDAOVS opened this issue 7 months ago
BUJIDAOVS opened this issue 7 months ago
[Misc] Missing error message for custom ops import
DamonFool opened this pull request 7 months ago
DamonFool opened this pull request 7 months ago
trigger_ci_cd
sergey-tinkoff opened this pull request 7 months ago
sergey-tinkoff opened this pull request 7 months ago
[Bug]: Regression in predictions in v0.4.3
hibukipanim opened this issue 7 months ago
hibukipanim opened this issue 7 months ago
[Model] Dynamic image size support for LLaVA-NeXT
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Frontend] OpenAI API server: Add `add_special_tokens` to ChatCompletionRequest (default False)
tomeras91 opened this pull request 7 months ago
tomeras91 opened this pull request 7 months ago
test
geeker-smallwhite opened this pull request 7 months ago
geeker-smallwhite opened this pull request 7 months ago
[Core] Dynamic image size support for VLMs
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Kernel] Update Cutlass int8 kernel configs for SM80
varun-sundar-rabindranath opened this pull request 7 months ago
varun-sundar-rabindranath opened this pull request 7 months ago
[Bug]: high gpu_memory_utilization with 'OOM' and low gpu_memory_utilization with 'No available memory for the cache blocks'
mars-ch opened this issue 7 months ago
mars-ch opened this issue 7 months ago
[Bug]: chatglm3 with lora adapter
Qingyuncookie opened this issue 7 months ago
Qingyuncookie opened this issue 7 months ago
[Bug]: When I call the speculative model through the vllm interface, an error is reported: TypeError: 'type' object is not subscriptable
YuCheng-Qi opened this issue 7 months ago
YuCheng-Qi opened this issue 7 months ago
[Misc] Fix docstring of get_attn_backend
WoosukKwon opened this pull request 7 months ago
WoosukKwon opened this pull request 7 months ago
[Bug]: a bug
lambda7xx opened this issue 7 months ago
lambda7xx opened this issue 7 months ago
[Usage]: How to load a model with less CPU memory
liulfy opened this issue 7 months ago
liulfy opened this issue 7 months ago