Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[Usage]: how should I do data parallelism using vLLM?
YuWang916 opened this issue 7 months ago
YuWang916 opened this issue 7 months ago
[Bugfix] Fix KV head calculation for MPT models when using GQA
bfontain opened this pull request 7 months ago
bfontain opened this pull request 7 months ago
[CI/Build] Test buildkite monorepo plugin
dgoupil opened this pull request 7 months ago
dgoupil opened this pull request 7 months ago
[Frontend]token_ids are useless param sent to the logit_bias_logits_processor.
Etelis opened this pull request 7 months ago
Etelis opened this pull request 7 months ago
[Core] Remove unnecessary copies in flash attn backend
Yard1 opened this pull request 7 months ago
Yard1 opened this pull request 7 months ago
[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU
tlrmchlsmth opened this pull request 7 months ago
tlrmchlsmth opened this pull request 7 months ago
[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5)
alexm-neuralmagic opened this pull request 7 months ago
alexm-neuralmagic opened this pull request 7 months ago
[Feature][Frontend]: Add support for `stream_options` in `ChatCompletionRequest`
Etelis opened this pull request 7 months ago
Etelis opened this pull request 7 months ago
[Usage]: Do we have any tutorials for using vllm with tensorrt-LLM?
weiyunfei opened this issue 7 months ago
weiyunfei opened this issue 7 months ago
[Bug]: nsys cannot track the cuda kernel called by the process except rank 0
crazy-JiangDongHua opened this issue 7 months ago
crazy-JiangDongHua opened this issue 7 months ago
[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier
sroy745 opened this pull request 7 months ago
sroy745 opened this pull request 7 months ago
[CI/Build] increase wheel size limit to 200 MB
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Misc] remove duplicate definition of `seq_lens_tensor` in model_runner.py
ita9naiwa opened this pull request 7 months ago
ita9naiwa opened this pull request 7 months ago
[Feature]: How to Enable VLLM to Work with PreTrainedModel Objects in my MOE-LoRA? THX
zhaofangtao opened this issue 7 months ago
zhaofangtao opened this issue 7 months ago
[Feature]:
double-vin opened this issue 7 months ago
double-vin opened this issue 7 months ago
[Usage]: extractive question answering using VLLM
suryavan11 opened this issue 7 months ago
suryavan11 opened this issue 7 months ago
[Doc] Use intersphinx and update entrypoints docs
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[New Model]: LLaVA-NeXT-Video support
AmazDeng opened this issue 7 months ago
AmazDeng opened this issue 7 months ago
[Bug]: The tail problem
ZixinxinWang opened this issue 7 months ago
ZixinxinWang opened this issue 7 months ago
Add gptq_marlin test to cover bug report https://github.com/vllm-project/vllm/issues/5088
alexm-neuralmagic opened this pull request 7 months ago
alexm-neuralmagic opened this pull request 7 months ago
Add gptq_marlin test to cover bug report #5088
alexm-neuralmagic opened this pull request 7 months ago
alexm-neuralmagic opened this pull request 7 months ago
[Bugfix] Avoid Warnings in SparseML Activation Quantization
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago
[Bugfix] Automatically Detect SparseML models
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago
[Misc] Simplify code and fix type annotations in `conftest.py`
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Usage]: Multiple samplig params with OpenAI library
JH-lee95 opened this issue 7 months ago
JH-lee95 opened this issue 7 months ago
[Kernel] Add `w4a16` support for `compressed_tensors` models
dsikka opened this pull request 7 months ago
dsikka opened this pull request 7 months ago
[Kernel] Add support for block size 96 to the paged attention kernel.
bfontain opened this pull request 7 months ago
bfontain opened this pull request 7 months ago
[Kernel] CUTLASS epilogue refactor and kernels with quantized outputs
tlrmchlsmth opened this pull request 7 months ago
tlrmchlsmth opened this pull request 7 months ago
[Bug]: Crash sometimes using LLM entrypoint and LoRA adapters
flexorRegev opened this issue 7 months ago
flexorRegev opened this issue 7 months ago
[CI/Build] Docker cleanup functionality for amd servers
okakarpa opened this pull request 7 months ago
okakarpa opened this pull request 7 months ago
[Bug]: vLLM embeddings example code doesn't work
orionw opened this issue 7 months ago
orionw opened this issue 7 months ago
New CI template on AWS stack
khluu opened this pull request 7 months ago
khluu opened this pull request 7 months ago
[ibm-granite/granite-8b-code-instruct]: Empty reponses on ibm-granite
eduardozamudio opened this issue 7 months ago
eduardozamudio opened this issue 7 months ago
[Bugfix] gptq_marlin: Ensure g_idx_sort_indices is not a Parameter
alexm-neuralmagic opened this pull request 7 months ago
alexm-neuralmagic opened this pull request 7 months ago
[Misc]: Loading microsoft/Phi-3-medium-128k-instruct with vLLM
AkshataDM opened this issue 7 months ago
AkshataDM opened this issue 7 months ago
[Bug]: async engine failure when placing multi lora adapter under load
DavidPeleg6 opened this issue 7 months ago
DavidPeleg6 opened this issue 7 months ago
[Bug]: can not clean up the memory usage after instantiating the LLM class.
c3-ali opened this issue 7 months ago
c3-ali opened this issue 7 months ago
[Doc][Build] update after removing vllm-nccl
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Bug]: [WSL] no response when vllm.entrypoints.openai.api_server run
sung-ho-moon opened this issue 7 months ago
sung-ho-moon opened this issue 7 months ago
[Speculative Decoding] Enable arbitrary model inputs
abhigoyal1997 opened this pull request 7 months ago
abhigoyal1997 opened this pull request 7 months ago
[CI/Build] Simplify OpenAI server setup in tests
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Core] Avoid the need to pass `None` values to `Sequence.inputs`
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Misc] Add vLLM version getter to utils
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Bugfix][CI/Build] Fix codespell failing to skip files in `git diff`
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Bugfix][CI/Build] Fix test and improve code for `merge_async_iterators`
DarkLight1337 opened this pull request 7 months ago
DarkLight1337 opened this pull request 7 months ago
[Bug]: Can't run vllm distributed inference with vLLM + Ray
linchen111 opened this issue 7 months ago
linchen111 opened this issue 7 months ago
[Bug]: The implementation of DynamicNTKScalingRotaryEmbedding may have errors.
macheng6 opened this issue 7 months ago
macheng6 opened this issue 7 months ago
[Feature] vLLM CLI for serving and querying OpenAI compatible server
EthanqX opened this pull request 8 months ago
EthanqX opened this pull request 8 months ago
[Bug]: Gemma model fails with GPTQ marlin
arunpatala opened this issue 8 months ago
arunpatala opened this issue 8 months ago
[Installation]: Error when importing LLM from vllm
manishkumar0709 opened this issue 8 months ago
manishkumar0709 opened this issue 8 months ago
[Bug]: The vllm is disconnected after running for some time
zxcdsa45687 opened this issue 8 months ago
zxcdsa45687 opened this issue 8 months ago
[RFC]: OpenAI Triton-only backend
bringlein opened this issue 8 months ago
bringlein opened this issue 8 months ago
curl http://localhost:8000/generate {"detail":"Not Found"}[Usage] generate relu can not ues
fishingcatgo opened this issue 8 months ago
fishingcatgo opened this issue 8 months ago
[Model] Support MAP-NEO model
xingweiqu opened this pull request 8 months ago
xingweiqu opened this pull request 8 months ago
[Usage]: quantization option usage
Juelianqvq opened this issue 8 months ago
Juelianqvq opened this issue 8 months ago
[Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint
youkaichao opened this pull request 8 months ago
youkaichao opened this pull request 8 months ago
[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with `perf-benchmarks` label
KuntaiDu opened this pull request 8 months ago
KuntaiDu opened this pull request 8 months ago
[Bug]: Build/Install Issues with pip install -e .
Msiavashi opened this issue 8 months ago
Msiavashi opened this issue 8 months ago
[Model] Add support for falcon-11B
Isotr0py opened this pull request 8 months ago
Isotr0py opened this pull request 8 months ago
[Misc] Add a test case for 'microsoft/Phi-3-small-8k-instruct', because special tokens can cause a crash
AllenDou opened this pull request 8 months ago
AllenDou opened this pull request 8 months ago
[Bug]: The VRAM usage of calculating log_probs is not considered in profile run
Conless opened this issue 8 months ago
Conless opened this issue 8 months ago
[Feature]: Integration of transformers past_key_values into the vllm kvcache Function
ChaoZhou2023 opened this issue 8 months ago
ChaoZhou2023 opened this issue 8 months ago
Heterogeneous Speculative Decoding (CPU + GPU)
jiqing-feng opened this pull request 8 months ago
jiqing-feng opened this pull request 8 months ago
[Model] Add Internlm2 LoRA support
Isotr0py opened this pull request 8 months ago
Isotr0py opened this pull request 8 months ago
[Misc]: How to use guided decoding and regex as well?
debraj135 opened this issue 8 months ago
debraj135 opened this issue 8 months ago
[Bug]: When load model weights, there are infinite loading
tjrlwjd1 opened this issue 8 months ago
tjrlwjd1 opened this issue 8 months ago
[Usage]: not support for mistralai/Mistral-7B-Instruct-v0.3
yananchen1989 opened this issue 8 months ago
yananchen1989 opened this issue 8 months ago
[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.
heungson opened this issue 8 months ago
heungson opened this issue 8 months ago
[Core] Allow AQLM on Pascal
sasha0552 opened this pull request 8 months ago
sasha0552 opened this pull request 8 months ago
[Bug]: Cannot build cpu docker image
licryle opened this issue 8 months ago
licryle opened this issue 8 months ago
[Feature]: multi-steps model_runner?
leiwen83 opened this issue 8 months ago
leiwen83 opened this issue 8 months ago
[Frontend] Add tokenize/detokenize endpoints
sasha0552 opened this pull request 8 months ago
sasha0552 opened this pull request 8 months ago
[Bugfix] Adds outlines performance improvement
lynkz-matt-psaltis opened this pull request 8 months ago
lynkz-matt-psaltis opened this pull request 8 months ago
Running Vllm on ray cluster, logging stuck at loading
maherr13 opened this issue 8 months ago
maherr13 opened this issue 8 months ago
[Feature]: Add num_requests_preempted metric
sathyanarays opened this issue 8 months ago
sathyanarays opened this issue 8 months ago
Chat method for offline llm
nunjunj opened this pull request 8 months ago
nunjunj opened this pull request 8 months ago
[Installation]:
Kastycupra opened this issue 8 months ago
Kastycupra opened this issue 8 months ago
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops
bnellnm opened this pull request 8 months ago
bnellnm opened this pull request 8 months ago
Bump version to v0.4.3
simon-mo opened this pull request 8 months ago
simon-mo opened this pull request 8 months ago
[Misc] add logging level env var
youkaichao opened this pull request 8 months ago
youkaichao opened this pull request 8 months ago
[Misc] Make Serving Benchmark More User-friendly
ywang96 opened this pull request 8 months ago
ywang96 opened this pull request 8 months ago
[Misc]: Understanding Batching Mechanism in Prefill and Decode Phases
Msiavashi opened this issue 8 months ago
Msiavashi opened this issue 8 months ago
[Feature]: Additional metrics to enable better autoscaling / load balancing of vLLM servers in Kubernetes
achandrasekar opened this issue 8 months ago
achandrasekar opened this issue 8 months ago
ci draft
khluu opened this pull request 8 months ago
khluu opened this pull request 8 months ago
[Model] Enable FP8 QKV in MoE and refine kernel tuning script
comaniac opened this pull request 8 months ago
comaniac opened this pull request 8 months ago
[Core] Change LoRA embedding sharding to support loading methods
Yard1 opened this pull request 8 months ago
Yard1 opened this pull request 8 months ago
[Kernel] Dynamic Per-Token Activation Quantization
dsikka opened this pull request 8 months ago
dsikka opened this pull request 8 months ago
[Kernel][RFC] Refactor the punica kernel based on Triton
jeejeelee opened this pull request 8 months ago
jeejeelee opened this pull request 8 months ago
[Bug]: 英伟达最新驱动555.85,vllm运行报错
gaye746560359 opened this issue 8 months ago
gaye746560359 opened this issue 8 months ago
[CI/Build] CMakeLists: build all extensions' cmake targets at the same time
dtrifiro opened this pull request 8 months ago
dtrifiro opened this pull request 8 months ago
[Misc]: LLM is responding with advertisement
Pocoyo7798 opened this issue 8 months ago
Pocoyo7798 opened this issue 8 months ago
[FRONTEND] OpenAI `tools` support named functions
br3no opened this pull request 8 months ago
br3no opened this pull request 8 months ago
[Bugfix] logprobs is not compatible with the OpenAI spec #4795
Etelis opened this pull request 8 months ago
Etelis opened this pull request 8 months ago
[Bug]: Command-R incorrect output contains `<EOS_TOKEN>` and seems to do text prediction rather than conversation
epignatelli opened this issue 8 months ago
epignatelli opened this issue 8 months ago
[BUGFIX] [FRONTEND] Correct chat logprobs
br3no opened this pull request 8 months ago
br3no opened this pull request 8 months ago
[Usage]: I use llama3. I found that one token is 'Ġor' in tokenizer.get_vocab(). But when I use vllm server, I got ' or' in response.
fengshansi opened this issue 8 months ago
fengshansi opened this issue 8 months ago
[Bugfix][Frontend] Cleanup "fix chat logprobs"
DarkLight1337 opened this pull request 8 months ago
DarkLight1337 opened this pull request 8 months ago
[Kernel] Initial commit containing new Triton kernels for multi lora serving.
FurtherAI opened this pull request 8 months ago
FurtherAI opened this pull request 8 months ago
[Bug]: Wrong results in LangChain integration
Warit314 opened this issue 8 months ago
Warit314 opened this issue 8 months ago
[Bug]: Mistral 7b inst v0.3 fails to run
yaronr opened this issue 8 months ago
yaronr opened this issue 8 months ago