Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[Core] Centralize GPU Worker construction
njhill opened this pull request 8 months ago
njhill opened this pull request 8 months ago
[Bug]: cannot load model back due to [does not appear to have a file named config.json]
yananchen1989 opened this issue 8 months ago
yananchen1989 opened this issue 8 months ago
[WIP][Hardware][Intel] support intel builds with intel c++
kannon92 opened this pull request 8 months ago
kannon92 opened this pull request 8 months ago
Add support for ReFT
RonanKMcGovern opened this issue 8 months ago
RonanKMcGovern opened this issue 8 months ago
[Core] Pipeline Parallel Support
andoorve opened this pull request 8 months ago
andoorve opened this pull request 8 months ago
[Doc]: Offline Inference Distributed Broken for TP
sam-h-bean opened this issue 8 months ago
sam-h-bean opened this issue 8 months ago
[Hardware][Nvidia] Enable support for Pascal GPUs
jasonacox opened this pull request 8 months ago
jasonacox opened this pull request 8 months ago
[RFC]: environment variable management in vllm
youkaichao opened this issue 8 months ago
youkaichao opened this issue 8 months ago
[kernel] fix sliding window in prefix prefill Triton kernel
mmoskal opened this pull request 8 months ago
mmoskal opened this pull request 8 months ago
[Bug]: Can not run openapi server with cpu backend
kannon92 opened this issue 8 months ago
kannon92 opened this issue 8 months ago
[Frontend] add tok/s speed metric to llm class when using tqdm
MahmoudAshraf97 opened this pull request 8 months ago
MahmoudAshraf97 opened this pull request 8 months ago
[Bug]: TypeError in XFormersMetadata
skonto opened this issue 8 months ago
skonto opened this issue 8 months ago
[Model]: Support for InternVL-Chat-V1-5
Iven2132 opened this issue 8 months ago
Iven2132 opened this issue 8 months ago
[Bug]: Running llama2-7b on H20, Floating point exception (core dumped) appears on float16
yk1012664593 opened this issue 8 months ago
yk1012664593 opened this issue 8 months ago
[Usage]: I doubt about the meaning of --enable-prefix-caching
chenchunhui97 opened this issue 8 months ago
chenchunhui97 opened this issue 8 months ago
[Bug]: vllm 0.4.1 and transformers 4.40.1 have conflicting dependencies on pydantic
AbbottKilig opened this issue 8 months ago
AbbottKilig opened this issue 8 months ago
[Bug]: Chunked prefill doesn't seem to work when --kv-cache-dtype fp8
rkooo567 opened this issue 8 months ago
rkooo567 opened this issue 8 months ago
[Model] Phi-3 4k sliding window temp. fix
caiom opened this pull request 8 months ago
caiom opened this pull request 8 months ago
[Speculative decoding] Support target-model logprobs
cadedaniel opened this pull request 8 months ago
cadedaniel opened this pull request 8 months ago
[Bug]: Phi3 still not supported
andrew-vold opened this issue 8 months ago
andrew-vold opened this issue 8 months ago
✨ support local cache for models
prashantgupta24 opened this pull request 8 months ago
prashantgupta24 opened this pull request 8 months ago
[Installation]: GitHub access required during install for vllm >=0.4.1 (for cu12-libnccl.so.2.18.1)
mattmalcher opened this issue 8 months ago
mattmalcher opened this issue 8 months ago
[Feature]: GPTQ/AWQ quantization is not fully optimized yet. The speed can be slower than non-quantized models.
ShubhamVerma16 opened this issue 8 months ago
ShubhamVerma16 opened this issue 8 months ago
[Feature]: AssertionError: Speculative decoding not yet supported for RayGPU backend.
cocoza4 opened this issue 8 months ago
cocoza4 opened this issue 8 months ago
[Core] Add `multiproc_worker_utils` for multiprocessing-based workers
njhill opened this pull request 8 months ago
njhill opened this pull request 8 months ago
[Frontend] Add APIs for dynamic LoRA models load/unload
graceleeis opened this pull request 8 months ago
graceleeis opened this pull request 8 months ago
[Kernel] Use flashinfer for decoding
LiuXiaoxuanPKU opened this pull request 8 months ago
LiuXiaoxuanPKU opened this pull request 8 months ago
[Bug]: mistralai/Mixtral-8x22B-Instruct-v0.1 fails to load 2/3 times on aae08249acca69060d0a8220cab920e00520932c
pseudotensor opened this issue 8 months ago
pseudotensor opened this issue 8 months ago
[Kernel] Optimize FP8 support for MoE kernel / Mixtral via static scales
pcmoritz opened this pull request 8 months ago
pcmoritz opened this pull request 8 months ago
[Bug]: Call to CUDA function failed - unknown error
roclark opened this issue 8 months ago
roclark opened this issue 8 months ago
[Misc]: RuntimeError: Cannot find any model weights [vllm=0.4.0]
vishwa27yvs opened this issue 8 months ago
vishwa27yvs opened this issue 8 months ago
[Kernel] Support Fp8 Checkpoints (Dynamic + Static)
robertgshaw2-neuralmagic opened this pull request 8 months ago
robertgshaw2-neuralmagic opened this pull request 8 months ago
[New Model]: launch error of Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4
eigen2017 opened this issue 8 months ago
eigen2017 opened this issue 8 months ago
[Misc] Upgrade outlines to v0.0.41
psykhi opened this pull request 8 months ago
psykhi opened this pull request 8 months ago
Add logger extra
olehviniarchyk opened this pull request 8 months ago
olehviniarchyk opened this pull request 8 months ago
[Core] Consolidate prompt arguments to LLM engines
DarkLight1337 opened this pull request 8 months ago
DarkLight1337 opened this pull request 8 months ago
[Kernel][Core][WIP] Tree attention and parallel decoding
yukavio opened this pull request 8 months ago
yukavio opened this pull request 8 months ago
[Bug]: phi-3 (microsoft/Phi-3-mini-128k-instruct) fails with assert "factor" in rope_scaling
pseudotensor opened this issue 8 months ago
pseudotensor opened this issue 8 months ago
[Usage]: Flash Attention not working any more
Techinix opened this issue 8 months ago
Techinix opened this issue 8 months ago
[CI] check size of the wheels
simon-mo opened this pull request 8 months ago
simon-mo opened this pull request 8 months ago
[Misc]: How is the continous batching feature of vLLM implemented?
llx-08 opened this issue 8 months ago
llx-08 opened this issue 8 months ago
[New Model]: Support Phi-3
alexkreidler opened this issue 8 months ago
alexkreidler opened this issue 8 months ago
Allow user to define whitespace pattern for outlines
robcaulk opened this pull request 8 months ago
robcaulk opened this pull request 8 months ago
[Feature]: batched parallel decoding
snyhlxde1 opened this issue 8 months ago
snyhlxde1 opened this issue 8 months ago
[Usage]: ValueError: Cannot find the config file for awq
grumpyp opened this issue 8 months ago
grumpyp opened this issue 8 months ago
[New Model]: Llama 3 8B Instruct
K-Mistele opened this issue 8 months ago
K-Mistele opened this issue 8 months ago
[Speculative decoding] CUDA graph support
heeju-kim2 opened this pull request 8 months ago
heeju-kim2 opened this pull request 8 months ago
[Bug]: Engine iteration timed out. This should never happen occurred when vllm 0.4.1 deployed llama3.
blackblue9 opened this issue 8 months ago
blackblue9 opened this issue 8 months ago
[Hardware][Nvidia] Enable support for Pascal GPUs
cduk opened this pull request 8 months ago
cduk opened this pull request 8 months ago
[WIP] Infrastructure for encoder/decoder support
afeldman-nm opened this pull request 8 months ago
afeldman-nm opened this pull request 8 months ago
[Bug]: vllm stall on llama3-70b warmup with 0.4.1
piercefreeman opened this issue 8 months ago
piercefreeman opened this issue 8 months ago
[Bug]: CPU Inference vllm_ops not defined
bsu3338 opened this issue 8 months ago
bsu3338 opened this issue 8 months ago
[MISC] Rework logger to enable pythonic custom logging configuration to be provided
tdg5 opened this pull request 8 months ago
tdg5 opened this pull request 8 months ago
add standalone_api_server
alex-k-cart opened this pull request 8 months ago
alex-k-cart opened this pull request 8 months ago
[CI/Build] AMD CI pipeline with extended set of tests.
Alexei-V-Ivanov-AMD opened this pull request 8 months ago
Alexei-V-Ivanov-AMD opened this pull request 8 months ago
[Bug]: offline test, Process hangs without exiting when using cuda graph
DefTruth opened this issue 8 months ago
DefTruth opened this issue 8 months ago
[Bug]: Repeatedly printing after the conversation ends<| im_end |><| im_start |>
huangshengfu opened this issue 8 months ago
huangshengfu opened this issue 8 months ago
[Speculative decoding] Fix async executing
zxdvd opened this pull request 8 months ago
zxdvd opened this pull request 8 months ago
[Feature]: Cannot use FlashAttention backend for Volta and Turing GPUs. (but FlashAttention v1.0.9 supports Turing GPU.)
tutu329 opened this issue 8 months ago
tutu329 opened this issue 8 months ago
[Bug]: Ray memory leak
saattrupdan opened this issue 8 months ago
saattrupdan opened this issue 8 months ago
Llama-3-70b: Should I apply some special template to use llama-3?
UbeCc opened this issue 8 months ago
UbeCc opened this issue 8 months ago
[Speculative decoding] Add ngram prompt lookup decoding
leiwen83 opened this pull request 8 months ago
leiwen83 opened this pull request 8 months ago
[Misc]: is it possible to load lora adapter on request basis with out restarting the base model for every new lora trained?
Wizmak9 opened this issue 8 months ago
Wizmak9 opened this issue 8 months ago
[Misc]: Total number of attention heads (40) must be divisible by tensor parallel size (6)
CNXDZS opened this issue 8 months ago
CNXDZS opened this issue 8 months ago
[Bug]: NameError: name 'vllm_ops' is not defined
yananchen1989 opened this issue 8 months ago
yananchen1989 opened this issue 8 months ago
[Model] Add moondream vision language model
vikhyat opened this pull request 8 months ago
vikhyat opened this pull request 8 months ago
[Bug]: NCCL locating mechanism in multi-user environment
ticoneva opened this issue 8 months ago
ticoneva opened this issue 8 months ago
[Bugfix] Fix marlin kernel crash on H100
alexm-neuralmagic opened this pull request 8 months ago
alexm-neuralmagic opened this pull request 8 months ago
[Feature]: beam search mode to allow for more options in sampling process
GeauxEric opened this issue 8 months ago
GeauxEric opened this issue 8 months ago
[Speculative decoding] [Performance]: Re-enable bonus tokens
cadedaniel opened this issue 8 months ago
cadedaniel opened this issue 8 months ago
Performance Regression between v0.4.0 and v0.4.1
simon-mo opened this issue 8 months ago
simon-mo opened this issue 8 months ago
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0
sangstar opened this pull request 8 months ago
sangstar opened this pull request 8 months ago
[Usage]: Make request to LLAVA server.
premg16 opened this issue 8 months ago
premg16 opened this issue 8 months ago
[Usage]: How to use LoRARequest with AsyncLLMEngine?
Rares9999 opened this issue 8 months ago
Rares9999 opened this issue 8 months ago
[Installation]: Failed to build form source code. Python=3.9 CUDA=12.1
WJMacro opened this issue 8 months ago
WJMacro opened this issue 8 months ago
[Frontend] Support GPT-4V Chat Completions API
DarkLight1337 opened this pull request 8 months ago
DarkLight1337 opened this pull request 8 months ago
[Model] Initial support for LLaVA-NeXT
DarkLight1337 opened this pull request 8 months ago
DarkLight1337 opened this pull request 8 months ago
[Bug]: KeyError: 'model.layers.24.mlp.down_proj.weight' for llama 7b model SqueezeLLM quantization
condy0919 opened this issue 8 months ago
condy0919 opened this issue 8 months ago
[Core] Support image processor
DarkLight1337 opened this pull request 8 months ago
DarkLight1337 opened this pull request 8 months ago
[Misc]: optimize eager mode host time
functionxu123 opened this pull request 8 months ago
functionxu123 opened this pull request 8 months ago
[RFC]: Multi-modality Support Refactoring
ywang96 opened this issue 8 months ago
ywang96 opened this issue 8 months ago
[Bug]: Disk I/O Error when using tools due to shared outlines cache database
AaronFriel opened this issue 8 months ago
AaronFriel opened this issue 8 months ago
[New Model]: Please update docker to support llama3
HangLu123 opened this issue 8 months ago
HangLu123 opened this issue 8 months ago
Adding max queue time parameter
KrishnaM251 opened this pull request 8 months ago
KrishnaM251 opened this pull request 8 months ago
[Bug]: lora base_model.model.lm_head.base_layer.weight is not supported
u650080 opened this issue 8 months ago
u650080 opened this issue 8 months ago
[Usage]: Llama 3 8B Instruct Inference
aliozts opened this issue 8 months ago
aliozts opened this issue 8 months ago
[Bug]: Server crash for bloom-3b while use prefix_caching, `AssertionError assert Lk in {16, 32, 64, 128}`
DefTruth opened this issue 8 months ago
DefTruth opened this issue 8 months ago
Add `vllm serve` to wrap `vllm.entrypoints.openai.api_server`
simon-mo opened this pull request 8 months ago
simon-mo opened this pull request 8 months ago
[CI/Build] Further decouple HuggingFace implementation from ours during tests
DarkLight1337 opened this pull request 8 months ago
DarkLight1337 opened this pull request 8 months ago
[BugFix] fix num_lookahead_slots missing in async executor
leiwen83 opened this pull request 8 months ago
leiwen83 opened this pull request 8 months ago
[Misc]: How to access the KV cache directly?
BDHU opened this issue 8 months ago
BDHU opened this issue 8 months ago
[Feature]: AMD ROCm 6.1 Support
kannan-scalers-ai opened this issue 8 months ago
kannan-scalers-ai opened this issue 8 months ago
[Bug]: Processed prompts: 5%|▌ | 429/8535 [00:27<08:37, 15.68it/s] RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
pangpang-xuan opened this issue 8 months ago
pangpang-xuan opened this issue 8 months ago
[Usage]: if I want to run a 34B model,like yi-34B-chat,how can I use multi GPU,I just have A100 40G
hellostronger opened this issue 8 months ago
hellostronger opened this issue 8 months ago
[Usage]: How to get the latency of each request with benchmark_serving.py
wanzhenchn opened this issue 8 months ago
wanzhenchn opened this issue 8 months ago
[Core] Enable prefix caching with block manager v2 enabled
leiwen83 opened this pull request 8 months ago
leiwen83 opened this pull request 8 months ago
[Feature]: Phi2 LoRA support
zero-or-one opened this issue 8 months ago
zero-or-one opened this issue 8 months ago
[Misc]Add customized information for models
jeejeelee opened this pull request 8 months ago
jeejeelee opened this pull request 8 months ago
[Bug]: Invalid Device Ordinal on ROCm
Bellk17 opened this issue 8 months ago
Bellk17 opened this issue 8 months ago