Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[ Kernel ] AWQ Fused MoE
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago
[ci][build] fix commit id
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests
g-eoj opened this pull request 7 months ago
g-eoj opened this pull request 7 months ago
[doc][distributed] add suggestion for distributed inference
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[ Misc ] Apply MoE Refactor to Qwen2 + Deepseekv2 To Support Fp8
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago
[Feature]: Apply chat template through `LLM` class
robertgshaw2-neuralmagic opened this issue 7 months ago
robertgshaw2-neuralmagic opened this issue 7 months ago
[ Kernel ] AWQ Fused MoE
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago
[Bug]: Timeout Error When Deploying Llamafied InternLM2-5-7B-Chat-1M Model via vLLM OpenAI API Server
mf-skjung opened this issue 7 months ago
mf-skjung opened this issue 7 months ago
[Bugfix][CI/Build] Fix testing for generated commit hash
mgoin opened this pull request 7 months ago
mgoin opened this pull request 7 months ago
[Doc] Add documentations for nightly benchmarks
KuntaiDu opened this pull request 7 months ago
KuntaiDu opened this pull request 7 months ago
Updating LM Format Enforcer version to v10.3
noamgat opened this pull request 7 months ago
noamgat opened this pull request 7 months ago
[ci][distributed] add pipeline parallel correctness test
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[Bugfix] use float32 precision in samplers/test_logprobs.py for comparing with HF
tdoublep opened this pull request 7 months ago
tdoublep opened this pull request 7 months ago
when i set tensor_parallel_size>1(A100 * 4), it does not work
cx-hub opened this issue 7 months ago
cx-hub opened this issue 7 months ago
[core][distributed] simplify code to support pipeline parallel
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
Remove unnecessary trailing period in spec_decode.rst
terrytangyuan opened this pull request 7 months ago
terrytangyuan opened this pull request 7 months ago
Report usage for beam search
simon-mo opened this pull request 7 months ago
simon-mo opened this pull request 7 months ago
[Model] Pipeline parallel support for Mixtral
binxuan opened this pull request 7 months ago
binxuan opened this pull request 7 months ago
[Misc] Add deprecation warning for beam search
WoosukKwon opened this pull request 7 months ago
WoosukKwon opened this pull request 7 months ago
[Misc]: _run_workers_async function of DistributedGPUExecutorAsync
HMJW opened this issue 7 months ago
HMJW opened this issue 7 months ago
[Misc] Disambiguate quantized types via a new ScalarType
LucasWilkinson opened this pull request 7 months ago
LucasWilkinson opened this pull request 7 months ago
[Bug]: Gemma-2 + FlashInfer: ValueError: Unsupported max_frags_z:
HanGuo97 opened this issue 7 months ago
HanGuo97 opened this issue 7 months ago
[CI/Build] Cross python wheel
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago
[Doc] xpu backend requires running setvars.sh
rscohn2 opened this pull request 7 months ago
rscohn2 opened this pull request 7 months ago
[Bug]: Problem loading Gemma 2 27b-it
rdaiello opened this issue 7 months ago
rdaiello opened this issue 7 months ago
[Bug]: Runtime AssertionError: 32768 is not divisible by 3, multiproc_worker_utils.py:120, when using 3 GPUs for tensor-parallel
haltingstate opened this issue 7 months ago
haltingstate opened this issue 7 months ago
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace
tlrmchlsmth opened this pull request 7 months ago
tlrmchlsmth opened this pull request 7 months ago
[RFC]: A Graph Optimization System in vLLM using torch.compile
bnellnm opened this issue 7 months ago
bnellnm opened this issue 7 months ago
torch.compile based model optimizer
bnellnm opened this pull request 7 months ago
bnellnm opened this pull request 7 months ago
[Bug]: vLLM 0.5.1 tensor parallel 2 hang
Flynn-Zh opened this issue 7 months ago
Flynn-Zh opened this issue 7 months ago
[BUGFIX] Raise an error for no draft token case when draft_tp>1
wooyeonlee0 opened this pull request 7 months ago
wooyeonlee0 opened this pull request 7 months ago
[Feature]: Request for Ascend NPU support
xuedinge233 opened this issue 7 months ago
xuedinge233 opened this issue 7 months ago
[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.
momomobinx opened this issue 7 months ago
momomobinx opened this issue 7 months ago
[ Misc ] More Cleanup of Marlin
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago
[ Misc ] Support Act Order in Compressed Tensors
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago
[BigFix] Fix the lm_head in gpt_bigcode in lora mode
maxdebayser opened this pull request 7 months ago
maxdebayser opened this pull request 7 months ago
[ Misc ] Support Models With Bias in `compressed-tensors` integration
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago
[Installation]: Running ohereForAI/c4ai-command-r-v01 with main pytorch
laithsakka opened this issue 7 months ago
laithsakka opened this issue 7 months ago
[Bugfix] Fix Ray Metrics API usage
Yard1 opened this pull request 7 months ago
Yard1 opened this pull request 7 months ago
[ Misc ] Remove separate bias add
robertgshaw2-neuralmagic opened this pull request 7 months ago
robertgshaw2-neuralmagic opened this pull request 7 months ago
[ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in vllm to get device count
hongxiayang opened this pull request 7 months ago
hongxiayang opened this pull request 7 months ago
[Misc] Remove flashinfer warning, add flashinfer tests to CI
LiuXiaoxuanPKU opened this pull request 7 months ago
LiuXiaoxuanPKU opened this pull request 7 months ago
[CI/Build] (2/2) Switching AMD CI to store images in Docker Hub
adityagoel14 opened this pull request 7 months ago
adityagoel14 opened this pull request 7 months ago
[Bugfix] Fix usage stats logging exception warning with OpenVINO
helena-intel opened this pull request 7 months ago
helena-intel opened this pull request 7 months ago
[Feature]: FlashAttention 3 support
orellavie1212 opened this issue 7 months ago
orellavie1212 opened this issue 7 months ago
[doc] update pipeline parallel in readme
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[distributed][misc] keep consistent with how pytorch finds libcudart.so
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[BugFix] BatchResponseData body should be optional
zifeitong opened this pull request 7 months ago
zifeitong opened this pull request 7 months ago
[Kernel] Fix identical branches
stevegrubb opened this pull request 7 months ago
stevegrubb opened this pull request 7 months ago
[Model][Phi3-Small] Remove scipy from blocksparse_attention
mgoin opened this pull request 7 months ago
mgoin opened this pull request 7 months ago
[Bug]: OpenAI batch file format pydantic validation error
ArsalShakil opened this issue 7 months ago
ArsalShakil opened this issue 7 months ago
[Misc] add fixture to guided processor tests
kevinbu233 opened this pull request 7 months ago
kevinbu233 opened this pull request 7 months ago
[Bug]: get that Exception in thread Thread-3 (_report_usage_worker): (vllm OpenVINO,When python3 vllm/benchmarks/benchmark_throughput.py,)
HPUedCSLearner opened this issue 7 months ago
HPUedCSLearner opened this issue 7 months ago
[bug fix] Fix llava next feature size calculation.
xwjiang2010 opened this pull request 7 months ago
xwjiang2010 opened this pull request 7 months ago
[Core] draft_model_runner: Implement prepare_inputs on GPU for advance_step
alexm-neuralmagic opened this pull request 7 months ago
alexm-neuralmagic opened this pull request 7 months ago
[Bug]: Metrics time_to_first_token_seconds, time_per_output_token_seconds not working correctly
thies1006 opened this issue 7 months ago
thies1006 opened this issue 7 months ago
[Performance]: how to use NVIDIA Nsight Compute in lunix
chenglu66 opened this issue 7 months ago
chenglu66 opened this issue 7 months ago
fix cuda118 can't find libcudart.so error
zhaotyer opened this pull request 7 months ago
zhaotyer opened this pull request 7 months ago
[Bug]: Unable to run phi-3-small in latest release
ssmi153 opened this issue 7 months ago
ssmi153 opened this issue 7 months ago
[Bug]: Error on inference with LoRa request (safetensors format)
tsvisab opened this issue 7 months ago
tsvisab opened this issue 7 months ago
[Bug]: `tests/basic_correctness/test_chunked_prefill.py` is failing on main in fp32
tdoublep opened this issue 7 months ago
tdoublep opened this issue 7 months ago
[Bug]: Gemma 2 GPTQ - Complete output via API but incomplete through batch inference
ArsalShakil opened this issue 7 months ago
ArsalShakil opened this issue 7 months ago
wip
thri5ha opened this pull request 7 months ago
thri5ha opened this pull request 7 months ago
[Bug]: Gloo 库无法在两台计算机之间进行通信
JKYtydt opened this issue 7 months ago
JKYtydt opened this issue 7 months ago
[Bug]: VLLM's output is unstable version==0.5.1
ffxmm opened this issue 7 months ago
ffxmm opened this issue 7 months ago
[Model] RowParallelLinear: pass bias to quant_method.apply
tdoublep opened this pull request 7 months ago
tdoublep opened this pull request 7 months ago
[Bugfix] GPTBigCodeForCausalLM: Remove lm_head from supported_lora_modules.
tdoublep opened this pull request 7 months ago
tdoublep opened this pull request 7 months ago
[Usage]: Maximum Context Length Exceeded Due to Base64-Encoded Image in Prompt
tusharraskar opened this issue 7 months ago
tusharraskar opened this issue 7 months ago
[Feature]: Hybrid Attention
leo6022 opened this issue 7 months ago
leo6022 opened this issue 7 months ago
[Bug]: VLLM 0.5.1 with LLaVA 1.6 exceptions
andrePankraz opened this issue 7 months ago
andrePankraz opened this issue 7 months ago
[Model]: Support for InternVL2
Weiyun1025 opened this issue 7 months ago
Weiyun1025 opened this issue 7 months ago
[Misc] refactor(config): clean up unused code
aniaan opened this pull request 7 months ago
aniaan opened this pull request 7 months ago
[Bug]: In k8s pod, it takes approximately 1 hour to start the model using vllm
WangxuP opened this issue 7 months ago
WangxuP opened this issue 7 months ago
[Core] offload model weights to CPU conditionally
chenqianfzh opened this pull request 7 months ago
chenqianfzh opened this pull request 7 months ago
[Core] Support Lora lineage and base model metadata management
Jeffwan opened this pull request 7 months ago
Jeffwan opened this pull request 7 months ago
[Bug]: Server fails to boot due to a tensor size mismatch when LoRA is enabled for GPTBigCode
tjohnson31415 opened this issue 7 months ago
tjohnson31415 opened this issue 7 months ago
[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor
WoosukKwon opened this pull request 7 months ago
WoosukKwon opened this pull request 7 months ago
[BUG FIX]fix compile error when building with torch2.1
maidabu opened this pull request 7 months ago
maidabu opened this pull request 7 months ago
[Bug]: Gloo Connection reset by peer
thies1006 opened this issue 7 months ago
thies1006 opened this issue 7 months ago
[Feature]: Is there any plan to support Cross-Layer Attention (CLA) ?
JiayiFeng opened this issue 7 months ago
JiayiFeng opened this issue 7 months ago
[Misc]: Random Output Generation with mistralai/Mixtral-8x22B-v0.1
rajagond opened this issue 7 months ago
rajagond opened this issue 7 months ago
[Usage]: In phi3 vision maximum context length issue
tusharraskar opened this issue 7 months ago
tusharraskar opened this issue 7 months ago
[Feature]: Multi-Proposers support for speculative decoding.
ShangmingCai opened this issue 7 months ago
ShangmingCai opened this issue 7 months ago
[Bug]: Vllm 0.5.1+cu118 timeout when init CustomAllreduce
zhaotyer opened this issue 7 months ago
zhaotyer opened this issue 7 months ago
Speculative decoding leads to zombie requests
naturomics opened this issue 7 months ago
naturomics opened this issue 7 months ago
[Model] Add support for 'gte-Qwen2' embedding models
Nickydusk opened this pull request 7 months ago
Nickydusk opened this pull request 7 months ago
[ci] try to add multi-node tests
youkaichao opened this pull request 7 months ago
youkaichao opened this pull request 7 months ago
[CI/Build][TPU] Add TPU CI test
WoosukKwon opened this pull request 7 months ago
WoosukKwon opened this pull request 7 months ago
[Bug]: deepseek-coder-v2-lite-instruct; Exception in worker VllmWorkerProcess while processing method initialize_cache: [Errno 2] No such file or directory: '/root/.triton/cache/de758c429c9ff1f18930bbd9c3004506/fused_moe_kernel.json.tmp.pid_1528_587007', Traceback (most recent call last):
fengyang95 opened this issue 7 months ago
fengyang95 opened this issue 7 months ago
[RFC]: Enhancing LoRA Management for Production Environments in vLLM
Jeffwan opened this issue 7 months ago
Jeffwan opened this issue 7 months ago
[core] Sampling controller interface
mmoskal opened this pull request 7 months ago
mmoskal opened this pull request 7 months ago
[Doc]: Latency vs Throughput Configurations
antferdom opened this issue 7 months ago
antferdom opened this issue 7 months ago
f[Bug]: TypeError: Can't instantiate abstract class NeuronWorker with abstract method execute_worker
areanddee opened this issue 7 months ago
areanddee opened this issue 7 months ago
[BugFix] get_and_reset only when scheduler outputs are not empty
mzusman opened this pull request 7 months ago
mzusman opened this pull request 7 months ago
[Bug]: Qwen2 Moe FP8 not supported on L40
TopIdiot opened this issue 7 months ago
TopIdiot opened this issue 7 months ago
[Core][Model] Add simple_model_runner and a new model XLMRobertaForSequenceClassification through multimodal interface
AllenDou opened this pull request 7 months ago
AllenDou opened this pull request 7 months ago
No executable after building vllm from source with CPU support
parkesorgua opened this issue 7 months ago
parkesorgua opened this issue 7 months ago
[Bug]: tensor parallel (of 4 cards) gives bad answers in version 0.5.1 and later (compared to 0.4.1) with gptq marlin kernels (compared to gptq)
orellavie1212 opened this issue 7 months ago
orellavie1212 opened this issue 7 months ago
[BugFix]: fix engine timeout due to request abort
pushan01 opened this pull request 7 months ago
pushan01 opened this pull request 7 months ago