Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Bug]: Can't load gemma-2-9b-it with vllm 0.5.2
github.com/vllm-project/vllm - vlsav opened this issue 3 months ago
github.com/vllm-project/vllm - vlsav opened this issue 3 months ago
[Bug]: No metrics exposed at /metrics with 0.5.2 (0.5.1 is fine), possible regression?
github.com/vllm-project/vllm - frittentheke opened this issue 3 months ago
github.com/vllm-project/vllm - frittentheke opened this issue 3 months ago
[CI/Build] Remove "boardwalk" image asset
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
[Bugfix] enable prefix caching for AsyncLLMEngine when requesting prompt_logprobs
github.com/vllm-project/vllm - KrishnaM251 opened this pull request 3 months ago
github.com/vllm-project/vllm - KrishnaM251 opened this pull request 3 months ago
[Distributed][Model] Rank-based Component Creation for Pipeline Parallelism Memory Optimization
github.com/vllm-project/vllm - wushidonguc opened this pull request 3 months ago
github.com/vllm-project/vllm - wushidonguc opened this pull request 3 months ago
[Misc] Log spec decode metrics
github.com/vllm-project/vllm - comaniac opened this pull request 3 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 3 months ago
[Bug]: vLLM is unable to load Mistral on Inferentia and AWS neuron
github.com/vllm-project/vllm - servient-ashwin opened this issue 3 months ago
github.com/vllm-project/vllm - servient-ashwin opened this issue 3 months ago
[Bug]: Seed issue with Pipeline Parallel
github.com/vllm-project/vllm - andoorve opened this issue 3 months ago
github.com/vllm-project/vllm - andoorve opened this issue 3 months ago
[Not for review] PP ADAG
github.com/vllm-project/vllm - ruisearch42 opened this pull request 3 months ago
github.com/vllm-project/vllm - ruisearch42 opened this pull request 3 months ago
[Bug]: TypeError: 'NoneType' object is not callable when start Gemma2-27b-it
github.com/vllm-project/vllm - candowu opened this issue 3 months ago
github.com/vllm-project/vllm - candowu opened this issue 3 months ago
[Core] Use numpy to speed up padded token processing
github.com/vllm-project/vllm - peng1999 opened this pull request 3 months ago
github.com/vllm-project/vllm - peng1999 opened this pull request 3 months ago
[Draft] proposal for ipex quant support
github.com/vllm-project/vllm - jikunshang opened this pull request 3 months ago
github.com/vllm-project/vllm - jikunshang opened this pull request 3 months ago
[doc][misc] doc update
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[Bug]: Severe computation errors when batching request for microsoft/Phi-3-mini-128k-instruct
github.com/vllm-project/vllm - lance0108 opened this issue 3 months ago
github.com/vllm-project/vllm - lance0108 opened this issue 3 months ago
[Doc] add env docs for flashinfer backend
github.com/vllm-project/vllm - DefTruth opened this pull request 3 months ago
github.com/vllm-project/vllm - DefTruth opened this pull request 3 months ago
[VLM] Minor space optimization for `ClipVisionModel`
github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 3 months ago
v0.5.2, v0.5.3, v0.6.0 Release Tracker
github.com/vllm-project/vllm - simon-mo opened this issue 3 months ago
github.com/vllm-project/vllm - simon-mo opened this issue 3 months ago
bump version to v0.5.2
github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago
[Bug]: autogen can't work with vllm v0.5.1
github.com/vllm-project/vllm - tonyaw opened this issue 3 months ago
github.com/vllm-project/vllm - tonyaw opened this issue 3 months ago
[Doc][CI/Build] Update docs and tests to use `vllm serve`
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
[Bugfix] Convert image to RGB by default
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
[Bug]: illegal memory access when increase max_model_length on FP8 models
github.com/vllm-project/vllm - IEI-mjx opened this issue 3 months ago
github.com/vllm-project/vllm - IEI-mjx opened this issue 3 months ago
[Bugfix] Benchmark serving script used global parameter 'args' in function 'sample_random_requests'
github.com/vllm-project/vllm - lxline opened this pull request 3 months ago
github.com/vllm-project/vllm - lxline opened this pull request 3 months ago
[Bug]: Paligemma support for PNG files
github.com/vllm-project/vllm - BabyChouSr opened this issue 3 months ago
github.com/vllm-project/vllm - BabyChouSr opened this issue 3 months ago
[ CI ] 0.4.3.post1 Hotfix
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug
github.com/vllm-project/vllm - mzusman opened this pull request 3 months ago
github.com/vllm-project/vllm - mzusman opened this pull request 3 months ago
[Feature]: Return softmax of attention layer.
github.com/vllm-project/vllm - DouHappy opened this issue 3 months ago
github.com/vllm-project/vllm - DouHappy opened this issue 3 months ago
[ Misc ] Enable Quantizing All Layers of DeekSeekv2
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[ Kernel ] AWQ Fused MoE
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[ci][build] fix commit id
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests
github.com/vllm-project/vllm - g-eoj opened this pull request 3 months ago
github.com/vllm-project/vllm - g-eoj opened this pull request 3 months ago
[doc][distributed] add suggestion for distributed inference
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[ Misc ] Apply MoE Refactor to Qwen2 + Deepseekv2 To Support Fp8
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[Feature]: Apply chat template through `LLM` class
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 3 months ago
[ Kernel ] AWQ Fused MoE
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[Bug]: Timeout Error When Deploying Llamafied InternLM2-5-7B-Chat-1M Model via vLLM OpenAI API Server
github.com/vllm-project/vllm - mf-skjung opened this issue 3 months ago
github.com/vllm-project/vllm - mf-skjung opened this issue 3 months ago
[Bugfix][CI/Build] Fix testing for generated commit hash
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
[Doc] Add documentations for nightly benchmarks
github.com/vllm-project/vllm - KuntaiDu opened this pull request 3 months ago
github.com/vllm-project/vllm - KuntaiDu opened this pull request 3 months ago
Updating LM Format Enforcer version to v10.3
github.com/vllm-project/vllm - noamgat opened this pull request 3 months ago
github.com/vllm-project/vllm - noamgat opened this pull request 3 months ago
[ci][distributed] add pipeline parallel correctness test
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[Bugfix] use float32 precision in samplers/test_logprobs.py for comparing with HF
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
when i set tensor_parallel_size>1(A100 * 4), it does not work
github.com/vllm-project/vllm - cx-hub opened this issue 3 months ago
github.com/vllm-project/vllm - cx-hub opened this issue 3 months ago
[core][distributed] simplify code to support pipeline parallel
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
Remove unnecessary trailing period in spec_decode.rst
github.com/vllm-project/vllm - terrytangyuan opened this pull request 3 months ago
github.com/vllm-project/vllm - terrytangyuan opened this pull request 3 months ago
Report usage for beam search
github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago
[Model] Pipeline parallel support for Mixtral
github.com/vllm-project/vllm - binxuan opened this pull request 3 months ago
github.com/vllm-project/vllm - binxuan opened this pull request 3 months ago
[Misc] Add deprecation warning for beam search
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
[Misc] Disambiguate quantized types via a new ScalarType
github.com/vllm-project/vllm - LucasWilkinson opened this pull request 3 months ago
github.com/vllm-project/vllm - LucasWilkinson opened this pull request 3 months ago
[Bug]: Gemma-2 + FlashInfer: ValueError: Unsupported max_frags_z:
github.com/vllm-project/vllm - HanGuo97 opened this issue 3 months ago
github.com/vllm-project/vllm - HanGuo97 opened this issue 3 months ago
[CI/Build] Cross python wheel
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[Doc] xpu backend requires running setvars.sh
github.com/vllm-project/vllm - rscohn2 opened this pull request 3 months ago
github.com/vllm-project/vllm - rscohn2 opened this pull request 3 months ago
[Bug]: Problem loading Gemma 2 27b-it
github.com/vllm-project/vllm - rdaiello opened this issue 3 months ago
github.com/vllm-project/vllm - rdaiello opened this issue 3 months ago
[Bug]: Runtime AssertionError: 32768 is not divisible by 3, multiproc_worker_utils.py:120, when using 3 GPUs for tensor-parallel
github.com/vllm-project/vllm - haltingstate opened this issue 3 months ago
github.com/vllm-project/vllm - haltingstate opened this issue 3 months ago
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 3 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 3 months ago
[RFC]: A Graph Optimization System in vLLM using torch.compile
github.com/vllm-project/vllm - bnellnm opened this issue 3 months ago
github.com/vllm-project/vllm - bnellnm opened this issue 3 months ago
torch.compile based model optimizer
github.com/vllm-project/vllm - bnellnm opened this pull request 3 months ago
github.com/vllm-project/vllm - bnellnm opened this pull request 3 months ago
[Bug]: vLLM 0.5.1 tensor parallel 2 hang
github.com/vllm-project/vllm - Flynn-Zh opened this issue 3 months ago
github.com/vllm-project/vllm - Flynn-Zh opened this issue 3 months ago
[BUGFIX] Raise an error for no draft token case when draft_tp>1
github.com/vllm-project/vllm - wooyeonlee0 opened this pull request 3 months ago
github.com/vllm-project/vllm - wooyeonlee0 opened this pull request 3 months ago
[Feature]: Request for Ascend NPU support
github.com/vllm-project/vllm - xuedinge233 opened this issue 3 months ago
github.com/vllm-project/vllm - xuedinge233 opened this issue 3 months ago
[ Misc ] More Cleanup of Marlin
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[ Misc ] Support Act Order in Compressed Tensors
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[BigFix] Fix the lm_head in gpt_bigcode in lora mode
github.com/vllm-project/vllm - maxdebayser opened this pull request 3 months ago
github.com/vllm-project/vllm - maxdebayser opened this pull request 3 months ago
[ Misc ] Support Models With Bias in `compressed-tensors` integration
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[Installation]: Running ohereForAI/c4ai-command-r-v01 with main pytorch
github.com/vllm-project/vllm - laithsakka opened this issue 3 months ago
github.com/vllm-project/vllm - laithsakka opened this issue 3 months ago
[Bugfix] Fix Ray Metrics API usage
github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago
[ Misc ] Remove separate bias add
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in vllm to get device count
github.com/vllm-project/vllm - hongxiayang opened this pull request 3 months ago
github.com/vllm-project/vllm - hongxiayang opened this pull request 3 months ago
[Misc] Remove flashinfer warning, add flashinfer tests to CI
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 3 months ago
github.com/vllm-project/vllm - LiuXiaoxuanPKU opened this pull request 3 months ago
[CI/Build] (2/2) Switching AMD CI to store images in Docker Hub
github.com/vllm-project/vllm - adityagoel14 opened this pull request 3 months ago
github.com/vllm-project/vllm - adityagoel14 opened this pull request 3 months ago
[Bugfix] Fix usage stats logging exception warning with OpenVINO
github.com/vllm-project/vllm - helena-intel opened this pull request 3 months ago
github.com/vllm-project/vllm - helena-intel opened this pull request 3 months ago
[Feature]: FlashAttention 3 support
github.com/vllm-project/vllm - orellavie1212 opened this issue 3 months ago
github.com/vllm-project/vllm - orellavie1212 opened this issue 3 months ago
[doc] update pipeline parallel in readme
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[distributed][misc] keep consistent with how pytorch finds libcudart.so
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[BugFix] BatchResponseData body should be optional
github.com/vllm-project/vllm - zifeitong opened this pull request 3 months ago
github.com/vllm-project/vllm - zifeitong opened this pull request 3 months ago
[Kernel] Fix identical branches
github.com/vllm-project/vllm - stevegrubb opened this pull request 3 months ago
github.com/vllm-project/vllm - stevegrubb opened this pull request 3 months ago
[Model][Phi3-Small] Remove scipy from blocksparse_attention
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
[Bug]: OpenAI batch file format pydantic validation error
github.com/vllm-project/vllm - ArsalShakil opened this issue 3 months ago
github.com/vllm-project/vllm - ArsalShakil opened this issue 3 months ago
[Misc] add fixture to guided processor tests
github.com/vllm-project/vllm - kevinbu233 opened this pull request 3 months ago
github.com/vllm-project/vllm - kevinbu233 opened this pull request 3 months ago
[Bug]: get that Exception in thread Thread-3 (_report_usage_worker): (vllm OpenVINO,When python3 vllm/benchmarks/benchmark_throughput.py,)
github.com/vllm-project/vllm - HPUedCSLearner opened this issue 3 months ago
github.com/vllm-project/vllm - HPUedCSLearner opened this issue 3 months ago
[bug fix] Fix llava next feature size calculation.
github.com/vllm-project/vllm - xwjiang2010 opened this pull request 3 months ago
github.com/vllm-project/vllm - xwjiang2010 opened this pull request 3 months ago
[Core] draft_model_runner: Implement prepare_inputs on GPU for advance_step
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 3 months ago
[Bug]: Metrics time_to_first_token_seconds, time_per_output_token_seconds not working correctly
github.com/vllm-project/vllm - thies1006 opened this issue 3 months ago
github.com/vllm-project/vllm - thies1006 opened this issue 3 months ago
[Performance]: how to use NVIDIA Nsight Compute in lunix
github.com/vllm-project/vllm - chenglu66 opened this issue 3 months ago
github.com/vllm-project/vllm - chenglu66 opened this issue 3 months ago
fix cuda118 can't find libcudart.so error
github.com/vllm-project/vllm - zhaotyer opened this pull request 3 months ago
github.com/vllm-project/vllm - zhaotyer opened this pull request 3 months ago
[Bug]: Unable to run phi-3-small in latest release
github.com/vllm-project/vllm - ssmi153 opened this issue 3 months ago
github.com/vllm-project/vllm - ssmi153 opened this issue 3 months ago
[Bug]: Error on inference with LoRa request (safetensors format)
github.com/vllm-project/vllm - tsvisab opened this issue 3 months ago
github.com/vllm-project/vllm - tsvisab opened this issue 3 months ago
[Bug]: `tests/basic_correctness/test_chunked_prefill.py` is failing on main in fp32
github.com/vllm-project/vllm - tdoublep opened this issue 3 months ago
github.com/vllm-project/vllm - tdoublep opened this issue 3 months ago
[Bug]: Gemma 2 GPTQ - Complete output via API but incomplete through batch inference
github.com/vllm-project/vllm - ArsalShakil opened this issue 3 months ago
github.com/vllm-project/vllm - ArsalShakil opened this issue 3 months ago
[Bug]: VLLM's output is unstable version==0.5.1
github.com/vllm-project/vllm - ffxmm opened this issue 3 months ago
github.com/vllm-project/vllm - ffxmm opened this issue 3 months ago
[Model] RowParallelLinear: pass bias to quant_method.apply
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
[Bugfix] GPTBigCodeForCausalLM: Remove lm_head from supported_lora_modules.
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
[Usage]: Maximum Context Length Exceeded Due to Base64-Encoded Image in Prompt
github.com/vllm-project/vllm - tusharraskar opened this issue 3 months ago
github.com/vllm-project/vllm - tusharraskar opened this issue 3 months ago
[Bug]: VLLM 0.5.1 with LLaVA 1.6 exceptions
github.com/vllm-project/vllm - andrePankraz opened this issue 3 months ago
github.com/vllm-project/vllm - andrePankraz opened this issue 3 months ago
[Model]: Support for InternVL2
github.com/vllm-project/vllm - Weiyun1025 opened this issue 3 months ago
github.com/vllm-project/vllm - Weiyun1025 opened this issue 3 months ago