Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[bugfix][distributed] fix multi-node bug for shared memory
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[Core] Modulize prepare input and attention metadata builder
github.com/vllm-project/vllm - comaniac opened this pull request 3 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 3 months ago
[Feature]: Implementation of Sliding Window Attention for Full Context Support with Gemma-2
github.com/vllm-project/vllm - Motorratte opened this issue 3 months ago
github.com/vllm-project/vllm - Motorratte opened this issue 3 months ago
[Frontend] Kill the server on engine death
github.com/vllm-project/vllm - joerunde opened this pull request 3 months ago
github.com/vllm-project/vllm - joerunde opened this pull request 3 months ago
[ Kernel ] FP8 Dynamic Per Token Quant - Add scale_ub
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 3 months ago
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 3 months ago
Add FP8 quantization `ignored_layers` support in llama
github.com/vllm-project/vllm - cli99 opened this pull request 3 months ago
github.com/vllm-project/vllm - cli99 opened this pull request 3 months ago
[Bug]: Intel GPU Test failing in CI
github.com/vllm-project/vllm - tdoublep opened this issue 3 months ago
github.com/vllm-project/vllm - tdoublep opened this issue 3 months ago
Fp8 dyn per tok
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[Bugfix][Core] Output sampling: heuristic to choose between candidates
github.com/vllm-project/vllm - NihalPotdar opened this pull request 3 months ago
github.com/vllm-project/vllm - NihalPotdar opened this pull request 3 months ago
[Bugfix][Core]: Guard for KeyErrors that can occur if a request is aborted with Pipeline Parallel
github.com/vllm-project/vllm - tjohnson31415 opened this pull request 3 months ago
github.com/vllm-project/vllm - tjohnson31415 opened this pull request 3 months ago
[Bug]: Crash possible with Pipeline Parallel when aborting requests
github.com/vllm-project/vllm - tjohnson31415 opened this issue 3 months ago
github.com/vllm-project/vllm - tjohnson31415 opened this issue 3 months ago
[Feature]: LLM2Vec (Fine-Tuned Embeddings) Support
github.com/vllm-project/vllm - DorotheaMueller opened this issue 3 months ago
github.com/vllm-project/vllm - DorotheaMueller opened this issue 3 months ago
[Bugfix] cuda: handle case visible devices is a MIG or GPU uuid
github.com/vllm-project/vllm - cfhammill opened this pull request 3 months ago
github.com/vllm-project/vllm - cfhammill opened this pull request 3 months ago
[Bugfix][Frontend] remove duplicate init logger
github.com/vllm-project/vllm - dtrifiro opened this pull request 3 months ago
github.com/vllm-project/vllm - dtrifiro opened this pull request 3 months ago
[Docs] Update docs for wheel location
github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 3 months ago
[Misc] Fix input_scale typing in w8a8_utils.py
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
[Bugfix] [SpecDecode] AsyncMetricsCollector: update time since last collection
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
github.com/vllm-project/vllm - tdoublep opened this pull request 3 months ago
[Bug]: SpecDecode AsyncMetricsCollector _last_metrics_collect_time is never reset
github.com/vllm-project/vllm - tdoublep opened this issue 3 months ago
github.com/vllm-project/vllm - tdoublep opened this issue 3 months ago
[Usage]: ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error.
github.com/vllm-project/vllm - jueming0312 opened this issue 3 months ago
github.com/vllm-project/vllm - jueming0312 opened this issue 3 months ago
[Feature]: Support Lora Adapter generated from mistral-finetune
github.com/vllm-project/vllm - tensimixt opened this issue 3 months ago
github.com/vllm-project/vllm - tensimixt opened this issue 3 months ago
[Model]: Llava-Next-Video support
github.com/vllm-project/vllm - TKONIY opened this issue 3 months ago
github.com/vllm-project/vllm - TKONIY opened this issue 3 months ago
add tqdm when loading checkpoint shards
github.com/vllm-project/vllm - zhaotyer opened this pull request 3 months ago
github.com/vllm-project/vllm - zhaotyer opened this pull request 3 months ago
[Misc] Enhance prefix-caching benchmark tool
github.com/vllm-project/vllm - Jeffwan opened this pull request 3 months ago
github.com/vllm-project/vllm - Jeffwan opened this pull request 3 months ago
[Bug]: gptq model fails on pascal gpu with long prompt
github.com/vllm-project/vllm - shesung opened this issue 3 months ago
github.com/vllm-project/vllm - shesung opened this issue 3 months ago
[Core] Support load and unload LoRA in api server
github.com/vllm-project/vllm - Jeffwan opened this pull request 3 months ago
github.com/vllm-project/vllm - Jeffwan opened this pull request 3 months ago
[Feature]: Any thoughts about MI50 support ?
github.com/vllm-project/vllm - linchen111 opened this issue 3 months ago
github.com/vllm-project/vllm - linchen111 opened this issue 3 months ago
[Bug]: Distributed Inference and Serving
github.com/vllm-project/vllm - warlockedward opened this issue 3 months ago
github.com/vllm-project/vllm - warlockedward opened this issue 3 months ago
[Bug]: Failed to import from vllm._C with ImportError("/lib64/libc.so.6: version `GLIBC_2.32' not found
github.com/vllm-project/vllm - balcklive opened this issue 3 months ago
github.com/vllm-project/vllm - balcklive opened this issue 3 months ago
[Usage]: Can't utilize all VRAM for context
github.com/vllm-project/vllm - vlsav opened this issue 3 months ago
github.com/vllm-project/vllm - vlsav opened this issue 3 months ago
[Performance]: GPU utilization is low when running large batches on H100
github.com/vllm-project/vllm - sleepwalker2017 opened this issue 3 months ago
github.com/vllm-project/vllm - sleepwalker2017 opened this issue 3 months ago
[ Misc ] `fbgemm` checkpoints
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[Bug]: Cannot load fp8 model of internlm2-chat-7b offline
github.com/vllm-project/vllm - EstellaXinyuZhang opened this issue 3 months ago
github.com/vllm-project/vllm - EstellaXinyuZhang opened this issue 3 months ago
[Core] Allow specifying custom Executor
github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago
[Bug]: vllm doesn't support multi-instance GPU
github.com/vllm-project/vllm - cfhammill opened this issue 3 months ago
github.com/vllm-project/vllm - cfhammill opened this issue 3 months ago
[ci][test] add correctness test for cpu offloading
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[Model] Support Mistral-Nemo
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
[ Kernel ] Enable Dynamic Per Token `fp8`
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[CI/Build] bump ruff version, fix linting issues
github.com/vllm-project/vllm - dtrifiro opened this pull request 3 months ago
github.com/vllm-project/vllm - dtrifiro opened this pull request 3 months ago
[Feature]: mistralai/Mistral-Nemo-Instruct-2407 support
github.com/vllm-project/vllm - bjoernpl opened this issue 3 months ago
github.com/vllm-project/vllm - bjoernpl opened this issue 3 months ago
[Usage]: How to release GPU of vLLM model in python code
github.com/vllm-project/vllm - quanshr opened this issue 3 months ago
github.com/vllm-project/vllm - quanshr opened this issue 3 months ago
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes
github.com/vllm-project/vllm - mawong-amd opened this pull request 3 months ago
github.com/vllm-project/vllm - mawong-amd opened this pull request 3 months ago
[CI/Build] replace yapf with ruff
github.com/vllm-project/vllm - dtrifiro opened this pull request 3 months ago
github.com/vllm-project/vllm - dtrifiro opened this pull request 3 months ago
[Misc] Consolidate and optimize logic for building padded tensors
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
[Feature]: return Usage info for streaming request for each chunk in ChatCompletion
github.com/vllm-project/vllm - yecohn opened this issue 3 months ago
github.com/vllm-project/vllm - yecohn opened this issue 3 months ago
[Bug]: vllm turned off my pc (loading mixtral8x7b)
github.com/vllm-project/vllm - juanluis17 opened this issue 3 months ago
github.com/vllm-project/vllm - juanluis17 opened this issue 3 months ago
[Bug]: vllm not support fp8 kv cache when use flashinfer
github.com/vllm-project/vllm - kuangdao opened this issue 3 months ago
github.com/vllm-project/vllm - kuangdao opened this issue 3 months ago
[Bugfix] Corrected Typographical Errors from "indicies" to "indices"
github.com/vllm-project/vllm - JHLEE17 opened this pull request 3 months ago
github.com/vllm-project/vllm - JHLEE17 opened this pull request 3 months ago
[Core] Reduce unnecessary compute when logprobs=None
github.com/vllm-project/vllm - peng1999 opened this pull request 3 months ago
github.com/vllm-project/vllm - peng1999 opened this pull request 3 months ago
[Bug]: inter-token latency is lower than TPOT in serving benchmark result
github.com/vllm-project/vllm - Jeffwan opened this issue 3 months ago
github.com/vllm-project/vllm - Jeffwan opened this issue 3 months ago
[doc][distributed] add more doc for setting up multi-node environment
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[Misc] Support FP8 kv cache scales from compressed-tensors
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 3 months ago
added bitsandbytes dependency in common requirement.txt file
github.com/vllm-project/vllm - dipatidar opened this pull request 3 months ago
github.com/vllm-project/vllm - dipatidar opened this pull request 3 months ago
[Misc] Small perf improvements
github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 3 months ago
[Model] Pipeline Parallel Support for DeepSeek v2
github.com/vllm-project/vllm - tjohnson31415 opened this pull request 3 months ago
github.com/vllm-project/vllm - tjohnson31415 opened this pull request 3 months ago
[Model] Initialize support for InternVL2 series models
github.com/vllm-project/vllm - Isotr0py opened this pull request 3 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 3 months ago
FP8 Dynamic-Per-Token Quant
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 3 months ago
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 3 months ago
[DOC] - Add docker image to Cerebrium Integration
github.com/vllm-project/vllm - milo157 opened this pull request 3 months ago
github.com/vllm-project/vllm - milo157 opened this pull request 3 months ago
[Usage]: No chat template provided. Chat API will not work. How do I get vllm to support Codellama-34B in openai format?
github.com/vllm-project/vllm - x0w3n opened this issue 3 months ago
github.com/vllm-project/vllm - x0w3n opened this issue 3 months ago
[Feature]: Add OpenAI server `prompt_logprobs` support
github.com/vllm-project/vllm - Theodotus1243 opened this issue 3 months ago
github.com/vllm-project/vllm - Theodotus1243 opened this issue 3 months ago
[Bug]: The _get_stats() are called multiple times which cause incorrect metrics collecting in do_log_stats()
github.com/vllm-project/vllm - yejingfu opened this issue 3 months ago
github.com/vllm-project/vllm - yejingfu opened this issue 3 months ago
[TPU] Refactor TPU worker & model runner
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
[Misc] Use `torch.Tensor` for type annotation
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
[TPU] Remove multi-modal args in TPU backend
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
[New Model]: Support for Telechat
github.com/vllm-project/vllm - hzhaoy opened this issue 3 months ago
github.com/vllm-project/vllm - hzhaoy opened this issue 3 months ago
[Model] Add Support for GPTQ Fused MOE
github.com/vllm-project/vllm - izhuhaoran opened this pull request 3 months ago
github.com/vllm-project/vllm - izhuhaoran opened this pull request 3 months ago
[Bugfix] Update flashinfer.py with PagedAttention forwards - Fixes Gemma2 OpenAI Server Crash
github.com/vllm-project/vllm - noamgat opened this pull request 3 months ago
github.com/vllm-project/vllm - noamgat opened this pull request 3 months ago
deploying embedding model in same way as LLM
github.com/vllm-project/vllm - riyajatar37003 opened this issue 3 months ago
github.com/vllm-project/vllm - riyajatar37003 opened this issue 3 months ago
ERROR: Ignored the following versions that require a different python version: 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10 ERROR: Could not find a version that satisfies the requirement numba (from outlines) (from versions: none) ERROR: No matching distribution found for numba[Installation]:
github.com/vllm-project/vllm - XyLove0223 opened this issue 3 months ago
github.com/vllm-project/vllm - XyLove0223 opened this issue 3 months ago
[core][model] yet another cpu offload implementation
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[Bugfix] Fix for multinode crash on 4 PP
github.com/vllm-project/vllm - andoorve opened this pull request 3 months ago
github.com/vllm-project/vllm - andoorve opened this pull request 3 months ago
[Bug]: The metrics have not improved.
github.com/vllm-project/vllm - zjjznw123 opened this issue 3 months ago
github.com/vllm-project/vllm - zjjznw123 opened this issue 3 months ago
[Not for review]test gemma lora
github.com/vllm-project/vllm - jeejeelee opened this pull request 3 months ago
github.com/vllm-project/vllm - jeejeelee opened this pull request 3 months ago
[misc][distributed] add seed to dummy weights
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[CI/Build] Update flashinfer to v0.0.9 (#6489)
github.com/vllm-project/vllm - 170928 opened this pull request 3 months ago
github.com/vllm-project/vllm - 170928 opened this pull request 3 months ago
[Misc] Updated flashinfer to v0.0.9 in the following test scripts:
github.com/vllm-project/vllm - 170928 opened this issue 3 months ago
github.com/vllm-project/vllm - 170928 opened this issue 3 months ago
[misc][distributed] improve tests
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[ Kernel ] Fp8 Channelwise Weight Support
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[Bug]: No module named `jsonschema.protocols`.
github.com/vllm-project/vllm - eff-kay opened this issue 3 months ago
github.com/vllm-project/vllm - eff-kay opened this issue 3 months ago
[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models.
github.com/vllm-project/vllm - sroy745 opened this pull request 3 months ago
github.com/vllm-project/vllm - sroy745 opened this pull request 3 months ago
[Model] Support Mamba
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 3 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 3 months ago
[Not for review] Spmd tp rebase
github.com/vllm-project/vllm - ruisearch42 opened this pull request 3 months ago
github.com/vllm-project/vllm - ruisearch42 opened this pull request 3 months ago
[ROCm] Cleanup Dockerfile and remove outdated patch
github.com/vllm-project/vllm - hongxiayang opened this pull request 3 months ago
github.com/vllm-project/vllm - hongxiayang opened this pull request 3 months ago
[New Model]: Codestral Mamba
github.com/vllm-project/vllm - K-Mistele opened this issue 3 months ago
github.com/vllm-project/vllm - K-Mistele opened this issue 3 months ago
[Bug]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding' / gemma-2-9b with vllm=0.5.2
github.com/vllm-project/vllm - choco9966 opened this issue 3 months ago
github.com/vllm-project/vllm - choco9966 opened this issue 3 months ago
[Bug]: Gemma 27B crashes on GCP A100
github.com/vllm-project/vllm - noamgat opened this issue 3 months ago
github.com/vllm-project/vllm - noamgat opened this issue 3 months ago
[Bug]: [vllm-openvino]: ValueError: `use_cache` was set to `True` but the loaded model only supports `use_cache=False`.
github.com/vllm-project/vllm - HPUedCSLearner opened this issue 3 months ago
github.com/vllm-project/vllm - HPUedCSLearner opened this issue 3 months ago
[Feature]: Pipeline parallelism support for qwen model
github.com/vllm-project/vllm - hiyforever opened this issue 3 months ago
github.com/vllm-project/vllm - hiyforever opened this issue 3 months ago
[Usage]: PeftModelForCausalLM is not JSON serializable
github.com/vllm-project/vllm - jazzisfuture opened this issue 3 months ago
github.com/vllm-project/vllm - jazzisfuture opened this issue 3 months ago
[Performance]: [Speculative Decoding] Measurement of Cost Coefficient through vLLM
github.com/vllm-project/vllm - bong-furiosa opened this issue 3 months ago
github.com/vllm-project/vllm - bong-furiosa opened this issue 3 months ago
[Misc][Speculative decoding] Typos and typing fixes
github.com/vllm-project/vllm - ShangmingCai opened this pull request 3 months ago
github.com/vllm-project/vllm - ShangmingCai opened this pull request 3 months ago
[Bug]: failed when run Qwen2-54B-A14B-GPTQ-Int4(MOE)
github.com/vllm-project/vllm - weiminw opened this issue 3 months ago
github.com/vllm-project/vllm - weiminw opened this issue 3 months ago
unable to run vllm model deployment
github.com/vllm-project/vllm - riyajatar37003 opened this issue 3 months ago
github.com/vllm-project/vllm - riyajatar37003 opened this issue 3 months ago
[Bugfix][Frontend] Fix missing `/metrics` endpoint
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 3 months ago