Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Misc] refactor(config): clean up unused code
github.com/vllm-project/vllm - aniaan opened this pull request 3 months ago
github.com/vllm-project/vllm - aniaan opened this pull request 3 months ago
[Bug]: In k8s pod, it takes approximately 1 hour to start the model using vllm
github.com/vllm-project/vllm - WangxuP opened this issue 3 months ago
github.com/vllm-project/vllm - WangxuP opened this issue 3 months ago
[Core] offload model weights to CPU conditionally
github.com/vllm-project/vllm - chenqianfzh opened this pull request 3 months ago
github.com/vllm-project/vllm - chenqianfzh opened this pull request 3 months ago
[Core] Support Lora lineage and base model metadata management
github.com/vllm-project/vllm - Jeffwan opened this pull request 3 months ago
github.com/vllm-project/vllm - Jeffwan opened this pull request 3 months ago
[Bug]: Server fails to boot due to a tensor size mismatch when LoRA is enabled for GPTBigCode
github.com/vllm-project/vllm - tjohnson31415 opened this issue 3 months ago
github.com/vllm-project/vllm - tjohnson31415 opened this issue 3 months ago
[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
[BUG FIX]fix compile error when building with torch2.1
github.com/vllm-project/vllm - maidabu opened this pull request 3 months ago
github.com/vllm-project/vllm - maidabu opened this pull request 3 months ago
[Bug]: Gloo Connection reset by peer
github.com/vllm-project/vllm - thies1006 opened this issue 3 months ago
github.com/vllm-project/vllm - thies1006 opened this issue 3 months ago
[Feature]: Is there any plan to support Cross-Layer Attention (CLA) ?
github.com/vllm-project/vllm - JiayiFeng opened this issue 3 months ago
github.com/vllm-project/vllm - JiayiFeng opened this issue 3 months ago
[Misc]: Random Output Generation with mistralai/Mixtral-8x22B-v0.1
github.com/vllm-project/vllm - rajagond opened this issue 3 months ago
github.com/vllm-project/vllm - rajagond opened this issue 3 months ago
[Usage]: In phi3 vision maximum context length issue
github.com/vllm-project/vllm - tusharraskar opened this issue 3 months ago
github.com/vllm-project/vllm - tusharraskar opened this issue 3 months ago
[Feature]: Multi-Proposers support for speculative decoding.
github.com/vllm-project/vllm - ShangmingCai opened this issue 3 months ago
github.com/vllm-project/vllm - ShangmingCai opened this issue 3 months ago
[Bug]: Vllm 0.5.1+cu118 timeout when init CustomAllreduce
github.com/vllm-project/vllm - zhaotyer opened this issue 3 months ago
github.com/vllm-project/vllm - zhaotyer opened this issue 3 months ago
[Model] Add support for 'gte-Qwen2' embedding models
github.com/vllm-project/vllm - Nickydusk opened this pull request 3 months ago
github.com/vllm-project/vllm - Nickydusk opened this pull request 3 months ago
[ci] try to add multi-node tests
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[CI/Build][TPU] Add TPU CI test
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 3 months ago
[Bug]: deepseek-coder-v2-lite-instruct; Exception in worker VllmWorkerProcess while processing method initialize_cache: [Errno 2] No such file or directory: '/root/.triton/cache/de758c429c9ff1f18930bbd9c3004506/fused_moe_kernel.json.tmp.pid_1528_587007', Traceback (most recent call last):
github.com/vllm-project/vllm - fengyang95 opened this issue 3 months ago
github.com/vllm-project/vllm - fengyang95 opened this issue 3 months ago
[RFC]: Enhancing LoRA Management for Production Environments in vLLM
github.com/vllm-project/vllm - Jeffwan opened this issue 3 months ago
github.com/vllm-project/vllm - Jeffwan opened this issue 3 months ago
[core] Sampling controller interface
github.com/vllm-project/vllm - mmoskal opened this pull request 3 months ago
github.com/vllm-project/vllm - mmoskal opened this pull request 3 months ago
f[Bug]: TypeError: Can't instantiate abstract class NeuronWorker with abstract method execute_worker
github.com/vllm-project/vllm - areanddee opened this issue 3 months ago
github.com/vllm-project/vllm - areanddee opened this issue 3 months ago
[BugFix] get_and_reset only when scheduler outputs are not empty
github.com/vllm-project/vllm - mzusman opened this pull request 3 months ago
github.com/vllm-project/vllm - mzusman opened this pull request 3 months ago
[Bug]: Qwen2 Moe FP8 not supported on L40
github.com/vllm-project/vllm - TopIdiot opened this issue 3 months ago
github.com/vllm-project/vllm - TopIdiot opened this issue 3 months ago
[Core][Model] Add simple_model_runner and a new model XLMRobertaForSequenceClassification through multimodal interface
github.com/vllm-project/vllm - AllenDou opened this pull request 3 months ago
github.com/vllm-project/vllm - AllenDou opened this pull request 3 months ago
No executable after building vllm from source with CPU support
github.com/vllm-project/vllm - parkesorgua opened this issue 3 months ago
github.com/vllm-project/vllm - parkesorgua opened this issue 3 months ago
[Bug]: tensor parallel (of 4 cards) gives bad answers in version 0.5.1 and later (compared to 0.4.1) with gptq marlin kernels (compared to gptq)
github.com/vllm-project/vllm - orellavie1212 opened this issue 3 months ago
github.com/vllm-project/vllm - orellavie1212 opened this issue 3 months ago
[BugFix]: fix engine timeout due to request abort
github.com/vllm-project/vllm - pushan01 opened this pull request 3 months ago
github.com/vllm-project/vllm - pushan01 opened this pull request 3 months ago
[Bug]: Engine timeout error due to request step residual
github.com/vllm-project/vllm - pushan01 opened this issue 3 months ago
github.com/vllm-project/vllm - pushan01 opened this issue 3 months ago
[Bug]: segfault when using google/gemma-2-27b-it on vLLM
github.com/vllm-project/vllm - federicotorrielli opened this issue 3 months ago
github.com/vllm-project/vllm - federicotorrielli opened this issue 3 months ago
add benchmark test for fixed input and output length
github.com/vllm-project/vllm - haichuan1221 opened this pull request 3 months ago
github.com/vllm-project/vllm - haichuan1221 opened this pull request 3 months ago
[Installation]: Installation with OpenVINO get dependency conflict Error !!!
github.com/vllm-project/vllm - HPUedCSLearner opened this issue 3 months ago
github.com/vllm-project/vllm - HPUedCSLearner opened this issue 3 months ago
[Usage]: Gemma2-9b not working on A10G 24gb gpu
github.com/vllm-project/vllm - Abhinay2323 opened this issue 3 months ago
github.com/vllm-project/vllm - Abhinay2323 opened this issue 3 months ago
[Bug]: Performance : slow inference for FP8 on L20 with 0.5.1(v0.5.0.post1 was fine)
github.com/vllm-project/vllm - garycaokai opened this issue 3 months ago
github.com/vllm-project/vllm - garycaokai opened this issue 3 months ago
[Installation]: Gemma2 Installing Flash Infer `[rank0]: TypeError: 'NoneType' object is not callable`
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 3 months ago
[Core] Support dynamically loading Lora adapter from HuggingFace
github.com/vllm-project/vllm - Jeffwan opened this pull request 3 months ago
github.com/vllm-project/vllm - Jeffwan opened this pull request 3 months ago
[Feature]: Support loading lora adapters from HuggingFace in runtime
github.com/vllm-project/vllm - Jeffwan opened this issue 3 months ago
github.com/vllm-project/vllm - Jeffwan opened this issue 3 months ago
[Bug]: relative path doesn't work for Lora adapter model
github.com/vllm-project/vllm - Jeffwan opened this issue 3 months ago
github.com/vllm-project/vllm - Jeffwan opened this issue 3 months ago
[Doc] Fix the lora adapter path in server startup script
github.com/vllm-project/vllm - Jeffwan opened this pull request 3 months ago
github.com/vllm-project/vllm - Jeffwan opened this pull request 3 months ago
[RFC] Drop beam search support
github.com/vllm-project/vllm - WoosukKwon opened this issue 3 months ago
github.com/vllm-project/vllm - WoosukKwon opened this issue 3 months ago
[Bug]: benchmark_throughput gets TypeError: XFormersMetadata.__init__() got an unexpected keyword argument 'is_prompt' wit CPU
github.com/vllm-project/vllm - LGLG42 opened this issue 3 months ago
github.com/vllm-project/vllm - LGLG42 opened this issue 3 months ago
[ BugFix ] Prompt Logprobs Detokenization
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 3 months ago
[Bug]: Gemma2 supports 8192 context with sliding window, but vllm only does 4196 or fails if try 8192
github.com/vllm-project/vllm - pseudotensor opened this issue 3 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 3 months ago
[Bug]: issue with Phi3 mini GPTQ 4Bit/8Bit
github.com/vllm-project/vllm - gm3000 opened this issue 3 months ago
github.com/vllm-project/vllm - gm3000 opened this issue 3 months ago
[Hardware][Intel CPU][DOC] Update docs for CPU backend
github.com/vllm-project/vllm - zhouyuan opened this pull request 3 months ago
github.com/vllm-project/vllm - zhouyuan opened this pull request 3 months ago
[Bug]: AsyncEngineDeadError: Task finished unexpectedly with qwen2 72b
github.com/vllm-project/vllm - thomZ1 opened this issue 3 months ago
github.com/vllm-project/vllm - thomZ1 opened this issue 3 months ago
[Installation]: pip install -e .
github.com/vllm-project/vllm - Kev1ntan opened this issue 3 months ago
github.com/vllm-project/vllm - Kev1ntan opened this issue 3 months ago
[Usage]: Is there a way to make the results of two different calls to VLLM with temperature > 0 consistent?
github.com/vllm-project/vllm - Some-random opened this issue 3 months ago
github.com/vllm-project/vllm - Some-random opened this issue 3 months ago
do not exclude `object` field in CompletionStreamResponse
github.com/vllm-project/vllm - kczimm opened this pull request 3 months ago
github.com/vllm-project/vllm - kczimm opened this pull request 3 months ago
[misc][frontend] log all available endpoints
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
[Bug]: No end point available after model is fully loaded
github.com/vllm-project/vllm - hassanzadeh opened this issue 3 months ago
github.com/vllm-project/vllm - hassanzadeh opened this issue 3 months ago
[Bug]: Guided decoding with Phi-3-small crashes
github.com/vllm-project/vllm - crosiumreborn opened this issue 3 months ago
github.com/vllm-project/vllm - crosiumreborn opened this issue 3 months ago
[Bug]: gemma-2-27b error loading with vllm.LLM
github.com/vllm-project/vllm - jl3676 opened this issue 3 months ago
github.com/vllm-project/vllm - jl3676 opened this issue 3 months ago
[Usage]: OpenAI-like API in offline inference
github.com/vllm-project/vllm - 1ncludeSteven opened this issue 3 months ago
github.com/vllm-project/vllm - 1ncludeSteven opened this issue 3 months ago
[Bug]: AsyncEngineDeadError: Task finished unexpectedly with Gemma2 9B
github.com/vllm-project/vllm - nelyajizi opened this issue 3 months ago
github.com/vllm-project/vllm - nelyajizi opened this issue 3 months ago
[Feature]: Precise model device placement
github.com/vllm-project/vllm - vwxyzjn opened this issue 3 months ago
github.com/vllm-project/vllm - vwxyzjn opened this issue 3 months ago
[Usage]: BNB Gemma2 9b loading problems
github.com/vllm-project/vllm - orellavie1212 opened this issue 3 months ago
github.com/vllm-project/vllm - orellavie1212 opened this issue 3 months ago
[Core][Speculative Decoding] Add multi-query verifier for speculative decoding without batch expansion
github.com/vllm-project/vllm - sighingnow opened this pull request 3 months ago
github.com/vllm-project/vllm - sighingnow opened this pull request 3 months ago
[Usage]: solve problem like "Found no NVIDIA driver on your system." in WSL2
github.com/vllm-project/vllm - HelloCard opened this issue 3 months ago
github.com/vllm-project/vllm - HelloCard opened this issue 3 months ago
[core][distributed] add zmq fallback for broadcasting large objects
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 3 months ago
Add test test (this is a test pr)
github.com/vllm-project/vllm - llmpros opened this pull request 3 months ago
github.com/vllm-project/vllm - llmpros opened this pull request 3 months ago
[Bug]: Multiprocessing FileNotFound error in triton cache
github.com/vllm-project/vllm - jl3676 opened this issue 3 months ago
github.com/vllm-project/vllm - jl3676 opened this issue 3 months ago
[Usage]: Struggling to get fp8 inference working correctly on 8xL40s
github.com/vllm-project/vllm - williambarberjr opened this issue 3 months ago
github.com/vllm-project/vllm - williambarberjr opened this issue 3 months ago
[Feature]: Support AVX2 for CPU (drop AVX-512 requirement)
github.com/vllm-project/vllm - kozuch opened this issue 4 months ago
github.com/vllm-project/vllm - kozuch opened this issue 4 months ago
[Bug]: Empty strings as output using gemma-2-27B with 4 A10s
github.com/vllm-project/vllm - lucafirefox opened this issue 4 months ago
github.com/vllm-project/vllm - lucafirefox opened this issue 4 months ago
[Bug]: LLaVA 1.6 in 0.5.1: Exceptions after some bigger image request, stuck in faulty mode
github.com/vllm-project/vllm - andrePankraz opened this issue 4 months ago
github.com/vllm-project/vllm - andrePankraz opened this issue 4 months ago
[Feature]: ROPE scaling supported by vLLM gemma2
github.com/vllm-project/vllm - kkk935208447 opened this issue 4 months ago
github.com/vllm-project/vllm - kkk935208447 opened this issue 4 months ago
[Doc]: Code Shared for OpenAI Embedding Client gives base64 encode error
github.com/vllm-project/vllm - palash-fin opened this issue 4 months ago
github.com/vllm-project/vllm - palash-fin opened this issue 4 months ago
[Bug]: As V100 does not support FlashAttention, it is not possible to run the gemma model, hopefully it can support the xformers way to run it
github.com/vllm-project/vllm - warlockedward opened this issue 4 months ago
github.com/vllm-project/vllm - warlockedward opened this issue 4 months ago
Add FlashInfer to default Dockerfile
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
[Bug]: New bug in 0.5.1 (v0.5.0.post1 was fine)
github.com/vllm-project/vllm - andrePankraz opened this issue 4 months ago
github.com/vllm-project/vllm - andrePankraz opened this issue 4 months ago
[Core] implement disaggregated prefilling via KV cache transfer
github.com/vllm-project/vllm - KuntaiDu opened this pull request 4 months ago
github.com/vllm-project/vllm - KuntaiDu opened this pull request 4 months ago
[Bug]: TypeError: 'NoneType' object is not callable when loading Gemma 2 9B with new 0.5.1 version
github.com/vllm-project/vllm - DanielusG opened this issue 4 months ago
github.com/vllm-project/vllm - DanielusG opened this issue 4 months ago
[Doc] Move guide for multimodal model and other improvements
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
[Doc] Reorganize Supported Models by Type
github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago
[Bug]: When running gemma2 7b, an error is reported [rank0]: RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)` Set up according to the prompts: os.environ['VLLM_ATTENTION_BACKEND'] = 'FLASHINFER' print("Environment variable set for VLLM_ATTENTION_BACKEND:", os.getenv('VLLM_ATTENTION_BACKEND'))
github.com/vllm-project/vllm - orderer0001 opened this issue 4 months ago
github.com/vllm-project/vllm - orderer0001 opened this issue 4 months ago
[Feature]: Return hidden states (in progress?)
github.com/vllm-project/vllm - Elanmarkowitz opened this issue 4 months ago
github.com/vllm-project/vllm - Elanmarkowitz opened this issue 4 months ago
[Core] Refactor _prepare_model_input_tensors - take 2
github.com/vllm-project/vllm - comaniac opened this pull request 4 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 4 months ago
Move release wheel env var to Dockerfile instead
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
Fix release wheel build env var
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
Update wheel builds to strip debug
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
[Bug]: Batch expansion doesn't work with lora
github.com/vllm-project/vllm - Adhyyan1252 opened this issue 4 months ago
github.com/vllm-project/vllm - Adhyyan1252 opened this issue 4 months ago
[Docs] Fix readthedocs for tag build
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
bump version to v0.5.1
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
[Bug]: When starting deepseek-coder-v2-lite-instruct with vllm on 4 GPUs, one of them is at 0%.
github.com/vllm-project/vllm - fengyang95 opened this issue 4 months ago
github.com/vllm-project/vllm - fengyang95 opened this issue 4 months ago
[Usage]: How to use Multi-instance in Vllm? (Model replication on multiple GPUs)
github.com/vllm-project/vllm - KimMinSang96 opened this issue 4 months ago
github.com/vllm-project/vllm - KimMinSang96 opened this issue 4 months ago
[Feature]: expose the tqdm progress bar to enable logging the progress
github.com/vllm-project/vllm - hugolytics opened this issue 4 months ago
github.com/vllm-project/vllm - hugolytics opened this issue 4 months ago
加载Baichuan2-13B-Chat时出异常,torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB. GPU
github.com/vllm-project/vllm - czhcc opened this issue 4 months ago
github.com/vllm-project/vllm - czhcc opened this issue 4 months ago
[Bug]: When tensor_parallel_size>1, RuntimeError: Cannot re-initialize CUDA in forked subprocess.
github.com/vllm-project/vllm - excelsimon opened this issue 4 months ago
github.com/vllm-project/vllm - excelsimon opened this issue 4 months ago
[Feature]: Integrate new backend
github.com/vllm-project/vllm - XDaoHong opened this issue 4 months ago
github.com/vllm-project/vllm - XDaoHong opened this issue 4 months ago
[Performance]: the performance with chunked-prefill-enabled is lower than default
github.com/vllm-project/vllm - BestKuan opened this issue 4 months ago
github.com/vllm-project/vllm - BestKuan opened this issue 4 months ago
[VLM] Cleanup validation and update docs
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
[Feature]: Model ChatGLMForCausalLM does not support LoRA, but LoRA is enabled.
github.com/vllm-project/vllm - wangbhan opened this issue 4 months ago
github.com/vllm-project/vllm - wangbhan opened this issue 4 months ago
[Bug]: CUDA error when using multiple GPUs
github.com/vllm-project/vllm - ndao600 opened this issue 4 months ago
github.com/vllm-project/vllm - ndao600 opened this issue 4 months ago
[VLM] Improve consistency between feature size calculation and dummy data for profiling
github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago
[Bug]: When using tp for inference, an error occurs: Worker VllmWorkerProcess pid 3283517 died, exit code: -15.
github.com/vllm-project/vllm - B-201 opened this issue 4 months ago
github.com/vllm-project/vllm - B-201 opened this issue 4 months ago
[Bugfix] Enable chunked-prefill and prefix cache with flash-attn backend
github.com/vllm-project/vllm - sighingnow opened this pull request 4 months ago
github.com/vllm-project/vllm - sighingnow opened this pull request 4 months ago
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend
github.com/vllm-project/vllm - kzawora-intel opened this pull request 4 months ago
github.com/vllm-project/vllm - kzawora-intel opened this pull request 4 months ago
[Feature]: deepseek-v2 awq support
github.com/vllm-project/vllm - fengyang95 opened this issue 4 months ago
github.com/vllm-project/vllm - fengyang95 opened this issue 4 months ago
[Usage]: Internal server error when serving LoRA adapters with Open-AI compatible vLLM server
github.com/vllm-project/vllm - ebi64 opened this issue 4 months ago
github.com/vllm-project/vllm - ebi64 opened this issue 4 months ago