Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[misc] Optimize speculative decoding
github.com/vllm-project/vllm - jacob-crux opened this pull request about 2 months ago
github.com/vllm-project/vllm - jacob-crux opened this pull request about 2 months ago
[Bugfix] Fix #7592 vllm 0.5.4 enable_chunked_prefill throughput is slightly lower than 0.5.3~0.5.0.
github.com/vllm-project/vllm - noooop opened this pull request about 2 months ago
github.com/vllm-project/vllm - noooop opened this pull request about 2 months ago
[CI/Build] Added OpenVINO backend tests run
github.com/vllm-project/vllm - ilya-lavrenov opened this pull request about 2 months ago
github.com/vllm-project/vllm - ilya-lavrenov opened this pull request about 2 months ago
[Doc]: Using awq with tensor-parallel-size 4 get bad result but with tensor-parallel-size 2 get good result
github.com/vllm-project/vllm - Soulscb opened this issue about 2 months ago
github.com/vllm-project/vllm - Soulscb opened this issue about 2 months ago
[Usage]: set num_crops in LVLM
github.com/vllm-project/vllm - Liyan06 opened this issue about 2 months ago
github.com/vllm-project/vllm - Liyan06 opened this issue about 2 months ago
Inclusion of InternVLChatModel In PP_SUPPORTED_MODELS(Pipeline Parallelism)
github.com/vllm-project/vllm - Manikandan-Thangaraj-ZS0321 opened this pull request about 2 months ago
github.com/vllm-project/vllm - Manikandan-Thangaraj-ZS0321 opened this pull request about 2 months ago
[Bug]: minicpmv2_6 OOM
github.com/vllm-project/vllm - Howe-Young opened this issue about 2 months ago
github.com/vllm-project/vllm - Howe-Young opened this issue about 2 months ago
[Bug]: Chatglm2 with KeyError: 'transformer.layers.1.mlp.dense_4h_to_h.weight'
github.com/vllm-project/vllm - jimmy-walker opened this issue about 2 months ago
github.com/vllm-project/vllm - jimmy-walker opened this issue about 2 months ago
[Bug]: Prefix Caching same prompts gives different results
github.com/vllm-project/vllm - danielhanchen opened this issue about 2 months ago
github.com/vllm-project/vllm - danielhanchen opened this issue about 2 months ago
[Bugfix] Fix Phi-3v crash when input images are of certain sizes
github.com/vllm-project/vllm - zifeitong opened this pull request about 2 months ago
github.com/vllm-project/vllm - zifeitong opened this pull request about 2 months ago
Update RemoteOpenAIServer to use common prepare_weights function
github.com/vllm-project/vllm - mgoin opened this pull request about 2 months ago
github.com/vllm-project/vllm - mgoin opened this pull request about 2 months ago
[ci][test] fix RemoteOpenAIServer
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
[Misc]: Multi-Node Multi-GPU (tensor parallel plus pipeline parallel inference)
github.com/vllm-project/vllm - ilovesouthpark opened this issue about 2 months ago
github.com/vllm-project/vllm - ilovesouthpark opened this issue about 2 months ago
[Bugfix][CI/Build] Fix model name being overwritten
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
[Misc] Remove snapshot_download usage in InternVL2 test
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
[ci][test] exclude model download time in server start time
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
[Bug]: AI21-Jamba-1.5-Mini RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
github.com/vllm-project/vllm - pseudotensor opened this issue about 2 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue about 2 months ago
[BUG]: Support AI21-Jamba-1.5-Large (and mini)
github.com/vllm-project/vllm - pseudotensor opened this issue about 2 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue about 2 months ago
[misc][core] lazy import outlines
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
[Bug]: vLLM with ray backend and enable nsight can't get perf metrics due to connection issue
github.com/vllm-project/vllm - paladin2000cn opened this issue about 2 months ago
github.com/vllm-project/vllm - paladin2000cn opened this issue about 2 months ago
[Bugfix] Fix guided_decode LogitsProcessor FSM_state missing error
github.com/vllm-project/vllm - xuechendi opened this pull request about 2 months ago
github.com/vllm-project/vllm - xuechendi opened this pull request about 2 months ago
[Bug]: Nemotron 340B does not generated EOS token
github.com/vllm-project/vllm - natolambert opened this issue about 2 months ago
github.com/vllm-project/vllm - natolambert opened this issue about 2 months ago
[Bug]: tool_calls parsing error with CPU
github.com/vllm-project/vllm - xuechendi opened this issue about 2 months ago
github.com/vllm-project/vllm - xuechendi opened this issue about 2 months ago
[WIP][Model] Add support for multiple audio chunks/audio URLs
github.com/vllm-project/vllm - petersalas opened this pull request about 2 months ago
github.com/vllm-project/vllm - petersalas opened this pull request about 2 months ago
[Misc] Update compressed tensors lifecycle to remove `prefix` from `create_weights`
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
[Bugfix][Intel] Fix XPU Dockerfile Build
github.com/vllm-project/vllm - tylertitsworth opened this pull request about 2 months ago
github.com/vllm-project/vllm - tylertitsworth opened this pull request about 2 months ago
Bump version to v0.5.5
github.com/vllm-project/vllm - simon-mo opened this pull request about 2 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request about 2 months ago
[Performance][BlockManagerV2] Mark prefix cache block as computed after schedule
github.com/vllm-project/vllm - comaniac opened this pull request about 2 months ago
github.com/vllm-project/vllm - comaniac opened this pull request about 2 months ago
[Misc] An Example to Compute the Low-noise Perplexity Estimate for Llama-2 model family.
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request about 2 months ago
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request about 2 months ago
[CI/Build] Reorganize models tests
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request about 2 months ago
[MODEL] add Exaone model support
github.com/vllm-project/vllm - nayohan opened this pull request about 2 months ago
github.com/vllm-project/vllm - nayohan opened this pull request about 2 months ago
[MODEL] add Exaone model support
github.com/vllm-project/vllm - nayohan opened this pull request about 2 months ago
github.com/vllm-project/vllm - nayohan opened this pull request about 2 months ago
[Usage]: When debugging with vLLM, a CUDA error occurs.
github.com/vllm-project/vllm - kinglion811 opened this issue about 2 months ago
github.com/vllm-project/vllm - kinglion811 opened this issue about 2 months ago
[Bug]: Error: No available node types can fulfill resource request
github.com/vllm-project/vllm - thies1006 opened this issue about 2 months ago
github.com/vllm-project/vllm - thies1006 opened this issue about 2 months ago
[Core] Chunked Prefill support for Multi Step Scheduling
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request about 2 months ago
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request about 2 months ago
[Bug]: Docker build for ROCm fails for latest release and main branch
github.com/vllm-project/vllm - Spurthi-Bhat-ScalersAI opened this issue about 2 months ago
github.com/vllm-project/vllm - Spurthi-Bhat-ScalersAI opened this issue about 2 months ago
[Usage]: The seed in vllm.SamplingParams and vllm.LLM
github.com/vllm-project/vllm - caiqizh opened this issue about 2 months ago
github.com/vllm-project/vllm - caiqizh opened this issue about 2 months ago
[Bug]: Llama 3 answers starting with <|start_header_id|>assistant<|end_header_id|>
github.com/vllm-project/vllm - erickrf opened this issue about 2 months ago
github.com/vllm-project/vllm - erickrf opened this issue about 2 months ago
[Hardware][Intel GPU] Add intel GPU pipeline parallel support.
github.com/vllm-project/vllm - jikunshang opened this pull request about 2 months ago
github.com/vllm-project/vllm - jikunshang opened this pull request about 2 months ago
[github][misc] promote asking llm first
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
[Usage]: How to generate independent samples for a given input?
github.com/vllm-project/vllm - caiqizh opened this issue about 2 months ago
github.com/vllm-project/vllm - caiqizh opened this issue about 2 months ago
[Bugfix] Catch up with removed parameter 'is_prompt' in cpu/xpu model runner
github.com/vllm-project/vllm - anencore94 opened this pull request about 2 months ago
github.com/vllm-project/vllm - anencore94 opened this pull request about 2 months ago
[misc] Add Torch profiler support for CPU-only devices
github.com/vllm-project/vllm - DamonFool opened this pull request about 2 months ago
github.com/vllm-project/vllm - DamonFool opened this pull request about 2 months ago
[Misc] Update `qqq` to use vLLMParameters
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
[Bug]: falcon-40B model support
github.com/vllm-project/vllm - jikunshang opened this issue about 2 months ago
github.com/vllm-project/vllm - jikunshang opened this issue about 2 months ago
[Misc] Update `marlin` to use vLLMParameters
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
[Bug]: ModuleNotFoundError: No module named 'openai.types'
github.com/vllm-project/vllm - Juelianqvq opened this issue about 2 months ago
github.com/vllm-project/vllm - Juelianqvq opened this issue about 2 months ago
[Bug]: Running mistral-large results in an error related to NCCL
github.com/vllm-project/vllm - White-Friday opened this issue about 2 months ago
github.com/vllm-project/vllm - White-Friday opened this issue about 2 months ago
[Core][Kernels] Use FlashInfer backend for FP8 KV Cache when available.
github.com/vllm-project/vllm - pavanimajety opened this pull request about 2 months ago
github.com/vllm-project/vllm - pavanimajety opened this pull request about 2 months ago
[Bug]: my vllm phi-3-vision server runs one request correctly then returns an error for the same request stating 2509 image tokens to 0 placeholders
github.com/vllm-project/vllm - SPZtaymed opened this issue about 2 months ago
github.com/vllm-project/vllm - SPZtaymed opened this issue about 2 months ago
[core][torch.compile] not compile for profiling
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request about 2 months ago
[Usage]: Is there any way to hook features inside vision-language model?
github.com/vllm-project/vllm - minuenergy opened this issue about 2 months ago
github.com/vllm-project/vllm - minuenergy opened this issue about 2 months ago
[Bug]: FP8 Marlin fallback out of memory regression
github.com/vllm-project/vllm - cduk opened this issue about 2 months ago
github.com/vllm-project/vllm - cduk opened this issue about 2 months ago
[Bug]: Critical distributed executor bug
github.com/vllm-project/vllm - clintg6 opened this issue about 2 months ago
github.com/vllm-project/vllm - clintg6 opened this issue about 2 months ago
[TPU] Enable neural-magic pre-quantized W8A8/16 checkpoint for TPU backend
github.com/vllm-project/vllm - lsy323 opened this pull request about 2 months ago
github.com/vllm-project/vllm - lsy323 opened this pull request about 2 months ago
[Core] Add multi-step support to LLMEngine
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 2 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request about 2 months ago
[Bug]: Phi-3-small-128k-instruct on 1 A100 GPUs - Assertion error: Does not support prefix-enabled attention.
github.com/vllm-project/vllm - congcongchen123 opened this issue about 2 months ago
github.com/vllm-project/vllm - congcongchen123 opened this issue about 2 months ago
[Bug]: install vllm ocurr the building error
github.com/vllm-project/vllm - Mysnake opened this issue about 2 months ago
github.com/vllm-project/vllm - Mysnake opened this issue about 2 months ago
[Model][VLM] Support multi-images inputs for Phi-3-vision models
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request about 2 months ago
[Bug]: /metrics endpoint shows less information at latest (0.5.4) vllm docker container.
github.com/vllm-project/vllm - kulievvitaly opened this issue about 2 months ago
github.com/vllm-project/vllm - kulievvitaly opened this issue about 2 months ago
[Bug]: Can't load vision model `microsoft/Phi-3.5-vision-instruct`
github.com/vllm-project/vllm - remiconnesson opened this issue about 2 months ago
github.com/vllm-project/vllm - remiconnesson opened this issue about 2 months ago
[Feature]: Check for presence of files at startup
github.com/vllm-project/vllm - cduk opened this issue about 2 months ago
github.com/vllm-project/vllm - cduk opened this issue about 2 months ago
[Bug]: vllm online mode gives variance logprobs even if temperature is 0 with same prompt
github.com/vllm-project/vllm - tonyaw opened this issue about 2 months ago
github.com/vllm-project/vllm - tonyaw opened this issue about 2 months ago
[Feature]: High throughput has not been achieved in decoding stage when using json format output
github.com/vllm-project/vllm - Liucd0520 opened this issue about 2 months ago
github.com/vllm-project/vllm - Liucd0520 opened this issue about 2 months ago
[Bug]: Using fp8 cutlass scaled_mm causes wrong output
github.com/vllm-project/vllm - xTayEx opened this issue about 2 months ago
github.com/vllm-project/vllm - xTayEx opened this issue about 2 months ago
[Bug]: llama3-405b-fp8 NCCL communication
github.com/vllm-project/vllm - wangwensuo opened this issue about 2 months ago
github.com/vllm-project/vllm - wangwensuo opened this issue about 2 months ago
[Bug]: for mistral-7B, local batch inference mode causes OOM error, while serving mode does not cause error
github.com/vllm-project/vllm - yananchen1989 opened this issue about 2 months ago
github.com/vllm-project/vllm - yananchen1989 opened this issue about 2 months ago
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
[Misc] Update `gptq_marlin_24` to use vLLMParameters
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request about 2 months ago
Add more percentiles and latencies
github.com/vllm-project/vllm - wschin opened this pull request 2 months ago
github.com/vllm-project/vllm - wschin opened this pull request 2 months ago
[Kernel] Add torch custom op for all_reduce
github.com/vllm-project/vllm - SageMoore opened this pull request 2 months ago
github.com/vllm-project/vllm - SageMoore opened this pull request 2 months ago
[Performance] Enable chunked prefill and prefix caching together
github.com/vllm-project/vllm - comaniac opened this pull request 2 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 2 months ago
[Usage]: How do I configure Phi-3-vision for high throughput?
github.com/vllm-project/vllm - hommayushi3 opened this issue 2 months ago
github.com/vllm-project/vllm - hommayushi3 opened this issue 2 months ago
[Misc] Raise a more informative exception in add/remove_logger
github.com/vllm-project/vllm - Yard1 opened this pull request 2 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 2 months ago
[BugFix] Fix server crash on empty prompt
github.com/vllm-project/vllm - maxdebayser opened this pull request 2 months ago
github.com/vllm-project/vllm - maxdebayser opened this pull request 2 months ago
[Bug]: llama 3.1 405B RuntimeError: start (1024) + length (256) exceeds dimension size (1024)
github.com/vllm-project/vllm - youkaichao opened this issue 2 months ago
github.com/vllm-project/vllm - youkaichao opened this issue 2 months ago
[Installation]: Failed to build vLLM from source due to https://github.com/vllm-project/vllm/pull/7174 by bisecting the most recent changes
github.com/vllm-project/vllm - congcongchen123 opened this issue 2 months ago
github.com/vllm-project/vllm - congcongchen123 opened this issue 2 months ago
Combine async postprocessor and multi-step - first WIP version
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 2 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 2 months ago
[Bug]: Requesting Prompt Logprobs with an MLP Speculator Crashes the Server
github.com/vllm-project/vllm - tjohnson31415 opened this issue 2 months ago
github.com/vllm-project/vllm - tjohnson31415 opened this issue 2 months ago
[Usage]: Wait for the response for each prediction
github.com/vllm-project/vllm - savi8sant8s opened this issue 2 months ago
github.com/vllm-project/vllm - savi8sant8s opened this issue 2 months ago
[Feature]: phi-3.5 is a strong model for its size, including vision support. Has multi-image support, but vllm does not support
github.com/vllm-project/vllm - pseudotensor opened this issue 2 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 2 months ago
[Model] Add Mistral Tokenization to improve robustness and chat encoding
github.com/vllm-project/vllm - patrickvonplaten opened this pull request 2 months ago
github.com/vllm-project/vllm - patrickvonplaten opened this pull request 2 months ago
[Usage]: About bitsandbytes
github.com/vllm-project/vllm - emreekmekcioglu1 opened this issue 2 months ago
github.com/vllm-project/vllm - emreekmekcioglu1 opened this issue 2 months ago
[Frontend]-config-cli-args
github.com/vllm-project/vllm - KaunilD opened this pull request 2 months ago
github.com/vllm-project/vllm - KaunilD opened this pull request 2 months ago
[Bugfix][Hardware][CPU] Fix `mm_limits` initialization for CPU backend
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
[Bugfix] chat method add_generation_prompt param
github.com/vllm-project/vllm - brian14708 opened this pull request 2 months ago
github.com/vllm-project/vllm - brian14708 opened this pull request 2 months ago
[Bug]: vLLM server not supporting stabilityai/stablelm-3b-4e1t model on CPU
github.com/vllm-project/vllm - jerin-scalers-ai opened this issue 2 months ago
github.com/vllm-project/vllm - jerin-scalers-ai opened this issue 2 months ago
[New Model]: ValueError: Model architectures ['PhiMoEForCausalLM'] are not supported for now
github.com/vllm-project/vllm - Nishant-kirito opened this issue 2 months ago
github.com/vllm-project/vllm - Nishant-kirito opened this issue 2 months ago
[Bugfix] Pass PYTHONPATH from setup.py to CMake
github.com/vllm-project/vllm - sasha0552 opened this pull request 2 months ago
github.com/vllm-project/vllm - sasha0552 opened this pull request 2 months ago
[Model] Adding support for MSFT Phi-3.5-MoE
github.com/vllm-project/vllm - wenxcs opened this pull request 2 months ago
github.com/vllm-project/vllm - wenxcs opened this pull request 2 months ago
[Usage]: Potential Hardware Failure when running vllm
github.com/vllm-project/vllm - NicolasDrapier opened this issue 2 months ago
github.com/vllm-project/vllm - NicolasDrapier opened this issue 2 months ago
[New Model]: MiniCPM-V-2_6-int4
github.com/vllm-project/vllm - tangent2018 opened this issue 2 months ago
github.com/vllm-project/vllm - tangent2018 opened this issue 2 months ago
[Bug]: Unexpected non-determinism with vLLM 0.5.4 and Llama 3.1
github.com/vllm-project/vllm - br3no opened this issue 2 months ago
github.com/vllm-project/vllm - br3no opened this issue 2 months ago
[Model] 1.58bits BitNet Model Support
github.com/vllm-project/vllm - LeiWang1999 opened this pull request 2 months ago
github.com/vllm-project/vllm - LeiWang1999 opened this pull request 2 months ago
[Usage]: How to use FP8 or other quantization algorithms for Minicpmv2_6
github.com/vllm-project/vllm - Howe-Young opened this issue 2 months ago
github.com/vllm-project/vllm - Howe-Young opened this issue 2 months ago
[Bugfix] Mirror jinja2 in pyproject.toml
github.com/vllm-project/vllm - sasha0552 opened this pull request 2 months ago
github.com/vllm-project/vllm - sasha0552 opened this pull request 2 months ago
[Bug]: Using CPU for inference, an error occurred. [Engine iteration timed out. This should never happen! ]
github.com/vllm-project/vllm - liuzhipengchd opened this issue 2 months ago
github.com/vllm-project/vllm - liuzhipengchd opened this issue 2 months ago
[Bug]: torch.OutOfMemoryError: CUDA out of memory
github.com/vllm-project/vllm - Sandwiches97 opened this issue 2 months ago
github.com/vllm-project/vllm - Sandwiches97 opened this issue 2 months ago