Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[core] [3/N] multi-step args and sequence.py
github.com/vllm-project/vllm - SolitaryThinker opened this pull request 2 months ago
github.com/vllm-project/vllm - SolitaryThinker opened this pull request 2 months ago
[misc] Add Torch profiler support
github.com/vllm-project/vllm - SolitaryThinker opened this pull request 2 months ago
github.com/vllm-project/vllm - SolitaryThinker opened this pull request 2 months ago
[Feature]: Integrate `flash-infer` FP8 KV Cache Chunked-Prefill (Append Attention)
github.com/vllm-project/vllm - jon-chuang opened this issue 2 months ago
github.com/vllm-project/vllm - jon-chuang opened this issue 2 months ago
[Core] More-efficient cross-attention parallel QKV computation
github.com/vllm-project/vllm - afeldman-nm opened this pull request 2 months ago
github.com/vllm-project/vllm - afeldman-nm opened this pull request 2 months ago
support bitsandbytes 8-bit and FP4 quantized models
github.com/vllm-project/vllm - chenqianfzh opened this pull request 2 months ago
github.com/vllm-project/vllm - chenqianfzh opened this pull request 2 months ago
[Doc] Add docs for llmcompressor INT8 and FP8 checkpoints
github.com/vllm-project/vllm - mgoin opened this pull request 2 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 2 months ago
[Bug][Frontend] Add and test client timeouts
github.com/vllm-project/vllm - joerunde opened this pull request 2 months ago
github.com/vllm-project/vllm - joerunde opened this pull request 2 months ago
[Core] Fix tracking of model forward time to the span traces in case of PP>1
github.com/vllm-project/vllm - sfc-gh-mkeralapura opened this pull request 2 months ago
github.com/vllm-project/vllm - sfc-gh-mkeralapura opened this pull request 2 months ago
[Feature]: CI - Split up "Models Test" and "Vision Language Models Test"
github.com/vllm-project/vllm - jon-chuang opened this issue 2 months ago
github.com/vllm-project/vllm - jon-chuang opened this issue 2 months ago
[Core][Model][Frontend] Model architecture plugins
github.com/vllm-project/vllm - NadavShmayo opened this pull request 2 months ago
github.com/vllm-project/vllm - NadavShmayo opened this pull request 2 months ago
[Misc] update fp8 to use `vLLMParameter`
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
[Model] Adding Granite model.
github.com/vllm-project/vllm - shawntan opened this pull request 2 months ago
github.com/vllm-project/vllm - shawntan opened this pull request 2 months ago
[Usage]: GPTQ quantization behavior
github.com/vllm-project/vllm - onlinex opened this issue 2 months ago
github.com/vllm-project/vllm - onlinex opened this issue 2 months ago
Simplify Jamba state management
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 2 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 2 months ago
[misc][plugin] add plugin system implementation
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
[Misc] Deprecation Warning when setting --engine-use-ray
github.com/vllm-project/vllm - wallashss opened this pull request 2 months ago
github.com/vllm-project/vllm - wallashss opened this pull request 2 months ago
[Misc] Update `awq` and `awq_marlin` to use `vLLMParameters`
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
[New Model]: LLaVA-OneVision
github.com/vllm-project/vllm - EthanZoneCoding opened this issue 2 months ago
github.com/vllm-project/vllm - EthanZoneCoding opened this issue 2 months ago
[Misc]: How to use intel-gpu in openvino
github.com/vllm-project/vllm - liuxingbin opened this issue 2 months ago
github.com/vllm-project/vllm - liuxingbin opened this issue 2 months ago
[Kernel] W8A16 Int8 inside FusedMoE
github.com/vllm-project/vllm - mzusman opened this pull request 2 months ago
github.com/vllm-project/vllm - mzusman opened this pull request 2 months ago
[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang
github.com/vllm-project/vllm - KuntaiDu opened this pull request 2 months ago
github.com/vllm-project/vllm - KuntaiDu opened this pull request 2 months ago
[frontend] isolate api server process and engine process
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
[VLM][Model] Add test for InternViT vision encoder
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
[RFC]: Refactor the service pipeline to overlap GPU execution and CPU operations
github.com/vllm-project/vllm - sleepwalker2017 opened this issue 2 months ago
github.com/vllm-project/vllm - sleepwalker2017 opened this issue 2 months ago
[CI/Build] Minor refactoring for vLLM assets
github.com/vllm-project/vllm - ywang96 opened this pull request 2 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 2 months ago
[Misc] Update `awq_marlin` to use `vLLMParameters`
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
[misc] add commit id in collect env
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 2 months ago
[Usage]: KV Cache Warning for `gemma2`
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 2 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 2 months ago
[Doc] add instructions about building vLLM with VLLM_TARGET_DEVICE=empty
github.com/vllm-project/vllm - tomeras91 opened this pull request 2 months ago
github.com/vllm-project/vllm - tomeras91 opened this pull request 2 months ago
[Core] Move detokenization to front-end process
github.com/vllm-project/vllm - njhill opened this pull request 2 months ago
github.com/vllm-project/vllm - njhill opened this pull request 2 months ago
[Bug]: Dockerfile build error
github.com/vllm-project/vllm - palash-fin opened this issue 2 months ago
github.com/vllm-project/vllm - palash-fin opened this issue 2 months ago
[Bug]: Bug in quantization/awq /gemm_kernels.cu gemm_forward_4bit_cuda_m16nXk32 More result have been write
github.com/vllm-project/vllm - mengsoso opened this issue 2 months ago
github.com/vllm-project/vllm - mengsoso opened this issue 2 months ago
[Bug]: Bug in vllm/csrc/quantization/awq /gemm_kernels.cu
github.com/vllm-project/vllm - mengsoso opened this issue 2 months ago
github.com/vllm-project/vllm - mengsoso opened this issue 2 months ago
[Bugfix] Handle PackageNotFoundError when checking for xpu version
github.com/vllm-project/vllm - sasha0552 opened this pull request 2 months ago
github.com/vllm-project/vllm - sasha0552 opened this pull request 2 months ago
[Misc]: Cross-attention QKV computation is inefficient
github.com/vllm-project/vllm - afeldman-nm opened this issue 2 months ago
github.com/vllm-project/vllm - afeldman-nm opened this issue 2 months ago
[CI/Build]Reduce the time consumption for LoRA tests
github.com/vllm-project/vllm - jeejeelee opened this pull request 2 months ago
github.com/vllm-project/vllm - jeejeelee opened this pull request 2 months ago
[Core] More-efficient cross-attention parallel QKV computation
github.com/vllm-project/vllm - afeldman-nm opened this pull request 2 months ago
github.com/vllm-project/vllm - afeldman-nm opened this pull request 2 months ago
[Bugfix][Frontend] Fix Issues Under High Load With `zeromq` Frontend
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 2 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 2 months ago
[Bug]: use `openai_vision_api_client.py` get error
github.com/vllm-project/vllm - jaffe-fly opened this issue 2 months ago
github.com/vllm-project/vllm - jaffe-fly opened this issue 2 months ago
[Bugfix] Fix phi3v batch inference when images have different aspect ratio
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
[Core] fix _get_num_new_tokens() for _schedule_default()
github.com/vllm-project/vllm - George-ao opened this pull request 2 months ago
github.com/vllm-project/vllm - George-ao opened this pull request 2 months ago
[Bug]: guided regex (using outlines and lm format enforcer) return bad error description on invalid regex
github.com/vllm-project/vllm - itaybar opened this issue 2 months ago
github.com/vllm-project/vllm - itaybar opened this issue 2 months ago
[Bug]: `facebook/chameleon-30b` triggers assertion error while loading weights
github.com/vllm-project/vllm - jaywonchung opened this issue 2 months ago
github.com/vllm-project/vllm - jaywonchung opened this issue 2 months ago
[core] [2/N] refactor worker_base input preparation for multi-step
github.com/vllm-project/vllm - SolitaryThinker opened this pull request 2 months ago
github.com/vllm-project/vllm - SolitaryThinker opened this pull request 2 months ago
[Kernel] [Triton] [AMD] Adding Triton implementations awq_dequantize and awq_gemm to support AWQ
github.com/vllm-project/vllm - rasmith opened this pull request 2 months ago
github.com/vllm-project/vllm - rasmith opened this pull request 2 months ago
[Core]RequestMetrics add preempt metrics
github.com/vllm-project/vllm - zeroorhero opened this pull request 2 months ago
github.com/vllm-project/vllm - zeroorhero opened this pull request 2 months ago
[Bug]: some questions regarding the usage of NCCL allreduce/broadcast/allgather/send/recv in VLLM using pycomm and torch's distributed.
github.com/vllm-project/vllm - kanghui0204 opened this issue 2 months ago
github.com/vllm-project/vllm - kanghui0204 opened this issue 2 months ago
[Bug]: LLaMa 3.1 8B/70B/405B all behave poorly and differently using completions API as compared to good chat API
github.com/vllm-project/vllm - pseudotensor opened this issue 2 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 2 months ago
[Core] Add engine option to return only deltas or final output
github.com/vllm-project/vllm - njhill opened this pull request 2 months ago
github.com/vllm-project/vllm - njhill opened this pull request 2 months ago
[Core] Fix edge case in chunked prefill + block manager v2
github.com/vllm-project/vllm - cadedaniel opened this pull request 2 months ago
github.com/vllm-project/vllm - cadedaniel opened this pull request 2 months ago
[Performance]: vllm inference in CPU instance has generation < 10 tokens / second
github.com/vllm-project/vllm - gracequeen opened this issue 2 months ago
github.com/vllm-project/vllm - gracequeen opened this issue 2 months ago
[Bug]: prefill/prefix FP8 triton kernel for opt-125m - an illegal memory access was encountered
github.com/vllm-project/vllm - jon-chuang opened this issue 2 months ago
github.com/vllm-project/vllm - jon-chuang opened this issue 2 months ago
[Misc] Add numpy implementation of `compute_slot_mapping`
github.com/vllm-project/vllm - Yard1 opened this pull request 2 months ago
github.com/vllm-project/vllm - Yard1 opened this pull request 2 months ago
[Bugfix] Fix `PerTensorScaleParameter` weight loading for fused models
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
[Usage]: Getting empty text using llm.generate of mixtral-8X7b-Instruct AWQ model
github.com/vllm-project/vllm - ab6995 opened this issue 2 months ago
github.com/vllm-project/vllm - ab6995 opened this issue 2 months ago
[Bug]: Tensor Parallel > 1 causes desc_act=True GPTQ models to give bad output on ROCm
github.com/vllm-project/vllm - TNT3530 opened this issue 2 months ago
github.com/vllm-project/vllm - TNT3530 opened this issue 2 months ago
[Bugfix] Fix ITL recording in serving benchmark
github.com/vllm-project/vllm - ywang96 opened this pull request 2 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 2 months ago
[Feature]: Support block manager v2 for chunked prefill
github.com/vllm-project/vllm - comaniac opened this issue 2 months ago
github.com/vllm-project/vllm - comaniac opened this issue 2 months ago
[Misc]: Improving VLLM KVCACHE Transfer Efficiency with NCCL P2P Communication
github.com/vllm-project/vllm - liweiqing1997 opened this issue 2 months ago
github.com/vllm-project/vllm - liweiqing1997 opened this issue 2 months ago
[CI/Build][ROCm] Enabling LoRA tests on ROCm
github.com/vllm-project/vllm - alexeykondrat opened this pull request 2 months ago
github.com/vllm-project/vllm - alexeykondrat opened this pull request 2 months ago
[Installation]: git clone cutlass fails
github.com/vllm-project/vllm - paolovic opened this issue 2 months ago
github.com/vllm-project/vllm - paolovic opened this issue 2 months ago
[Usage]: how to use LLM class with AsyncLLMEngine
github.com/vllm-project/vllm - henry-y opened this issue 2 months ago
github.com/vllm-project/vllm - henry-y opened this issue 2 months ago
[RFC]: Encoder/decoder models & feature compatibility
github.com/vllm-project/vllm - afeldman-nm opened this issue 2 months ago
github.com/vllm-project/vllm - afeldman-nm opened this issue 2 months ago
[FrontEnd] Keep RPC server tcp protocol
github.com/vllm-project/vllm - esmeetu opened this pull request 2 months ago
github.com/vllm-project/vllm - esmeetu opened this pull request 2 months ago
[Performance] e2e overheads reduction: Small followup diff
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 2 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 2 months ago
Create speculative decode dynamic parallel strategy
github.com/vllm-project/vllm - vladislavkruglikov opened this pull request 2 months ago
github.com/vllm-project/vllm - vladislavkruglikov opened this pull request 2 months ago
Create parallel scorer
github.com/vllm-project/vllm - vladislavkruglikov opened this pull request 2 months ago
github.com/vllm-project/vllm - vladislavkruglikov opened this pull request 2 months ago
Allow model executor to return many next tokens
github.com/vllm-project/vllm - vladislavkruglikov opened this pull request 2 months ago
github.com/vllm-project/vllm - vladislavkruglikov opened this pull request 2 months ago
[Bugfix] Fix reinit procedure in ModelInputForGPUBuilder
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 2 months ago
github.com/vllm-project/vllm - alexm-neuralmagic opened this pull request 2 months ago
Create draft from random tokens from promt
github.com/vllm-project/vllm - vladislavkruglikov opened this pull request 2 months ago
github.com/vllm-project/vllm - vladislavkruglikov opened this pull request 2 months ago
Save speculative decoding states
github.com/vllm-project/vllm - vladislavkruglikov opened this pull request 2 months ago
github.com/vllm-project/vllm - vladislavkruglikov opened this pull request 2 months ago
[Bug]: `gemma-2-27b-it-GGUF`: `Architecture gemma2 not supported`
github.com/vllm-project/vllm - alllexx88 opened this issue 2 months ago
github.com/vllm-project/vllm - alllexx88 opened this issue 2 months ago
[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.
github.com/vllm-project/vllm - yitianlian opened this issue 2 months ago
github.com/vllm-project/vllm - yitianlian opened this issue 2 months ago
[Bug]: Using LLM Engine to infer the MiniCPM-V-2_6 model, the result is wrong
github.com/vllm-project/vllm - orderer0001 opened this issue 2 months ago
github.com/vllm-project/vllm - orderer0001 opened this issue 2 months ago
[VLM][Doc] Add `stop_token_ids` to InternVL example
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 2 months ago
[Feature]: continuous batching for vllm.LLM
github.com/vllm-project/vllm - babu111 opened this issue 2 months ago
github.com/vllm-project/vllm - babu111 opened this issue 2 months ago
[Bug]: CUDA out of memory for llama3.1 70gb gptq, while in llama3 70gb gptq doesn't
github.com/vllm-project/vllm - orellavie1212 opened this issue 2 months ago
github.com/vllm-project/vllm - orellavie1212 opened this issue 2 months ago
Create speculative decode dynamic parallel strategy
github.com/vllm-project/vllm - vladislavkruglikov opened this issue 2 months ago
github.com/vllm-project/vllm - vladislavkruglikov opened this issue 2 months ago
[Feature]: Why vllm cli not provide a config arg?
github.com/vllm-project/vllm - dsp6414 opened this issue 2 months ago
github.com/vllm-project/vllm - dsp6414 opened this issue 2 months ago
[Bug]: internvl2-8b提问无限循环
github.com/vllm-project/vllm - haoduoyu1203 opened this issue 2 months ago
github.com/vllm-project/vllm - haoduoyu1203 opened this issue 2 months ago
[Bug]: internvl2-8b 提问无限循环回答
github.com/vllm-project/vllm - haoduoyu1203 opened this issue 2 months ago
github.com/vllm-project/vllm - haoduoyu1203 opened this issue 2 months ago
[Frontend] Disallow passing `model` as both argument and option
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago
[Core] RequestMetrics add preempt metrics
github.com/vllm-project/vllm - zeroorhero opened this pull request 2 months ago
github.com/vllm-project/vllm - zeroorhero opened this pull request 2 months ago
[Bug]:`vllm server` will get some error and `python3 -m vllm.entrypoints.openai.api_server` is correct
github.com/vllm-project/vllm - jaffe-fly opened this issue 2 months ago
github.com/vllm-project/vllm - jaffe-fly opened this issue 2 months ago
[Bug]: LLama3 LoRA load failed
github.com/vllm-project/vllm - victorlwchen opened this issue 2 months ago
github.com/vllm-project/vllm - victorlwchen opened this issue 2 months ago
[Misc] Add quantization config support for speculative model.
github.com/vllm-project/vllm - ShangmingCai opened this pull request 2 months ago
github.com/vllm-project/vllm - ShangmingCai opened this pull request 2 months ago
[TPU] Use mark_dynamic to reduce compilation time
github.com/vllm-project/vllm - WoosukKwon opened this pull request 2 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 2 months ago
[Feature]: Small Model Large Latency Compared to SGLang and TensorRT-LLM
github.com/vllm-project/vllm - CambioML opened this issue 2 months ago
github.com/vllm-project/vllm - CambioML opened this issue 2 months ago
Enable FusedSDPA for prompt attention with env VLLM_PREFILL_USE_FUSEDSDPA=true
github.com/vllm-project/vllm - libinta opened this pull request 2 months ago
github.com/vllm-project/vllm - libinta opened this pull request 2 months ago
[Bug]: Extra body don't work when response_format is also sent for serving.
github.com/vllm-project/vllm - HwwwwwwwH opened this issue 2 months ago
github.com/vllm-project/vllm - HwwwwwwwH opened this issue 2 months ago
[Core] Streamline stream termination in `AsyncLLMEngine`
github.com/vllm-project/vllm - njhill opened this pull request 2 months ago
github.com/vllm-project/vllm - njhill opened this pull request 2 months ago
[Misc] Update Fused MoE weight loading
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 2 months ago
[Usage]: Does vllm support dynamic quantization
github.com/vllm-project/vllm - garyyang85 opened this issue 2 months ago
github.com/vllm-project/vllm - garyyang85 opened this issue 2 months ago
[Core] Factor out input preprocessing to a separate class
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 2 months ago
[Bug]: Endless generation with fine tuned llama 3.1 model
github.com/vllm-project/vllm - shreshtshettybs opened this issue 2 months ago
github.com/vllm-project/vllm - shreshtshettybs opened this issue 2 months ago
[CI/Build] Add e2e correctness in oai
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 2 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 2 months ago
[Usage]: How to config the parameters to support higher concurrency for deploying the qwen2-7b model as an API at 8-GPU A800 (80G) server?
github.com/vllm-project/vllm - ybdesire opened this issue 2 months ago
github.com/vllm-project/vllm - ybdesire opened this issue 2 months ago
[Misc/Testing] Use `torch.testing.assert_close`
github.com/vllm-project/vllm - jon-chuang opened this pull request 2 months ago
github.com/vllm-project/vllm - jon-chuang opened this pull request 2 months ago