Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
Addition of lacked ignored_seq_groups in _schedule_chunked_prefill
github.com/vllm-project/vllm - JamesLim-sy opened this pull request 5 months ago
github.com/vllm-project/vllm - JamesLim-sy opened this pull request 5 months ago
[Core][Distributed] add coordinator to reduce code duplication in tp and pp
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 5 months ago
[Hardware] Initial TPU integration
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
[Misc] Skip for logits_scale == 1.0
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
[Usage]: the docker image v0.4.3 cannot work
github.com/vllm-project/vllm - BUJIDAOVS opened this issue 5 months ago
github.com/vllm-project/vllm - BUJIDAOVS opened this issue 5 months ago
[Misc] Missing error message for custom ops import
github.com/vllm-project/vllm - DamonFool opened this pull request 5 months ago
github.com/vllm-project/vllm - DamonFool opened this pull request 5 months ago
[Bug]: Regression in predictions in v0.4.3
github.com/vllm-project/vllm - hibukipanim opened this issue 5 months ago
github.com/vllm-project/vllm - hibukipanim opened this issue 5 months ago
[Model] Dynamic image size support for LLaVA-NeXT
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Frontend] OpenAI API server: Add `add_special_tokens` to ChatCompletionRequest (default False)
github.com/vllm-project/vllm - tomeras91 opened this pull request 5 months ago
github.com/vllm-project/vllm - tomeras91 opened this pull request 5 months ago
[Core] Dynamic image size support for VLMs
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Kernel] Update Cutlass int8 kernel configs for SM80
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 5 months ago
github.com/vllm-project/vllm - varun-sundar-rabindranath opened this pull request 5 months ago
[Bug]: high gpu_memory_utilization with 'OOM' and low gpu_memory_utilization with 'No available memory for the cache blocks'
github.com/vllm-project/vllm - mars-ch opened this issue 5 months ago
github.com/vllm-project/vllm - mars-ch opened this issue 5 months ago
[Bug]: chatglm3 with lora adapter
github.com/vllm-project/vllm - Qingyuncookie opened this issue 5 months ago
github.com/vllm-project/vllm - Qingyuncookie opened this issue 5 months ago
[Bug]: When I call the speculative model through the vllm interface, an error is reported: TypeError: 'type' object is not subscriptable
github.com/vllm-project/vllm - YuCheng-Qi opened this issue 5 months ago
github.com/vllm-project/vllm - YuCheng-Qi opened this issue 5 months ago
[Misc] Fix docstring of get_attn_backend
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
[Usage]: How to load a model with less CPU memory
github.com/vllm-project/vllm - liulfy opened this issue 5 months ago
github.com/vllm-project/vllm - liulfy opened this issue 5 months ago
vllm推理THUDM/chatglm3-6b-128k 无法stop
github.com/vllm-project/vllm - linzm1007 opened this issue 5 months ago
github.com/vllm-project/vllm - linzm1007 opened this issue 5 months ago
[Bug]: Pending but Avg generation throughput: 0.0 tokens/s
github.com/vllm-project/vllm - hitsz-zxw opened this issue 5 months ago
github.com/vllm-project/vllm - hitsz-zxw opened this issue 5 months ago
[Usage]:how to get the output embedding for a text generation model using vllm
github.com/vllm-project/vllm - Apricot1225 opened this issue 5 months ago
github.com/vllm-project/vllm - Apricot1225 opened this issue 5 months ago
[Bugfix] Destroy PP groups properly
github.com/vllm-project/vllm - andoorve opened this pull request 5 months ago
github.com/vllm-project/vllm - andoorve opened this pull request 5 months ago
[Bug]: prompt_logprobs doesn't work with openai compatible server
github.com/vllm-project/vllm - Some-random opened this issue 5 months ago
github.com/vllm-project/vllm - Some-random opened this issue 5 months ago
[misc] benchmark_serving.py -- add ITL results and tweak TPOT results
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
[Kernel] Allow 8-bit outputs for cutlass_scaled_mm
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
[Misc] Add CustomOp interface for device portability
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
[Bugfix] Fix `MultiprocessingGPUExecutor.check_health` when world_size == 1
github.com/vllm-project/vllm - jsato8094 opened this pull request 5 months ago
github.com/vllm-project/vllm - jsato8094 opened this pull request 5 months ago
[CI/Build] Add `is_quant_method_supported` to control quantization test configurations
github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 5 months ago
[Speculative Decoding] Add `ProposerWorkerBase` abstract class
github.com/vllm-project/vllm - njhill opened this pull request 5 months ago
github.com/vllm-project/vllm - njhill opened this pull request 5 months ago
[Misc]: vllm ONLY allocate KVCache on the first device in CUDA_VISIBLE_DEVICES
github.com/vllm-project/vllm - CatYing opened this issue 5 months ago
github.com/vllm-project/vllm - CatYing opened this issue 5 months ago
how to compile with GLIBCXX_USE_CXX11_ABI=1
github.com/vllm-project/vllm - demonatic opened this issue 5 months ago
github.com/vllm-project/vllm - demonatic opened this issue 5 months ago
[BugFix]Fix the problem that StopChecker assumes a single token produ…
github.com/vllm-project/vllm - IcyFeather233 opened this pull request 5 months ago
github.com/vllm-project/vllm - IcyFeather233 opened this pull request 5 months ago
[Kernel] Add back batch size 1536 and 3072 to MoE tuning
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 5 months ago
[CI/Build] Reducing CPU CI execution time
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 5 months ago
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 5 months ago
[Bug]: Tokenizer setter of LLM without CachedTokenizer adapter
github.com/vllm-project/vllm - DriverSong opened this issue 5 months ago
github.com/vllm-project/vllm - DriverSong opened this issue 5 months ago
[Performance]: Speculative Performance almost same or lower
github.com/vllm-project/vllm - tolry418 opened this issue 5 months ago
github.com/vllm-project/vllm - tolry418 opened this issue 5 months ago
[Kernel] Re-tune Mixtral MoE configurations for FP8 on H100
github.com/vllm-project/vllm - pcmoritz opened this pull request 5 months ago
github.com/vllm-project/vllm - pcmoritz opened this pull request 5 months ago
[Frontend] Add OpenAI Vision API Support
github.com/vllm-project/vllm - ywang96 opened this pull request 5 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 5 months ago
[Bug]: LLM.generate() collapse with some padding side
github.com/vllm-project/vllm - kevin3314 opened this issue 5 months ago
github.com/vllm-project/vllm - kevin3314 opened this issue 5 months ago
[Bugfix] Add warmup for prefix caching example
github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago
[Feature]: Add efficient interface for evaluating probabilities of fixed prompt-completion pairs
github.com/vllm-project/vllm - xinyangz opened this issue 5 months ago
github.com/vllm-project/vllm - xinyangz opened this issue 5 months ago
Bugfix: fix broken of download models from modelscope
github.com/vllm-project/vllm - liuyhwangyh opened this pull request 5 months ago
github.com/vllm-project/vllm - liuyhwangyh opened this pull request 5 months ago
[Feature]: vllm-flash-attn cu118 compatibility
github.com/vllm-project/vllm - epark001 opened this issue 5 months ago
github.com/vllm-project/vllm - epark001 opened this issue 5 months ago
[Model] Correct Mixtral FP8 checkpoint loading
github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 5 months ago
[Core][Doc] Default to multiprocessing for single-node distributed case
github.com/vllm-project/vllm - njhill opened this pull request 5 months ago
github.com/vllm-project/vllm - njhill opened this pull request 5 months ago
[Bugfix] Fix torch.compile() error when using MultiprocessingGPUExecutor
github.com/vllm-project/vllm - zifeitong opened this pull request 5 months ago
github.com/vllm-project/vllm - zifeitong opened this pull request 5 months ago
[Feature]: Custom attention masks
github.com/vllm-project/vllm - ojus1 opened this issue 5 months ago
github.com/vllm-project/vllm - ojus1 opened this issue 5 months ago
[Usage]: How to start inference serving through `LLM` object
github.com/vllm-project/vllm - Jiayi-Pan opened this issue 5 months ago
github.com/vllm-project/vllm - Jiayi-Pan opened this issue 5 months ago
[Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to False
github.com/vllm-project/vllm - zifeitong opened this pull request 5 months ago
github.com/vllm-project/vllm - zifeitong opened this pull request 5 months ago
[Misc] Adding Speculative decoding to Throughput Benchmarking script
github.com/vllm-project/vllm - abhibambhaniya opened this pull request 5 months ago
github.com/vllm-project/vllm - abhibambhaniya opened this pull request 5 months ago
[Usage]: RuntimeError: CUDA error: uncorrectable ECC error encountered
github.com/vllm-project/vllm - DJCoolDev opened this issue 5 months ago
github.com/vllm-project/vllm - DJCoolDev opened this issue 5 months ago
[Doc]: Update the vllm distributed Inference and Serving with the new MultiprocessingGPUExecutor
github.com/vllm-project/vllm - rcarrata opened this issue 5 months ago
github.com/vllm-project/vllm - rcarrata opened this issue 5 months ago
[Bug]: Mixtral-8x22 request cancelled by cancel scope when client sends multiple concurrent requests
github.com/vllm-project/vllm - markovalexander opened this issue 5 months ago
github.com/vllm-project/vllm - markovalexander opened this issue 5 months ago
[Bug]: Mistral 7B crashes on NVidia Tesla P100 with a CUDA Error
github.com/vllm-project/vllm - oe3gwu opened this issue 5 months ago
github.com/vllm-project/vllm - oe3gwu opened this issue 5 months ago
Support W4A8 quantization for vllm
github.com/vllm-project/vllm - HandH1998 opened this pull request 5 months ago
github.com/vllm-project/vllm - HandH1998 opened this pull request 5 months ago
[Bugfix] Support `prompt_logprobs==0`
github.com/vllm-project/vllm - toslunar opened this pull request 5 months ago
github.com/vllm-project/vllm - toslunar opened this pull request 5 months ago
[CI/Build] Add inputs tests
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Core] Registry for processing model inputs
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Bug]: prompt_logprobs=0 raises AssertionError
github.com/vllm-project/vllm - toslunar opened this issue 5 months ago
github.com/vllm-project/vllm - toslunar opened this issue 5 months ago
[Installation]: Failed to build punica
github.com/vllm-project/vllm - asinglestep opened this issue 5 months ago
github.com/vllm-project/vllm - asinglestep opened this issue 5 months ago
[Usage]: how to terminal a vllm model and free or release gpu memory
github.com/vllm-project/vllm - wellcasa opened this issue 5 months ago
github.com/vllm-project/vllm - wellcasa opened this issue 5 months ago
[Bugfix]: During testing, use pytest monkeypatch for safely overriding the env var that indicates the vLLM backend
github.com/vllm-project/vllm - afeldman-nm opened this pull request 5 months ago
github.com/vllm-project/vllm - afeldman-nm opened this pull request 5 months ago
[Feature]: Support for Mirostat, Dynamic Temperature, and Quadratic Sampling
github.com/vllm-project/vllm - Emmie411 opened this issue 5 months ago
github.com/vllm-project/vllm - Emmie411 opened this issue 5 months ago
[Bug]: VLLM_ATTENTION_BACKEND set to ROCM_FLASH only in GHA environment, overriding automatic backend selection; this breaks other kernel unit tests.
github.com/vllm-project/vllm - afeldman-nm opened this issue 5 months ago
github.com/vllm-project/vllm - afeldman-nm opened this issue 5 months ago
[BugFix] Apply get_cached_tokenizer to the tokenizer setter of LLM
github.com/vllm-project/vllm - DriverSong opened this pull request 5 months ago
github.com/vllm-project/vllm - DriverSong opened this pull request 5 months ago
[Feature]: Option to override HuggingFace's configurations
github.com/vllm-project/vllm - DarkLight1337 opened this issue 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this issue 5 months ago
[Feature]: inconsistent vocab_sizes support for draft and target workers while using Speculative Decoding
github.com/vllm-project/vllm - ShangmingCai opened this issue 5 months ago
github.com/vllm-project/vllm - ShangmingCai opened this issue 5 months ago
[Feature]: Speculative edits
github.com/vllm-project/vllm - Muhtasham opened this issue 5 months ago
github.com/vllm-project/vllm - Muhtasham opened this issue 5 months ago
[Bug]: Issues with Applying LoRA in vllm on a T4 GPU
github.com/vllm-project/vllm - rikitomo opened this issue 5 months ago
github.com/vllm-project/vllm - rikitomo opened this issue 5 months ago
[Bug]: Issues with Applying LoRA in vllm on a T4 GPU
github.com/vllm-project/vllm - rikioka-tomokazu opened this issue 5 months ago
github.com/vllm-project/vllm - rikioka-tomokazu opened this issue 5 months ago
[Frontend] Customizable RoPE theta
github.com/vllm-project/vllm - sasha0552 opened this pull request 5 months ago
github.com/vllm-project/vllm - sasha0552 opened this pull request 5 months ago
[Usage]: how to use the gpu_cache_usage_perc as a custom metric in k8s HPA?
github.com/vllm-project/vllm - chakpongchung opened this issue 5 months ago
github.com/vllm-project/vllm - chakpongchung opened this issue 5 months ago
[Misc] Improve error message when LoRA parsing fails
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 5 months ago
[Usage]: How can I deploy llama3-70b on a server with 8 3090 GPUs with lora and CUDA graph.
github.com/vllm-project/vllm - AlphaINF opened this issue 5 months ago
github.com/vllm-project/vllm - AlphaINF opened this issue 5 months ago
[Core] Support loading GGUF model
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 5 months ago
[Bug]: loading squeezellm model
github.com/vllm-project/vllm - yuhuixu1993 opened this issue 5 months ago
github.com/vllm-project/vllm - yuhuixu1993 opened this issue 5 months ago
[Core][Prefix Caching] Fix hashing logic for non-full blocks
github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago
[Bugfix] [Frontend] vLLM api_server.py when using with prompt_token_ids causes error.
github.com/vllm-project/vllm - TikZSZ opened this pull request 5 months ago
github.com/vllm-project/vllm - TikZSZ opened this pull request 5 months ago
[Bug]: vLLM api_server.py when using with prompt_token_ids causes error.
github.com/vllm-project/vllm - TikZSZ opened this issue 5 months ago
github.com/vllm-project/vllm - TikZSZ opened this issue 5 months ago
[Feature]: MoE kernels (Mixtral-8x22B-Instruct-v0.1) are not yet supported on CPU only ?
github.com/vllm-project/vllm - xxll88 opened this issue 5 months ago
github.com/vllm-project/vllm - xxll88 opened this issue 5 months ago
[BugFix] Prevent `LLM.encode` for non-generation Models
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 5 months ago
[Kernel] Switch fp8 layers to use the CUTLASS kernels
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 5 months ago
[Bug]: Offline Inference with the OpenAI Batch file format yields unnecessary `asyncio.exceptions.CancelledError`
github.com/vllm-project/vllm - jlcmoore opened this issue 5 months ago
github.com/vllm-project/vllm - jlcmoore opened this issue 5 months ago
[Bug]: The Offline Inference Embedding Example Fails
github.com/vllm-project/vllm - cuizhuyefei opened this issue 5 months ago
github.com/vllm-project/vllm - cuizhuyefei opened this issue 5 months ago
[Bugfix]: Fix issues related to prefix caching example (#5177)
github.com/vllm-project/vllm - Delviet opened this pull request 5 months ago
github.com/vllm-project/vllm - Delviet opened this pull request 5 months ago
[Feature]: BERT models for embeddings
github.com/vllm-project/vllm - mevince opened this issue 5 months ago
github.com/vllm-project/vllm - mevince opened this issue 5 months ago
[Model] LoRA support added for command-r
github.com/vllm-project/vllm - sergey-tinkoff opened this pull request 5 months ago
github.com/vllm-project/vllm - sergey-tinkoff opened this pull request 5 months ago
[Bug]: Incorrect Example for the Inference with Prefix
github.com/vllm-project/vllm - Delviet opened this issue 5 months ago
github.com/vllm-project/vllm - Delviet opened this issue 5 months ago
[Usage]: Prefix caching in VLLM
github.com/vllm-project/vllm - Abhinay2323 opened this issue 5 months ago
github.com/vllm-project/vllm - Abhinay2323 opened this issue 5 months ago
[Bugfix] Remove deprecated @abstractproperty
github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request 5 months ago
bug fixed: cuda out of memory lead to 'AsyncEngineDeadError: Background loop has errored already.
github.com/vllm-project/vllm - charent opened this pull request 5 months ago
github.com/vllm-project/vllm - charent opened this pull request 5 months ago
Adding fp8 gemm computation
github.com/vllm-project/vllm - charlifu opened this pull request 5 months ago
github.com/vllm-project/vllm - charlifu opened this pull request 5 months ago
[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support
github.com/vllm-project/vllm - njhill opened this pull request 5 months ago
github.com/vllm-project/vllm - njhill opened this pull request 5 months ago
[Bug]: Model Launch Hangs with 16+ Ranks in vLLM
github.com/vllm-project/vllm - wushidonguc opened this issue 5 months ago
github.com/vllm-project/vllm - wushidonguc opened this issue 5 months ago