Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Bug]: resource_tracker unregister error with 2*3090
github.com/vllm-project/vllm - xuhao916 opened this issue 4 months ago
github.com/vllm-project/vllm - xuhao916 opened this issue 4 months ago
[Doc] Update debug docs
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
[Doc] Update LLaVA docs
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago
[Bug]: get the degree of the `outlines FSM` compilation progress from vlllm0.5.0 engine (via a route)
github.com/vllm-project/vllm - syGOAT opened this issue 4 months ago
github.com/vllm-project/vllm - syGOAT opened this issue 4 months ago
`compressed-tensors` marlin 24 support
github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago
[Feature]: PagedAttention for CPU-memory constraned environments?
github.com/vllm-project/vllm - peeteeman opened this issue 4 months ago
github.com/vllm-project/vllm - peeteeman opened this issue 4 months ago
[Feature]: Add guided-* Parameters to Sampling Parameters
github.com/vllm-project/vllm - zhanghx0905 opened this issue 4 months ago
github.com/vllm-project/vllm - zhanghx0905 opened this issue 4 months ago
[ Misc ] Rs/compressed tensors cleanup
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 4 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 4 months ago
[Feature]: Support [RecurrentGemmaForCausalLM]
github.com/vllm-project/vllm - sung-ho-moon opened this issue 4 months ago
github.com/vllm-project/vllm - sung-ho-moon opened this issue 4 months ago
[Bugfix] fix lora_dtype value type in arg_utils.py - part 2
github.com/vllm-project/vllm - c3-ali opened this pull request 4 months ago
github.com/vllm-project/vllm - c3-ali opened this pull request 4 months ago
[Docs] [Spec decode] Fix docs error in code example
github.com/vllm-project/vllm - cadedaniel opened this pull request 4 months ago
github.com/vllm-project/vllm - cadedaniel opened this pull request 4 months ago
[Feature]: ci test with vGPU
github.com/vllm-project/vllm - youkaichao opened this issue 4 months ago
github.com/vllm-project/vllm - youkaichao opened this issue 4 months ago
[Frontend] Add "input speed" to tqdm postfix alongside output speed
github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago
[Bug]: CUDA out of memory when setting prompt_logprobs with larger batch_size
github.com/vllm-project/vllm - qaz-wsx-1 opened this issue 4 months ago
github.com/vllm-project/vllm - qaz-wsx-1 opened this issue 4 months ago
[RFC]: Improve guided decoding (logit_processor) APIs and performance.
github.com/vllm-project/vllm - rkooo567 opened this issue 4 months ago
github.com/vllm-project/vllm - rkooo567 opened this issue 4 months ago
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes
github.com/vllm-project/vllm - mawong-amd opened this pull request 4 months ago
github.com/vllm-project/vllm - mawong-amd opened this pull request 4 months ago
[Bug]: Automatic Prefix caching not working while hitting same request multiple times
github.com/vllm-project/vllm - Abhinay2323 opened this issue 4 months ago
github.com/vllm-project/vllm - Abhinay2323 opened this issue 4 months ago
[Bug]: vllm deployment of GLM-4V reports KeyError: 'transformer.vision.transformer.layers.45.mlp.fc2.weight'
github.com/vllm-project/vllm - zhaobu opened this issue 4 months ago
github.com/vllm-project/vllm - zhaobu opened this issue 4 months ago
[Bug]: Small context lengths consume more memory than large context lengths
github.com/vllm-project/vllm - majestichou opened this issue 4 months ago
github.com/vllm-project/vllm - majestichou opened this issue 4 months ago
[Usage]: How do you specify a specific branch on huggingface to use when downloading a model?
github.com/vllm-project/vllm - fake-name opened this issue 4 months ago
github.com/vllm-project/vllm - fake-name opened this issue 4 months ago
[Speculative Decoding] Support draft model on different tensor-parallel size than target model
github.com/vllm-project/vllm - wooyeonlee0 opened this pull request 4 months ago
github.com/vllm-project/vllm - wooyeonlee0 opened this pull request 4 months ago
[Bugfix] We have fixed the bug that occurred when using FlashInfer as the backend in vLLM Speculative Decoding.
github.com/vllm-project/vllm - bong-furiosa opened this pull request 4 months ago
github.com/vllm-project/vllm - bong-furiosa opened this pull request 4 months ago
[Bugfix]Fix evict v2 with long context length
github.com/vllm-project/vllm - puf147 opened this pull request 4 months ago
github.com/vllm-project/vllm - puf147 opened this pull request 4 months ago
[Doc] add debugging tips
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
[Core] Refactor Worker and ModelRunner to consolidate control plane communication
github.com/vllm-project/vllm - stephanie-wang opened this pull request 4 months ago
github.com/vllm-project/vllm - stephanie-wang opened this pull request 4 months ago
[Performance]: Qwen2-72B-Instruction-GPTQ-Int4 Openai Server Request Problem
github.com/vllm-project/vllm - syngokhan opened this issue 4 months ago
github.com/vllm-project/vllm - syngokhan opened this issue 4 months ago
hidden-states from final (or middle layers)
github.com/vllm-project/vllm - janphilippfranken opened this issue 4 months ago
github.com/vllm-project/vllm - janphilippfranken opened this issue 4 months ago
[Bug]:The vllm service takes two hours to start Because of NCCL
github.com/vllm-project/vllm - zhaotyer opened this issue 4 months ago
github.com/vllm-project/vllm - zhaotyer opened this issue 4 months ago
[Bug]: topk=1 and temperature=0 cause different output in vllm
github.com/vllm-project/vllm - rangehow opened this issue 4 months ago
github.com/vllm-project/vllm - rangehow opened this issue 4 months ago
[Doc][Typo] Fixing Missing Comma
github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago
[Bugfix] Add device assertion to TorchSDPA
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 4 months ago
github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 4 months ago
[Kernel] Suppress mma.sp warning on CUDA 12.5 and later
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago
[Speculative decoding] Initial spec decode docs
github.com/vllm-project/vllm - cadedaniel opened this pull request 4 months ago
github.com/vllm-project/vllm - cadedaniel opened this pull request 4 months ago
[Core][Distributed] add shm broadcast
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
[Bugfix] fix lora_dtype value type in arg_utils.py
github.com/vllm-project/vllm - c3-ali opened this pull request 4 months ago
github.com/vllm-project/vllm - c3-ali opened this pull request 4 months ago
[Bug]: EngineArgs missing value type for `lora_dtype`
github.com/vllm-project/vllm - c3-ali opened this issue 4 months ago
github.com/vllm-project/vllm - c3-ali opened this issue 4 months ago
[Kernel] Vectorized FP8 quantize kernel
github.com/vllm-project/vllm - comaniac opened this pull request 4 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 4 months ago
[Bug]: Llama3 output limited to around 10 tokens
github.com/vllm-project/vllm - arifsaeed opened this issue 4 months ago
github.com/vllm-project/vllm - arifsaeed opened this issue 4 months ago
[ci] Fix Buildkite agent path
github.com/vllm-project/vllm - khluu opened this pull request 4 months ago
github.com/vllm-project/vllm - khluu opened this pull request 4 months ago
[Kernel] Factor out epilogues from cutlass kernels
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago
github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago
[Kernel] Adding fused bias add to cutlass_scaled_mm_dq kernel
github.com/vllm-project/vllm - cyang49 opened this pull request 4 months ago
github.com/vllm-project/vllm - cyang49 opened this pull request 4 months ago
[Misc] Remove VLLM_BUILD_WITH_NEURON env variable
github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago
[Doc] Add documentation for FP8 W8A8
github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago
[Kernel] `w4a16` support for `compressed-tensors`
github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago
Bump version to v0.5.0
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago
[Docs] Add Docs on Limitations of VLM Support
github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago
github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago
[CI] Upgrade codespell version.
github.com/vllm-project/vllm - rkooo567 opened this pull request 4 months ago
github.com/vllm-project/vllm - rkooo567 opened this pull request 4 months ago
[Hardware][Intel] OpenVINO vLLM backend
github.com/vllm-project/vllm - ilya-lavrenov opened this pull request 4 months ago
github.com/vllm-project/vllm - ilya-lavrenov opened this pull request 4 months ago
[RFC]: OpenVINO vLLM backend
github.com/vllm-project/vllm - ilya-lavrenov opened this issue 4 months ago
github.com/vllm-project/vllm - ilya-lavrenov opened this issue 4 months ago
0.4.3 error CUDA error: an illegal memory access was encountered
github.com/vllm-project/vllm - maxin9966 opened this issue 4 months ago
github.com/vllm-project/vllm - maxin9966 opened this issue 4 months ago
[Core][Distributed] add same-node detection
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
[Misc] Various simplifications and typing fixes
github.com/vllm-project/vllm - njhill opened this pull request 4 months ago
github.com/vllm-project/vllm - njhill opened this pull request 4 months ago
[WIP][Core] Support tensor parallel division with remainder of attention heads
github.com/vllm-project/vllm - NadavShmayo opened this pull request 4 months ago
github.com/vllm-project/vllm - NadavShmayo opened this pull request 4 months ago
[Bug]: Docker image starts vllm.entrypoints.openai.api_server , Docker opens port 8000 but vllm isn't listening on 8000
github.com/vllm-project/vllm - elabz opened this issue 4 months ago
github.com/vllm-project/vllm - elabz opened this issue 4 months ago
[Bug]: load nvidia/Llama3-ChatQA-1.5-8B model 15 min
github.com/vllm-project/vllm - JJplane opened this issue 4 months ago
github.com/vllm-project/vllm - JJplane opened this issue 4 months ago
[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy
github.com/vllm-project/vllm - KuntaiDu opened this pull request 4 months ago
github.com/vllm-project/vllm - KuntaiDu opened this pull request 4 months ago
[Bug]: Multi GPU setup for VLLM in Openshift still does not work
github.com/vllm-project/vllm - jayteaftw opened this issue 4 months ago
github.com/vllm-project/vllm - jayteaftw opened this issue 4 months ago
[Model] Add GLM-4v support
github.com/vllm-project/vllm - songxxzp opened this pull request 4 months ago
github.com/vllm-project/vllm - songxxzp opened this pull request 4 months ago
[CI/Test] improve robustness of test by replacing del with context manager (vllm_runner)
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
[Kernel][RFC] Initial commit containing new Triton kernels for multi lora serving.
github.com/vllm-project/vllm - FurtherAI opened this pull request 4 months ago
github.com/vllm-project/vllm - FurtherAI opened this pull request 4 months ago
[Bugfix] Take the VRAM usage of prompt_logprobs into account
github.com/vllm-project/vllm - Conless opened this pull request 4 months ago
github.com/vllm-project/vllm - Conless opened this pull request 4 months ago
[Core][Distributed] merge two broadcast_tensor_dict
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input_scale
github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago
github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago
[Bug Fix] Fix the support check for FP8 CUTLASS
github.com/vllm-project/vllm - cli99 opened this pull request 4 months ago
github.com/vllm-project/vllm - cli99 opened this pull request 4 months ago
[Bug]: TorchSDPAMetadata is out of date
github.com/vllm-project/vllm - Reichenbachian opened this issue 4 months ago
github.com/vllm-project/vllm - Reichenbachian opened this issue 4 months ago
[Misc] Update to comply with the new `compressed-tensors` config
github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago
github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago
[Bugfix][Core] fix broken state for recompute
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
[Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker
github.com/vllm-project/vllm - sroy745 opened this pull request 4 months ago
github.com/vllm-project/vllm - sroy745 opened this pull request 4 months ago
[CI/Test] improve robustness of test by replacing del with context manager (hf_runner)
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
[RFC]: Refactor MoE
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 4 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 4 months ago
[Misc] Remove unused cuda_utils.h in CPU backend
github.com/vllm-project/vllm - DamonFool opened this pull request 4 months ago
github.com/vllm-project/vllm - DamonFool opened this pull request 4 months ago
[Bug]: with `--enable-prefix-caching` , `/completions` crashes server with `echo=True` above certain prompt length
github.com/vllm-project/vllm - hibukipanim opened this issue 4 months ago
github.com/vllm-project/vllm - hibukipanim opened this issue 4 months ago
[Bug]: Qwen2 MoE: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'. Did you mean: 'qweight'?
github.com/vllm-project/vllm - geekwish opened this issue 4 months ago
github.com/vllm-project/vllm - geekwish opened this issue 4 months ago
fix DbrxFusedNormAttention missing cache_config
github.com/vllm-project/vllm - Calvinnncy97 opened this pull request 4 months ago
github.com/vllm-project/vllm - Calvinnncy97 opened this pull request 4 months ago
[Performance]: [Automatic Prefix Caching] When hitting the KV cached blocks, the first execute is slow, and then is fast.
github.com/vllm-project/vllm - soacker opened this issue 4 months ago
github.com/vllm-project/vllm - soacker opened this issue 4 months ago
[Usage]: Howto quiet the terminal 'Info' outputs in vllm
github.com/vllm-project/vllm - rohitnanda1443 opened this issue 4 months ago
github.com/vllm-project/vllm - rohitnanda1443 opened this issue 4 months ago
[Bug]: non-deterministic Python gc order leads to flaky tests
github.com/vllm-project/vllm - youkaichao opened this issue 4 months ago
github.com/vllm-project/vllm - youkaichao opened this issue 4 months ago
[Bug]: Getting an empty string ('') for every call on fine-tuned Code-Llama-7b-hf model
github.com/vllm-project/vllm - arthbohra opened this issue 4 months ago
github.com/vllm-project/vllm - arthbohra opened this issue 4 months ago
[Misc] Add args for selecting distributed executor to benchmarks
github.com/vllm-project/vllm - BKitor opened this pull request 4 months ago
github.com/vllm-project/vllm - BKitor opened this pull request 4 months ago
[Bug]: Unexpected prompt token logprob behaviors of llama 2 when setting echo=True for openai-api server
github.com/vllm-project/vllm - fywalter opened this issue 4 months ago
github.com/vllm-project/vllm - fywalter opened this issue 4 months ago
[Misc][Utils] allow get_open_port to be called for multiple times
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago
[Core] Fix sharing of stateful logits processors
github.com/vllm-project/vllm - maxdebayser opened this pull request 4 months ago
github.com/vllm-project/vllm - maxdebayser opened this pull request 4 months ago
[MISC] Upgrade dependency to PyTorch 2.3.1
github.com/vllm-project/vllm - comaniac opened this pull request 4 months ago
github.com/vllm-project/vllm - comaniac opened this pull request 4 months ago
[Doc] Add an automatic prefix caching section in vllm documentation
github.com/vllm-project/vllm - KuntaiDu opened this pull request 4 months ago
github.com/vllm-project/vllm - KuntaiDu opened this pull request 4 months ago
[AMD][ROCm][CI] unit tests fixes or skip
github.com/vllm-project/vllm - hongxiayang opened this pull request 4 months ago
github.com/vllm-project/vllm - hongxiayang opened this pull request 4 months ago
[Usage]: Streaming Response from vLLM 0.4.2 -> 0.4.3
github.com/vllm-project/vllm - BiboyQG opened this issue 4 months ago
github.com/vllm-project/vllm - BiboyQG opened this issue 4 months ago
[Feature][Frontend]: Continued `stream_options` implementation also in CompletionRequest
github.com/vllm-project/vllm - Etelis opened this pull request 4 months ago
github.com/vllm-project/vllm - Etelis opened this pull request 4 months ago
[New Model]: mistralai/Codestral-22B-v0.1
github.com/vllm-project/vllm - eduardozamudio opened this issue 4 months ago
github.com/vllm-project/vllm - eduardozamudio opened this issue 4 months ago
[Installation]: Compiling VLLM for cpu only.
github.com/vllm-project/vllm - Zibri opened this issue 4 months ago
github.com/vllm-project/vllm - Zibri opened this issue 4 months ago
[Performance]: gptq and awq quantization do not improve the performance
github.com/vllm-project/vllm - aaronlyt opened this issue 4 months ago
github.com/vllm-project/vllm - aaronlyt opened this issue 4 months ago
[Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs
github.com/vllm-project/vllm - maor-ps opened this pull request 4 months ago
github.com/vllm-project/vllm - maor-ps opened this pull request 4 months ago
[Bugfix]if the content is started with ":"(response of ping), client should i…
github.com/vllm-project/vllm - sywangyi opened this pull request 5 months ago
github.com/vllm-project/vllm - sywangyi opened this pull request 5 months ago
[Installation]: Building editable for vllm fails (pip install -e .)
github.com/vllm-project/vllm - felixzhu555 opened this issue 5 months ago
github.com/vllm-project/vllm - felixzhu555 opened this issue 5 months ago
[Bug]: Cannot request more than 5 logprobs
github.com/vllm-project/vllm - coder109 opened this issue 5 months ago
github.com/vllm-project/vllm - coder109 opened this issue 5 months ago