vLLM issues | Ecosyste.ms: OpenCollective

[Bug]: resource_tracker unregister error with 2*3090

github.com/vllm-project/vllm - xuhao916 opened this issue 4 months ago

[Doc] Update debug docs

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

[Doc] Update LLaVA docs

github.com/vllm-project/vllm - DarkLight1337 opened this pull request 4 months ago

[Bug]: get the degree of the `outlines FSM` compilation progress from vlllm0.5.0 engine (via a route)

github.com/vllm-project/vllm - syGOAT opened this issue 4 months ago

`compressed-tensors` marlin 24 support

github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago

[Feature]: PagedAttention for CPU-memory constraned environments?

github.com/vllm-project/vllm - peeteeman opened this issue 4 months ago

[Feature]: Add guided-* Parameters to Sampling Parameters

github.com/vllm-project/vllm - zhanghx0905 opened this issue 4 months ago

[ Misc ] Rs/compressed tensors cleanup

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 4 months ago

[Feature]: Support [RecurrentGemmaForCausalLM]

github.com/vllm-project/vllm - sung-ho-moon opened this issue 4 months ago

[Bugfix] fix lora_dtype value type in arg_utils.py - part 2

github.com/vllm-project/vllm - c3-ali opened this pull request 4 months ago

[Docs] [Spec decode] Fix docs error in code example

github.com/vllm-project/vllm - cadedaniel opened this pull request 4 months ago

[Feature]: ci test with vGPU

github.com/vllm-project/vllm - youkaichao opened this issue 4 months ago

[Frontend] Add "input speed" to tqdm postfix alongside output speed

github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago

[Bug]: CUDA out of memory when setting prompt_logprobs with larger batch_size

github.com/vllm-project/vllm - qaz-wsx-1 opened this issue 4 months ago

[RFC]: Improve guided decoding (logit_processor) APIs and performance.

github.com/vllm-project/vllm - rkooo567 opened this issue 4 months ago

[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes

github.com/vllm-project/vllm - mawong-amd opened this pull request 4 months ago

[Bug]: Automatic Prefix caching not working while hitting same request multiple times

github.com/vllm-project/vllm - Abhinay2323 opened this issue 4 months ago

cache image build

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

[Bug]: vllm deployment of GLM-4V reports KeyError: 'transformer.vision.transformer.layers.45.mlp.fc2.weight'

github.com/vllm-project/vllm - zhaobu opened this issue 4 months ago

[Bug]: Small context lengths consume more memory than large context lengths

github.com/vllm-project/vllm - majestichou opened this issue 4 months ago

[Usage]: How do you specify a specific branch on huggingface to use when downloading a model?

github.com/vllm-project/vllm - fake-name opened this issue 4 months ago

[Speculative Decoding] Support draft model on different tensor-parallel size than target model

github.com/vllm-project/vllm - wooyeonlee0 opened this pull request 4 months ago

[Doc]: Urgent MoE question

github.com/vllm-project/vllm - ymmm-4 opened this issue 4 months ago

[Bugfix] We have fixed the bug that occurred when using FlashInfer as the backend in vLLM Speculative Decoding.

github.com/vllm-project/vllm - bong-furiosa opened this pull request 4 months ago

[Bugfix]Fix evict v2 with long context length

github.com/vllm-project/vllm - puf147 opened this pull request 4 months ago

[CI] docfix

github.com/vllm-project/vllm - rkooo567 opened this pull request 4 months ago

[Doc] add debugging tips

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Core] Refactor Worker and ModelRunner to consolidate control plane communication

github.com/vllm-project/vllm - stephanie-wang opened this pull request 4 months ago

[Performance]: Qwen2-72B-Instruction-GPTQ-Int4 Openai Server Request Problem

github.com/vllm-project/vllm - syngokhan opened this issue 4 months ago

hidden-states from final (or middle layers)

github.com/vllm-project/vllm - janphilippfranken opened this issue 4 months ago

[Bug]:The vllm service takes two hours to start Because of NCCL

github.com/vllm-project/vllm - zhaotyer opened this issue 4 months ago

[Bug]: topk=1 and temperature=0 cause different output in vllm

github.com/vllm-project/vllm - rangehow opened this issue 4 months ago

[Doc][Typo] Fixing Missing Comma

github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago

[Bugfix] Add device assertion to TorchSDPA

github.com/vllm-project/vllm - bigPYJ1151 opened this pull request 4 months ago

[Kernel] Suppress mma.sp warning on CUDA 12.5 and later

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago

[Speculative decoding] Initial spec decode docs

github.com/vllm-project/vllm - cadedaniel opened this pull request 4 months ago

[Core][Distributed] add shm broadcast

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Bugfix] fix lora_dtype value type in arg_utils.py

github.com/vllm-project/vllm - c3-ali opened this pull request 4 months ago

[Bug]: EngineArgs missing value type for `lora_dtype`

github.com/vllm-project/vllm - c3-ali opened this issue 4 months ago

[Kernel] Vectorized FP8 quantize kernel

github.com/vllm-project/vllm - comaniac opened this pull request 4 months ago

[Bug]: Llama3 output limited to around 10 tokens

github.com/vllm-project/vllm - arifsaeed opened this issue 4 months ago

[ci] Fix Buildkite agent path

github.com/vllm-project/vllm - khluu opened this pull request 4 months ago

[Kernel] Factor out epilogues from cutlass kernels

github.com/vllm-project/vllm - tlrmchlsmth opened this pull request 4 months ago

[Kernel] Adding fused bias add to cutlass_scaled_mm_dq kernel

github.com/vllm-project/vllm - cyang49 opened this pull request 4 months ago

[Misc] Remove VLLM_BUILD_WITH_NEURON env variable

github.com/vllm-project/vllm - WoosukKwon opened this pull request 4 months ago

[Doc] Add documentation for FP8 W8A8

github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago

[Kernel] `w4a16` support for `compressed-tensors`

github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago

Bump version to v0.5.0

github.com/vllm-project/vllm - simon-mo opened this pull request 4 months ago

[Docs] Add Docs on Limitations of VLM Support

github.com/vllm-project/vllm - ywang96 opened this pull request 4 months ago

[CI] Upgrade codespell version.

github.com/vllm-project/vllm - rkooo567 opened this pull request 4 months ago

[Hardware][Intel] OpenVINO vLLM backend

github.com/vllm-project/vllm - ilya-lavrenov opened this pull request 4 months ago

[RFC]: OpenVINO vLLM backend

github.com/vllm-project/vllm - ilya-lavrenov opened this issue 4 months ago

0.4.3 error CUDA error: an illegal memory access was encountered

github.com/vllm-project/vllm - maxin9966 opened this issue 4 months ago

[Core][Distributed] add same-node detection

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Misc] Various simplifications and typing fixes

github.com/vllm-project/vllm - njhill opened this pull request 4 months ago

[WIP][Core] Support tensor parallel division with remainder of attention heads

github.com/vllm-project/vllm - NadavShmayo opened this pull request 4 months ago

[Bug]: Docker image starts vllm.entrypoints.openai.api_server , Docker opens port 8000 but vllm isn't listening on 8000

github.com/vllm-project/vllm - elabz opened this issue 4 months ago

[Bug]: load nvidia/Llama3-ChatQA-1.5-8B model 15 min

github.com/vllm-project/vllm - JJplane opened this issue 4 months ago

[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy

github.com/vllm-project/vllm - KuntaiDu opened this pull request 4 months ago

[Bug]: Multi GPU setup for VLLM in Openshift still does not work

github.com/vllm-project/vllm - jayteaftw opened this issue 4 months ago

[Model] Add GLM-4v support

github.com/vllm-project/vllm - songxxzp opened this pull request 4 months ago

[CI/Test] improve robustness of test by replacing del with context manager (vllm_runner)

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Kernel][RFC] Initial commit containing new Triton kernels for multi lora serving.

github.com/vllm-project/vllm - FurtherAI opened this pull request 4 months ago

[Bugfix] Take the VRAM usage of prompt_logprobs into account

github.com/vllm-project/vllm - Conless opened this pull request 4 months ago

[Core][Distributed] merge two broadcast_tensor_dict

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input_scale

github.com/vllm-project/vllm - mgoin opened this pull request 4 months ago

[Bug Fix] Fix the support check for FP8 CUTLASS

github.com/vllm-project/vllm - cli99 opened this pull request 4 months ago

[Bug]: TorchSDPAMetadata is out of date

github.com/vllm-project/vllm - Reichenbachian opened this issue 4 months ago

[Misc] Update to comply with the new `compressed-tensors` config

github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago

[Bugfix][Core] fix broken state for recompute

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker

github.com/vllm-project/vllm - sroy745 opened this pull request 4 months ago

[CI/Test] improve robustness of test by replacing del with context manager (hf_runner)

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[RFC]: Refactor MoE

github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this issue 4 months ago

[Misc] Remove unused cuda_utils.h in CPU backend

github.com/vllm-project/vllm - DamonFool opened this pull request 4 months ago

[Bug]: with `--enable-prefix-caching` , `/completions` crashes server with `echo=True` above certain prompt length

github.com/vllm-project/vllm - hibukipanim opened this issue 4 months ago

[Bug]: Qwen2 MoE: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'. Did you mean: 'qweight'?

github.com/vllm-project/vllm - geekwish opened this issue 4 months ago

[Speculative decoding]: The content generated by speculative decoding is inconsistent with the content generated by : When I use the speculative mode and prompt_length+output_length > 2048, the error occurs

github.com/vllm-project/vllm - zhangxy1234 opened this issue 4 months ago

fix DbrxFusedNormAttention missing cache_config

github.com/vllm-project/vllm - Calvinnncy97 opened this pull request 4 months ago

[Performance]: [Automatic Prefix Caching] When hitting the KV cached blocks, the first execute is slow, and then is fast.

github.com/vllm-project/vllm - soacker opened this issue 4 months ago

[Usage]: Howto quiet the terminal 'Info' outputs in vllm

github.com/vllm-project/vllm - rohitnanda1443 opened this issue 4 months ago

[Bug]: non-deterministic Python gc order leads to flaky tests

github.com/vllm-project/vllm - youkaichao opened this issue 4 months ago

[Bug]: Getting an empty string ('') for every call on fine-tuned Code-Llama-7b-hf model

github.com/vllm-project/vllm - arthbohra opened this issue 4 months ago

[Misc] Add args for selecting distributed executor to benchmarks

github.com/vllm-project/vllm - BKitor opened this pull request 4 months ago

[Bug]: Unexpected prompt token logprob behaviors of llama 2 when setting echo=True for openai-api server

github.com/vllm-project/vllm - fywalter opened this issue 4 months ago

[Misc][Utils] allow get_open_port to be called for multiple times

github.com/vllm-project/vllm - youkaichao opened this pull request 4 months ago

[Core] Fix sharing of stateful logits processors

github.com/vllm-project/vllm - maxdebayser opened this pull request 4 months ago

[MISC] Upgrade dependency to PyTorch 2.3.1

github.com/vllm-project/vllm - comaniac opened this pull request 4 months ago

Sa 24 sparse

github.com/vllm-project/vllm - dsikka opened this pull request 4 months ago

[Doc] Add an automatic prefix caching section in vllm documentation

github.com/vllm-project/vllm - KuntaiDu opened this pull request 4 months ago

[AMD][ROCm][CI] unit tests fixes or skip

github.com/vllm-project/vllm - hongxiayang opened this pull request 4 months ago

[Usage]: Streaming Response from vLLM 0.4.2 -> 0.4.3

github.com/vllm-project/vllm - BiboyQG opened this issue 4 months ago

[Feature][Frontend]: Continued `stream_options` implementation also in CompletionRequest

github.com/vllm-project/vllm - Etelis opened this pull request 4 months ago

[New Model]: mistralai/Codestral-22B-v0.1

github.com/vllm-project/vllm - eduardozamudio opened this issue 4 months ago

[Installation]: Compiling VLLM for cpu only.

github.com/vllm-project/vllm - Zibri opened this issue 4 months ago

[Performance]: gptq and awq quantization do not improve the performance

github.com/vllm-project/vllm - aaronlyt opened this issue 4 months ago

[Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs

github.com/vllm-project/vllm - maor-ps opened this pull request 4 months ago

GLM-4-9B-Chat:

github.com/vllm-project/vllm - Geaming-CHN opened this issue 5 months ago

[Bugfix]if the content is started with ":"(response of ping), client should i…

github.com/vllm-project/vllm - sywangyi opened this pull request 5 months ago

[Installation]: Building editable for vllm fails (pip install -e .)

github.com/vllm-project/vllm - felixzhu555 opened this issue 5 months ago

[Bug]: Cannot request more than 5 logprobs

github.com/vllm-project/vllm - coder109 opened this issue 5 months ago