Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://github.com/vllm-project/vllm
[Misc] Improve type annotations for `support_torch_compile`
DarkLight1337 opened this pull request 23 days ago
DarkLight1337 opened this pull request 23 days ago
support download Lora Model from ModelScope and download private mode…
AlphaINF opened this pull request 23 days ago
AlphaINF opened this pull request 23 days ago
TypeError: ChatGLMTokenizer._pad() got an unexpected keyword argument 'padding_side'
wenruihua opened this issue 23 days ago
wenruihua opened this issue 23 days ago
[Usage]: 可以在vllm的日志打印中加入模型的输出吗?因为请求端有的时候看不到结果,但是模型已经推理结束了,想在服务端看一下模型的输出
WangJianQ-0118 opened this issue 23 days ago
WangJianQ-0118 opened this issue 23 days ago
[Usage]: ValueError: Model architectures ['Qwen2ForCausalLM'] failed to be inspected. Please check the logs for more details.
despzcm opened this issue 23 days ago
despzcm opened this issue 23 days ago
[Feature]: ChatCompletionRequest get default value from generation_config.json
zhaotyer opened this issue 23 days ago
zhaotyer opened this issue 23 days ago
[platform] Add verify_quantization in platform.
wangxiyuan opened this pull request 23 days ago
wangxiyuan opened this pull request 23 days ago
[Misc]: Qwen2VL Vision ID Support
yusufani opened this issue 23 days ago
yusufani opened this issue 23 days ago
[Usage]: How to use `use_image_id` and `max_slice_num` parameter
2U1 opened this issue 23 days ago
2U1 opened this issue 23 days ago
[Feature]: Beam search: top_p, min_p and logit processors
denadai2 opened this issue 23 days ago
denadai2 opened this issue 23 days ago
[Bugfix] Prevent benchmark_throughput.py from using duplicated random prompts
mgoin opened this pull request 24 days ago
mgoin opened this pull request 24 days ago
[Feature]: Enable `/score` endpoint for all embedding models
maxdebayser opened this issue 24 days ago
maxdebayser opened this issue 24 days ago
[Model] Clean up MiniCPMV
DarkLight1337 opened this pull request 24 days ago
DarkLight1337 opened this pull request 24 days ago
Configuration of the model parallelism does not make sense
fajavadi opened this pull request 24 days ago
fajavadi opened this pull request 24 days ago
[Misc][XPU] Avoid torch compile for XPU platform
yma11 opened this pull request 24 days ago
yma11 opened this pull request 24 days ago
[Bug]: Making a request to the OpenAI API server with n=2 and best_of=2 fails
payoto opened this issue 24 days ago
payoto opened this issue 24 days ago
[Misc] typo find in sampling_metadata.py
noooop opened this pull request 24 days ago
noooop opened this pull request 24 days ago
[Model] Add has_weight to RMSNorm and re-enable weights loading tracker for Mamba
Isotr0py opened this pull request 24 days ago
Isotr0py opened this pull request 24 days ago
[Bug]: (vllm serve Qwen/Qwen2.5-1.5B-Instruct) is generating error torch.distributed.DistBackendError: File name too long and same thing is happening with other models
Diksha06122 opened this issue 24 days ago
Diksha06122 opened this issue 24 days ago
[V1] Optimize the CPU overheads in FlashAttention custom op
WoosukKwon opened this pull request 24 days ago
WoosukKwon opened this pull request 24 days ago
[doc]Update config docstring
wangxiyuan opened this pull request 24 days ago
wangxiyuan opened this pull request 24 days ago
[Core] Refactoring disaggregated prefilling/decoding using Mooncake Transfer Engine
alogfans opened this pull request 24 days ago
alogfans opened this pull request 24 days ago
[WIP][V1] Ray executor
rkooo567 opened this pull request 24 days ago
rkooo567 opened this pull request 24 days ago
[Doc]: BNB 8 bit quantization is undocumented
molereddy opened this issue 24 days ago
molereddy opened this issue 24 days ago
[Bugfix] Fix BNB loader target_modules
jeejeelee opened this pull request 25 days ago
jeejeelee opened this pull request 25 days ago
[Model] Update multi-modal processor to support Mantis(LLaVA) model
DarkLight1337 opened this pull request 25 days ago
DarkLight1337 opened this pull request 25 days ago
[Bug]: VLLM run very very slow in ARM cpu
feikiss opened this issue 25 days ago
feikiss opened this issue 25 days ago
[WIP][CI]add genai-perf benchmark in nightly benchmark
jikunshang opened this pull request 25 days ago
jikunshang opened this pull request 25 days ago
[V1] Initial support of multimodal models for V1 re-arch
ywang96 opened this pull request 25 days ago
ywang96 opened this pull request 25 days ago
[Bug]: v0.6.4.post1 Qwen2-VL-7B-Instruct-AWQ crash:shape mismatch
wciq1208 opened this issue 25 days ago
wciq1208 opened this issue 25 days ago
[V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler
sroy745 opened this pull request 25 days ago
sroy745 opened this pull request 25 days ago
[Model] Implement merged input processor for LLaVA model
DarkLight1337 opened this pull request 26 days ago
DarkLight1337 opened this pull request 26 days ago
[RFC]: Make any vLLM model a pooling model
DarkLight1337 opened this issue 26 days ago
DarkLight1337 opened this issue 26 days ago
[Doc] Add github links for source code references
russellb opened this pull request 26 days ago
russellb opened this pull request 26 days ago
[Feature]: Integrate with XGrammar for zero-overhead structured generation in LLM inference.
choisioo opened this issue 26 days ago
choisioo opened this issue 26 days ago
[V1] VLM - Run the mm_mapper preprocessor in the frontend process
alexm-neuralmagic opened this pull request 27 days ago
alexm-neuralmagic opened this pull request 27 days ago
[Model] Enable optional prefix when loading embedding models
DarkLight1337 opened this pull request 27 days ago
DarkLight1337 opened this pull request 27 days ago
[Usage]: What should the chat template for the `meta-llama/Llama-3.2-3B` be?
mrakgr opened this issue 27 days ago
mrakgr opened this issue 27 days ago
[Bug]: Crash with Qwen2-Audio Model in vLLM During Audio Processing
jiahansu opened this issue 27 days ago
jiahansu opened this issue 27 days ago
[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture
IdoAsraff opened this pull request 28 days ago
IdoAsraff opened this pull request 28 days ago
[Bug]: When apply prompt_logprobs for OpenAI server, the prompt_logprobs field in respnose does not show which token is chosen
DIYer22 opened this issue 28 days ago
DIYer22 opened this issue 28 days ago
[Bug]: Authorization ignored when root_path is set
chaunceyjiang opened this pull request 28 days ago
chaunceyjiang opened this pull request 28 days ago
[Usage]: Why `use_beam_search` is eliminated in `vllm.SamplingParams` from v0.6.3?
BAI-Yeqi opened this issue 28 days ago
BAI-Yeqi opened this issue 28 days ago
[fix] Correct num_accepted_tokens counting
KexinFeng opened this pull request 28 days ago
KexinFeng opened this pull request 28 days ago
[doc] update the code to add models
youkaichao opened this pull request 28 days ago
youkaichao opened this pull request 28 days ago
[Usage]: How to make model response information appear in the vllm backend logs
nora647 opened this issue 28 days ago
nora647 opened this issue 28 days ago
Revert "[CI/Build] Print running script to enhance CI log readability"
youkaichao opened this pull request 28 days ago
youkaichao opened this pull request 28 days ago
[Bug]: GGUF Model Output Repeats Nonsensically
Mayflyyh opened this issue 28 days ago
Mayflyyh opened this issue 28 days ago
[model][utils] add extract_layer_index utility function
youkaichao opened this pull request 28 days ago
youkaichao opened this pull request 28 days ago
[Usage]: While loading model get 'layers.0.mlp.down_proj.weight' after merge_and_unload()
alex2romanov opened this issue 28 days ago
alex2romanov opened this issue 28 days ago
[Misc]Further reduce BNB static variable
jeejeelee opened this pull request 28 days ago
jeejeelee opened this pull request 28 days ago
[CI/Build] Print running script to enhance CI log readability
jeejeelee opened this pull request 29 days ago
jeejeelee opened this pull request 29 days ago
[Bugfix] Avoid import AttentionMetadata explicitly in Mllama and fix openvino import
Isotr0py opened this pull request 29 days ago
Isotr0py opened this pull request 29 days ago
[Interleaved ATTN] Support for Mistral-8B
patrickvonplaten opened this pull request 29 days ago
patrickvonplaten opened this pull request 29 days ago
[Bug] Streaming output error of tool calling has still not been resolved.
Sala8888 opened this issue 29 days ago
Sala8888 opened this issue 29 days ago
[Kernel] Remove hard-dependencies of Speculative decode to CUDA workers
xuechendi opened this pull request 29 days ago
xuechendi opened this pull request 29 days ago
[Bug]: Duplicate request_id breaks the engine
tjohnson31415 opened this issue 29 days ago
tjohnson31415 opened this issue 29 days ago
[Core] Update to outlines > 0.1.4
russellb opened this pull request 30 days ago
russellb opened this pull request 30 days ago
[Installation]: Segmentation fault when building Docker container on WSL
nlsferrara opened this issue 30 days ago
nlsferrara opened this issue 30 days ago
[V1] Refactor model executable interface for multimodal models
ywang96 opened this pull request 30 days ago
ywang96 opened this pull request 30 days ago
[Hardware][Intel-Gaudi] Enable LoRA support for Intel Gaudi (HPU)
SanjuCSudhakaran opened this pull request 30 days ago
SanjuCSudhakaran opened this pull request 30 days ago
[Docs] Add dedicated tool calling page to docs
mgoin opened this pull request about 1 month ago
mgoin opened this pull request about 1 month ago
[Usage]: Can we extend the context length of gemma2 model or other models?
hahmad2008 opened this issue about 1 month ago
hahmad2008 opened this issue about 1 month ago
[Feature]: Support for Registering Model-Specific Default Sampling Parameters
yansh97 opened this issue about 1 month ago
yansh97 opened this issue about 1 month ago
[Usage]: How to use ROPE scaling for llama3.1 and gemma2?
hahmad2008 opened this issue about 1 month ago
hahmad2008 opened this issue about 1 month ago
[CI][Installation] Avoid uploading CUDA 11.8 wheel
cermeng opened this pull request about 1 month ago
cermeng opened this pull request about 1 month ago
[Usage]: Fail to load config.json
dequeueing opened this issue about 1 month ago
dequeueing opened this issue about 1 month ago
[Bug]: vllm failed to run two instance with one gpu
pandada8 opened this issue about 1 month ago
pandada8 opened this issue about 1 month ago
Add Sageattention backend
flozi00 opened this pull request about 1 month ago
flozi00 opened this pull request about 1 month ago
[Bug]: Authorization ignored when root_path is set
OskarLiew opened this issue about 1 month ago
OskarLiew opened this issue about 1 month ago
[Misc] Suppress duplicated logging regarding multimodal input pipeline
ywang96 opened this pull request about 1 month ago
ywang96 opened this pull request about 1 month ago
[8/N] enable cli flag without a space
youkaichao opened this pull request about 1 month ago
youkaichao opened this pull request about 1 month ago
[V1] Fix Compilation config & Enable CUDA graph by default
WoosukKwon opened this pull request about 1 month ago
WoosukKwon opened this pull request about 1 month ago
[Usage]: Optimizing TTFT for Qwen2.5-72B Model Deployment on A800 GPUs for RAG Application
zhanghx0905 opened this issue about 1 month ago
zhanghx0905 opened this issue about 1 month ago
[Feature]: Additional possible value for `tool_choice`: `required`
fahadh4ilyas opened this issue about 1 month ago
fahadh4ilyas opened this issue about 1 month ago
[Bug]: Gemma2 becomes a fool.
Foreist opened this issue about 1 month ago
Foreist opened this issue about 1 month ago
fix the issue that len(tokenizer(prompt)["input_ids"]) > prompt_len
sywangyi opened this pull request about 1 month ago
sywangyi opened this pull request about 1 month ago
[Kernel] Register punica ops directly
jeejeelee opened this pull request about 1 month ago
jeejeelee opened this pull request about 1 month ago
[Usage]: when i set --tensor-parallel-size 4 ,openai server dose not work . Report a new Exception
Geek-Peng opened this issue about 1 month ago
Geek-Peng opened this issue about 1 month ago
[platforms] improve error message for unspecified platforms
youkaichao opened this pull request about 1 month ago
youkaichao opened this pull request about 1 month ago
[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server
angkywilliam opened this pull request about 1 month ago
angkywilliam opened this pull request about 1 month ago
[Model] Expose `dynamic_image_size` as mm_processor_kwargs for InternVL2 models
Isotr0py opened this pull request about 1 month ago
Isotr0py opened this pull request about 1 month ago
[Usage]: What's the relationship between KV cache and MAX_SEQUENCE_LENGTH.
GRuuuuu opened this issue about 1 month ago
GRuuuuu opened this issue about 1 month ago
[Bug]: Model does not split in multiple Gpus instead it occupy same memory on each GPU
anilkumar0502 opened this issue about 1 month ago
anilkumar0502 opened this issue about 1 month ago
[Feature]: Manually inject Prefix KV Cache
toilaluan opened this issue about 1 month ago
toilaluan opened this issue about 1 month ago
[Model]: Add support for Aria model
xffxff opened this pull request about 1 month ago
xffxff opened this pull request about 1 month ago
[Doc] fix a small typo in docstring of llama_tool_parser
FerdinandZhong opened this pull request about 1 month ago
FerdinandZhong opened this pull request about 1 month ago
[core] overhaul memory profiling and fix backward compatibility
youkaichao opened this pull request about 1 month ago
youkaichao opened this pull request about 1 month ago
[Feature]: Multimodel prefix-caching features
justzhanghong opened this issue about 1 month ago
justzhanghong opened this issue about 1 month ago
[Usage]:
Lukas-123 opened this issue about 1 month ago
Lukas-123 opened this issue about 1 month ago
[Platforms] Add `device_type` in `Platform`
MengqingCao opened this pull request about 1 month ago
MengqingCao opened this pull request about 1 month ago
[WIP][v1] Refactor KVCacheManager for more hash input than token ids
rickyyx opened this pull request about 1 month ago
rickyyx opened this pull request about 1 month ago
Need to update the jax and jaxlib version
vanbasten23 opened this pull request about 1 month ago
vanbasten23 opened this pull request about 1 month ago
Turn on V1 for H200 build
simon-mo opened this pull request about 1 month ago
simon-mo opened this pull request about 1 month ago
Metrics model name when using multiple loras
mces89 opened this issue about 1 month ago
mces89 opened this issue about 1 month ago
[Model] Add OLMo November 2024 model
2015aroras opened this pull request about 1 month ago
2015aroras opened this pull request about 1 month ago
[Core] Implement disagg prefill by StatelessProcessGroup
KuntaiDu opened this pull request about 1 month ago
KuntaiDu opened this pull request about 1 month ago
Setting default for EmbeddingChatRequest.add_generation_prompt to False
noamgat opened this pull request about 1 month ago
noamgat opened this pull request about 1 month ago