Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
[Installation]: VLLM is impossible to install.
github.com/vllm-project/vllm - GPaolo opened this issue 6 months ago
github.com/vllm-project/vllm - GPaolo opened this issue 6 months ago
[Bug]: Qwen1.5-14B-Chat使用vllm==0.3.3版本在Tesla V100-PCIE-32GB显卡上部署结果全部是感叹号,无结果
github.com/vllm-project/vllm - li995495592 opened this issue 6 months ago
github.com/vllm-project/vllm - li995495592 opened this issue 6 months ago
[Usage]: flash_attn vs xformers
github.com/vllm-project/vllm - VeryVery opened this issue 6 months ago
github.com/vllm-project/vllm - VeryVery opened this issue 6 months ago
[Performance]: FP8 KV Cache performance loss on FP16 models in ROCm
github.com/vllm-project/vllm - TNT3530 opened this issue 6 months ago
github.com/vllm-project/vllm - TNT3530 opened this issue 6 months ago
[Bug]: Command R+ GPTQ bad output on ROCm
github.com/vllm-project/vllm - TNT3530 opened this issue 6 months ago
github.com/vllm-project/vllm - TNT3530 opened this issue 6 months ago
[Core][Frontend][Doc] Initial support for LLaVA-NeXT and GPT-4V Chat Completions API
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
github.com/vllm-project/vllm - DarkLight1337 opened this pull request 6 months ago
[Feature]: Tree attention about Speculative Decoding
github.com/vllm-project/vllm - yukavio opened this issue 6 months ago
github.com/vllm-project/vllm - yukavio opened this issue 6 months ago
[CI/Build] Reduce race condition in docker build
github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 6 months ago
[Bug]: StableLM 12b head size incorrect
github.com/vllm-project/vllm - bjoernpl opened this issue 6 months ago
github.com/vllm-project/vllm - bjoernpl opened this issue 6 months ago
[Model] LoRA gptbigcode implementation
github.com/vllm-project/vllm - raywanb opened this pull request 6 months ago
github.com/vllm-project/vllm - raywanb opened this pull request 6 months ago
[Bugfix] Fix inappropriate content of model_name tag in Prometheus metrics
github.com/vllm-project/vllm - DearPlanet opened this pull request 6 months ago
github.com/vllm-project/vllm - DearPlanet opened this pull request 6 months ago
[Usage]: How to offload some layers to CPU?
github.com/vllm-project/vllm - cheney369 opened this issue 6 months ago
github.com/vllm-project/vllm - cheney369 opened this issue 6 months ago
[Model] Initialize Fuyu-8B support
github.com/vllm-project/vllm - Isotr0py opened this pull request 6 months ago
github.com/vllm-project/vllm - Isotr0py opened this pull request 6 months ago
[Bug]: Cannot use FlashAttention because the package is not found. Please install it for better performance.
github.com/vllm-project/vllm - pseudotensor opened this issue 6 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 6 months ago
[Bug]: Getting subprocess.CalledProcessError: Command '['/usr/bin/gcc',] error message.
github.com/vllm-project/vllm - yk287 opened this issue 6 months ago
github.com/vllm-project/vllm - yk287 opened this issue 6 months ago
[Usage]: I have two Gpus, how do I make my model run on 2 gpus
github.com/vllm-project/vllm - hxujal opened this issue 6 months ago
github.com/vllm-project/vllm - hxujal opened this issue 6 months ago
[Kernel] PyTorch Labs Fused MoE Kernel Integration
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 6 months ago
github.com/vllm-project/vllm - robertgshaw2-neuralmagic opened this pull request 6 months ago
[Feature]: Support Ray-free multi-node distributed inference on resource managers like Kubernetes
github.com/vllm-project/vllm - Jeffwan opened this issue 6 months ago
github.com/vllm-project/vllm - Jeffwan opened this issue 6 months ago
[Bug]: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'
github.com/vllm-project/vllm - guangweiShaw opened this issue 6 months ago
github.com/vllm-project/vllm - guangweiShaw opened this issue 6 months ago
[Usage]: How to determine whether the vllm engine is full with requests or not
github.com/vllm-project/vllm - man2machine opened this issue 6 months ago
github.com/vllm-project/vllm - man2machine opened this issue 6 months ago
[Bug]: killed due to high memory usage
github.com/vllm-project/vllm - xiewf1990 opened this issue 6 months ago
github.com/vllm-project/vllm - xiewf1990 opened this issue 6 months ago
[Bug]: Cannot load lora adapters in WSL 2
github.com/vllm-project/vllm - invokeinnovation opened this issue 6 months ago
github.com/vllm-project/vllm - invokeinnovation opened this issue 6 months ago
[Bug]: vllm 0.4.0.post1 crashed when loading dbrx-instruct on AMD MI250x
github.com/vllm-project/vllm - vgod-dbx opened this issue 7 months ago
github.com/vllm-project/vllm - vgod-dbx opened this issue 7 months ago
[Doc/Feature]: Llava 1.5 in OpenAI compatible server
github.com/vllm-project/vllm - stikkireddy opened this issue 7 months ago
github.com/vllm-project/vllm - stikkireddy opened this issue 7 months ago
[Roadmap] vLLM Roadmap Q2 2024
github.com/vllm-project/vllm - simon-mo opened this issue 7 months ago
github.com/vllm-project/vllm - simon-mo opened this issue 7 months ago
[Misc]: Can we remove `vllm/entrypoints/api_server.py`?
github.com/vllm-project/vllm - hmellor opened this issue 7 months ago
github.com/vllm-project/vllm - hmellor opened this issue 7 months ago
[Frontend] openAI entrypoint dynamic adapter load
github.com/vllm-project/vllm - DavidPeleg6 opened this pull request 7 months ago
github.com/vllm-project/vllm - DavidPeleg6 opened this pull request 7 months ago
[Bug]: Error happen in async_llm_engine when use multiple GPUs
github.com/vllm-project/vllm - for-just-we opened this issue 7 months ago
github.com/vllm-project/vllm - for-just-we opened this issue 7 months ago
[Misc]: Implement CPU/GPU swapping in BlockManagerV2
github.com/vllm-project/vllm - Kaiyang-Chen opened this pull request 7 months ago
github.com/vllm-project/vllm - Kaiyang-Chen opened this pull request 7 months ago
[Core] :loud_sound: Improve request logging truncation
github.com/vllm-project/vllm - joerunde opened this pull request 7 months ago
github.com/vllm-project/vllm - joerunde opened this pull request 7 months ago
[Model] Cohere CommandR+
github.com/vllm-project/vllm - saurabhdash2512 opened this pull request 7 months ago
github.com/vllm-project/vllm - saurabhdash2512 opened this pull request 7 months ago
[Hardware][Intel GPU]Add Initial Intel GPU(XPU) inference backend
github.com/vllm-project/vllm - jikunshang opened this pull request 7 months ago
github.com/vllm-project/vllm - jikunshang opened this pull request 7 months ago
[Bug]: RuntimeError: No suitable kernel. h_in=16 h_out=3424 dtype=Float out_dtype=BFloat16
github.com/vllm-project/vllm - Edisonwei54 opened this issue 7 months ago
github.com/vllm-project/vllm - Edisonwei54 opened this issue 7 months ago
[Feature]: Add OpenTelemetry distributed tracing
github.com/vllm-project/vllm - ronensc opened this issue 7 months ago
github.com/vllm-project/vllm - ronensc opened this issue 7 months ago
[Feature]: cuda12.2 support
github.com/vllm-project/vllm - s-natsubori opened this issue 7 months ago
github.com/vllm-project/vllm - s-natsubori opened this issue 7 months ago
vllm-0.4.0.post1+neuron213; ModuleNotFoundError: No module named 'vllm._C' [Bug]:
github.com/vllm-project/vllm - MojHnd opened this issue 7 months ago
github.com/vllm-project/vllm - MojHnd opened this issue 7 months ago
Best server cmd for mistralai/Mistral-7B-v0.1
github.com/vllm-project/vllm - sshleifer opened this issue 7 months ago
github.com/vllm-project/vllm - sshleifer opened this issue 7 months ago
[Bug]: Qwen-14B-Chat-Int4 with guided_json error
github.com/vllm-project/vllm - xunfeng1980 opened this issue 7 months ago
github.com/vllm-project/vllm - xunfeng1980 opened this issue 7 months ago
[Bug]: n_inner divisible to number of GPUs
github.com/vllm-project/vllm - aliozts opened this issue 7 months ago
github.com/vllm-project/vllm - aliozts opened this issue 7 months ago
[Bug]: docker 启动vllm,配置了host_IP ,还是 [W socket.cpp:663] [c10d] The client socket has failed to connect to [::ffff:172.16.8.232]:39623 (errno: 110 - Connection timed out)
github.com/vllm-project/vllm - huyang19881115 opened this issue 7 months ago
github.com/vllm-project/vllm - huyang19881115 opened this issue 7 months ago
[Core] Eliminate parallel worker per-step task scheduling overhead
github.com/vllm-project/vllm - njhill opened this pull request 7 months ago
github.com/vllm-project/vllm - njhill opened this pull request 7 months ago
[WIP][Core] fully composible launcher/task/coordinator/communicator design and implementation
github.com/vllm-project/vllm - youkaichao opened this pull request 7 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 7 months ago
[Usage]: Expected output when prompt_logprobs=1
github.com/vllm-project/vllm - thefirebanks opened this issue 7 months ago
github.com/vllm-project/vllm - thefirebanks opened this issue 7 months ago
[Bug]: trying to run vllm inference behind the fastapi's server, but it stucks
github.com/vllm-project/vllm - sigridjineth opened this issue 7 months ago
github.com/vllm-project/vllm - sigridjineth opened this issue 7 months ago
[Bug]: CUDA error: invalid argument
github.com/vllm-project/vllm - qingjiaozyn opened this issue 7 months ago
github.com/vllm-project/vllm - qingjiaozyn opened this issue 7 months ago
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API
github.com/vllm-project/vllm - CatherineSue opened this pull request 7 months ago
github.com/vllm-project/vllm - CatherineSue opened this pull request 7 months ago
[CI/Build] A perplexity-computing test for the FP8 KV cache system. Originally used in the context of PR #3290
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 7 months ago
github.com/vllm-project/vllm - Alexei-V-Ivanov-AMD opened this pull request 7 months ago
[Model Loading] Speedup model loading with distributed loading
github.com/vllm-project/vllm - chestnut-Q opened this pull request 7 months ago
github.com/vllm-project/vllm - chestnut-Q opened this pull request 7 months ago
[Misc]: Cohere models are not working due to transformers library outdated?
github.com/vllm-project/vllm - Playerrrrr opened this issue 7 months ago
github.com/vllm-project/vllm - Playerrrrr opened this issue 7 months ago
[Bug]: RuntimeError: CUDA error: invalid device ordinal with multi node multi gpus
github.com/vllm-project/vllm - kn1011 opened this issue 7 months ago
github.com/vllm-project/vllm - kn1011 opened this issue 7 months ago
[Usage]: vllm can host offline? with internet connection?
github.com/vllm-project/vllm - juud79 opened this issue 7 months ago
github.com/vllm-project/vllm - juud79 opened this issue 7 months ago
[Feature]: A instruction/chat method for offline LLM class.
github.com/vllm-project/vllm - simon-mo opened this issue 7 months ago
github.com/vllm-project/vllm - simon-mo opened this issue 7 months ago
[Bug]: Custom all reduce not work.
github.com/vllm-project/vllm - esmeetu opened this issue 7 months ago
github.com/vllm-project/vllm - esmeetu opened this issue 7 months ago
[Usage]: Error Segmentation fault(core dumped) while testing asynchronous high concurrency
github.com/vllm-project/vllm - alex1996-ljl opened this issue 7 months ago
github.com/vllm-project/vllm - alex1996-ljl opened this issue 7 months ago
[Misc]: Implement CPU/GPU swapping in BlockManagerV2
github.com/vllm-project/vllm - cadedaniel opened this issue 7 months ago
github.com/vllm-project/vllm - cadedaniel opened this issue 7 months ago
[Hardware][AMD][Kernel]Adding custom kernel for vector query on Rocm
github.com/vllm-project/vllm - charlifu opened this pull request 7 months ago
github.com/vllm-project/vllm - charlifu opened this pull request 7 months ago
[Bug]: ChatCompletion prompt_logprobs does not work
github.com/vllm-project/vllm - noamgat opened this issue 7 months ago
github.com/vllm-project/vllm - noamgat opened this issue 7 months ago
[RFC] Initial Support for CPUs
github.com/vllm-project/vllm - bigPYJ1151 opened this issue 7 months ago
github.com/vllm-project/vllm - bigPYJ1151 opened this issue 7 months ago
[Usage]: Generate specified number of tokens for each request individually
github.com/vllm-project/vllm - oximi123 opened this issue 7 months ago
github.com/vllm-project/vllm - oximi123 opened this issue 7 months ago
[Kernel] Use flash-attn for decoding
github.com/vllm-project/vllm - skrider opened this pull request 7 months ago
github.com/vllm-project/vllm - skrider opened this pull request 7 months ago
[Misc] add the "download-dir" option to the latency/throughput benchmarks
github.com/vllm-project/vllm - AmadeusChan opened this pull request 7 months ago
github.com/vllm-project/vllm - AmadeusChan opened this pull request 7 months ago
[RFC] Initial Support for Cloud TPUs
github.com/vllm-project/vllm - WoosukKwon opened this issue 7 months ago
github.com/vllm-project/vllm - WoosukKwon opened this issue 7 months ago
parent_child_dict[sample.parent_seq_id].append(sample) KeyError: 4
github.com/vllm-project/vllm - Stosan opened this issue 7 months ago
github.com/vllm-project/vllm - Stosan opened this issue 7 months ago
ModuleNotFoundError: No module named 'transformers_modules' with API serving using phi-2b
github.com/vllm-project/vllm - haining78zhang opened this issue 7 months ago
github.com/vllm-project/vllm - haining78zhang opened this issue 7 months ago
[BugFix] Fix Falcon tied embeddings
github.com/vllm-project/vllm - WoosukKwon opened this pull request 7 months ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request 7 months ago
[RFC]: Interface and Abstraction for Distributed Inference Environment
github.com/vllm-project/vllm - youkaichao opened this issue 7 months ago
github.com/vllm-project/vllm - youkaichao opened this issue 7 months ago
[Misc]: Throughput/Latency for guided_json with ~100% GPU cache utilization
github.com/vllm-project/vllm - jens-create opened this issue 7 months ago
github.com/vllm-project/vllm - jens-create opened this issue 7 months ago
[Feature]: Offload Model Weights to CPU
github.com/vllm-project/vllm - chenqianfzh opened this issue 7 months ago
github.com/vllm-project/vllm - chenqianfzh opened this issue 7 months ago
[New Model]: Phi-2 support for LoRA
github.com/vllm-project/vllm - andykhanna opened this issue 7 months ago
github.com/vllm-project/vllm - andykhanna opened this issue 7 months ago
[Feature]: Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
github.com/vllm-project/vllm - tchaton opened this issue 7 months ago
github.com/vllm-project/vllm - tchaton opened this issue 7 months ago
[Usage]: punica LoRA kernels could not be imported. If you built vLLM from source, make sure VLLM_INSTALL_PUNICA_KERNELS=1 env var was set.
github.com/vllm-project/vllm - nlp-learner opened this issue 7 months ago
github.com/vllm-project/vllm - nlp-learner opened this issue 7 months ago
[Feature]: Support Guided Decoding in `LLM` entrypoint
github.com/vllm-project/vllm - simon-mo opened this issue 7 months ago
github.com/vllm-project/vllm - simon-mo opened this issue 7 months ago
[Bug]: when intalling vllm by pip, some errors happend.
github.com/vllm-project/vllm - finylink opened this issue 7 months ago
github.com/vllm-project/vllm - finylink opened this issue 7 months ago
[Usage]: How to inference model with multi-gpus
github.com/vllm-project/vllm - ckj18 opened this issue 7 months ago
github.com/vllm-project/vllm - ckj18 opened this issue 7 months ago
[Kernel] Full Tensor Parallelism for LoRA Layers
github.com/vllm-project/vllm - FurtherAI opened this pull request 7 months ago
github.com/vllm-project/vllm - FurtherAI opened this pull request 7 months ago
[Bug]: aisingapore/sea-lion-7b-instruct fails with assert config.embedding_fraction == 1.0
github.com/vllm-project/vllm - pseudotensor opened this issue 7 months ago
github.com/vllm-project/vllm - pseudotensor opened this issue 7 months ago
[Feature]: Support distributing serving with KubeRay's autoscaler
github.com/vllm-project/vllm - TrafalgarZZZ opened this issue 7 months ago
github.com/vllm-project/vllm - TrafalgarZZZ opened this issue 7 months ago
[Bug]: vllm slows down after a long run
github.com/vllm-project/vllm - momomobinx opened this issue 7 months ago
github.com/vllm-project/vllm - momomobinx opened this issue 7 months ago
[New Model]: Please support CogVLM
github.com/vllm-project/vllm - kietna1809 opened this issue 7 months ago
github.com/vllm-project/vllm - kietna1809 opened this issue 7 months ago
[Misc] Add attention sinks
github.com/vllm-project/vllm - felixzhu555 opened this pull request 7 months ago
github.com/vllm-project/vllm - felixzhu555 opened this pull request 7 months ago
[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server
github.com/vllm-project/vllm - njhill opened this pull request 7 months ago
github.com/vllm-project/vllm - njhill opened this pull request 7 months ago
[Core] Add generic typing to `LRUCache`
github.com/vllm-project/vllm - njhill opened this pull request 7 months ago
github.com/vllm-project/vllm - njhill opened this pull request 7 months ago
[Usage]: Set dtype for VLLM using YAML
github.com/vllm-project/vllm - telekoteko opened this issue 7 months ago
github.com/vllm-project/vllm - telekoteko opened this issue 7 months ago
[Feature]: Compute and log the serving FLOPs
github.com/vllm-project/vllm - zhuohan123 opened this issue 7 months ago
github.com/vllm-project/vllm - zhuohan123 opened this issue 7 months ago
[Usage]: Why increase max-num-seqs will use less memory
github.com/vllm-project/vllm - TaChao opened this issue 7 months ago
github.com/vllm-project/vllm - TaChao opened this issue 7 months ago
[Bug]: DynamicNTKScalingRotaryEmbedding implementation is different from Transformers
github.com/vllm-project/vllm - killawhale2 opened this issue 7 months ago
github.com/vllm-project/vllm - killawhale2 opened this issue 7 months ago
[Frontend] [Core] feat: Add model loading using `tensorizer`
github.com/vllm-project/vllm - sangstar opened this pull request 7 months ago
github.com/vllm-project/vllm - sangstar opened this pull request 7 months ago
[Frontend] Support complex message content for chat completions endpoint
github.com/vllm-project/vllm - fgreinacher opened this pull request 7 months ago
github.com/vllm-project/vllm - fgreinacher opened this pull request 7 months ago
[Core] Multiprocessing executor for single-node multi-GPU deployment
github.com/vllm-project/vllm - njhill opened this pull request 7 months ago
github.com/vllm-project/vllm - njhill opened this pull request 7 months ago
[Misc] add HOST_IP env var
github.com/vllm-project/vllm - youkaichao opened this pull request 7 months ago
github.com/vllm-project/vllm - youkaichao opened this pull request 7 months ago
Unable to load LoRA fine-tuned LLM from HF (AssertionError)
github.com/vllm-project/vllm - oscar-martin opened this issue 7 months ago
github.com/vllm-project/vllm - oscar-martin opened this issue 7 months ago
Multi-LoRA - Support for providing /load and /unload API
github.com/vllm-project/vllm - gauravkr2108 opened this issue 7 months ago
github.com/vllm-project/vllm - gauravkr2108 opened this issue 7 months ago
Question regarding GPU memory allocation
github.com/vllm-project/vllm - wx971025 opened this issue 7 months ago
github.com/vllm-project/vllm - wx971025 opened this issue 7 months ago
lm-evaluation-harness broken on master
github.com/vllm-project/vllm - pcmoritz opened this issue 7 months ago
github.com/vllm-project/vllm - pcmoritz opened this issue 7 months ago
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU)
github.com/vllm-project/vllm - AdrianAbeyta opened this pull request 7 months ago
github.com/vllm-project/vllm - AdrianAbeyta opened this pull request 7 months ago