Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
No CUDA GPUs are available Error with vLLM in JupyterLab
github.com/vllm-project/vllm - SafeyahShemali opened this issue about 1 year ago
github.com/vllm-project/vllm - SafeyahShemali opened this issue about 1 year ago
how to use chat function
github.com/vllm-project/vllm - zhangzai666 opened this issue about 1 year ago
github.com/vllm-project/vllm - zhangzai666 opened this issue about 1 year ago
chatglm3 vllm/vllm/model_executor/models/chatglm.py", line 53, in __init__ assert self.total_num_kv_heads % tp_size == 0 AssertionError
github.com/vllm-project/vllm - Changjy1997nb opened this issue about 1 year ago
github.com/vllm-project/vllm - Changjy1997nb opened this issue about 1 year ago
Tensor parallelism on ray cluster
github.com/vllm-project/vllm - baojunliu opened this issue about 1 year ago
github.com/vllm-project/vllm - baojunliu opened this issue about 1 year ago
Adding support for switch-transformer / NLLB-MoE
github.com/vllm-project/vllm - yl3469 opened this issue about 1 year ago
github.com/vllm-project/vllm - yl3469 opened this issue about 1 year ago
[Bug] prompt_logprobs = 1 OOM problem
github.com/vllm-project/vllm - shunxing1234 opened this issue over 1 year ago
github.com/vllm-project/vllm - shunxing1234 opened this issue over 1 year ago
Error: When using OpenAI-Compatible Server, the server is available but cannot be accessed from the same terminal.
github.com/vllm-project/vllm - LuristheSun opened this issue over 1 year ago
github.com/vllm-project/vllm - LuristheSun opened this issue over 1 year ago
Support W8A8 inference in vllm
github.com/vllm-project/vllm - AniZpZ opened this pull request over 1 year ago
github.com/vllm-project/vllm - AniZpZ opened this pull request over 1 year ago
Support int8 KVCache Quant in Vllm
github.com/vllm-project/vllm - AniZpZ opened this pull request over 1 year ago
github.com/vllm-project/vllm - AniZpZ opened this pull request over 1 year ago
Added logits processor API to sampling params
github.com/vllm-project/vllm - noamgat opened this pull request over 1 year ago
github.com/vllm-project/vllm - noamgat opened this pull request over 1 year ago
ImportError: cannot import name 'MistralConfig' from 'transformers'
github.com/vllm-project/vllm - peter-ch opened this issue over 1 year ago
github.com/vllm-project/vllm - peter-ch opened this issue over 1 year ago
Adding Locally Typical Sampling (i.e. typical_p in transformers and TGI)
github.com/vllm-project/vllm - seongminp opened this issue over 1 year ago
github.com/vllm-project/vllm - seongminp opened this issue over 1 year ago
Does vllm support the Mac/Metal/MPS?
github.com/vllm-project/vllm - Phil-U-U opened this issue over 1 year ago
github.com/vllm-project/vllm - Phil-U-U opened this issue over 1 year ago
Using VLLM with a Tesla T4 on SageMaker Studio (ml.g4dn.xlarge instance)
github.com/vllm-project/vllm - paulovasconcellos-hotmart opened this issue over 1 year ago
github.com/vllm-project/vllm - paulovasconcellos-hotmart opened this issue over 1 year ago
[question] Does vllm support macos M1 or M2 chip?
github.com/vllm-project/vllm - acekingke opened this issue over 1 year ago
github.com/vllm-project/vllm - acekingke opened this issue over 1 year ago
Could not build wheels for vllm, which is required to install pyproject.toml-based projects
github.com/vllm-project/vllm - ABooth01 opened this issue over 1 year ago
github.com/vllm-project/vllm - ABooth01 opened this issue over 1 year ago
Make multi replicas to make a balancer.
github.com/vllm-project/vllm - linkedlist771 opened this issue over 1 year ago
github.com/vllm-project/vllm - linkedlist771 opened this issue over 1 year ago
How to deploy vllm model across multiple nodes in kubernetes?
github.com/vllm-project/vllm - Ryojikn opened this issue over 1 year ago
github.com/vllm-project/vllm - Ryojikn opened this issue over 1 year ago
What is the max number prompts that the generate() method can take
github.com/vllm-project/vllm - hxue3 opened this issue over 1 year ago
github.com/vllm-project/vllm - hxue3 opened this issue over 1 year ago
Low VRAM batch processing mode
github.com/vllm-project/vllm - viktor-ferenczi opened this issue over 1 year ago
github.com/vllm-project/vllm - viktor-ferenczi opened this issue over 1 year ago
Feature request. Allow a few model instances in one GPU if they can feet in VRAM.
github.com/vllm-project/vllm - agrogov opened this issue over 1 year ago
github.com/vllm-project/vllm - agrogov opened this issue over 1 year ago
feat: demonstrate using regex for suffix matching
github.com/vllm-project/vllm - wsxiaoys opened this pull request over 1 year ago
github.com/vllm-project/vllm - wsxiaoys opened this pull request over 1 year ago
workaround of AWQ for Turing GPUs
github.com/vllm-project/vllm - twaka opened this pull request over 1 year ago
github.com/vllm-project/vllm - twaka opened this pull request over 1 year ago
Generate nothing from VLLM output
github.com/vllm-project/vllm - FocusLiwen opened this issue over 1 year ago
github.com/vllm-project/vllm - FocusLiwen opened this issue over 1 year ago
[Discussion] Will vLLM consider using Speculative Sampling to accelerating LLM decoding?
github.com/vllm-project/vllm - gesanqiu opened this issue over 1 year ago
github.com/vllm-project/vllm - gesanqiu opened this issue over 1 year ago
vLLM to add a locally trained model
github.com/vllm-project/vllm - atanikan opened this issue over 1 year ago
github.com/vllm-project/vllm - atanikan opened this issue over 1 year ago
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly.
github.com/vllm-project/vllm - MUZAMMILPERVAIZ opened this issue over 1 year ago
github.com/vllm-project/vllm - MUZAMMILPERVAIZ opened this issue over 1 year ago
AWQ: bfloat16 not supported? And `--dtype` arg doesn't allow specifying float16
github.com/vllm-project/vllm - TheBloke opened this issue over 1 year ago
github.com/vllm-project/vllm - TheBloke opened this issue over 1 year ago
Waiting sequence group should have only one prompt sequence.
github.com/vllm-project/vllm - Link-Li opened this issue over 1 year ago
github.com/vllm-project/vllm - Link-Li opened this issue over 1 year ago
Inconsistent results between HuggingFace Transformers and vllm
github.com/vllm-project/vllm - normster opened this issue over 1 year ago
github.com/vllm-project/vllm - normster opened this issue over 1 year ago
How to deploy api server as https
github.com/vllm-project/vllm - yilihtien opened this issue over 1 year ago
github.com/vllm-project/vllm - yilihtien opened this issue over 1 year ago
vllm hangs when reinitializing ray
github.com/vllm-project/vllm - nelson-liu opened this issue over 1 year ago
github.com/vllm-project/vllm - nelson-liu opened this issue over 1 year ago
How to use vllm to compute ppl score for input text?
github.com/vllm-project/vllm - yinochaos opened this issue over 1 year ago
github.com/vllm-project/vllm - yinochaos opened this issue over 1 year ago
AsyncEngineDeadError / RuntimeError: CUDA error: an illegal memory access was encountered
github.com/vllm-project/vllm - xingyaoww opened this issue over 1 year ago
github.com/vllm-project/vllm - xingyaoww opened this issue over 1 year ago
It seems that SamplingParams doesnt support the bad_words_ids parameter when generating
github.com/vllm-project/vllm - mengban opened this issue over 1 year ago
github.com/vllm-project/vllm - mengban opened this issue over 1 year ago
can model Qwen/Qwen-VL-Chat work well?
github.com/vllm-project/vllm - wangschang opened this issue over 1 year ago
github.com/vllm-project/vllm - wangschang opened this issue over 1 year ago
vllm reducing quality when loading local fine tuned Llama-2-13b-hf model
github.com/vllm-project/vllm - BerndHuber opened this issue over 1 year ago
github.com/vllm-project/vllm - BerndHuber opened this issue over 1 year ago
I want to close kv cache. if i set gpu_memory_utilization is 0. Does it means that i close the kv cache?
github.com/vllm-project/vllm - amulil opened this issue over 1 year ago
github.com/vllm-project/vllm - amulil opened this issue over 1 year ago
Is there authentication supported?
github.com/vllm-project/vllm - mluogh opened this issue over 1 year ago
github.com/vllm-project/vllm - mluogh opened this issue over 1 year ago
Loading Model through Multi-Node Ray Cluster Fails
github.com/vllm-project/vllm - VarunSreenivasan16 opened this issue over 1 year ago
github.com/vllm-project/vllm - VarunSreenivasan16 opened this issue over 1 year ago
SIGABRT - Fatal Python error: Aborted when running vllm on llama2-7b with --tensor-parallel-size 2
github.com/vllm-project/vllm - dhritiman opened this issue over 1 year ago
github.com/vllm-project/vllm - dhritiman opened this issue over 1 year ago
Sagemaker support for inference
github.com/vllm-project/vllm - Tarun3679 opened this issue over 1 year ago
github.com/vllm-project/vllm - Tarun3679 opened this issue over 1 year ago
start vllm.entrypoints.api_server model vicuna-13b-v1.3 error: Fatal Python error: Bus error
github.com/vllm-project/vllm - luefei opened this issue over 1 year ago
github.com/vllm-project/vllm - luefei opened this issue over 1 year ago
Support for RLHF (ILQL)-trained Models
github.com/vllm-project/vllm - ojus1 opened this issue over 1 year ago
github.com/vllm-project/vllm - ojus1 opened this issue over 1 year ago
Stream Tokens operation integration into LLM class (which uses LLMEngine behind the scenes)
github.com/vllm-project/vllm - orellavie1212 opened this issue over 1 year ago
github.com/vllm-project/vllm - orellavie1212 opened this issue over 1 year ago
pip installation error - ERROR: Failed building wheel for vllm
github.com/vllm-project/vllm - dxlong2000 opened this issue over 1 year ago
github.com/vllm-project/vllm - dxlong2000 opened this issue over 1 year ago
Stuck in Initializing an LLM engine
github.com/vllm-project/vllm - EvilCalf opened this issue over 1 year ago
github.com/vllm-project/vllm - EvilCalf opened this issue over 1 year ago
Feature request: Support for embedding models
github.com/vllm-project/vllm - mantrakp04 opened this issue over 1 year ago
github.com/vllm-project/vllm - mantrakp04 opened this issue over 1 year ago
test qwen-7b-chat model and output incorrect
github.com/vllm-project/vllm - dachengai opened this issue over 1 year ago
github.com/vllm-project/vllm - dachengai opened this issue over 1 year ago
What a fast tokenizer can be used for Baichuan-13b?
github.com/vllm-project/vllm - FURYFOR opened this issue over 1 year ago
github.com/vllm-project/vllm - FURYFOR opened this issue over 1 year ago
Issue with raylet error
github.com/vllm-project/vllm - ZihanWang314 opened this issue over 1 year ago
github.com/vllm-project/vllm - ZihanWang314 opened this issue over 1 year ago
Memory leak while using tensor_parallel_size>1
github.com/vllm-project/vllm - haiasd opened this issue over 1 year ago
github.com/vllm-project/vllm - haiasd opened this issue over 1 year ago
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
github.com/vllm-project/vllm - jinfengfeng opened this issue over 1 year ago
github.com/vllm-project/vllm - jinfengfeng opened this issue over 1 year ago
Best effort support for all Hugging Face transformers models
github.com/vllm-project/vllm - dwyatte opened this issue over 1 year ago
github.com/vllm-project/vllm - dwyatte opened this issue over 1 year ago
ValueError: The number of GPUs per node is not divisible by the number of tensor parallelism.
github.com/vllm-project/vllm - beratcmn opened this issue over 1 year ago
github.com/vllm-project/vllm - beratcmn opened this issue over 1 year ago
Cannot get a simple example working with multi-GPU
github.com/vllm-project/vllm - brevity2021 opened this issue over 1 year ago
github.com/vllm-project/vllm - brevity2021 opened this issue over 1 year ago
ModuleNotFoundError: No module named 'transformers_modules' with API serving using baichuan-7b
github.com/vllm-project/vllm - McCarrtney opened this issue over 1 year ago
github.com/vllm-project/vllm - McCarrtney opened this issue over 1 year ago
vLLM stops all processing when CPU KV cache is used, has to be shut down and restarted.
github.com/vllm-project/vllm - TheBloke opened this issue over 1 year ago
github.com/vllm-project/vllm - TheBloke opened this issue over 1 year ago
LlaMA 2: Input prompt (2664 tokens) is too long and exceeds limit of 2048/2560
github.com/vllm-project/vllm - foamliu opened this issue over 1 year ago
github.com/vllm-project/vllm - foamliu opened this issue over 1 year ago
[Feature Request] Support input embedding in `LLM.generate()`
github.com/vllm-project/vllm - KimmiShi opened this issue over 1 year ago
github.com/vllm-project/vllm - KimmiShi opened this issue over 1 year ago
Decode error while inferencing a batch of prompts
github.com/vllm-project/vllm - SiriusNEO opened this issue over 1 year ago
github.com/vllm-project/vllm - SiriusNEO opened this issue over 1 year ago
Feature request:support ExLlama
github.com/vllm-project/vllm - alanxmay opened this issue over 1 year ago
github.com/vllm-project/vllm - alanxmay opened this issue over 1 year ago
Require a "Wrapper" feature
github.com/vllm-project/vllm - jeffchy opened this issue over 1 year ago
github.com/vllm-project/vllm - jeffchy opened this issue over 1 year ago
Remove Ray for the dependency
github.com/vllm-project/vllm - lanking520 opened this issue over 1 year ago
github.com/vllm-project/vllm - lanking520 opened this issue over 1 year ago
Adding support for encoder-decoder models, like T5 or BART
github.com/vllm-project/vllm - shermansiu opened this issue over 1 year ago
github.com/vllm-project/vllm - shermansiu opened this issue over 1 year ago
Can I directly obtain the logits here?
github.com/vllm-project/vllm - SparkJiao opened this issue over 1 year ago
github.com/vllm-project/vllm - SparkJiao opened this issue over 1 year ago
Build failure due to CUDA version mismatch
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
Add docstrings to some modules and classes
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Minor code cleaning for SamplingParams
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Add performance comparison figures on A100, V100, T4
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
Enhance SamplingParams
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Implement presence and frequency penalties
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Support top-k sampling
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Avoid sorting waiting queue & Minor code cleaning
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Support string-based stopping conditions
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
Rename variables and methods
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago