Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs).
Collective -
Host: opensource -
https://opencollective.com/vllm
- Code: https://github.com/vllm-project/vllm
workaround of AWQ for Turing GPUs
github.com/vllm-project/vllm - twaka opened this pull request over 1 year ago
github.com/vllm-project/vllm - twaka opened this pull request over 1 year ago
Generate nothing from VLLM output
github.com/vllm-project/vllm - FocusLiwen opened this issue over 1 year ago
github.com/vllm-project/vllm - FocusLiwen opened this issue over 1 year ago
[Discussion] Will vLLM consider using Speculative Sampling to accelerating LLM decoding?
github.com/vllm-project/vllm - gesanqiu opened this issue over 1 year ago
github.com/vllm-project/vllm - gesanqiu opened this issue over 1 year ago
vLLM to add a locally trained model
github.com/vllm-project/vllm - atanikan opened this issue over 1 year ago
github.com/vllm-project/vllm - atanikan opened this issue over 1 year ago
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly.
github.com/vllm-project/vllm - MUZAMMILPERVAIZ opened this issue over 1 year ago
github.com/vllm-project/vllm - MUZAMMILPERVAIZ opened this issue over 1 year ago
AWQ: bfloat16 not supported? And `--dtype` arg doesn't allow specifying float16
github.com/vllm-project/vllm - TheBloke opened this issue over 1 year ago
github.com/vllm-project/vllm - TheBloke opened this issue over 1 year ago
Waiting sequence group should have only one prompt sequence.
github.com/vllm-project/vllm - Link-Li opened this issue over 1 year ago
github.com/vllm-project/vllm - Link-Li opened this issue over 1 year ago
Inconsistent results between HuggingFace Transformers and vllm
github.com/vllm-project/vllm - normster opened this issue over 1 year ago
github.com/vllm-project/vllm - normster opened this issue over 1 year ago
How to deploy api server as https
github.com/vllm-project/vllm - yilihtien opened this issue over 1 year ago
github.com/vllm-project/vllm - yilihtien opened this issue over 1 year ago
vllm hangs when reinitializing ray
github.com/vllm-project/vllm - nelson-liu opened this issue over 1 year ago
github.com/vllm-project/vllm - nelson-liu opened this issue over 1 year ago
How to use vllm to compute ppl score for input text?
github.com/vllm-project/vllm - yinochaos opened this issue over 1 year ago
github.com/vllm-project/vllm - yinochaos opened this issue over 1 year ago
AsyncEngineDeadError / RuntimeError: CUDA error: an illegal memory access was encountered
github.com/vllm-project/vllm - xingyaoww opened this issue over 1 year ago
github.com/vllm-project/vllm - xingyaoww opened this issue over 1 year ago
It seems that SamplingParams doesnt support the bad_words_ids parameter when generating
github.com/vllm-project/vllm - mengban opened this issue over 1 year ago
github.com/vllm-project/vllm - mengban opened this issue over 1 year ago
can model Qwen/Qwen-VL-Chat work well?
github.com/vllm-project/vllm - wangschang opened this issue over 1 year ago
github.com/vllm-project/vllm - wangschang opened this issue over 1 year ago
vllm reducing quality when loading local fine tuned Llama-2-13b-hf model
github.com/vllm-project/vllm - BerndHuber opened this issue over 1 year ago
github.com/vllm-project/vllm - BerndHuber opened this issue over 1 year ago
I want to close kv cache. if i set gpu_memory_utilization is 0. Does it means that i close the kv cache?
github.com/vllm-project/vllm - amulil opened this issue over 1 year ago
github.com/vllm-project/vllm - amulil opened this issue over 1 year ago
Is there authentication supported?
github.com/vllm-project/vllm - mluogh opened this issue over 1 year ago
github.com/vllm-project/vllm - mluogh opened this issue over 1 year ago
Loading Model through Multi-Node Ray Cluster Fails
github.com/vllm-project/vllm - VarunSreenivasan16 opened this issue over 1 year ago
github.com/vllm-project/vllm - VarunSreenivasan16 opened this issue over 1 year ago
SIGABRT - Fatal Python error: Aborted when running vllm on llama2-7b with --tensor-parallel-size 2
github.com/vllm-project/vllm - dhritiman opened this issue over 1 year ago
github.com/vllm-project/vllm - dhritiman opened this issue over 1 year ago
Sagemaker support for inference
github.com/vllm-project/vllm - Tarun3679 opened this issue over 1 year ago
github.com/vllm-project/vllm - Tarun3679 opened this issue over 1 year ago
start vllm.entrypoints.api_server model vicuna-13b-v1.3 error: Fatal Python error: Bus error
github.com/vllm-project/vllm - luefei opened this issue over 1 year ago
github.com/vllm-project/vllm - luefei opened this issue over 1 year ago
Support for RLHF (ILQL)-trained Models
github.com/vllm-project/vllm - ojus1 opened this issue over 1 year ago
github.com/vllm-project/vllm - ojus1 opened this issue over 1 year ago
Stream Tokens operation integration into LLM class (which uses LLMEngine behind the scenes)
github.com/vllm-project/vllm - orellavie1212 opened this issue over 1 year ago
github.com/vllm-project/vllm - orellavie1212 opened this issue over 1 year ago
pip installation error - ERROR: Failed building wheel for vllm
github.com/vllm-project/vllm - dxlong2000 opened this issue over 1 year ago
github.com/vllm-project/vllm - dxlong2000 opened this issue over 1 year ago
Stuck in Initializing an LLM engine
github.com/vllm-project/vllm - EvilCalf opened this issue over 1 year ago
github.com/vllm-project/vllm - EvilCalf opened this issue over 1 year ago
Feature request: Support for embedding models
github.com/vllm-project/vllm - mantrakp04 opened this issue over 1 year ago
github.com/vllm-project/vllm - mantrakp04 opened this issue over 1 year ago
test qwen-7b-chat model and output incorrect
github.com/vllm-project/vllm - dachengai opened this issue over 1 year ago
github.com/vllm-project/vllm - dachengai opened this issue over 1 year ago
What a fast tokenizer can be used for Baichuan-13b?
github.com/vllm-project/vllm - FURYFOR opened this issue over 1 year ago
github.com/vllm-project/vllm - FURYFOR opened this issue over 1 year ago
Issue with raylet error
github.com/vllm-project/vllm - ZihanWang314 opened this issue over 1 year ago
github.com/vllm-project/vllm - ZihanWang314 opened this issue over 1 year ago
Memory leak while using tensor_parallel_size>1
github.com/vllm-project/vllm - haiasd opened this issue over 1 year ago
github.com/vllm-project/vllm - haiasd opened this issue over 1 year ago
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
github.com/vllm-project/vllm - jinfengfeng opened this issue over 1 year ago
github.com/vllm-project/vllm - jinfengfeng opened this issue over 1 year ago
Best effort support for all Hugging Face transformers models
github.com/vllm-project/vllm - dwyatte opened this issue over 1 year ago
github.com/vllm-project/vllm - dwyatte opened this issue over 1 year ago
ValueError: The number of GPUs per node is not divisible by the number of tensor parallelism.
github.com/vllm-project/vllm - beratcmn opened this issue over 1 year ago
github.com/vllm-project/vllm - beratcmn opened this issue over 1 year ago
Cannot get a simple example working with multi-GPU
github.com/vllm-project/vllm - brevity2021 opened this issue over 1 year ago
github.com/vllm-project/vllm - brevity2021 opened this issue over 1 year ago
ModuleNotFoundError: No module named 'transformers_modules' with API serving using baichuan-7b
github.com/vllm-project/vllm - McCarrtney opened this issue over 1 year ago
github.com/vllm-project/vllm - McCarrtney opened this issue over 1 year ago
vLLM stops all processing when CPU KV cache is used, has to be shut down and restarted.
github.com/vllm-project/vllm - TheBloke opened this issue over 1 year ago
github.com/vllm-project/vllm - TheBloke opened this issue over 1 year ago
LlaMA 2: Input prompt (2664 tokens) is too long and exceeds limit of 2048/2560
github.com/vllm-project/vllm - foamliu opened this issue over 1 year ago
github.com/vllm-project/vllm - foamliu opened this issue over 1 year ago
[Feature Request] Support input embedding in `LLM.generate()`
github.com/vllm-project/vllm - KimmiShi opened this issue over 1 year ago
github.com/vllm-project/vllm - KimmiShi opened this issue over 1 year ago
Decode error while inferencing a batch of prompts
github.com/vllm-project/vllm - SiriusNEO opened this issue over 1 year ago
github.com/vllm-project/vllm - SiriusNEO opened this issue over 1 year ago
Feature request:support ExLlama
github.com/vllm-project/vllm - alanxmay opened this issue over 1 year ago
github.com/vllm-project/vllm - alanxmay opened this issue over 1 year ago
Require a "Wrapper" feature
github.com/vllm-project/vllm - jeffchy opened this issue over 1 year ago
github.com/vllm-project/vllm - jeffchy opened this issue over 1 year ago
Remove Ray for the dependency
github.com/vllm-project/vllm - lanking520 opened this issue over 1 year ago
github.com/vllm-project/vllm - lanking520 opened this issue over 1 year ago
Adding support for encoder-decoder models, like T5 or BART
github.com/vllm-project/vllm - shermansiu opened this issue over 1 year ago
github.com/vllm-project/vllm - shermansiu opened this issue over 1 year ago
Can I directly obtain the logits here?
github.com/vllm-project/vllm - SparkJiao opened this issue over 1 year ago
github.com/vllm-project/vllm - SparkJiao opened this issue over 1 year ago
Build failure due to CUDA version mismatch
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
Add docstrings to some modules and classes
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Minor code cleaning for SamplingParams
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Add performance comparison figures on A100, V100, T4
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
Enhance SamplingParams
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Implement presence and frequency penalties
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Support top-k sampling
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Avoid sorting waiting queue & Minor code cleaning
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Support string-based stopping conditions
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
Rename variables and methods
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Update example prompts in `simple_server.py`
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Support various sampling parameters
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
Make sure the system can run on T4 and V100
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
Clean up the scheduler code
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
Add a system logger
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Use slow tokenizer for LLaMA
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Enhance model loader
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Refactor system architecture
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Use runtime profiling to replace manual memory analyzers
github.com/vllm-project/vllm - zhuohan123 opened this pull request over 1 year ago
github.com/vllm-project/vllm - zhuohan123 opened this pull request over 1 year ago
Bug in LLaMA fast tokenizer
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
[Minor] Fix a dtype bug
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Specify python package dependencies in requirements.txt
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Clean up Megatron-LM code
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
Dangerous floating point comparison
github.com/vllm-project/vllm - merrymercy opened this issue over 1 year ago
github.com/vllm-project/vllm - merrymercy opened this issue over 1 year ago
Replace FlashAttention with xformers
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Decrease the default size of swap space
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
Fix a bug in attention kernel
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this pull request over 1 year ago
Use O3 optimization instead of O2 for CUDA compilation?
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago
github.com/vllm-project/vllm - WoosukKwon opened this issue over 1 year ago