github.com/lm-sys/FastChat issues | Ecosyste.ms: OpenCollective

Does it support batch inference?

kangsan0420 opened this issue over 1 year ago

Add Code Llama Support and Fix empty system prompt for llama 2

woshiyyya opened this pull request over 1 year ago

Reduce gradio overhead

merrymercy opened this pull request over 1 year ago

2048 context length limit about qwen-7b-chat

Hspix opened this issue over 1 year ago

Improve gradio demo

merrymercy opened this pull request over 1 year ago

can I finetune Llama-2-70B using 16 * A10( 16 * 23G)

babytdream opened this issue over 1 year ago

Added support of google/flan models

wangzhen263 opened this pull request over 1 year ago

Allow register custom OpenAI compatible models

merrymercy opened this pull request over 1 year ago

code llama The answers are all blank, I don't know if there is any way to fix it.

Puzzledyy opened this issue over 1 year ago

Error when trying to finetune `lmsys/vicuna-7b-v1.5` with 6 A100 40G GPUs

kunqian-58 opened this issue over 1 year ago

How to finetune baichuan 13b?

renmengjie7 opened this issue over 1 year ago

when tuning 70b llama 2 RuntimeError encountered: shape is invalid for input of size

PhanTask opened this issue over 1 year ago

Optimize for proper flash attn causal handling

siddartha-RE opened this pull request over 1 year ago

Vicuna v1.5 giving wrong repsones in a different language when trying to do a vanila inference

Akshay1-6180 opened this issue over 1 year ago

How to process requests with FastChat api parallelly or in a batch style?

BigAndSweet opened this issue over 1 year ago

vicuna-7b-v1.5 err

gavinju opened this issue over 1 year ago

Fix the issue of API not stopping when passing in stop

Trangle opened this pull request over 1 year ago

"finish_reason": "length" --> how to increase max_new_tokens

2533245542 opened this issue over 1 year ago

Support codellama

obitolyz opened this issue over 1 year ago

why lmsys/vicuna-13b-v1.3 model output contains prompt? And how to prepare finetune data for this behavior?

kunqian-58 opened this issue over 1 year ago

Flash Attention Monkey Patch not working with CodeLlama-34B

michaelroyzen opened this issue over 1 year ago

Update conversation.py

epec254 opened this pull request over 1 year ago

Why is lmsys/vicuna-13b-v1.5 giving chinease answers for small questions based on a custom code template created?

Akshay1-6180 opened this issue over 1 year ago

Peft loading

Heckler-Dark opened this issue over 1 year ago

Correct prompt for Vicuna v1.5 7b in the case of RAG

Matthieu-Tinycoaching opened this issue over 1 year ago

Is it safe and faster to use multiprocess to call response = openai.ChatCompletion.create()?

BigAndSweet opened this issue over 1 year ago

使用gpt-3.5-turbo llm_api报错

zqt996 opened this issue over 1 year ago

XVERSE-13B need support!

tms2003 opened this issue over 1 year ago

Make the arena page as the default page

merrymercy opened this pull request over 1 year ago

AssertionError: Torch not compiled with CUDA enabled

Heckler-Dark opened this issue over 1 year ago

Support no user message in llama2

zeyugao opened this pull request over 1 year ago

Add new model to the arena

renatz opened this pull request over 1 year ago

Make chatglm2-6b load8bit work on Mac m2 with mps(fix bfloatxx error)

vaxilicaihouxian opened this issue over 1 year ago

Add new model to the arena

renatz opened this pull request over 1 year ago

webui is very slow, but api is normal

hustwyk opened this issue over 1 year ago

Make all tensors to be on the same device

fan-chao opened this pull request over 1 year ago

support custom API endpoints for gen_api_answer.py in llm-judge

imoneoi opened this pull request over 1 year ago

Bug on llama2-chinese conversation templates

fan-chao opened this pull request over 1 year ago

add-realm-to-the-arena

renatz opened this pull request over 1 year ago

Make all tensors to be on the same device

fan-chao opened this pull request over 1 year ago

leetcode dataset

kkkparty opened this issue over 1 year ago

[Bug] Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0!

fan-chao opened this issue over 1 year ago

async method call the sync code, and using semaphore together

colinguozizhong opened this issue over 1 year ago

Consider using a fixed version of GPT-4 for llm_judge

imoneoi opened this issue over 1 year ago

Why does sequentially call requests.post() without attaching history conversations get a good few-shot learning?

BigAndSweet opened this issue over 1 year ago

When using the FastChat OpenAI API to deploy the Qwen-7B-Chat model with stop configuration, if the custom stop_word is not triggered, the model will not stop promptly until it reaches the max_tokens limit.

5102a opened this issue over 1 year ago

Make embedding api compatible for openai

Trangle opened this pull request over 1 year ago

Bug on llama2-chinese conversation templates

fan-chao opened this issue over 1 year ago

output text is not a complete sentence

wqn1 opened this issue over 1 year ago

Add conversation support for VMware's OpenLLaMa OpenInstruct models

nicobasile opened this pull request over 1 year ago

udpate compression: support multi-device when using compression with args.num_gpus and args.max_gpu_memory

hzg0601 opened this pull request over 1 year ago

Update openai_api_server.py

ArtificialZeng opened this pull request over 1 year ago

release v0.2.25

merrymercy opened this pull request over 1 year ago

Fix typos

merrymercy opened this pull request over 1 year ago

switch to aiohttp post request mode

leiwen83 opened this pull request over 1 year ago

[Minor] Style clean up & Fix embeding

merrymercy opened this pull request over 1 year ago

The ConnectionError can't run

YuamLu opened this issue over 1 year ago

What's qvk in flash attention patch file?

DqEDC opened this issue over 1 year ago

lmsys/longchat-7b-v1.5-32k transformer version problem

JACKHAHA363 opened this issue over 1 year ago

Why does the deepspeed command given in the documentation get affected by the position of the parameters?

liyifo opened this issue over 1 year ago

chatglm2-6b-32k cannot output properly when it runs on multiple Gpus

dream20201212 opened this issue over 1 year ago

twitter --> x

ut-kr opened this pull request over 1 year ago

How can i use the functions attribute provided by OpenAI with open source models

necromorph98 opened this issue over 1 year ago

Vicuna-1.5 Quantized using AWQ Not Working - CUDA Illegal Memory Access

mmaaz60 opened this issue over 1 year ago

Add group kv support and fix past kv from cache

siddartha-RE opened this pull request over 1 year ago

load peft-model error (gradio_web_server)

jackaihfia2334 opened this issue over 1 year ago

--device cpu --load-8bit ends in TypeError

leolivier opened this issue over 1 year ago

feat: consider template's stop_token_ids in gen_model_answer

congchan opened this pull request over 1 year ago

Improve indentation in openai_api_server.py

ArtificialZeng opened this pull request over 1 year ago

Does fastchat model worker support expose some metrics?

leyao-daily opened this issue over 1 year ago

Does FastChat consider AbortController?

qftie opened this issue over 1 year ago

poor performance of httpx.AsyncClient in openai_api_server.py

leiwen83 opened this issue over 1 year ago

Official evaluation scores of QWen-7B-Chat

Lukeming-tsinghua opened this issue over 1 year ago

Fix support for GPU selection using CLI argument

laidybug opened this pull request over 1 year ago

ちゃっとぼっと

Taichi331213 opened this issue over 1 year ago

Measure API Load

brandonbiggs opened this issue over 1 year ago

WizardCoder hallucinations or bug in inference settings?

Extremys opened this issue over 1 year ago

QLoRA accidentally results in CUDA Out of Memory

zycheiheihei opened this issue over 1 year ago

Can we use FastChat for full-parameter fine-tuning based on DeepSpeed?

mpdpey043 opened this issue over 1 year ago

[Minor] Update the warning to follow the new conv_template file

persistz opened this pull request over 1 year ago

Add Intel AMX/AVX512 support to accelerate inference

LeiZhou-97 opened this pull request over 1 year ago

how model_workers load balancing

linpan opened this issue over 1 year ago

ModuleNotFoundError: No module named 'packaging'

sxunix opened this issue over 1 year ago

Update embedding logic

Trangle opened this pull request over 1 year ago

About the function of assign special token ids to the model.config object

auroua opened this issue over 1 year ago

AsyncLLMEngine not founded in vllm

HelloCard opened this issue over 1 year ago

make fastchat api server run in multiprocessing easily

liunux4odoo opened this pull request over 1 year ago

ChatGLM is error, version from 0.2.18 to 0.2.23

luefei opened this issue over 1 year ago

Update llama2 and starchat templates

Cyrilvallez opened this pull request over 1 year ago

[Minor] Fix typos

merrymercy opened this pull request over 1 year ago

How to specify gpu id?

fan-chao opened this issue over 1 year ago

Add support for Vigogne models

bofenghuang opened this pull request over 1 year ago

Can arm-64 CPU run?

zds-yyds opened this issue over 1 year ago

AssertionError

haof-github opened this issue over 1 year ago

Will there be 70b Vicuna-v1.5?

PhanTask opened this issue over 1 year ago

Does FastChat support models like GPT3?

shuqike opened this issue over 1 year ago

Add conversation template parameter to vllm worker

alanxmay opened this pull request over 1 year ago

: Loading /path/Qwen requires you to execute the tokenizer file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error

caowenhero opened this issue over 1 year ago

Is it possible to finetune with llama2-70b?

lanfengmo opened this issue over 1 year ago

Adjusting Token Limit in Fastchat with Llama2 Model

coding-alt opened this issue over 1 year ago