github.com/lm-sys/FastChat issues | Ecosyste.ms: OpenCollective

question about your implementation of xformers

tjlujianqiao opened this issue over 1 year ago

support multi-gpus when implement compressing

hzg0601 opened this pull request over 1 year ago

fix: llm_judge resume from breakpoint when judging

so2liu opened this pull request over 1 year ago

API streaming output requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

qimingyangyang opened this issue over 1 year ago

Is llama2 support embedding api？

YLongJin opened this issue over 1 year ago

[bug] Loading vicuna 1.5 with vllm worker fails

fozziethebeat opened this issue over 1 year ago

fastchat

Dg0230 opened this issue over 1 year ago

Bug on 2 conversation templates

Cyrilvallez opened this issue over 1 year ago

how to use mode_worker to load base model+lora model？

LawlightXY opened this issue over 1 year ago

Add Llama2-Chinese model support

Rayrtfr opened this pull request over 1 year ago

[Bug] the bug of generating Llama2 conversation template

amulil opened this issue over 1 year ago

[Feature] Do you have plan to support multimodal mode?

thiner opened this issue over 1 year ago

is there a possibility that fastchat can support llama2-70b quantized with gptq?

Joeking-around opened this issue over 1 year ago

This version does not support function_call yet?

dscrlc opened this issue over 1 year ago

Add required python library.

lokinko opened this pull request over 1 year ago

Improve flash attn monkey patch

merrymercy opened this pull request over 1 year ago

Release v0.2.24

merrymercy opened this pull request over 1 year ago

Improve docs

merrymercy opened this pull request over 1 year ago

Adjust model names/rates/info

merrymercy opened this pull request over 1 year ago

deepspeed_config_s2.json-ValueError

byy-git opened this issue over 1 year ago

cannot find -lcuda

njhouse365 opened this issue over 1 year ago

controller architecture question

yunll opened this issue over 1 year ago

test_tokenization_{{cookiecutter.lowercase_modelname}}.py: Filename too long

luanxw opened this issue over 1 year ago

Is system prompt/user roles supported when using openai api?

fedshyvana opened this issue over 1 year ago

Add Support of bge model family for embedding generation

Extremys opened this pull request over 1 year ago

Explain --max-gpu-memory parameter

alongLFB opened this pull request over 1 year ago

Fail to find the code to finetune Vicuna on Arena as a judge

ownstyledu opened this issue over 1 year ago

Modify vllm compatible empty special token, and revise qwen

Trangle opened this pull request over 1 year ago

Invalid response object from API: '{"object":"error","message":"**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**\\n\\n(probability tensor contains either `inf`, `nan` or element < 0)","code":50001}'

starqiu opened this issue over 1 year ago

[BUG] Worker did not release the occupied memory after encountering OOM

birdhackor opened this issue over 1 year ago

Accelerated performance of vllm

yl17104265 opened this issue over 1 year ago

To add a python script to shutdown serve

hzg0601 opened this pull request over 1 year ago

feat: get_context_length support rope_scaling

congchan opened this pull request over 1 year ago

How to add presaved prompt for vicuna=7b models

ratan opened this issue over 1 year ago

feat: support BAAI/AquilaChat-7B.

gesanqiu opened this pull request over 1 year ago

revise qwen adapter

Trangle opened this pull request over 1 year ago

Maybe a bug in vicuna's tokenizer

xmy0916 opened this issue over 1 year ago

Support batching for chatglm models.

kimanli opened this issue over 1 year ago

json.decoder.JSONDecodeError: Invalid \uXXXX escape: line 1 column 1442 (char 1441)

zlht812 opened this issue over 1 year ago

How to specify which GPU to run the model in background mode on?

zlht812 opened this issue over 1 year ago

baichuan-13b-chat updated the model, FastChat needs to sync

Tomorrowxxy opened this issue over 1 year ago

[OPENAI_API Error] vicuna13b_V1.5-16K called Error .

cason0126 opened this issue over 1 year ago

[important features] PDF uploader support

Dandelionym opened this issue over 1 year ago

OpenAI API error. thanks~

SaraiQX opened this issue over 1 year ago

Worker do not stop even if the api request was cancelled

jasoncaojingren opened this issue over 1 year ago

vicuna-13b-v1.5-16k is repeating word as output

Extremys opened this issue over 1 year ago

adding support for Llama2 and Cohere models with litellm

krrishdholakia opened this pull request over 1 year ago

1. add shell scripts for shutdowning serve; 2. add a feature to launch all serve related to openai-api-server in one cmd;

hzg0601 opened this pull request over 1 year ago

Fixes for FlashAttention

tmm1 opened this pull request over 1 year ago

Fine-tuning Llama 2 based model

mikefrandsen opened this issue over 1 year ago

Finetuning of LLaMA does not work in any setting (mem, lora)

sergsb opened this issue over 1 year ago

gen_model_answer.py on peft adapter gets problem

Clemente-H opened this issue over 1 year ago

RuntimeError: CUDA error: device-side assert triggered

lw3259111 opened this issue over 1 year ago

Cost associated with running MT bench

dhyani15 opened this issue over 1 year ago

help! how to load safetensors model?

rainbownmm opened this issue over 1 year ago

[error] ValueError：The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers the weights in this format.

cm-liushaodong opened this issue over 1 year ago

the model keeps repeating the same answer all the time

aiot-tech opened this issue over 1 year ago

Bugs about Falcon-40B-instructed.

Kong-Aobo opened this issue over 1 year ago

Nearly impossivle to use arena voting: the interface is too slow and hangs

Anixx opened this issue over 1 year ago

Llama-2 loss and learning rate is always 0 after first step

jerryjalapeno opened this issue over 1 year ago

Could not call v1/chat/completion successfully in new langchain endpoint in openai-compatible server

zeyusuntt opened this issue over 1 year ago

gradio web server infinite loading model

o-evgeny opened this issue over 1 year ago

THIS SESSION HAS BEEN INACTIVE FOR TOO LONG. PLEASE REFRESH THIS PAGE.

Art-Man opened this issue over 1 year ago

No available workers for vicuna-7b-v1.3

andreasbinder opened this issue over 1 year ago

how to run pretrain?

lucasjinreal opened this issue over 1 year ago

Speed comparison with https://huggingface.co/chat/

surak opened this issue over 1 year ago

The loss is abnormal when fine-tuning meta-llama/Llama-2-7b-hf

Tianranse opened this issue over 1 year ago

training loss curve looks like stairs

gombumsoo opened this issue over 1 year ago

Where to add embedding function?

hypily123 opened this issue over 1 year ago

fastchat-t5-3b-v1.0 模型无法本地运行,

elmoss opened this issue over 1 year ago

Error with longchat-13b-16k

Lufffya opened this issue over 1 year ago

flash attention 2

jiangix-paper opened this issue over 1 year ago

Experiencing intermittent 400 and 500 HTTP errors when making requests to server

zxia545 opened this issue over 1 year ago

Fine Tuning trust_remote_code=True

Minniesse opened this issue over 1 year ago

Loss reaches 0 when finetuning 7B model using 2xA100 80G

rootally opened this issue over 1 year ago

Langchain documentation conflicts with gradio web server.

surak opened this issue over 1 year ago

'CUDA out of memory'. When QLoRA vicuna-7b in 4*24G gps .

lj976264709 opened this issue over 1 year ago

python -m fastchat.serve.gradio_web_server

suntinsion opened this issue over 1 year ago

加载Baichuan-13b-chat模型报错

Zhang-star-master opened this issue over 1 year ago

Answering chinese even with english question.

soap117 opened this issue over 1 year ago

Support for htttps protocol call between controller node and worker node

Victorwz opened this issue over 1 year ago

inference truncate causes “output_ids” to be incorrectly sliced

lyy-zz opened this issue over 1 year ago

Allow Peft models to share their base model

fozziethebeat opened this pull request over 1 year ago

MT-bench results are different today

imoneoi opened this issue over 1 year ago

google search

Bluemist76 opened this issue over 1 year ago

Using past for speeding up generation

mmdalix opened this issue over 1 year ago

Why to save just cpu weights?

Hins opened this issue over 1 year ago

Does Vicuna have plans to expand its Chinese vocabulary?

PolarPeak opened this issue over 1 year ago

At present, I use vicuna-33b with 3 cards V100 32G memory, and the speed is very slow compared with the demo demo on the official website (https://chat.lmsys.org/). Can you please provide resources for the demo demo on the official website? configuration? Is this speed difference the difference between V100 and A100? Or is it the performance degradation caused by the data interaction of multiple graphics cards in the case of a single machine with multiple cards?

murongweibo opened this issue over 1 year ago

[Feature] Safe save with FSDP, slurm examples

zhisbug opened this pull request over 1 year ago

Modified loss for Multi-turn conversations

staticpunch opened this issue over 1 year ago

[Bug]SFT vicuna-7b-v1.3 with train_mem.py (with flash-attention) can not work

lindylin1817 opened this issue over 1 year ago

Invalid JSON Response Error When Running Langchain Use Cases with AutoGPT

Hannune-tech opened this issue over 1 year ago

Use gradio state instead of IP address to track session expiration time

lpfhs opened this pull request over 1 year ago

worker can not start in mac

sydowma opened this issue over 1 year ago

Mac M2: Memory usage growing by 1g per 4-5 tokens generated

ericskiff opened this issue over 1 year ago

Text generation early stop problem with Vicuna 33B v1.3

iibw opened this issue over 1 year ago

`TypeError: not a string` after pressing to special characters use delete

FANGOD opened this issue over 1 year ago

how to use vicuna-33b with multi nodes? such as three nodes and each has a V100 GPU, thanks

zhiyongLiu1114 opened this issue over 1 year ago

support for 4bit quantization from transfomer library.

harpomaxx opened this issue over 1 year ago