Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/mozilla/translations

The code, training pipeline, and models that power Firefox Translations
https://github.com/mozilla/translations

This is a dummy change to poke at broken actions on PR decisions.

gabrielBusta opened this pull request 12 months ago
[meta] Train harder to segment languages, like CJK languages

gregtatum opened this issue 12 months ago
Monolingual data has a word splitter that won't work for CJK

gregtatum opened this issue 12 months ago
Support training without a monolingual corpus

marco-c opened this issue 12 months ago
Incompatible python version

AmitMY opened this issue 12 months ago
GPU workers still not always handling preemptions properly

bhearsum opened this issue 12 months ago
Add Hugging Face data importer

eu9ene opened this issue 12 months ago
Consider switching to Docker for all tasks

eu9ene opened this issue 12 months ago
Add community contribution guidelines

eu9ene opened this issue 12 months ago
[Experiment] Train en-ca - Feb 2024

gregtatum opened this pull request 12 months ago
Bump bicleaner-ai dependency

marco-c opened this issue almost 1 year ago
Test OpusFilter with OpusCleaner

eu9ene opened this issue about 1 year ago
Investigate automatic generation of cleaning rules using OpusFilter

marco-c opened this issue about 1 year ago
[meta] Ship 30 languages

gregtatum opened this issue about 1 year ago
Interactive task's shell disconnects periodically

eu9ene opened this issue about 1 year ago
Use real time publication for parser generic publishers

vrigal opened this issue about 1 year ago
Publish experiment metrics separetely from Marian training logs

vrigal opened this issue about 1 year ago
Show parser publication failures

vrigal opened this issue about 1 year ago
Refactor `translations_parser.cli.experiments.main`

vrigal opened this issue about 1 year ago
Publish training charts from Taskcluster

eu9ene opened this issue about 1 year ago
Investigate if we can use datasets from instruction tuning

marco-c opened this issue about 1 year ago
Use SEACrowd datasets

marco-c opened this issue about 1 year ago
Use CommonVoice datasets as monolingual datasets

marco-c opened this issue about 1 year ago
Publish original Marian configs

La0 opened this issue about 1 year ago
[meta] Make the pipeline reliable enough to train many languages

gregtatum opened this issue about 1 year ago
allow for runtime selection of GPU workers

bhearsum opened this issue about 1 year ago
allow `dataset-thresholds` to be empty

bhearsum opened this issue about 1 year ago
Current DAG in the documentation is missing `translate` steps

bhearsum opened this issue about 1 year ago
Support monolingual data from OPUS

bhearsum opened this issue about 1 year ago
`merge-corpus` and `merge-mono` do not work no cleaning is done

gregtatum opened this issue about 1 year ago
quantize often dumps core since browsermt-marian was updated

bhearsum opened this issue about 1 year ago
Logs disappear after a task fails

eu9ene opened this issue about 1 year ago
Support training continuation for the failed or preempted tasks

eu9ene opened this issue about 1 year ago
`make tensorboard` fails

AmitMY opened this issue about 1 year ago
Help get more datasets in OPUS

marco-c opened this issue about 1 year ago
"Cancel all" action doesn't work

eu9ene opened this issue about 1 year ago
Consider using monocleaner for cleaning monolingual corpuses

marco-c opened this issue about 1 year ago
[meta] Improve translation robustness

eu9ene opened this issue about 1 year ago
Consider integrating more data sources

eu9ene opened this issue about 1 year ago
Investigate distillation quality gap

eu9ene opened this issue about 1 year ago
[meta] General translation quality improvements

eu9ene opened this issue over 1 year ago
Improve translation of short sentences

eu9ene opened this issue over 1 year ago
Add full support of the custom filters for OpusCleaner

eu9ene opened this issue over 1 year ago
Task artifact expiration and long-term data storage

gabrielBusta opened this issue over 1 year ago
Evaluate translation capabilities of LLMs

eu9ene opened this issue over 1 year ago
Experiment with using more monolingual data

eu9ene opened this issue over 1 year ago
Re-train existing models with larger datasets

marco-c opened this issue over 1 year ago
Add comparisons with teacher models

marco-c opened this issue over 1 year ago
Investigate using larger student models

marco-c opened this issue over 1 year ago
Improve translation of sentences containing numbers

marco-c opened this issue over 1 year ago
Detect and prevent toxicity in models' output

marco-c opened this issue over 1 year ago
[meta] Track experiments with the tracking platform W&B

marco-c opened this issue over 1 year ago
add testing & linting for taskcluster directory

bhearsum opened this issue over 1 year ago
Issues with casing

marco-c opened this issue over 1 year ago
`download_mono` fails to load custom datasets with `/` in name

AmitMY opened this issue over 2 years ago
clean-corpus: remove `--no-notice` from parallel

AmitMY opened this pull request over 2 years ago
clean_corpus has no clear error message

AmitMY opened this issue over 2 years ago
Support training separate source/target SentencePiece Models

radinplaid opened this issue over 2 years ago
Move configuraiton to profiles

eu9ene opened this pull request over 2 years ago
SImplify configuration

eu9ene opened this issue over 2 years ago
Create Dockerfile - use firefox-translations-training in docker

AmitMY opened this pull request over 2 years ago
Strip HTML from ELRC luxembourg data

jelmervdl opened this pull request over 2 years ago
Translation settings for backtranslation are suboptimal

XapaJIaMnu opened this issue over 2 years ago
Jobarray support

eu9ene opened this pull request over 2 years ago
mono_trg using wrong vocab file

kpu opened this issue over 2 years ago
max-length and max-length-crop considered harmful

kpu opened this issue almost 3 years ago
`make test` command fails, despite `dry-run` succeeding

AmitMY opened this issue almost 3 years ago
CI: add automatic test workflow on every push

AmitMY opened this pull request almost 3 years ago
Update README.md - fix test config path

AmitMY opened this pull request almost 3 years ago
Running snakemake does not recognize mamba

AmitMY opened this issue almost 3 years ago
Fine-tune teachers to parallel corpora

lisskor opened this pull request almost 3 years ago
Preemptable marian training on slurm.

ugermann opened this pull request almost 3 years ago
Minor fixes

lisskor opened this pull request almost 3 years ago
Fix unbound CUDA_VISIBLE_DEVICES in bicleaner.sh

lisskor opened this pull request almost 3 years ago
Checkpointing training

eu9ene opened this pull request almost 3 years ago
Fix unbound LD_LIBRARY_PATH

lisskor opened this pull request almost 3 years ago
Do not continue training if evaluation quality is too low

eu9ene opened this issue almost 3 years ago
Bicleaner won't work on HPC because of time limits

eu9ene opened this issue almost 3 years ago
Add bcp 47 code support in mtdata importer.

khoisan25 opened this issue almost 3 years ago
Poor performance when translating ALL CAPS sentences

XapaJIaMnu opened this issue almost 3 years ago
Add support of Mozilla slurm cluster

eu9ene opened this pull request almost 3 years ago
Quality of teacher model degraded for Russian

eu9ene opened this issue almost 3 years ago
Integrate deduplication in the pipeline

XapaJIaMnu opened this pull request almost 3 years ago
Remove soft hyphen character

kpu opened this issue almost 3 years ago
hotfix: the biclean() function doesn't have access to global variables

XapaJIaMnu opened this pull request almost 3 years ago
bicleaner-ai parallelism fixes

XapaJIaMnu opened this pull request almost 3 years ago
Dataset deduplication issues.

XapaJIaMnu opened this issue almost 3 years ago
Add support for Mozilla slurm cluster

eu9ene opened this issue almost 3 years ago