Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/mozilla/translations
The code, training pipeline, and models that power Firefox Translations
https://github.com/mozilla/translations
This is a dummy change to poke at broken actions on PR decisions.
gabrielBusta opened this pull request 12 months ago
gabrielBusta opened this pull request 12 months ago
[meta] Train harder to segment languages, like CJK languages
gregtatum opened this issue 12 months ago
gregtatum opened this issue 12 months ago
Monolingual data has a word splitter that won't work for CJK
gregtatum opened this issue 12 months ago
gregtatum opened this issue 12 months ago
Support training without a monolingual corpus
marco-c opened this issue 12 months ago
marco-c opened this issue 12 months ago
Incompatible python version
AmitMY opened this issue 12 months ago
AmitMY opened this issue 12 months ago
Tune workspace dashboard to enable comparison across models and experiments
eu9ene opened this issue 12 months ago
eu9ene opened this issue 12 months ago
GPU workers still not always handling preemptions properly
bhearsum opened this issue 12 months ago
bhearsum opened this issue 12 months ago
Add Hugging Face data importer
eu9ene opened this issue 12 months ago
eu9ene opened this issue 12 months ago
Consider switching to Docker for all tasks
eu9ene opened this issue 12 months ago
eu9ene opened this issue 12 months ago
Ensure monolingual corpus is de-duplicated from the parallel corpus
gregtatum opened this issue 12 months ago
gregtatum opened this issue 12 months ago
Add community contribution guidelines
eu9ene opened this issue 12 months ago
eu9ene opened this issue 12 months ago
[Experiment] Train en-ca - Feb 2024
gregtatum opened this pull request 12 months ago
gregtatum opened this pull request 12 months ago
Bump bicleaner-ai dependency
marco-c opened this issue almost 1 year ago
marco-c opened this issue almost 1 year ago
Test OpusFilter with OpusCleaner
eu9ene opened this issue about 1 year ago
eu9ene opened this issue about 1 year ago
Investigate automatic generation of cleaning rules using OpusFilter
marco-c opened this issue about 1 year ago
marco-c opened this issue about 1 year ago
[meta] Ship 30 languages
gregtatum opened this issue about 1 year ago
gregtatum opened this issue about 1 year ago
Interactive task's shell disconnects periodically
eu9ene opened this issue about 1 year ago
eu9ene opened this issue about 1 year ago
Use real time publication for parser generic publishers
vrigal opened this issue about 1 year ago
vrigal opened this issue about 1 year ago
Publish experiment metrics separetely from Marian training logs
vrigal opened this issue about 1 year ago
vrigal opened this issue about 1 year ago
Show parser publication failures
vrigal opened this issue about 1 year ago
vrigal opened this issue about 1 year ago
Refactor `translations_parser.cli.experiments.main`
vrigal opened this issue about 1 year ago
vrigal opened this issue about 1 year ago
Publish training charts from Taskcluster
eu9ene opened this issue about 1 year ago
eu9ene opened this issue about 1 year ago
Investigate if we can use datasets from instruction tuning
marco-c opened this issue about 1 year ago
marco-c opened this issue about 1 year ago
Use SEACrowd datasets
marco-c opened this issue about 1 year ago
marco-c opened this issue about 1 year ago
Use CommonVoice datasets as monolingual datasets
marco-c opened this issue about 1 year ago
marco-c opened this issue about 1 year ago
Publish original Marian configs
La0 opened this issue about 1 year ago
La0 opened this issue about 1 year ago
[meta] Make the pipeline reliable enough to train many languages
gregtatum opened this issue about 1 year ago
gregtatum opened this issue about 1 year ago
allow for runtime selection of GPU workers
bhearsum opened this issue about 1 year ago
bhearsum opened this issue about 1 year ago
allow `dataset-thresholds` to be empty
bhearsum opened this issue about 1 year ago
bhearsum opened this issue about 1 year ago
Current DAG in the documentation is missing `translate` steps
bhearsum opened this issue about 1 year ago
bhearsum opened this issue about 1 year ago
Support monolingual data from OPUS
bhearsum opened this issue about 1 year ago
bhearsum opened this issue about 1 year ago
`merge-corpus` and `merge-mono` do not work no cleaning is done
gregtatum opened this issue about 1 year ago
gregtatum opened this issue about 1 year ago
quantize often dumps core since browsermt-marian was updated
bhearsum opened this issue about 1 year ago
bhearsum opened this issue about 1 year ago
In the cleaning task, output statistics about filtered out/kept sentences and maybe attach list of filtered out sentences as an artifact
marco-c opened this issue about 1 year ago
marco-c opened this issue about 1 year ago
Logs disappear after a task fails
eu9ene opened this issue about 1 year ago
eu9ene opened this issue about 1 year ago
Support training continuation for the failed or preempted tasks
eu9ene opened this issue about 1 year ago
eu9ene opened this issue about 1 year ago
`make tensorboard` fails
AmitMY opened this issue about 1 year ago
AmitMY opened this issue about 1 year ago
Help get more datasets in OPUS
marco-c opened this issue about 1 year ago
marco-c opened this issue about 1 year ago
"Cancel all" action doesn't work
eu9ene opened this issue about 1 year ago
eu9ene opened this issue about 1 year ago
Consider using monocleaner for cleaning monolingual corpuses
marco-c opened this issue about 1 year ago
marco-c opened this issue about 1 year ago
[meta] Improve translation robustness
eu9ene opened this issue about 1 year ago
eu9ene opened this issue about 1 year ago
Consider integrating more data sources
eu9ene opened this issue about 1 year ago
eu9ene opened this issue about 1 year ago
Investigate distillation quality gap
eu9ene opened this issue about 1 year ago
eu9ene opened this issue about 1 year ago
Dig deeper in sentences we translate badly to try and identify common failure themes
marco-c opened this issue about 1 year ago
marco-c opened this issue about 1 year ago
[meta] General translation quality improvements
eu9ene opened this issue over 1 year ago
eu9ene opened this issue over 1 year ago
Improve translation of short sentences
eu9ene opened this issue over 1 year ago
eu9ene opened this issue over 1 year ago
Add full support of the custom filters for OpusCleaner
eu9ene opened this issue over 1 year ago
eu9ene opened this issue over 1 year ago
Task artifact expiration and long-term data storage
gabrielBusta opened this issue over 1 year ago
gabrielBusta opened this issue over 1 year ago
Evaluate translation capabilities of LLMs
eu9ene opened this issue over 1 year ago
eu9ene opened this issue over 1 year ago
Experiment with using more monolingual data
eu9ene opened this issue over 1 year ago
eu9ene opened this issue over 1 year ago
Support training a student model from an already existing teacher model
marco-c opened this issue over 1 year ago
marco-c opened this issue over 1 year ago
Re-train existing models with larger datasets
marco-c opened this issue over 1 year ago
marco-c opened this issue over 1 year ago
Ensure presence or absence of periods doesn't trigger wrong translations
marco-c opened this issue over 1 year ago
marco-c opened this issue over 1 year ago
Ensure random whitespaces between characters don't trigger wrong translations
marco-c opened this issue over 1 year ago
marco-c opened this issue over 1 year ago
Add comparisons with teacher models
marco-c opened this issue over 1 year ago
marco-c opened this issue over 1 year ago
Investigate using larger student models
marco-c opened this issue over 1 year ago
marco-c opened this issue over 1 year ago
Improve translation of sentences containing numbers
marco-c opened this issue over 1 year ago
marco-c opened this issue over 1 year ago
Investigate performance, memory usage and translation quality without using a lexical shortlist
marco-c opened this issue over 1 year ago
marco-c opened this issue over 1 year ago
Some words should be passing through untranslated (e.g. IPA characters, emojis, etc.)
marco-c opened this issue over 1 year ago
marco-c opened this issue over 1 year ago
Detect and prevent toxicity in models' output
marco-c opened this issue over 1 year ago
marco-c opened this issue over 1 year ago
Investigate CTranslate2 for translating sentences with teacher model
marco-c opened this issue over 1 year ago
marco-c opened this issue over 1 year ago
[meta] Track experiments with the tracking platform W&B
marco-c opened this issue over 1 year ago
marco-c opened this issue over 1 year ago
add testing & linting for taskcluster directory
bhearsum opened this issue over 1 year ago
bhearsum opened this issue over 1 year ago
Issues with casing
marco-c opened this issue over 1 year ago
marco-c opened this issue over 1 year ago
`download_mono` fails to load custom datasets with `/` in name
AmitMY opened this issue over 2 years ago
AmitMY opened this issue over 2 years ago
clean-corpus: remove `--no-notice` from parallel
AmitMY opened this pull request over 2 years ago
AmitMY opened this pull request over 2 years ago
clean_corpus has no clear error message
AmitMY opened this issue over 2 years ago
AmitMY opened this issue over 2 years ago
Support training separate source/target SentencePiece Models
radinplaid opened this issue over 2 years ago
radinplaid opened this issue over 2 years ago
Move configuraiton to profiles
eu9ene opened this pull request over 2 years ago
eu9ene opened this pull request over 2 years ago
SImplify configuration
eu9ene opened this issue over 2 years ago
eu9ene opened this issue over 2 years ago
Create Dockerfile - use firefox-translations-training in docker
AmitMY opened this pull request over 2 years ago
AmitMY opened this pull request over 2 years ago
Strip HTML from ELRC luxembourg data
jelmervdl opened this pull request over 2 years ago
jelmervdl opened this pull request over 2 years ago
Translation settings for backtranslation are suboptimal
XapaJIaMnu opened this issue over 2 years ago
XapaJIaMnu opened this issue over 2 years ago
Jobarray support
eu9ene opened this pull request over 2 years ago
eu9ene opened this pull request over 2 years ago
mono_trg using wrong vocab file
kpu opened this issue over 2 years ago
kpu opened this issue over 2 years ago
max-length and max-length-crop considered harmful
kpu opened this issue almost 3 years ago
kpu opened this issue almost 3 years ago
`make test` command fails, despite `dry-run` succeeding
AmitMY opened this issue almost 3 years ago
AmitMY opened this issue almost 3 years ago
CI: add automatic test workflow on every push
AmitMY opened this pull request almost 3 years ago
AmitMY opened this pull request almost 3 years ago
Update README.md - fix test config path
AmitMY opened this pull request almost 3 years ago
AmitMY opened this pull request almost 3 years ago
Running snakemake does not recognize mamba
AmitMY opened this issue almost 3 years ago
AmitMY opened this issue almost 3 years ago
Fine-tune teachers to parallel corpora
lisskor opened this pull request almost 3 years ago
lisskor opened this pull request almost 3 years ago
Preemptable marian training on slurm.
ugermann opened this pull request almost 3 years ago
ugermann opened this pull request almost 3 years ago
Minor fixes
lisskor opened this pull request almost 3 years ago
lisskor opened this pull request almost 3 years ago
Fix unbound CUDA_VISIBLE_DEVICES in bicleaner.sh
lisskor opened this pull request almost 3 years ago
lisskor opened this pull request almost 3 years ago
Checkpointing training
eu9ene opened this pull request almost 3 years ago
eu9ene opened this pull request almost 3 years ago
Fix unbound LD_LIBRARY_PATH
lisskor opened this pull request almost 3 years ago
lisskor opened this pull request almost 3 years ago
Do not continue training if evaluation quality is too low
eu9ene opened this issue almost 3 years ago
eu9ene opened this issue almost 3 years ago
Bicleaner won't work on HPC because of time limits
eu9ene opened this issue almost 3 years ago
eu9ene opened this issue almost 3 years ago
Add bcp 47 code support in mtdata importer.
khoisan25 opened this issue almost 3 years ago
khoisan25 opened this issue almost 3 years ago
Teacher does not continue training after pretraining on augmented corpus
eu9ene opened this issue almost 3 years ago
eu9ene opened this issue almost 3 years ago
tedx download via sacrebleu fails due to directionality of the data set
kpu opened this issue almost 3 years ago
kpu opened this issue almost 3 years ago
Poor performance when translating ALL CAPS sentences
XapaJIaMnu opened this issue almost 3 years ago
XapaJIaMnu opened this issue almost 3 years ago
Add support of Mozilla slurm cluster
eu9ene opened this pull request almost 3 years ago
eu9ene opened this pull request almost 3 years ago
Quality of teacher model degraded for Russian
eu9ene opened this issue almost 3 years ago
eu9ene opened this issue almost 3 years ago
Integrate deduplication in the pipeline
XapaJIaMnu opened this pull request almost 3 years ago
XapaJIaMnu opened this pull request almost 3 years ago
Remove soft hyphen character
kpu opened this issue almost 3 years ago
kpu opened this issue almost 3 years ago
hotfix: the biclean() function doesn't have access to global variables
XapaJIaMnu opened this pull request almost 3 years ago
XapaJIaMnu opened this pull request almost 3 years ago
bicleaner-ai parallelism fixes
XapaJIaMnu opened this pull request almost 3 years ago
XapaJIaMnu opened this pull request almost 3 years ago
Dataset deduplication issues.
XapaJIaMnu opened this issue almost 3 years ago
XapaJIaMnu opened this issue almost 3 years ago
Add support for Mozilla slurm cluster
eu9ene opened this issue almost 3 years ago
eu9ene opened this issue almost 3 years ago