Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/mozilla/translations
The code, training pipeline, and models that power Firefox Translations
https://github.com/mozilla/translations
Skip logging missing alignments in training
eu9ene opened this issue 8 months ago
eu9ene opened this issue 8 months ago
Delete old cleaning scripts
gregtatum opened this issue 8 months ago
gregtatum opened this issue 8 months ago
Use a virtual environment per requirements.txt file in run_task
gregtatum opened this pull request 8 months ago
gregtatum opened this pull request 8 months ago
Update tracking unit tests to support new CLI parameter wandb_publication
La0 opened this pull request 8 months ago
La0 opened this pull request 8 months ago
Set wandb-publication argument in publication tests args namespace
vrigal opened this pull request 8 months ago
vrigal opened this pull request 8 months ago
Parse evaluation data from .metrics artifacts in taskcluster
vrigal opened this pull request 8 months ago
vrigal opened this pull request 8 months ago
test_tracking_cli.py is currently failing on main
bhearsum opened this issue 8 months ago
bhearsum opened this issue 8 months ago
WIP: add worker configuration for snakepit machines
bhearsum opened this pull request 8 months ago
bhearsum opened this pull request 8 months ago
OOM looks like a preemption
eu9ene opened this issue 8 months ago
eu9ene opened this issue 8 months ago
Re-enable generic-worker for CPU tasks
bhearsum opened this pull request 8 months ago
bhearsum opened this pull request 8 months ago
Replace print by sys.stdout.buffer.write
La0 opened this pull request 8 months ago
La0 opened this pull request 8 months ago
Explore quality estimation methods for data filtering
eu9ene opened this issue 8 months ago
eu9ene opened this issue 8 months ago
Publish evaluation metrics
La0 opened this pull request 8 months ago
La0 opened this pull request 8 months ago
Add a docker marker
gregtatum opened this pull request 8 months ago
gregtatum opened this pull request 8 months ago
Fix the opus test
gregtatum opened this pull request 8 months ago
gregtatum opened this pull request 8 months ago
Add missing wandb_publication parameter on finetune-student task.
La0 opened this pull request 8 months ago
La0 opened this pull request 8 months ago
[tracking ERROR] Publication failed: Invalid config section: while scanning a simple key
bhearsum opened this issue 8 months ago
bhearsum opened this issue 8 months ago
CI pollutes W&B
eu9ene opened this issue 8 months ago
eu9ene opened this issue 8 months ago
training runs don't log properly to stdout/stderr anymore
bhearsum opened this issue 8 months ago
bhearsum opened this issue 8 months ago
Do not show empty opus datasets, and fix the URLs
gregtatum opened this pull request 8 months ago
gregtatum opened this pull request 8 months ago
[Experiment] Retrain en-ru with latest cleaning Apr 2024
eu9ene opened this pull request 8 months ago
eu9ene opened this pull request 8 months ago
CI often fails with "Could not resolve host: github.com"
eu9ene opened this issue 8 months ago
eu9ene opened this issue 8 months ago
Add a preflight check for URL mounts
gregtatum opened this pull request 8 months ago
gregtatum opened this pull request 8 months ago
Custom cleaning
eu9ene opened this pull request 8 months ago
eu9ene opened this pull request 8 months ago
Wrap train-taskcluster.sh in train_taskcluster.py
bhearsum opened this pull request 8 months ago
bhearsum opened this pull request 8 months ago
Always use vocab.spm from artifacts directory in training steps
bhearsum opened this pull request 8 months ago
bhearsum opened this pull request 8 months ago
Improve caching of teacher ensembles
eu9ene opened this issue 8 months ago
eu9ene opened this issue 8 months ago
Add file overrides to the training continuation, and refactor the implementation
gregtatum opened this pull request 8 months ago
gregtatum opened this pull request 8 months ago
Support training continuation (pretrained models) for teacher ensembles
gregtatum opened this issue 8 months ago
gregtatum opened this issue 8 months ago
Always use vocab.spm from artifacts directory in training steps
bhearsum opened this pull request 8 months ago
bhearsum opened this pull request 8 months ago
Update the training continuation docs
gregtatum opened this pull request 8 months ago
gregtatum opened this pull request 8 months ago
Fix the preflight check to use the proper config
gregtatum opened this pull request 8 months ago
gregtatum opened this pull request 8 months ago
docker tasks on generic worker sometimes hit issues with caches
bhearsum opened this issue 9 months ago
bhearsum opened this issue 9 months ago
Integrate HPLT Datasets v1.2 as a monolingual dataset
gregtatum opened this issue 9 months ago
gregtatum opened this issue 9 months ago
Revert change to generic-worker for CPU tasks
bhearsum opened this pull request 9 months ago
bhearsum opened this pull request 9 months ago
Revert unnecessary change to docker image
bhearsum opened this pull request 9 months ago
bhearsum opened this pull request 9 months ago
Support extra metrics from Tensorboard
eu9ene opened this issue 9 months ago
eu9ene opened this issue 9 months ago
Switch CPU tasks to generic-worker/d2g images (fixes #473)
bhearsum opened this pull request 9 months ago
bhearsum opened this pull request 9 months ago
allow runtime selection of worker classes through training config (fixes #300)
bhearsum opened this pull request 9 months ago
bhearsum opened this pull request 9 months ago
Investigate using the c4 dataset as a monolingual data source.
gregtatum opened this issue 9 months ago
gregtatum opened this issue 9 months ago
Make downloads more robust
gregtatum opened this pull request 9 months ago
gregtatum opened this pull request 9 months ago
Display corpus size in W&B
eu9ene opened this issue 9 months ago
eu9ene opened this issue 9 months ago
Upgrade to BicleanerAI 3.0
eu9ene opened this issue 9 months ago
eu9ene opened this issue 9 months ago
use shorter names for tasks that have custom datasets
bhearsum opened this issue 9 months ago
bhearsum opened this issue 9 months ago
[meta] Train RTL languages like Arabic and Hebrew
gregtatum opened this issue 9 months ago
gregtatum opened this issue 9 months ago
[meta] Train easy to segment LTR languages
gregtatum opened this issue 9 months ago
gregtatum opened this issue 9 months ago
Add more sources support for utils/run_model.py
gregtatum opened this issue 9 months ago
gregtatum opened this issue 9 months ago
Publish task logs from Taskcluster
eu9ene opened this issue 9 months ago
eu9ene opened this issue 9 months ago
Publish experiment config from Taskcluster
eu9ene opened this issue 9 months ago
eu9ene opened this issue 9 months ago
Publish Marian config from Taskcluster
eu9ene opened this issue 9 months ago
eu9ene opened this issue 9 months ago
Publish evals from Taskcluster
eu9ene opened this issue 9 months ago
eu9ene opened this issue 9 months ago
Issues with uploaded experiments
eu9ene opened this issue 9 months ago
eu9ene opened this issue 9 months ago
[Experiment] Data cleaning Apr 2024
eu9ene opened this pull request 9 months ago
eu9ene opened this pull request 9 months ago
Support training continuation for student models
gregtatum opened this issue 9 months ago
gregtatum opened this issue 9 months ago
The old cleaning script breaks on small datasets
gregtatum opened this issue 9 months ago
gregtatum opened this issue 9 months ago
Improve implementation of alignments
eu9ene opened this issue 9 months ago
eu9ene opened this issue 9 months ago
Add a `binaries` marker to pytests
gregtatum opened this issue 9 months ago
gregtatum opened this issue 9 months ago
Mono data downloading got stuck
eu9ene opened this issue 9 months ago
eu9ene opened this issue 9 months ago
Add COMET to the evaluation steps
gregtatum opened this issue 10 months ago
gregtatum opened this issue 10 months ago
Taskcluster publication
La0 opened this pull request 10 months ago
La0 opened this pull request 10 months ago
[meta] issues before primary maintance of taskgraph code in this repository is handed off to translations engineers
bhearsum opened this issue 10 months ago
bhearsum opened this issue 10 months ago
[Experiment] Train en cs - Mar 2024
gregtatum opened this pull request 10 months ago
gregtatum opened this pull request 10 months ago
Investigate monolingual cleaning
eu9ene opened this issue 10 months ago
eu9ene opened this issue 10 months ago
enable memory monitoring on CPU workers
bhearsum opened this issue 10 months ago
bhearsum opened this issue 10 months ago
Teacher model does not continue training on original corpus
eu9ene opened this issue 10 months ago
eu9ene opened this issue 10 months ago
automatically upload important artifacts to a GCP bucket
bhearsum opened this issue 11 months ago
bhearsum opened this issue 11 months ago
Investigate optimizing the CI training run
gregtatum opened this issue 11 months ago
gregtatum opened this issue 11 months ago
evaluate-quantized step fails in CI
eu9ene opened this issue 11 months ago
eu9ene opened this issue 11 months ago
[meta] Cost efficiency
eu9ene opened this issue 11 months ago
eu9ene opened this issue 11 months ago
[meta] issues blocking us from using spot instance for training tasks
bhearsum opened this issue 11 months ago
bhearsum opened this issue 11 months ago
[Experiment] Inline noise Feb 2024
eu9ene opened this pull request 11 months ago
eu9ene opened this pull request 11 months ago
Add kind that demonstrates how to modify the upstream graph in a transform
bhearsum opened this pull request 11 months ago
bhearsum opened this pull request 11 months ago
Fully switch to zstd compression
eu9ene opened this issue 11 months ago
eu9ene opened this issue 11 months ago
This is a dummy change to poke at broken actions on PR decisions.
gabrielBusta opened this pull request 11 months ago
gabrielBusta opened this pull request 11 months ago
[meta] Train harder to segment languages, like CJK languages
gregtatum opened this issue 11 months ago
gregtatum opened this issue 11 months ago
Monolingual data has a word splitter that won't work for CJK
gregtatum opened this issue 11 months ago
gregtatum opened this issue 11 months ago
Support training without a monolingual corpus
marco-c opened this issue 11 months ago
marco-c opened this issue 11 months ago
Incompatible python version
AmitMY opened this issue 11 months ago
AmitMY opened this issue 11 months ago
Tune workspace dashboard to enable comparison across models and experiments
eu9ene opened this issue 11 months ago
eu9ene opened this issue 11 months ago
GPU workers still not always handling preemptions properly
bhearsum opened this issue 12 months ago
bhearsum opened this issue 12 months ago
Add Hugging Face data importer
eu9ene opened this issue 12 months ago
eu9ene opened this issue 12 months ago
Consider switching to Docker for all tasks
eu9ene opened this issue 12 months ago
eu9ene opened this issue 12 months ago
Ensure monolingual corpus is de-duplicated from the parallel corpus
gregtatum opened this issue 12 months ago
gregtatum opened this issue 12 months ago
Add community contribution guidelines
eu9ene opened this issue 12 months ago
eu9ene opened this issue 12 months ago
[Experiment] Train en-ca - Feb 2024
gregtatum opened this pull request 12 months ago
gregtatum opened this pull request 12 months ago
Bump bicleaner-ai dependency
marco-c opened this issue 12 months ago
marco-c opened this issue 12 months ago
Test OpusFilter with OpusCleaner
eu9ene opened this issue 12 months ago
eu9ene opened this issue 12 months ago
Investigate automatic generation of cleaning rules using OpusFilter
marco-c opened this issue 12 months ago
marco-c opened this issue 12 months ago
[meta] Ship 30 languages
gregtatum opened this issue 12 months ago
gregtatum opened this issue 12 months ago
Interactive task's shell disconnects periodically
eu9ene opened this issue 12 months ago
eu9ene opened this issue 12 months ago
Use real time publication for parser generic publishers
vrigal opened this issue 12 months ago
vrigal opened this issue 12 months ago
Publish experiment metrics separetely from Marian training logs
vrigal opened this issue 12 months ago
vrigal opened this issue 12 months ago
Show parser publication failures
vrigal opened this issue 12 months ago
vrigal opened this issue 12 months ago
Refactor `translations_parser.cli.experiments.main`
vrigal opened this issue 12 months ago
vrigal opened this issue 12 months ago
Publish training charts from Taskcluster
eu9ene opened this issue about 1 year ago
eu9ene opened this issue about 1 year ago
Investigate if we can use datasets from instruction tuning
marco-c opened this issue about 1 year ago
marco-c opened this issue about 1 year ago
Use SEACrowd datasets
marco-c opened this issue about 1 year ago
marco-c opened this issue about 1 year ago
Use CommonVoice datasets as monolingual datasets
marco-c opened this issue about 1 year ago
marco-c opened this issue about 1 year ago
Publish original Marian configs
La0 opened this issue about 1 year ago
La0 opened this issue about 1 year ago
[meta] Make the pipeline reliable enough to train many languages
gregtatum opened this issue about 1 year ago
gregtatum opened this issue about 1 year ago