Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/mozilla/translations

The code, training pipeline, and models that power Firefox Translations
https://github.com/mozilla/translations

Skip logging missing alignments in training

eu9ene opened this issue 8 months ago
Delete old cleaning scripts

gregtatum opened this issue 8 months ago
Use a virtual environment per requirements.txt file in run_task

gregtatum opened this pull request 8 months ago
Set wandb-publication argument in publication tests args namespace

vrigal opened this pull request 8 months ago
Parse evaluation data from .metrics artifacts in taskcluster

vrigal opened this pull request 8 months ago
test_tracking_cli.py is currently failing on main

bhearsum opened this issue 8 months ago
WIP: add worker configuration for snakepit machines

bhearsum opened this pull request 8 months ago
OOM looks like a preemption

eu9ene opened this issue 8 months ago
Re-enable generic-worker for CPU tasks

bhearsum opened this pull request 8 months ago
Replace print by sys.stdout.buffer.write

La0 opened this pull request 8 months ago
Explore quality estimation methods for data filtering

eu9ene opened this issue 8 months ago
Publish evaluation metrics

La0 opened this pull request 8 months ago
Add a docker marker

gregtatum opened this pull request 8 months ago
Fix the opus test

gregtatum opened this pull request 8 months ago
Add missing wandb_publication parameter on finetune-student task.

La0 opened this pull request 8 months ago
CI pollutes W&B

eu9ene opened this issue 8 months ago
training runs don't log properly to stdout/stderr anymore

bhearsum opened this issue 8 months ago
Do not show empty opus datasets, and fix the URLs

gregtatum opened this pull request 8 months ago
[Experiment] Retrain en-ru with latest cleaning Apr 2024

eu9ene opened this pull request 8 months ago
CI often fails with "Could not resolve host: github.com"

eu9ene opened this issue 8 months ago
Add a preflight check for URL mounts

gregtatum opened this pull request 8 months ago
Custom cleaning

eu9ene opened this pull request 8 months ago
Wrap train-taskcluster.sh in train_taskcluster.py

bhearsum opened this pull request 8 months ago
Always use vocab.spm from artifacts directory in training steps

bhearsum opened this pull request 8 months ago
Improve caching of teacher ensembles

eu9ene opened this issue 8 months ago
Always use vocab.spm from artifacts directory in training steps

bhearsum opened this pull request 8 months ago
Update the training continuation docs

gregtatum opened this pull request 8 months ago
Fix the preflight check to use the proper config

gregtatum opened this pull request 8 months ago
docker tasks on generic worker sometimes hit issues with caches

bhearsum opened this issue 9 months ago
Integrate HPLT Datasets v1.2 as a monolingual dataset

gregtatum opened this issue 9 months ago
Revert change to generic-worker for CPU tasks

bhearsum opened this pull request 9 months ago
Revert unnecessary change to docker image

bhearsum opened this pull request 9 months ago
Support extra metrics from Tensorboard

eu9ene opened this issue 9 months ago
Switch CPU tasks to generic-worker/d2g images (fixes #473)

bhearsum opened this pull request 9 months ago
Investigate using the c4 dataset as a monolingual data source.

gregtatum opened this issue 9 months ago
Make downloads more robust

gregtatum opened this pull request 9 months ago
Display corpus size in W&B

eu9ene opened this issue 9 months ago
Upgrade to BicleanerAI 3.0

eu9ene opened this issue 9 months ago
use shorter names for tasks that have custom datasets

bhearsum opened this issue 9 months ago
[meta] Train RTL languages like Arabic and Hebrew

gregtatum opened this issue 9 months ago
[meta] Train easy to segment LTR languages

gregtatum opened this issue 9 months ago
Add more sources support for utils/run_model.py

gregtatum opened this issue 9 months ago
Publish task logs from Taskcluster

eu9ene opened this issue 9 months ago
Publish experiment config from Taskcluster

eu9ene opened this issue 9 months ago
Publish Marian config from Taskcluster

eu9ene opened this issue 9 months ago
Publish evals from Taskcluster

eu9ene opened this issue 9 months ago
Issues with uploaded experiments

eu9ene opened this issue 9 months ago
[Experiment] Data cleaning Apr 2024

eu9ene opened this pull request 9 months ago
Support training continuation for student models

gregtatum opened this issue 9 months ago
The old cleaning script breaks on small datasets

gregtatum opened this issue 9 months ago
Improve implementation of alignments

eu9ene opened this issue 9 months ago
Add a `binaries` marker to pytests

gregtatum opened this issue 9 months ago
Mono data downloading got stuck

eu9ene opened this issue 9 months ago
Add COMET to the evaluation steps

gregtatum opened this issue 10 months ago
Taskcluster publication

La0 opened this pull request 10 months ago
[Experiment] Train en cs - Mar 2024

gregtatum opened this pull request 10 months ago
Investigate monolingual cleaning

eu9ene opened this issue 10 months ago
enable memory monitoring on CPU workers

bhearsum opened this issue 10 months ago
Teacher model does not continue training on original corpus

eu9ene opened this issue 10 months ago
automatically upload important artifacts to a GCP bucket

bhearsum opened this issue 11 months ago
Investigate optimizing the CI training run

gregtatum opened this issue 11 months ago
evaluate-quantized step fails in CI

eu9ene opened this issue 11 months ago
[meta] Cost efficiency

eu9ene opened this issue 11 months ago
[Experiment] Inline noise Feb 2024

eu9ene opened this pull request 11 months ago
Add kind that demonstrates how to modify the upstream graph in a transform

bhearsum opened this pull request 11 months ago
Fully switch to zstd compression

eu9ene opened this issue 11 months ago
This is a dummy change to poke at broken actions on PR decisions.

gabrielBusta opened this pull request 11 months ago
[meta] Train harder to segment languages, like CJK languages

gregtatum opened this issue 11 months ago
Monolingual data has a word splitter that won't work for CJK

gregtatum opened this issue 11 months ago
Support training without a monolingual corpus

marco-c opened this issue 11 months ago
Incompatible python version

AmitMY opened this issue 11 months ago
GPU workers still not always handling preemptions properly

bhearsum opened this issue 12 months ago
Add Hugging Face data importer

eu9ene opened this issue 12 months ago
Consider switching to Docker for all tasks

eu9ene opened this issue 12 months ago
Add community contribution guidelines

eu9ene opened this issue 12 months ago
[Experiment] Train en-ca - Feb 2024

gregtatum opened this pull request 12 months ago
Bump bicleaner-ai dependency

marco-c opened this issue 12 months ago
Test OpusFilter with OpusCleaner

eu9ene opened this issue 12 months ago
[meta] Ship 30 languages

gregtatum opened this issue 12 months ago
Interactive task's shell disconnects periodically

eu9ene opened this issue 12 months ago
Use real time publication for parser generic publishers

vrigal opened this issue 12 months ago
Show parser publication failures

vrigal opened this issue 12 months ago
Refactor `translations_parser.cli.experiments.main`

vrigal opened this issue 12 months ago
Publish training charts from Taskcluster

eu9ene opened this issue about 1 year ago
Investigate if we can use datasets from instruction tuning

marco-c opened this issue about 1 year ago
Use SEACrowd datasets

marco-c opened this issue about 1 year ago
Use CommonVoice datasets as monolingual datasets

marco-c opened this issue about 1 year ago
Publish original Marian configs

La0 opened this issue about 1 year ago
[meta] Make the pipeline reliable enough to train many languages

gregtatum opened this issue about 1 year ago