Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/mozilla/translations

The code, training pipeline, and models that power Firefox Translations
https://github.com/mozilla/translations

Add pyright coverage for tests/**/*

gregtatum opened this pull request 3 days ago
Combine poetry's utils and test groups

gregtatum opened this issue 16 days ago
Train a real smaller teacher to be used in CTranslate2

gregtatum opened this issue 16 days ago
Train a small teacher

gregtatum opened this pull request 16 days ago
Remove expired-after

eu9ene opened this pull request 17 days ago
Update dependencies to upstream OpusTrainer when released

eu9ene opened this issue 17 days ago
feat: add cron task that runs the minimal training pipeline nightly

bhearsum opened this pull request 18 days ago
Error on resuming training after preemption

eu9ene opened this issue 18 days ago
Do not output empty alignments

eu9ene opened this pull request 18 days ago
Rename "all" to "all-pipeline"

gregtatum opened this pull request 19 days ago
Errors in Marian logs parser

eu9ene opened this issue 19 days ago
Rollback target renaming

eu9ene opened this pull request 19 days ago
[Experiment] CJK with ICU segmenter

eu9ene opened this pull request 19 days ago
Fix Space Omission for Korean Translations

nordzilla opened this pull request 22 days ago
Investigate marian-decoder memory usage

gregtatum opened this issue 23 days ago
Fix space omission when translating into CJK languages

nordzilla opened this pull request 24 days ago
Add a util for local remote settings

gregtatum opened this pull request 25 days ago
fix: use base repo name for project in pull requests

bhearsum opened this pull request 30 days ago
fix: set permission for train action

bhearsum opened this pull request about 1 month ago
Add pyright to check python types

gregtatum opened this pull request about 1 month ago
Add a warning when train- is the target stage rather than evaluate-

gregtatum opened this pull request about 1 month ago
feat: upgrade cpu workers to ubuntu 24.04 generic-worker image

bhearsum opened this pull request about 1 month ago
scrape and upload 2024 training artifacts

bhearsum opened this issue about 1 month ago
Add Mozilla's clang-format rules to CI

nordzilla opened this issue about 1 month ago
Use Snapshot Testing for WASM Inference Tests

nordzilla opened this issue about 1 month ago
Use `Intl.Segmenter` instead of `ssplit` for segmentation in WASM builds

nordzilla opened this pull request about 1 month ago
Japanese is missing in OpusCleaner

eu9ene opened this issue about 1 month ago
MTData fails to unpack some datasets

eu9ene opened this issue about 1 month ago
Autogenerated config doesn't work

eu9ene opened this issue about 1 month ago
Corpora exclusion rules

ZJaume opened this issue about 1 month ago
Switch to ICU tokenizer

eu9ene opened this pull request about 1 month ago
Cjk corpora fixes

ZJaume opened this pull request about 1 month ago
[skip ci] Fix typo in the rebuild docker-images/toolchains docs

gabrielBusta opened this pull request about 2 months ago
Adjust default values for batching

gregtatum opened this pull request about 2 months ago
Linters needs to ignore node_modules

gregtatum opened this issue about 2 months ago
Experiment with distillation data inference

gregtatum opened this issue about 2 months ago
Add a tsconfig.json file for JS code within this repository

nordzilla opened this issue about 2 months ago
Use PyMarian for COMET evaluations

marco-c opened this issue about 2 months ago
Single-side deduplication

ZJaume opened this issue about 2 months ago
Test WASM Translations in CI

nordzilla opened this pull request about 2 months ago
Ctranslate2 ci 2

gregtatum opened this pull request about 2 months ago
CI Run check

gregtatum opened this pull request about 2 months ago
Create an `analyze-datasets` step in the pipeline

gregtatum opened this issue about 2 months ago
Investigate merging document sentences in HPLT

eu9ene opened this issue about 2 months ago
Rewrite the train scripts and add config support for ctranslate2

gregtatum opened this pull request about 2 months ago
Make `npm` available to `local` and `inference` docker images

nordzilla opened this pull request 2 months ago
Setup WASM test infrastructure for CI

nordzilla opened this pull request 2 months ago
Add --run-as-user flag to docker-run.py

nordzilla opened this pull request 2 months ago
Add emsdk as a git submodule

nordzilla opened this pull request 2 months ago
Add better support for reporting training continuation values

gregtatum opened this pull request 2 months ago
Rename docker tags following repository rename

nordzilla opened this pull request 2 months ago
Rename repo

gregtatum opened this pull request 2 months ago
Allow for split vocabs

gregtatum opened this issue 2 months ago
[meta] Kick off a 2024-H2 training run

gregtatum opened this issue 2 months ago
Do not use WMTNews as training!

ZJaume opened this issue 2 months ago
More corpora specific fixes

ZJaume opened this issue 2 months ago
Fix shortlist pruning for CJK

eu9ene opened this pull request 2 months ago
Switch bestbleu to chrF

eu9ene opened this pull request 2 months ago
Use GCP standard instances for alignment tasks

eu9ene opened this pull request 2 months ago
Configure vocab for CJK

eu9ene opened this pull request 2 months ago
Limit the amount of data used for distillation

gregtatum opened this issue 2 months ago
Update training to support CJK

eu9ene opened this pull request 2 months ago
Rework wasm build scripts for gecko

nordzilla opened this pull request 2 months ago
Remove max_words filtering from data importers

eu9ene opened this pull request 2 months ago
Adjust data cleaning for CJK

eu9ene opened this pull request 2 months ago
Investigate word-based filtering for CJK

eu9ene opened this issue 2 months ago
Update data importer to support CJK

eu9ene opened this pull request 2 months ago
Add support for Chinese Traditional

eu9ene opened this issue 2 months ago
Fix taskcluster train scripts

eu9ene opened this pull request 2 months ago
Experiment with student model parameters

gregtatum opened this issue 2 months ago
Student training continuation is regressed

gregtatum opened this issue 2 months ago
Disable bilceaner hard rules completely

eu9ene opened this pull request 3 months ago
[meta] Retrain older models

eu9ene opened this issue 3 months ago
Fine-tune students with 8-bit

ZJaume opened this issue 3 months ago
Consider adding NTREX-128 for evaluation

ZJaume opened this issue 3 months ago
Disable use of `bicleaner-hardrules`

ZJaume opened this issue 3 months ago
Vocabulary construction

ZJaume opened this issue 3 months ago
Reduce monolingual data experiment

gregtatum opened this pull request 3 months ago
Use HPLT 2.0

eu9ene opened this issue 3 months ago
Kick off training from the command line

gregtatum opened this pull request 3 months ago
Use our localization data for training

marco-c opened this issue 3 months ago
Add student base configuration option

eu9ene opened this pull request 3 months ago
Consider harvesting short sentences from parallel data

gregtatum opened this issue 3 months ago
Don't translate idioms literally

zcorpan opened this issue 3 months ago
Models are missing in group logs

eu9ene opened this issue 3 months ago
Migrate Taskcluster UI tools to this repo

eu9ene opened this issue 3 months ago
Add Inference Tasks to CI

nordzilla opened this pull request 3 months ago