Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/mozilla/translations
The code, training pipeline, and models that power Firefox Translations
https://github.com/mozilla/translations
Add pyright coverage for tests/**/*
gregtatum opened this pull request 3 days ago
gregtatum opened this pull request 3 days ago
Combine poetry's utils and test groups
gregtatum opened this issue 16 days ago
gregtatum opened this issue 16 days ago
Train a real smaller teacher to be used in CTranslate2
gregtatum opened this issue 16 days ago
gregtatum opened this issue 16 days ago
Train a small teacher
gregtatum opened this pull request 16 days ago
gregtatum opened this pull request 16 days ago
Remove expired-after
eu9ene opened this pull request 17 days ago
eu9ene opened this pull request 17 days ago
Update dependencies to upstream OpusTrainer when released
eu9ene opened this issue 17 days ago
eu9ene opened this issue 17 days ago
feat: add cron task that runs the minimal training pipeline nightly
bhearsum opened this pull request 18 days ago
bhearsum opened this pull request 18 days ago
Investigate translating mixed Simplified/Traditional script
eu9ene opened this issue 18 days ago
eu9ene opened this issue 18 days ago
Error on resuming training after preemption
eu9ene opened this issue 18 days ago
eu9ene opened this issue 18 days ago
Do not output empty alignments
eu9ene opened this pull request 18 days ago
eu9ene opened this pull request 18 days ago
Rename "all" to "all-pipeline"
gregtatum opened this pull request 19 days ago
gregtatum opened this pull request 19 days ago
Errors in Marian logs parser
eu9ene opened this issue 19 days ago
eu9ene opened this issue 19 days ago
Rollback target renaming
eu9ene opened this pull request 19 days ago
eu9ene opened this pull request 19 days ago
[Experiment] CJK with ICU segmenter
eu9ene opened this pull request 19 days ago
eu9ene opened this pull request 19 days ago
Fix Space Omission for Korean Translations
nordzilla opened this pull request 22 days ago
nordzilla opened this pull request 22 days ago
Investigate marian-decoder memory usage
gregtatum opened this issue 23 days ago
gregtatum opened this issue 23 days ago
Fix space omission when translating into CJK languages
nordzilla opened this pull request 24 days ago
nordzilla opened this pull request 24 days ago
Add a util for local remote settings
gregtatum opened this pull request 25 days ago
gregtatum opened this pull request 25 days ago
fix: use base repo name for project in pull requests
bhearsum opened this pull request 30 days ago
bhearsum opened this pull request 30 days ago
fix: set permission for train action
bhearsum opened this pull request about 1 month ago
bhearsum opened this pull request about 1 month ago
Add pyright to check python types
gregtatum opened this pull request about 1 month ago
gregtatum opened this pull request about 1 month ago
Add a warning when train- is the target stage rather than evaluate-
gregtatum opened this pull request about 1 month ago
gregtatum opened this pull request about 1 month ago
Investigate better ways to determine which language scripts require or omit sentence-separating whitespace
nordzilla opened this issue about 1 month ago
nordzilla opened this issue about 1 month ago
feat: upgrade cpu workers to ubuntu 24.04 generic-worker image
bhearsum opened this pull request about 1 month ago
bhearsum opened this pull request about 1 month ago
scrape and upload 2024 training artifacts
bhearsum opened this issue about 1 month ago
bhearsum opened this issue about 1 month ago
Add Mozilla's clang-format rules to CI
nordzilla opened this issue about 1 month ago
nordzilla opened this issue about 1 month ago
Use Snapshot Testing for WASM Inference Tests
nordzilla opened this issue about 1 month ago
nordzilla opened this issue about 1 month ago
Use `Intl.Segmenter` instead of `ssplit` for segmentation in WASM builds
nordzilla opened this pull request about 1 month ago
nordzilla opened this pull request about 1 month ago
dataset-hplt-mono_v1_2-zh failed due to a too large fluency score in the config
eu9ene opened this issue about 1 month ago
eu9ene opened this issue about 1 month ago
Japanese is missing in OpusCleaner
eu9ene opened this issue about 1 month ago
eu9ene opened this issue about 1 month ago
MTData fails to unpack some datasets
eu9ene opened this issue about 1 month ago
eu9ene opened this issue about 1 month ago
Autogenerated config doesn't work
eu9ene opened this issue about 1 month ago
eu9ene opened this issue about 1 month ago
Corpora exclusion rules
ZJaume opened this issue about 1 month ago
ZJaume opened this issue about 1 month ago
Switch to ICU tokenizer
eu9ene opened this pull request about 1 month ago
eu9ene opened this pull request about 1 month ago
Cjk corpora fixes
ZJaume opened this pull request about 1 month ago
ZJaume opened this pull request about 1 month ago
[skip ci] Fix typo in the rebuild docker-images/toolchains docs
gabrielBusta opened this pull request about 2 months ago
gabrielBusta opened this pull request about 2 months ago
Adjust default values for batching
gregtatum opened this pull request about 2 months ago
gregtatum opened this pull request about 2 months ago
decoding-teacher config property is not being used in translate.sh or translate-nbest.sh
gregtatum opened this issue about 2 months ago
gregtatum opened this issue about 2 months ago
Linters needs to ignore node_modules
gregtatum opened this issue about 2 months ago
gregtatum opened this issue about 2 months ago
Experiment with distillation data inference
gregtatum opened this issue about 2 months ago
gregtatum opened this issue about 2 months ago
Add a tsconfig.json file for JS code within this repository
nordzilla opened this issue about 2 months ago
nordzilla opened this issue about 2 months ago
Use PyMarian for COMET evaluations
marco-c opened this issue about 2 months ago
marco-c opened this issue about 2 months ago
Single-side deduplication
ZJaume opened this issue about 2 months ago
ZJaume opened this issue about 2 months ago
Test WASM Translations in CI
nordzilla opened this pull request about 2 months ago
nordzilla opened this pull request about 2 months ago
Ctranslate2 ci 2
gregtatum opened this pull request about 2 months ago
gregtatum opened this pull request about 2 months ago
CI Run check
gregtatum opened this pull request about 2 months ago
gregtatum opened this pull request about 2 months ago
Create an `analyze-datasets` step in the pipeline
gregtatum opened this issue about 2 months ago
gregtatum opened this issue about 2 months ago
Investigate merging document sentences in HPLT
eu9ene opened this issue about 2 months ago
eu9ene opened this issue about 2 months ago
Rewrite the train scripts and add config support for ctranslate2
gregtatum opened this pull request about 2 months ago
gregtatum opened this pull request about 2 months ago
Make `npm` available to `local` and `inference` docker images
nordzilla opened this pull request 2 months ago
nordzilla opened this pull request 2 months ago
Setup WASM test infrastructure for CI
nordzilla opened this pull request 2 months ago
nordzilla opened this pull request 2 months ago
Add --run-as-user flag to docker-run.py
nordzilla opened this pull request 2 months ago
nordzilla opened this pull request 2 months ago
Add emsdk as a git submodule
nordzilla opened this pull request 2 months ago
nordzilla opened this pull request 2 months ago
Add better support for reporting training continuation values
gregtatum opened this pull request 2 months ago
gregtatum opened this pull request 2 months ago
Rename docker tags following repository rename
nordzilla opened this pull request 2 months ago
nordzilla opened this pull request 2 months ago
Reduce monolingual data for en-lt to investigate distillation performance
gregtatum opened this issue 2 months ago
gregtatum opened this issue 2 months ago
Rename repo
gregtatum opened this pull request 2 months ago
gregtatum opened this pull request 2 months ago
Allow for split vocabs
gregtatum opened this issue 2 months ago
gregtatum opened this issue 2 months ago
[meta] Kick off a 2024-H2 training run
gregtatum opened this issue 2 months ago
gregtatum opened this issue 2 months ago
Do not use WMTNews as training!
ZJaume opened this issue 2 months ago
ZJaume opened this issue 2 months ago
More corpora specific fixes
ZJaume opened this issue 2 months ago
ZJaume opened this issue 2 months ago
Fix shortlist pruning for CJK
eu9ene opened this pull request 2 months ago
eu9ene opened this pull request 2 months ago
Switch bestbleu to chrF
eu9ene opened this pull request 2 months ago
eu9ene opened this pull request 2 months ago
Use GCP standard instances for alignment tasks
eu9ene opened this pull request 2 months ago
eu9ene opened this pull request 2 months ago
Configure vocab for CJK
eu9ene opened this pull request 2 months ago
eu9ene opened this pull request 2 months ago
Limit the amount of data used for distillation
gregtatum opened this issue 2 months ago
gregtatum opened this issue 2 months ago
Update training to support CJK
eu9ene opened this pull request 2 months ago
eu9ene opened this pull request 2 months ago
Check if issues with short sentences were caused by bicleaner hard rules
eu9ene opened this issue 2 months ago
eu9ene opened this issue 2 months ago
Rework wasm build scripts for gecko
nordzilla opened this pull request 2 months ago
nordzilla opened this pull request 2 months ago
Remove max_words filtering from data importers
eu9ene opened this pull request 2 months ago
eu9ene opened this pull request 2 months ago
Adjust data cleaning for CJK
eu9ene opened this pull request 2 months ago
eu9ene opened this pull request 2 months ago
Investigate word-based filtering for CJK
eu9ene opened this issue 2 months ago
eu9ene opened this issue 2 months ago
Run dhat or similar memory tools on a native built version of the the browsermt marian-dev fork
gregtatum opened this issue 2 months ago
gregtatum opened this issue 2 months ago
Update data importer to support CJK
eu9ene opened this pull request 2 months ago
eu9ene opened this pull request 2 months ago
Add support for Chinese Traditional
eu9ene opened this issue 2 months ago
eu9ene opened this issue 2 months ago
Fix taskcluster train scripts
eu9ene opened this pull request 2 months ago
eu9ene opened this pull request 2 months ago
Experiment with student model parameters
gregtatum opened this issue 2 months ago
gregtatum opened this issue 2 months ago
Student training continuation is regressed
gregtatum opened this issue 2 months ago
gregtatum opened this issue 2 months ago
Disable bilceaner hard rules completely
eu9ene opened this pull request 3 months ago
eu9ene opened this pull request 3 months ago
[meta] Retrain older models
eu9ene opened this issue 3 months ago
eu9ene opened this issue 3 months ago
Fine-tune students with 8-bit
ZJaume opened this issue 3 months ago
ZJaume opened this issue 3 months ago
Consider adding NTREX-128 for evaluation
ZJaume opened this issue 3 months ago
ZJaume opened this issue 3 months ago
Disable use of `bicleaner-hardrules`
ZJaume opened this issue 3 months ago
ZJaume opened this issue 3 months ago
Vocabulary construction
ZJaume opened this issue 3 months ago
ZJaume opened this issue 3 months ago
Reduce monolingual data experiment
gregtatum opened this pull request 3 months ago
gregtatum opened this pull request 3 months ago
Compute the standard deviation of COMET scores for training student models
gregtatum opened this issue 3 months ago
gregtatum opened this issue 3 months ago
Use HPLT 2.0
eu9ene opened this issue 3 months ago
eu9ene opened this issue 3 months ago
Kick off training from the command line
gregtatum opened this pull request 3 months ago
gregtatum opened this pull request 3 months ago
Use our localization data for training
marco-c opened this issue 3 months ago
marco-c opened this issue 3 months ago
Add student base configuration option
eu9ene opened this pull request 3 months ago
eu9ene opened this pull request 3 months ago
Consider statistically translating short sentences from monolingual datasets.
gregtatum opened this issue 3 months ago
gregtatum opened this issue 3 months ago
Consider harvesting short sentences from parallel data
gregtatum opened this issue 3 months ago
gregtatum opened this issue 3 months ago
Consider using data augmentation to synthesize one word translations
gregtatum opened this issue 3 months ago
gregtatum opened this issue 3 months ago
temp: switch to temporary gpu worker image that issues dnsmasq to sanity check it
bhearsum opened this pull request 3 months ago
bhearsum opened this pull request 3 months ago
Don't translate idioms literally
zcorpan opened this issue 3 months ago
zcorpan opened this issue 3 months ago
Tracking does not supports override a run: wandb [409] run was previously created and deleted
vrigal opened this issue 3 months ago
vrigal opened this issue 3 months ago
Models are missing in group logs
eu9ene opened this issue 3 months ago
eu9ene opened this issue 3 months ago
Oflline uploader does not follow correct structure for config in group logs
eu9ene opened this issue 3 months ago
eu9ene opened this issue 3 months ago
Migrate Taskcluster UI tools to this repo
eu9ene opened this issue 3 months ago
eu9ene opened this issue 3 months ago
Add Inference Tasks to CI
nordzilla opened this pull request 3 months ago
nordzilla opened this pull request 3 months ago