Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/FedML-AI/FedML
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
https://github.com/FedML-AI/FedML
Alexleung/dev v070
fedml-alex opened this pull request 9 months ago
fedml-alex opened this pull request 9 months ago
Dev/v0.7.0
fedml-alex opened this pull request 9 months ago
fedml-alex opened this pull request 9 months ago
[Deploy] Set setting to "DEPLOYED" when no action needed.
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
[Deploy] Refine Autoscaling Algorithm
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
Refine metrics collection
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
Refine metrics collection
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
Update remote_storage.py
zhouLion opened this pull request 9 months ago
zhouLion opened this pull request 9 months ago
Update Launch Job Docker Image name
alaydshah opened this pull request 9 months ago
alaydshah opened this pull request 9 months ago
Hotfix to make detect status logging less noisy
alaydshah opened this pull request 9 months ago
alaydshah opened this pull request 9 months ago
[Deploy] Refine Database Readability; Format Code.
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
Logging Timestamp with nanosecond granularity
alaydshah opened this pull request 9 months ago
alaydshah opened this pull request 9 months ago
Rearranging checking conditions of the autoscaler's prediction operations
fedml-dimitris opened this pull request 9 months ago
fedml-dimitris opened this pull request 9 months ago
[Deploy] Support fail rollback for scale out.
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
Fix logging + minor bugs
alaydshah opened this pull request 9 months ago
alaydshah opened this pull request 9 months ago
Changing the help option displayed for adding user metadata to storag…
bhargav191098 opened this pull request 9 months ago
bhargav191098 opened this pull request 9 months ago
Autoscaler hotfix
fedml-dimitris opened this pull request 9 months ago
fedml-dimitris opened this pull request 9 months ago
[Deploy] Fix Proxy inference; Simplify Logs
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
[Deploy] typo: modify end point to endpoint
ASCE1885 opened this pull request 9 months ago
ASCE1885 opened this pull request 9 months ago
[Deploy] Support autoscaling
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
Refactored Logging Test
alaydshah opened this pull request 9 months ago
alaydshah opened this pull request 9 months ago
MQTT Refactoring
alaydshah opened this pull request 9 months ago
alaydshah opened this pull request 9 months ago
[Deploy] Fix endless rollback issue when multiple replica.
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
Sync dev to alex-branch-latest-swap
fedml-alex opened this pull request 9 months ago
fedml-alex opened this pull request 9 months ago
Alexleung/dev branch online
fedml-alex opened this pull request 9 months ago
fedml-alex opened this pull request 9 months ago
Alexleung/dev branch online
fedml-alex opened this pull request 9 months ago
fedml-alex opened this pull request 9 months ago
sync dev to my branch
fedml-alex opened this pull request 9 months ago
fedml-alex opened this pull request 9 months ago
[DevOps] update devops files.
fedml-alex opened this pull request 9 months ago
fedml-alex opened this pull request 9 months ago
Pr update fail rollback
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
[Deploy] Hotfix crosstalk issue
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
Test/v0.7.0
fedml-alex opened this pull request 9 months ago
fedml-alex opened this pull request 9 months ago
Dev/v0.7.0
fedml-alex opened this pull request 9 months ago
fedml-alex opened this pull request 9 months ago
add the option to disable http inference when the firewall is enabled in the machines.
fedml-alex opened this pull request 9 months ago
fedml-alex opened this pull request 9 months ago
[Deploy] Support update fail-rollback.
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
Logging timestamp with millisecond granularity
alaydshah opened this pull request 9 months ago
alaydshah opened this pull request 9 months ago
Unified Log Prefix for Logs Inside Inference Container
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
Raphael/refactor inf runtime logging
Raphael-Jin opened this pull request 9 months ago
Raphael-Jin opened this pull request 9 months ago
Minor fixes + Test
alaydshah opened this pull request 10 months ago
alaydshah opened this pull request 10 months ago
Rotating Upload Fix Initialization
alaydshah opened this pull request 10 months ago
alaydshah opened this pull request 10 months ago
possible bug in python/fedml/core/distributed/communication/trpc/utils.py
bene-ges opened this issue 10 months ago
bene-ges opened this issue 10 months ago
Alexleung/dev branch latest sync
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
Alexleung/dev branch latest sync
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
Dev/v0.7.0
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
[WIP] [Deploy] Refactor Logging System for deploy
Raphael-Jin opened this pull request 10 months ago
Raphael-Jin opened this pull request 10 months ago
typo "salve" instead of "slave" in identifiers
bene-ges opened this issue 10 months ago
bene-ges opened this issue 10 months ago
Sync the workflow to dev
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
Update Launch Driver Example
alaydshah opened this pull request 10 months ago
alaydshah opened this pull request 10 months ago
Refactor Logging + Fix Rotating Log Upload Bug
alaydshah opened this pull request 10 months ago
alaydshah opened this pull request 10 months ago
Which communication protocol and serialization method is supported?
Rene36 opened this issue 10 months ago
Rene36 opened this issue 10 months ago
Dev/v0.7.0
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
Scheduler Logging Nits
alaydshah opened this pull request 10 months ago
alaydshah opened this pull request 10 months ago
Enhance Workflow
alaydshah opened this pull request 10 months ago
alaydshah opened this pull request 10 months ago
Fix Race Conditions
Raphael-Jin opened this pull request 10 months ago
Raphael-Jin opened this pull request 10 months ago
[CoreEngine] Use the original url to download packages.
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
Dev/v0.7.0
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
Fail loudly and terminate if version upgrade fails
alaydshah opened this pull request 10 months ago
alaydshah opened this pull request 10 months ago
[Deploy] Expand topic to avoid MQTT crosstalk. Deprecated status topic; Clean Logs.
Raphael-Jin opened this pull request 10 months ago
Raphael-Jin opened this pull request 10 months ago
[CoreEngine] replace the direct function call with posting launching …
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
Test/v0.7.0
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
[CoreEngine] close the ota.
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
Test/v0.7.0
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
Dev/v0.7.0
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
[Deploy] Add replica_no column if missing.
Raphael-Jin opened this pull request 10 months ago
Raphael-Jin opened this pull request 10 months ago
[Deploy] Add replica_no column if missing.
Raphael-Jin opened this pull request 10 months ago
Raphael-Jin opened this pull request 10 months ago
[DevOps] update devops files.
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
Dev/v0.7.0
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
Refactor Replica Logic
Raphael-Jin opened this pull request 10 months ago
Raphael-Jin opened this pull request 10 months ago
Support Replica Logic
Raphael-Jin opened this pull request 10 months ago
Raphael-Jin opened this pull request 10 months ago
[CoreEngine] update the subscribed topics for reporting device info t…
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
[CoreEngine] update the subscribed topics for reporting device info t…
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
[CoreEngine] subscribe topics for reporting device info to mlops.
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
Dev/v0.7.0
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
Rookie question
Salterwater23 opened this issue 10 months ago
Salterwater23 opened this issue 10 months ago
Fix generating environment variables from job configurations for multi-level nesting
alaydshah opened this pull request 10 months ago
alaydshah opened this pull request 10 months ago
Enhance Build Packaging
alaydshah opened this pull request 10 months ago
alaydshah opened this pull request 10 months ago
[unitedllm, train.llm] migrate & integrate UnitedLLM
fedml-zijianhu opened this pull request 10 months ago
fedml-zijianhu opened this pull request 10 months ago
sync `fedml.train.llm`: bugfix falcon issue, update flash attention integrations
fedml-zijianhu opened this pull request 10 months ago
fedml-zijianhu opened this pull request 10 months ago
[CoreEngine] 1. fix the issue which the gpu id is not released into t…
fedml-alex opened this pull request 10 months ago
fedml-alex opened this pull request 10 months ago
avoid remove the endpoint when cannot match GPU resource
fedml-alex opened this pull request 11 months ago
fedml-alex opened this pull request 11 months ago
Dev/v0.7.0
fedml-alex opened this pull request 11 months ago
fedml-alex opened this pull request 11 months ago
add the workflow with connected inputs and ouputs.
fedml-alex opened this pull request 11 months ago
fedml-alex opened this pull request 11 months ago
[CoreEngine] add the workflow with connected inputs and ouputs.
fedml-alex opened this pull request 11 months ago
fedml-alex opened this pull request 11 months ago
[Deploy] Download template from s3; Using home sign.
Raphael-Jin opened this pull request 11 months ago
Raphael-Jin opened this pull request 11 months ago
Make run cleanup idempotent
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
Fix occupy_gpu_ids race condition
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
Fix
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
Alaydshah/fix/race condition
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
Dev/v0.7.0
chaoyanghe opened this pull request 11 months ago
chaoyanghe opened this pull request 11 months ago
Fix job processor logs
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
Share trace id between api calls
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
log_file_dir arg not work
flylzj opened this issue 11 months ago
flylzj opened this issue 11 months ago
Update trim unavailable gpu id logic
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
Remove sys and process utils logs
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
Bump test to 27a1
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
Correct dev version
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
Bump Dev
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
Merge Test to Prod
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
Merge Dev To Test
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
Fix: Related to Serializable Issue
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago
Make setting log levels more straightforward
alaydshah opened this pull request 11 months ago
alaydshah opened this pull request 11 months ago