Compare commits

...

8 Commits

Author SHA1 Message Date
Goku Mohandas
3361aeb8dd release: trigger release prior to agentic workflow updates 2026-03-04 15:44:17 -08:00
Goku Mohandas
66e8e43711 Merge pull request #248 from GokuMohandas/dev
updated cluster env and local catch for efs
2023-12-07 11:37:37 -08:00
GokuMohandas
be3d2732a8 updated cluser compute, environment and workloads 2023-12-07 11:36:49 -08:00
GokuMohandas
11ab35b251 added local catch for EFS OS/permissions issues 2023-12-07 11:34:26 -08:00
Goku Mohandas
540754392a Merge pull request #241 from GokuMohandas/dev
updating to Ray 2.7
2023-09-19 13:56:37 -07:00
GokuMohandas
f682fbffa4 updating to Ray 2.7 2023-09-19 13:56:16 -07:00
GokuMohandas
14671a438a updated project id for workloads 2023-09-18 22:08:25 -07:00
GokuMohandas
b98bd5b1ae updated to Ray 2.7 2023-09-18 22:03:20 -07:00
25 changed files with 3483 additions and 2031 deletions

1
.gitignore vendored
View File

@@ -4,6 +4,7 @@ stores/
mlflow/
results/
workspaces/
efs/
# VSCode
.vscode/

View File

@@ -2,7 +2,7 @@
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer

View File

@@ -12,6 +12,7 @@ style:
# Cleaning
.PHONY: clean
clean: style
python notebooks/clear_cell_nums.py
find . -type f -name "*.DS_Store" -ls -delete
find . | grep -E "(__pycache__|\.pyc|\.pyo)" | xargs rm -rf
find . | grep -E ".pytest_cache" | xargs rm -rf

View File

@@ -83,7 +83,7 @@ We'll start by setting up our cluster with the environment and compute configura
- Project: `madewithml`
- Cluster environment name: `madewithml-cluster-env`
# Toggle `Select from saved configurations`
- Compute config: `madewithml-cluster-compute`
- Compute config: `madewithml-cluster-compute-g5.4xlarge`
```
> Alternatively, we can use the [CLI](https://docs.anyscale.com/reference/anyscale-cli) to create the workspace via `anyscale workspace create ...`
@@ -101,17 +101,6 @@ We'll start by setting up our cluster with the environment and compute configura
</details>
### Credentials
```bash
touch .env
```
```bash
# Inside .env
GITHUB_USERNAME="CHANGE_THIS_TO_YOUR_USERNAME" # ← CHANGE THIS
```bash
source .env
```
### Git setup
Create a repository by following these instructions: [Create a new repository](https://github.com/new) → name it `Made-With-ML` → Toggle `Add a README file` (**very important** as this creates a `main` branch) → Click `Create repository` (scroll down)
@@ -120,8 +109,19 @@ Now we're ready to clone the repository that has all of our code:
```bash
git clone https://github.com/GokuMohandas/Made-With-ML.git .
git remote set-url origin https://github.com/$GITHUB_USERNAME/Made-With-ML.git # <-- CHANGE THIS to your username
git checkout -b dev
```
### Credentials
```bash
touch .env
```
```bash
# Inside .env
GITHUB_USERNAME="CHANGE_THIS_TO_YOUR_USERNAME" # ← CHANGE THIS
```
```bash
source .env
```
### Virtual environment
@@ -289,7 +289,6 @@ python madewithml/evaluate.py \
### Inference
```bash
# Get run ID
export EXPERIMENT_NAME="llm"
export RUN_ID=$(python madewithml/predict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)
python madewithml/predict.py predict \
@@ -387,7 +386,8 @@ export RUN_ID=$(python madewithml/predict.py get-best-run-id --experiment-name $
pytest --run-id=$RUN_ID tests/model --verbose --disable-warnings
# Coverage
python3 -m pytest tests/code --cov madewithml --cov-report html --disable-warnings
python3 -m pytest tests/code --cov madewithml --cov-report html --disable-warnings # html report
python3 -m pytest tests/code --cov madewithml --cov-report term --disable-warnings # terminal report
```
## Production
@@ -423,7 +423,7 @@ anyscale cluster-env build deploy/cluster_env.yaml --name $CLUSTER_ENV_NAME
The compute configuration determines **what** resources our workloads will be executes on. We've already created this [compute configuration](./deploy/cluster_compute.yaml) for us but this is how we can create it ourselves.
```bash
export CLUSTER_COMPUTE_NAME="madewithml-cluster-compute"
export CLUSTER_COMPUTE_NAME="madewithml-cluster-compute-g5.4xlarge"
anyscale cluster-compute create deploy/cluster_compute.yaml --name $CLUSTER_COMPUTE_NAME
```
@@ -485,17 +485,23 @@ We're not going to manually deploy our application every time we make a change.
<img src="https://madewithml.com/static/images/mlops/cicd/cicd.png">
</div>
1. We'll start by adding the necessary credentials to the [`/settings/secrets/actions`](https://github.com/GokuMohandas/Made-With-ML/settings/secrets/actions) page of our GitHub repository.
1. Create a new github branch to save our changes to and execute CI/CD workloads:
```bash
git remote set-url origin https://github.com/$GITHUB_USERNAME/Made-With-ML.git # <-- CHANGE THIS to your username
git checkout -b dev
```
2. We'll start by adding the necessary credentials to the [`/settings/secrets/actions`](https://github.com/GokuMohandas/Made-With-ML/settings/secrets/actions) page of our GitHub repository.
``` bash
export ANYSCALE_HOST=https://console.anyscale.com
export ANYSCALE_CLI_TOKEN=$YOUR_CLI_TOKEN # retrieved from https://console.anyscale.com/o/madewithml/credentials
```
2. Now we can make changes to our code (not on `main` branch) and push them to GitHub. But in order to push our code to GitHub, we'll need to first authenticate with our credentials before pushing to our repository:
3. Now we can make changes to our code (not on `main` branch) and push them to GitHub. But in order to push our code to GitHub, we'll need to first authenticate with our credentials before pushing to our repository:
```bash
git config --global user.name "Your Name" # <-- CHANGE THIS to your name
git config --global user.name $GITHUB_USERNAME # <-- CHANGE THIS to your username
git config --global user.email you@example.com # <-- CHANGE THIS to your email
git add .
git commit -m "" # <-- CHANGE THIS to your message
@@ -504,13 +510,13 @@ git push origin dev
Now you will be prompted to enter your username and password (personal access token). Follow these steps to get personal access token: [New GitHub personal access token](https://github.com/settings/tokens/new) → Add a name → Toggle `repo` and `workflow` → Click `Generate token` (scroll down) → Copy the token and paste it when prompted for your password.
3. Now we can start a PR from this branch to our `main` branch and this will trigger the [workloads workflow](/.github/workflows/workloads.yaml). If the workflow (Anyscale Jobs) succeeds, this will produce comments with the training and evaluation results directly on the PR.
4. Now we can start a PR from this branch to our `main` branch and this will trigger the [workloads workflow](/.github/workflows/workloads.yaml). If the workflow (Anyscale Jobs) succeeds, this will produce comments with the training and evaluation results directly on the PR.
<div align="center">
<img src="https://madewithml.com/static/images/mlops/cicd/comments.png">
</div>
4. If we like the results, we can merge the PR into the `main` branch. This will trigger the [serve workflow](/.github/workflows/serve.yaml) which will rollout our new service to production!
5. If we like the results, we can merge the PR into the `main` branch. This will trigger the [serve workflow](/.github/workflows/serve.yaml) which will rollout our new service to production!
### Continual learning

View File

@@ -1,12 +1,12 @@
cloud: madewithml-us-east-2
region: us-east2
cloud: education-us-west-2
region: us-west-2
head_node_type:
name: head_node_type
instance_type: m5.2xlarge # 8 CPU, 0 GPU, 32 GB RAM
instance_type: g5.4xlarge
worker_node_types:
- name: gpu_worker
instance_type: g4dn.xlarge # 4 CPU, 1 GPU, 16 GB RAM
min_workers: 0
instance_type: g5.4xlarge
min_workers: 1
max_workers: 1
use_spot: False
aws:

View File

@@ -1,4 +1,4 @@
base_image: anyscale/ray:2.6.0-py310-cu118
base_image: anyscale/ray:2.7.0optimized-py310-cu118
env_vars: {}
debian_packages:
- curl

View File

@@ -1,6 +1,5 @@
#!/bin/bash
export PYTHONPATH=$PYTHONPATH:$PWD
export RAY_AIR_REENABLE_DEPRECATED_SYNC_TO_HEAD_NODE=1
mkdir results
# Test data

View File

@@ -1,5 +1,5 @@
name: workloads
project_id: prj_v9izs5t1d6b512ism8c5rkq4wm
project_id: prj_wn6el5cu9dqwktk6t4cv54x8zh
cluster_env: madewithml-cluster-env
compute_config: madewithml-cluster-compute
runtime_env:

View File

@@ -1,5 +1,5 @@
name: madewithml
project_id: prj_v9izs5t1d6b512ism8c5rkq4wm
project_id: prj_wn6el5cu9dqwktk6t4cv54x8zh
cluster_env: madewithml-cluster-env
compute_config: madewithml-cluster-compute
ray_serve_config:

View File

@@ -5,13 +5,17 @@ import sys
from pathlib import Path
import mlflow
import pretty_errors # NOQA: F401 (imported but unused)
# Directories
ROOT_DIR = Path(__file__).parent.parent.absolute()
LOGS_DIR = Path(ROOT_DIR, "logs")
LOGS_DIR.mkdir(parents=True, exist_ok=True)
EFS_DIR = Path(f"/efs/shared_storage/madewithml/{os.environ.get('GITHUB_USERNAME', '')}")
try:
Path(EFS_DIR).mkdir(parents=True, exist_ok=True)
except OSError:
EFS_DIR = Path(ROOT_DIR, "efs")
Path(EFS_DIR).mkdir(parents=True, exist_ok=True)
# Config MLflow
MODEL_REGISTRY = Path(f"{EFS_DIR}/mlflow")

View File

@@ -5,7 +5,6 @@ import numpy as np
import pandas as pd
import ray
from ray.data import Dataset
from ray.data.preprocessor import Preprocessor
from sklearn.model_selection import train_test_split
from transformers import BertTokenizer
@@ -135,13 +134,18 @@ def preprocess(df: pd.DataFrame, class_to_index: Dict) -> Dict:
return outputs
class CustomPreprocessor(Preprocessor):
class CustomPreprocessor:
"""Custom preprocessor class."""
def _fit(self, ds):
def __init__(self, class_to_index={}):
self.class_to_index = class_to_index or {} # mutable defaults
self.index_to_class = {v: k for k, v in self.class_to_index.items()}
def fit(self, ds):
tags = ds.unique(column="tag")
self.class_to_index = {tag: i for i, tag in enumerate(tags)}
self.index_to_class = {v: k for k, v in self.class_to_index.items()}
return self
def _transform_pandas(self, batch): # could also do _transform_numpy
return preprocess(batch, class_to_index=self.class_to_index)
def transform(self, ds):
return ds.map_batches(preprocess, fn_kwargs={"class_to_index": self.class_to_index}, batch_format="pandas")

View File

@@ -8,13 +8,13 @@ import ray
import ray.train.torch # NOQA: F401 (imported but unused)
import typer
from ray.data import Dataset
from ray.train.torch.torch_predictor import TorchPredictor
from sklearn.metrics import precision_recall_fscore_support
from snorkel.slicing import PandasSFApplier, slicing_function
from typing_extensions import Annotated
from madewithml import predict, utils
from madewithml.config import logger
from madewithml.predict import TorchPredictor
# Initialize Typer CLI app
app = typer.Typer()
@@ -133,8 +133,8 @@ def evaluate(
y_true = np.stack([item["targets"] for item in values])
# y_pred
z = predictor.predict(data=ds.to_pandas())["predictions"]
y_pred = np.stack(z).argmax(1)
predictions = preprocessed_ds.map_batches(predictor).take_all()
y_pred = np.array([d["output"] for d in predictions])
# Metrics
metrics = {

View File

@@ -1,13 +1,20 @@
import json
import os
from pathlib import Path
import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import BertModel
class FinetunedLLM(nn.Module): # pragma: no cover, torch model
"""Model architecture for a Large Language Model (LLM) that we will fine-tune."""
class FinetunedLLM(nn.Module):
def __init__(self, llm, dropout_p, embedding_dim, num_classes):
super(FinetunedLLM, self).__init__()
self.llm = llm
self.dropout_p = dropout_p
self.embedding_dim = embedding_dim
self.num_classes = num_classes
self.dropout = torch.nn.Dropout(dropout_p)
self.fc1 = torch.nn.Linear(embedding_dim, num_classes)
@@ -17,3 +24,36 @@ class FinetunedLLM(nn.Module): # pragma: no cover, torch model
z = self.dropout(pool)
z = self.fc1(z)
return z
@torch.inference_mode()
def predict(self, batch):
self.eval()
z = self(batch)
y_pred = torch.argmax(z, dim=1).cpu().numpy()
return y_pred
@torch.inference_mode()
def predict_proba(self, batch):
self.eval()
z = self(batch)
y_probs = F.softmax(z, dim=1).cpu().numpy()
return y_probs
def save(self, dp):
with open(Path(dp, "args.json"), "w") as fp:
contents = {
"dropout_p": self.dropout_p,
"embedding_dim": self.embedding_dim,
"num_classes": self.num_classes,
}
json.dump(contents, fp, indent=4, sort_keys=False)
torch.save(self.state_dict(), os.path.join(dp, "model.pt"))
@classmethod
def load(cls, args_fp, state_dict_fp):
with open(args_fp, "r") as fp:
kwargs = json.load(fp=fp)
llm = BertModel.from_pretrained("allenai/scibert_scivocab_uncased", return_dict=False)
model = cls(llm=llm, **kwargs)
model.load_state_dict(torch.load(state_dict_fp, map_location=torch.device("cpu")))
return model

View File

@@ -1,19 +1,20 @@
import json
from pathlib import Path
from typing import Any, Dict, Iterable, List
from urllib.parse import urlparse
import numpy as np
import pandas as pd
import ray
import torch
import typer
from numpyencoder import NumpyEncoder
from ray.air import Result
from ray.train.torch import TorchPredictor
from ray.train.torch.torch_checkpoint import TorchCheckpoint
from typing_extensions import Annotated
from madewithml.config import logger, mlflow
from madewithml.data import CustomPreprocessor
from madewithml.models import FinetunedLLM
from madewithml.utils import collate_fn
# Initialize Typer CLI app
app = typer.Typer()
@@ -48,25 +49,51 @@ def format_prob(prob: Iterable, index_to_class: Dict) -> Dict:
return d
def predict_with_proba(
df: pd.DataFrame,
predictor: ray.train.torch.torch_predictor.TorchPredictor,
class TorchPredictor:
def __init__(self, preprocessor, model):
self.preprocessor = preprocessor
self.model = model
self.model.eval()
def __call__(self, batch):
results = self.model.predict(collate_fn(batch))
return {"output": results}
def predict_proba(self, batch):
results = self.model.predict_proba(collate_fn(batch))
return {"output": results}
def get_preprocessor(self):
return self.preprocessor
@classmethod
def from_checkpoint(cls, checkpoint):
metadata = checkpoint.get_metadata()
preprocessor = CustomPreprocessor(class_to_index=metadata["class_to_index"])
model = FinetunedLLM.load(Path(checkpoint.path, "args.json"), Path(checkpoint.path, "model.pt"))
return cls(preprocessor=preprocessor, model=model)
def predict_proba(
ds: ray.data.dataset.Dataset,
predictor: TorchPredictor,
) -> List: # pragma: no cover, tested with inference workload
"""Predict tags (with probabilities) for input data from a dataframe.
Args:
df (pd.DataFrame): dataframe with input features.
predictor (ray.train.torch.torch_predictor.TorchPredictor): loaded predictor from a checkpoint.
predictor (TorchPredictor): loaded predictor from a checkpoint.
Returns:
List: list of predicted labels.
"""
preprocessor = predictor.get_preprocessor()
z = predictor.predict(data=df)["predictions"]
y_prob = torch.tensor(np.stack(z)).softmax(dim=1).numpy()
preprocessed_ds = preprocessor.transform(ds)
outputs = preprocessed_ds.map_batches(predictor.predict_proba)
y_prob = np.array([d["output"] for d in outputs.take_all()])
results = []
for i, prob in enumerate(y_prob):
tag = decode([z[i].argmax()], preprocessor.index_to_class)[0]
tag = preprocessor.index_to_class[prob.argmax()]
results.append({"prediction": tag, "probabilities": format_prob(prob, preprocessor.index_to_class)})
return results
@@ -125,11 +152,10 @@ def predict(
# Load components
best_checkpoint = get_best_checkpoint(run_id=run_id)
predictor = TorchPredictor.from_checkpoint(best_checkpoint)
# preprocessor = predictor.get_preprocessor()
# Predict
sample_df = pd.DataFrame([{"title": title, "description": description, "tag": "other"}])
results = predict_with_proba(df=sample_df, predictor=predictor)
sample_ds = ray.data.from_items([{"title": title, "description": description, "tag": "other"}])
results = predict_proba(ds=sample_ds, predictor=predictor)
logger.info(json.dumps(results, cls=NumpyEncoder, indent=2))
return results

View File

@@ -3,11 +3,9 @@ import os
from http import HTTPStatus
from typing import Dict
import pandas as pd
import ray
from fastapi import FastAPI
from ray import serve
from ray.train.torch import TorchPredictor
from starlette.requests import Request
from madewithml import evaluate, predict
@@ -21,7 +19,7 @@ app = FastAPI(
)
@serve.deployment(route_prefix="/", num_replicas="1", ray_actor_options={"num_cpus": 8, "num_gpus": 0})
@serve.deployment(num_replicas="1", ray_actor_options={"num_cpus": 8, "num_gpus": 0})
@serve.ingress(app)
class ModelDeployment:
def __init__(self, run_id: str, threshold: int = 0.9):
@@ -30,8 +28,7 @@ class ModelDeployment:
self.threshold = threshold
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI) # so workers have access to model registry
best_checkpoint = predict.get_best_checkpoint(run_id=run_id)
self.predictor = TorchPredictor.from_checkpoint(best_checkpoint)
self.preprocessor = self.predictor.get_preprocessor()
self.predictor = predict.TorchPredictor.from_checkpoint(best_checkpoint)
@app.get("/")
def _index(self) -> Dict:
@@ -55,11 +52,10 @@ class ModelDeployment:
return {"results": results}
@app.post("/predict/")
async def _predict(self, request: Request) -> Dict:
# Get prediction
async def _predict(self, request: Request):
data = await request.json()
df = pd.DataFrame([{"title": data.get("title", ""), "description": data.get("description", ""), "tag": ""}])
results = predict.predict_with_proba(df=df, predictor=self.predictor)
sample_ds = ray.data.from_items([{"title": data.get("title", ""), "description": data.get("description", ""), "tag": ""}])
results = predict.predict_proba(ds=sample_ds, predictor=self.predictor)
# Apply custom logic
for i, result in enumerate(results):

View File

@@ -1,6 +1,7 @@
import datetime
import json
import os
import tempfile
from typing import Tuple
import numpy as np
@@ -10,21 +11,23 @@ import torch
import torch.nn as nn
import torch.nn.functional as F
import typer
from ray.air import session
from ray.air.config import (
from ray.air.integrations.mlflow import MLflowLoggerCallback
from ray.data import Dataset
from ray.train import (
Checkpoint,
CheckpointConfig,
DatasetConfig,
DataConfig,
RunConfig,
ScalingConfig,
)
from ray.air.integrations.mlflow import MLflowLoggerCallback
from ray.data import Dataset
from ray.train.torch import TorchCheckpoint, TorchTrainer
from ray.train.torch import TorchTrainer
from torch.nn.parallel.distributed import DistributedDataParallel
from transformers import BertModel
from typing_extensions import Annotated
from madewithml import data, models, utils
from madewithml import data, utils
from madewithml.config import EFS_DIR, MLFLOW_TRACKING_URI, logger
from madewithml.models import FinetunedLLM
# Initialize Typer CLI app
app = typer.Typer()
@@ -106,18 +109,18 @@ def train_loop_per_worker(config: dict) -> None: # pragma: no cover, tested via
lr = config["lr"]
lr_factor = config["lr_factor"]
lr_patience = config["lr_patience"]
batch_size = config["batch_size"]
num_epochs = config["num_epochs"]
batch_size = config["batch_size"]
num_classes = config["num_classes"]
# Get datasets
utils.set_seeds()
train_ds = session.get_dataset_shard("train")
val_ds = session.get_dataset_shard("val")
train_ds = train.get_dataset_shard("train")
val_ds = train.get_dataset_shard("val")
# Model
llm = BertModel.from_pretrained("allenai/scibert_scivocab_uncased", return_dict=False)
model = models.FinetunedLLM(llm=llm, dropout_p=dropout_p, embedding_dim=llm.config.hidden_size, num_classes=num_classes)
model = FinetunedLLM(llm=llm, dropout_p=dropout_p, embedding_dim=llm.config.hidden_size, num_classes=num_classes)
model = train.torch.prepare_model(model)
# Training components
@@ -126,7 +129,8 @@ def train_loop_per_worker(config: dict) -> None: # pragma: no cover, tested via
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode="min", factor=lr_factor, patience=lr_patience)
# Training
batch_size_per_worker = batch_size // session.get_world_size()
num_workers = train.get_context().get_world_size()
batch_size_per_worker = batch_size // num_workers
for epoch in range(num_epochs):
# Step
train_loss = train_step(train_ds, batch_size_per_worker, model, num_classes, loss_fn, optimizer)
@@ -134,9 +138,14 @@ def train_loop_per_worker(config: dict) -> None: # pragma: no cover, tested via
scheduler.step(val_loss)
# Checkpoint
metrics = dict(epoch=epoch, lr=optimizer.param_groups[0]["lr"], train_loss=train_loss, val_loss=val_loss)
checkpoint = TorchCheckpoint.from_model(model=model)
session.report(metrics, checkpoint=checkpoint)
with tempfile.TemporaryDirectory() as dp:
if isinstance(model, DistributedDataParallel): # cpu
model.module.save(dp=dp)
else:
model.save(dp=dp)
metrics = dict(epoch=epoch, lr=optimizer.param_groups[0]["lr"], train_loss=train_loss, val_loss=val_loss)
checkpoint = Checkpoint.from_directory(dp)
train.report(metrics, checkpoint=checkpoint)
@app.command()
@@ -183,7 +192,6 @@ def train_model(
num_workers=num_workers,
use_gpu=bool(gpu_per_worker),
resources_per_worker={"CPU": cpu_per_worker, "GPU": gpu_per_worker},
_max_cpu_fraction_per_node=0.8,
)
# Checkpoint config
@@ -201,7 +209,7 @@ def train_model(
)
# Run config
run_config = RunConfig(callbacks=[mlflow_callback], checkpoint_config=checkpoint_config, storage_path=EFS_DIR)
run_config = RunConfig(callbacks=[mlflow_callback], checkpoint_config=checkpoint_config, storage_path=EFS_DIR, local_dir=EFS_DIR)
# Dataset
ds = data.load_data(dataset_loc=dataset_loc, num_samples=train_loop_config["num_samples"])
@@ -210,14 +218,13 @@ def train_model(
train_loop_config["num_classes"] = len(tags)
# Dataset config
dataset_config = {
"train": DatasetConfig(fit=False, transform=False, randomize_block_order=False),
"val": DatasetConfig(fit=False, transform=False, randomize_block_order=False),
}
options = ray.data.ExecutionOptions(preserve_order=True)
dataset_config = DataConfig(datasets_to_split=["train"], execution_options=options)
# Preprocess
preprocessor = data.CustomPreprocessor()
train_ds = preprocessor.fit_transform(train_ds)
preprocessor = preprocessor.fit(train_ds)
train_ds = preprocessor.transform(train_ds)
val_ds = preprocessor.transform(val_ds)
train_ds = train_ds.materialize()
val_ds = val_ds.materialize()
@@ -230,7 +237,7 @@ def train_model(
run_config=run_config,
datasets={"train": train_ds, "val": val_ds},
dataset_config=dataset_config,
preprocessor=preprocessor,
metadata={"class_to_index": preprocessor.class_to_index},
)
# Train

View File

@@ -73,7 +73,6 @@ def tune_models(
num_workers=num_workers,
use_gpu=bool(gpu_per_worker),
resources_per_worker={"CPU": cpu_per_worker, "GPU": gpu_per_worker},
_max_cpu_fraction_per_node=0.8,
)
# Dataset
@@ -90,7 +89,8 @@ def tune_models(
# Preprocess
preprocessor = data.CustomPreprocessor()
train_ds = preprocessor.fit_transform(train_ds)
preprocessor = preprocessor.fit(train_ds)
train_ds = preprocessor.transform(train_ds)
val_ds = preprocessor.transform(val_ds)
train_ds = train_ds.materialize()
val_ds = val_ds.materialize()
@@ -102,7 +102,7 @@ def tune_models(
scaling_config=scaling_config,
datasets={"train": train_ds, "val": val_ds},
dataset_config=dataset_config,
preprocessor=preprocessor,
metadata={"class_to_index": preprocessor.class_to_index},
)
# Checkpoint configuration
@@ -118,7 +118,7 @@ def tune_models(
experiment_name=experiment_name,
save_artifact=True,
)
run_config = RunConfig(callbacks=[mlflow_callback], checkpoint_config=checkpoint_config, storage_path=EFS_DIR)
run_config = RunConfig(callbacks=[mlflow_callback], checkpoint_config=checkpoint_config, storage_path=EFS_DIR, local_dir=EFS_DIR)
# Hyperparameters to start with
initial_params = json.loads(initial_params)

View File

@@ -58,7 +58,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "e2c96931-d511-4c6e-b582-87d24455a11e",
"metadata": {
"tags": []
@@ -79,7 +79,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"id": "953a577e-3cd0-4c6b-81f9-8bc32850214d",
"metadata": {
"tags": []
@@ -101,7 +101,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"id": "1790e2f5-6b8b-425c-8842-a2b0ea8f3f07",
"metadata": {
"tags": []
@@ -113,7 +113,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"id": "6b9bfadb-ba49-4f5a-b216-4db14c8888ab",
"metadata": {
"tags": []
@@ -208,7 +208,7 @@
"4 A PyTorch Implementation of \"Watch Your Step: ... other "
]
},
"execution_count": 4,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@@ -222,7 +222,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"id": "aa5b95d5-d61e-48e4-9100-d9d2fc0d53fa",
"metadata": {
"tags": []
@@ -234,7 +234,7 @@
"['computer-vision', 'other', 'natural-language-processing', 'mlops']"
]
},
"execution_count": 5,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@@ -247,7 +247,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"id": "3c828129-8248-4e38-93a4-cabb097e7ba5",
"metadata": {
"tags": []
@@ -279,7 +279,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"id": "8e3c3f44-2c19-4c32-9bc5-e9a7a917d19d",
"metadata": {},
"outputs": [],
@@ -295,7 +295,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"id": "4950bdb4",
"metadata": {},
"outputs": [
@@ -337,7 +337,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"id": "b2aae14c-9870-4a27-b5ad-90f339686620",
"metadata": {
"tags": []
@@ -364,7 +364,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"id": "03ee23e5",
"metadata": {},
"outputs": [
@@ -401,7 +401,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"id": "71c43e8c",
"metadata": {},
"outputs": [
@@ -416,7 +416,7 @@
" 'description': 'A PyTorch implementation of \"Capsule Graph Neural Network\" (ICLR 2019).'}]"
]
},
"execution_count": 11,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@@ -429,7 +429,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"id": "c9359a91-ac19-48a4-babb-e65d53f39b42",
"metadata": {
"tags": []
@@ -462,7 +462,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"id": "5fac795e",
"metadata": {},
"outputs": [
@@ -486,7 +486,7 @@
"['other', 'computer-vision', 'computer-vision']"
]
},
"execution_count": 13,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@@ -507,7 +507,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"id": "e4cb38a8-44cb-4cea-828c-590f223d4063",
"metadata": {
"tags": []
@@ -543,7 +543,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"id": "de2d0416",
"metadata": {},
"outputs": [],
@@ -576,7 +576,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"id": "ff3c37fb",
"metadata": {},
"outputs": [],
@@ -618,7 +618,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"id": "972fee2f-86e2-445e-92d0-923f5690132a",
"metadata": {},
"outputs": [],
@@ -647,7 +647,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": null,
"id": "9ee4e745-ef56-4b76-8230-fcbe56ac46aa",
"metadata": {
"tags": []
@@ -663,7 +663,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": null,
"id": "73780054-afeb-4ce6-8255-51bf91f9f820",
"metadata": {
"tags": []
@@ -709,7 +709,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"id": "24af6d04-d29e-4adb-a289-4c34c2cc7ec8",
"metadata": {
"tags": []
@@ -780,7 +780,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": null,
"id": "e22ed1e1-b34d-43d1-ae8b-32b1fd5be53d",
"metadata": {
"tags": []
@@ -815,7 +815,7 @@
" 'tag': 'mlops'}]"
]
},
"execution_count": 22,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@@ -833,7 +833,7 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": null,
"id": "294548a5-9edf-4dea-ab8d-dc7464246810",
"metadata": {
"tags": []
@@ -864,7 +864,7 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": null,
"id": "29bca273-3ea8-4ce0-9fa9-fe19062b7c5b",
"metadata": {
"tags": []
@@ -917,7 +917,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": null,
"id": "3e59a3b9-69d9-4bb5-8b88-0569fcc72f0c",
"metadata": {
"tags": []
@@ -1001,7 +1001,7 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": null,
"id": "15ea136e",
"metadata": {},
"outputs": [],
@@ -1020,7 +1020,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": null,
"id": "ec0b498a-97c1-488c-a6b9-dc63a8a9df4d",
"metadata": {
"tags": []
@@ -1065,7 +1065,7 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": null,
"id": "4cc80311",
"metadata": {},
"outputs": [],
@@ -1080,7 +1080,7 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": null,
"id": "6771b1d2",
"metadata": {},
"outputs": [

View File

@@ -0,0 +1,23 @@
from pathlib import Path
import nbformat
def clear_execution_numbers(nb_path):
with open(nb_path, "r", encoding="utf-8") as f:
nb = nbformat.read(f, as_version=4)
for cell in nb["cells"]:
if cell["cell_type"] == "code":
cell["execution_count"] = None
for output in cell["outputs"]:
if "execution_count" in output:
output["execution_count"] = None
with open(nb_path, "w", encoding="utf-8") as f:
nbformat.write(nb, f)
if __name__ == "__main__":
NOTEBOOK_DIR = Path(__file__).parent
notebook_fps = list(NOTEBOOK_DIR.glob("**/*.ipynb"))
for fp in notebook_fps:
clear_execution_numbers(fp)

File diff suppressed because one or more lines are too long

View File

@@ -7,9 +7,8 @@ nltk==3.8.1
numpy==1.24.3
numpyencoder==0.3.0
pandas==2.0.1
pretty-errors==1.2.25
python-dotenv==1.0.0
ray[air]==2.6.0
ray[air]==2.7.0
scikit-learn==1.2.2
snorkel==0.9.9
SQLAlchemy==1.4.48

View File

@@ -54,5 +54,7 @@ def test_preprocess(df, class_to_index):
def test_fit_transform(dataset_loc, preprocessor):
ds = data.load_data(dataset_loc=dataset_loc)
preprocessor.fit_transform(ds)
preprocessor = preprocessor.fit(ds)
preprocessed_ds = preprocessor.transform(ds)
assert len(preprocessor.class_to_index) == 4
assert ds.count() == preprocessed_ds.count()

View File

@@ -4,6 +4,7 @@ from pathlib import Path
import numpy as np
import pytest
import torch
from ray.train.torch import get_device
from madewithml import utils
@@ -42,9 +43,9 @@ def test_collate_fn():
}
processed_batch = utils.collate_fn(batch)
expected_batch = {
"ids": torch.tensor([[1, 2, 0], [1, 2, 3]], dtype=torch.int32),
"masks": torch.tensor([[1, 1, 0], [1, 1, 1]], dtype=torch.int32),
"targets": torch.tensor([3, 1], dtype=torch.int64),
"ids": torch.as_tensor([[1, 2, 0], [1, 2, 3]], dtype=torch.int32, device=get_device()),
"masks": torch.as_tensor([[1, 1, 0], [1, 1, 1]], dtype=torch.int32, device=get_device()),
"targets": torch.as_tensor([3, 1], dtype=torch.int64, device=get_device()),
}
for k in batch:
assert torch.allclose(processed_batch[k], expected_batch[k])

View File

@@ -1,7 +1,7 @@
import pytest
from ray.train.torch.torch_predictor import TorchPredictor
from madewithml import predict
from madewithml.predict import TorchPredictor
def pytest_addoption(parser):

View File

@@ -1,12 +1,9 @@
import numpy as np
import pandas as pd
import ray
from madewithml import predict
def get_label(text, predictor):
df = pd.DataFrame({"title": [text], "description": "", "tag": "other"})
z = predictor.predict(data=df)["predictions"]
preprocessor = predictor.get_preprocessor()
label = predict.decode(np.stack(z).argmax(1), preprocessor.index_to_class)[0]
return label
sample_ds = ray.data.from_items([{"title": text, "description": "", "tag": "other"}])
results = predict.predict_proba(ds=sample_ds, predictor=predictor)
return results[0]["prediction"]