updated to Ray 2.7
This commit is contained in:
parent
71b3d50a05
commit
b98bd5b1ae
20
README.md
20
README.md
@ -108,6 +108,7 @@ touch .env
|
|||||||
```bash
|
```bash
|
||||||
# Inside .env
|
# Inside .env
|
||||||
GITHUB_USERNAME="CHANGE_THIS_TO_YOUR_USERNAME" # ← CHANGE THIS
|
GITHUB_USERNAME="CHANGE_THIS_TO_YOUR_USERNAME" # ← CHANGE THIS
|
||||||
|
```
|
||||||
```bash
|
```bash
|
||||||
source .env
|
source .env
|
||||||
```
|
```
|
||||||
@ -120,8 +121,6 @@ Now we're ready to clone the repository that has all of our code:
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/GokuMohandas/Made-With-ML.git .
|
git clone https://github.com/GokuMohandas/Made-With-ML.git .
|
||||||
git remote set-url origin https://github.com/$GITHUB_USERNAME/Made-With-ML.git # <-- CHANGE THIS to your username
|
|
||||||
git checkout -b dev
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Virtual environment
|
### Virtual environment
|
||||||
@ -289,7 +288,6 @@ python madewithml/evaluate.py \
|
|||||||
|
|
||||||
### Inference
|
### Inference
|
||||||
```bash
|
```bash
|
||||||
# Get run ID
|
|
||||||
export EXPERIMENT_NAME="llm"
|
export EXPERIMENT_NAME="llm"
|
||||||
export RUN_ID=$(python madewithml/predict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)
|
export RUN_ID=$(python madewithml/predict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)
|
||||||
python madewithml/predict.py predict \
|
python madewithml/predict.py predict \
|
||||||
@ -485,17 +483,23 @@ We're not going to manually deploy our application every time we make a change.
|
|||||||
<img src="https://madewithml.com/static/images/mlops/cicd/cicd.png">
|
<img src="https://madewithml.com/static/images/mlops/cicd/cicd.png">
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
1. We'll start by adding the necessary credentials to the [`/settings/secrets/actions`](https://github.com/GokuMohandas/Made-With-ML/settings/secrets/actions) page of our GitHub repository.
|
1. Create a new github branch to save our changes to and execute CI/CD workloads:
|
||||||
|
```bash
|
||||||
|
git remote set-url origin https://github.com/$GITHUB_USERNAME/Made-With-ML.git # <-- CHANGE THIS to your username
|
||||||
|
git checkout -b dev
|
||||||
|
```
|
||||||
|
|
||||||
|
2. We'll start by adding the necessary credentials to the [`/settings/secrets/actions`](https://github.com/GokuMohandas/Made-With-ML/settings/secrets/actions) page of our GitHub repository.
|
||||||
|
|
||||||
``` bash
|
``` bash
|
||||||
export ANYSCALE_HOST=https://console.anyscale.com
|
export ANYSCALE_HOST=https://console.anyscale.com
|
||||||
export ANYSCALE_CLI_TOKEN=$YOUR_CLI_TOKEN # retrieved from https://console.anyscale.com/o/madewithml/credentials
|
export ANYSCALE_CLI_TOKEN=$YOUR_CLI_TOKEN # retrieved from https://console.anyscale.com/o/madewithml/credentials
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Now we can make changes to our code (not on `main` branch) and push them to GitHub. But in order to push our code to GitHub, we'll need to first authenticate with our credentials before pushing to our repository:
|
3. Now we can make changes to our code (not on `main` branch) and push them to GitHub. But in order to push our code to GitHub, we'll need to first authenticate with our credentials before pushing to our repository:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git config --global user.name "Your Name" # <-- CHANGE THIS to your name
|
git config --global user.name $GITHUB_USERNAME # <-- CHANGE THIS to your username
|
||||||
git config --global user.email you@example.com # <-- CHANGE THIS to your email
|
git config --global user.email you@example.com # <-- CHANGE THIS to your email
|
||||||
git add .
|
git add .
|
||||||
git commit -m "" # <-- CHANGE THIS to your message
|
git commit -m "" # <-- CHANGE THIS to your message
|
||||||
@ -504,13 +508,13 @@ git push origin dev
|
|||||||
|
|
||||||
Now you will be prompted to enter your username and password (personal access token). Follow these steps to get personal access token: [New GitHub personal access token](https://github.com/settings/tokens/new) → Add a name → Toggle `repo` and `workflow` → Click `Generate token` (scroll down) → Copy the token and paste it when prompted for your password.
|
Now you will be prompted to enter your username and password (personal access token). Follow these steps to get personal access token: [New GitHub personal access token](https://github.com/settings/tokens/new) → Add a name → Toggle `repo` and `workflow` → Click `Generate token` (scroll down) → Copy the token and paste it when prompted for your password.
|
||||||
|
|
||||||
3. Now we can start a PR from this branch to our `main` branch and this will trigger the [workloads workflow](/.github/workflows/workloads.yaml). If the workflow (Anyscale Jobs) succeeds, this will produce comments with the training and evaluation results directly on the PR.
|
4. Now we can start a PR from this branch to our `main` branch and this will trigger the [workloads workflow](/.github/workflows/workloads.yaml). If the workflow (Anyscale Jobs) succeeds, this will produce comments with the training and evaluation results directly on the PR.
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<img src="https://madewithml.com/static/images/mlops/cicd/comments.png">
|
<img src="https://madewithml.com/static/images/mlops/cicd/comments.png">
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
4. If we like the results, we can merge the PR into the `main` branch. This will trigger the [serve workflow](/.github/workflows/serve.yaml) which will rollout our new service to production!
|
5. If we like the results, we can merge the PR into the `main` branch. This will trigger the [serve workflow](/.github/workflows/serve.yaml) which will rollout our new service to production!
|
||||||
|
|
||||||
### Continual learning
|
### Continual learning
|
||||||
|
|
||||||
|
@ -5,7 +5,6 @@ import sys
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
import mlflow
|
import mlflow
|
||||||
import pretty_errors # NOQA: F401 (imported but unused)
|
|
||||||
|
|
||||||
# Directories
|
# Directories
|
||||||
ROOT_DIR = Path(__file__).parent.parent.absolute()
|
ROOT_DIR = Path(__file__).parent.parent.absolute()
|
||||||
|
@ -5,7 +5,6 @@ import numpy as np
|
|||||||
import pandas as pd
|
import pandas as pd
|
||||||
import ray
|
import ray
|
||||||
from ray.data import Dataset
|
from ray.data import Dataset
|
||||||
from ray.data.preprocessor import Preprocessor
|
|
||||||
from sklearn.model_selection import train_test_split
|
from sklearn.model_selection import train_test_split
|
||||||
from transformers import BertTokenizer
|
from transformers import BertTokenizer
|
||||||
|
|
||||||
@ -135,13 +134,18 @@ def preprocess(df: pd.DataFrame, class_to_index: Dict) -> Dict:
|
|||||||
return outputs
|
return outputs
|
||||||
|
|
||||||
|
|
||||||
class CustomPreprocessor(Preprocessor):
|
class CustomPreprocessor:
|
||||||
"""Custom preprocessor class."""
|
"""Custom preprocessor class."""
|
||||||
|
|
||||||
def _fit(self, ds):
|
def __init__(self, class_to_index={}):
|
||||||
|
self.class_to_index = class_to_index or {} # mutable defaults
|
||||||
|
self.index_to_class = {v: k for k, v in self.class_to_index.items()}
|
||||||
|
|
||||||
|
def fit(self, ds):
|
||||||
tags = ds.unique(column="tag")
|
tags = ds.unique(column="tag")
|
||||||
self.class_to_index = {tag: i for i, tag in enumerate(tags)}
|
self.class_to_index = {tag: i for i, tag in enumerate(tags)}
|
||||||
self.index_to_class = {v: k for k, v in self.class_to_index.items()}
|
self.index_to_class = {v: k for k, v in self.class_to_index.items()}
|
||||||
|
return self
|
||||||
|
|
||||||
def _transform_pandas(self, batch): # could also do _transform_numpy
|
def transform(self, ds):
|
||||||
return preprocess(batch, class_to_index=self.class_to_index)
|
return ds.map_batches(preprocess, fn_kwargs={"class_to_index": self.class_to_index}, batch_format="pandas")
|
||||||
|
@ -8,13 +8,13 @@ import ray
|
|||||||
import ray.train.torch # NOQA: F401 (imported but unused)
|
import ray.train.torch # NOQA: F401 (imported but unused)
|
||||||
import typer
|
import typer
|
||||||
from ray.data import Dataset
|
from ray.data import Dataset
|
||||||
from ray.train.torch.torch_predictor import TorchPredictor
|
|
||||||
from sklearn.metrics import precision_recall_fscore_support
|
from sklearn.metrics import precision_recall_fscore_support
|
||||||
from snorkel.slicing import PandasSFApplier, slicing_function
|
from snorkel.slicing import PandasSFApplier, slicing_function
|
||||||
from typing_extensions import Annotated
|
from typing_extensions import Annotated
|
||||||
|
|
||||||
from madewithml import predict, utils
|
from madewithml import predict, utils
|
||||||
from madewithml.config import logger
|
from madewithml.config import logger
|
||||||
|
from madewithml.predict import TorchPredictor
|
||||||
|
|
||||||
# Initialize Typer CLI app
|
# Initialize Typer CLI app
|
||||||
app = typer.Typer()
|
app = typer.Typer()
|
||||||
@ -133,8 +133,8 @@ def evaluate(
|
|||||||
y_true = np.stack([item["targets"] for item in values])
|
y_true = np.stack([item["targets"] for item in values])
|
||||||
|
|
||||||
# y_pred
|
# y_pred
|
||||||
z = predictor.predict(data=ds.to_pandas())["predictions"]
|
predictions = preprocessed_ds.map_batches(predictor).take_all()
|
||||||
y_pred = np.stack(z).argmax(1)
|
y_pred = np.array([d["output"] for d in predictions])
|
||||||
|
|
||||||
# Metrics
|
# Metrics
|
||||||
metrics = {
|
metrics = {
|
||||||
|
@ -1,13 +1,20 @@
|
|||||||
|
import json
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
import torch
|
import torch
|
||||||
import torch.nn as nn
|
import torch.nn as nn
|
||||||
|
import torch.nn.functional as F
|
||||||
|
from transformers import BertModel
|
||||||
|
|
||||||
|
|
||||||
class FinetunedLLM(nn.Module): # pragma: no cover, torch model
|
class FinetunedLLM(nn.Module):
|
||||||
"""Model architecture for a Large Language Model (LLM) that we will fine-tune."""
|
|
||||||
|
|
||||||
def __init__(self, llm, dropout_p, embedding_dim, num_classes):
|
def __init__(self, llm, dropout_p, embedding_dim, num_classes):
|
||||||
super(FinetunedLLM, self).__init__()
|
super(FinetunedLLM, self).__init__()
|
||||||
self.llm = llm
|
self.llm = llm
|
||||||
|
self.dropout_p = dropout_p
|
||||||
|
self.embedding_dim = embedding_dim
|
||||||
|
self.num_classes = num_classes
|
||||||
self.dropout = torch.nn.Dropout(dropout_p)
|
self.dropout = torch.nn.Dropout(dropout_p)
|
||||||
self.fc1 = torch.nn.Linear(embedding_dim, num_classes)
|
self.fc1 = torch.nn.Linear(embedding_dim, num_classes)
|
||||||
|
|
||||||
@ -17,3 +24,36 @@ class FinetunedLLM(nn.Module): # pragma: no cover, torch model
|
|||||||
z = self.dropout(pool)
|
z = self.dropout(pool)
|
||||||
z = self.fc1(z)
|
z = self.fc1(z)
|
||||||
return z
|
return z
|
||||||
|
|
||||||
|
@torch.inference_mode()
|
||||||
|
def predict(self, batch):
|
||||||
|
self.eval()
|
||||||
|
z = self(batch)
|
||||||
|
y_pred = torch.argmax(z, dim=1).cpu().numpy()
|
||||||
|
return y_pred
|
||||||
|
|
||||||
|
@torch.inference_mode()
|
||||||
|
def predict_proba(self, batch):
|
||||||
|
self.eval()
|
||||||
|
z = self(batch)
|
||||||
|
y_probs = F.softmax(z, dim=1).cpu().numpy()
|
||||||
|
return y_probs
|
||||||
|
|
||||||
|
def save(self, dp):
|
||||||
|
with open(Path(dp, "args.json"), "w") as fp:
|
||||||
|
contents = {
|
||||||
|
"dropout_p": self.dropout_p,
|
||||||
|
"embedding_dim": self.embedding_dim,
|
||||||
|
"num_classes": self.num_classes,
|
||||||
|
}
|
||||||
|
json.dump(contents, fp, indent=4, sort_keys=False)
|
||||||
|
torch.save(self.state_dict(), os.path.join(dp, "model.pt"))
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def load(cls, args_fp, state_dict_fp):
|
||||||
|
with open(args_fp, "r") as fp:
|
||||||
|
kwargs = json.load(fp=fp)
|
||||||
|
llm = BertModel.from_pretrained("allenai/scibert_scivocab_uncased", return_dict=False)
|
||||||
|
model = cls(llm=llm, **kwargs)
|
||||||
|
model.load_state_dict(torch.load(state_dict_fp, map_location=torch.device("cpu")))
|
||||||
|
return model
|
||||||
|
@ -1,19 +1,20 @@
|
|||||||
import json
|
import json
|
||||||
|
from pathlib import Path
|
||||||
from typing import Any, Dict, Iterable, List
|
from typing import Any, Dict, Iterable, List
|
||||||
from urllib.parse import urlparse
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import pandas as pd
|
|
||||||
import ray
|
import ray
|
||||||
import torch
|
|
||||||
import typer
|
import typer
|
||||||
from numpyencoder import NumpyEncoder
|
from numpyencoder import NumpyEncoder
|
||||||
from ray.air import Result
|
from ray.air import Result
|
||||||
from ray.train.torch import TorchPredictor
|
|
||||||
from ray.train.torch.torch_checkpoint import TorchCheckpoint
|
from ray.train.torch.torch_checkpoint import TorchCheckpoint
|
||||||
from typing_extensions import Annotated
|
from typing_extensions import Annotated
|
||||||
|
|
||||||
from madewithml.config import logger, mlflow
|
from madewithml.config import logger, mlflow
|
||||||
|
from madewithml.data import CustomPreprocessor
|
||||||
|
from madewithml.models import FinetunedLLM
|
||||||
|
from madewithml.utils import collate_fn
|
||||||
|
|
||||||
# Initialize Typer CLI app
|
# Initialize Typer CLI app
|
||||||
app = typer.Typer()
|
app = typer.Typer()
|
||||||
@ -48,25 +49,51 @@ def format_prob(prob: Iterable, index_to_class: Dict) -> Dict:
|
|||||||
return d
|
return d
|
||||||
|
|
||||||
|
|
||||||
def predict_with_proba(
|
class TorchPredictor:
|
||||||
df: pd.DataFrame,
|
def __init__(self, preprocessor, model):
|
||||||
predictor: ray.train.torch.torch_predictor.TorchPredictor,
|
self.preprocessor = preprocessor
|
||||||
|
self.model = model
|
||||||
|
self.model.eval()
|
||||||
|
|
||||||
|
def __call__(self, batch):
|
||||||
|
results = self.model.predict(collate_fn(batch))
|
||||||
|
return {"output": results}
|
||||||
|
|
||||||
|
def predict_proba(self, batch):
|
||||||
|
results = self.model.predict_proba(collate_fn(batch))
|
||||||
|
return {"output": results}
|
||||||
|
|
||||||
|
def get_preprocessor(self):
|
||||||
|
return self.preprocessor
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_checkpoint(cls, checkpoint):
|
||||||
|
metadata = checkpoint.get_metadata()
|
||||||
|
preprocessor = CustomPreprocessor(class_to_index=metadata["class_to_index"])
|
||||||
|
model = FinetunedLLM.load(Path(checkpoint.path, "args.json"), Path(checkpoint.path, "model.pt"))
|
||||||
|
return cls(preprocessor=preprocessor, model=model)
|
||||||
|
|
||||||
|
|
||||||
|
def predict_proba(
|
||||||
|
ds: ray.data.dataset.Dataset,
|
||||||
|
predictor: TorchPredictor,
|
||||||
) -> List: # pragma: no cover, tested with inference workload
|
) -> List: # pragma: no cover, tested with inference workload
|
||||||
"""Predict tags (with probabilities) for input data from a dataframe.
|
"""Predict tags (with probabilities) for input data from a dataframe.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
df (pd.DataFrame): dataframe with input features.
|
df (pd.DataFrame): dataframe with input features.
|
||||||
predictor (ray.train.torch.torch_predictor.TorchPredictor): loaded predictor from a checkpoint.
|
predictor (TorchPredictor): loaded predictor from a checkpoint.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
List: list of predicted labels.
|
List: list of predicted labels.
|
||||||
"""
|
"""
|
||||||
preprocessor = predictor.get_preprocessor()
|
preprocessor = predictor.get_preprocessor()
|
||||||
z = predictor.predict(data=df)["predictions"]
|
preprocessed_ds = preprocessor.transform(ds)
|
||||||
y_prob = torch.tensor(np.stack(z)).softmax(dim=1).numpy()
|
outputs = preprocessed_ds.map_batches(predictor.predict_proba)
|
||||||
|
y_prob = np.array([d["output"] for d in outputs.take_all()])
|
||||||
results = []
|
results = []
|
||||||
for i, prob in enumerate(y_prob):
|
for i, prob in enumerate(y_prob):
|
||||||
tag = decode([z[i].argmax()], preprocessor.index_to_class)[0]
|
tag = preprocessor.index_to_class[prob.argmax()]
|
||||||
results.append({"prediction": tag, "probabilities": format_prob(prob, preprocessor.index_to_class)})
|
results.append({"prediction": tag, "probabilities": format_prob(prob, preprocessor.index_to_class)})
|
||||||
return results
|
return results
|
||||||
|
|
||||||
@ -125,11 +152,10 @@ def predict(
|
|||||||
# Load components
|
# Load components
|
||||||
best_checkpoint = get_best_checkpoint(run_id=run_id)
|
best_checkpoint = get_best_checkpoint(run_id=run_id)
|
||||||
predictor = TorchPredictor.from_checkpoint(best_checkpoint)
|
predictor = TorchPredictor.from_checkpoint(best_checkpoint)
|
||||||
# preprocessor = predictor.get_preprocessor()
|
|
||||||
|
|
||||||
# Predict
|
# Predict
|
||||||
sample_df = pd.DataFrame([{"title": title, "description": description, "tag": "other"}])
|
sample_ds = ray.data.from_items([{"title": title, "description": description, "tag": "other"}])
|
||||||
results = predict_with_proba(df=sample_df, predictor=predictor)
|
results = predict_proba(ds=sample_ds, predictor=predictor)
|
||||||
logger.info(json.dumps(results, cls=NumpyEncoder, indent=2))
|
logger.info(json.dumps(results, cls=NumpyEncoder, indent=2))
|
||||||
return results
|
return results
|
||||||
|
|
||||||
|
@ -3,11 +3,9 @@ import os
|
|||||||
from http import HTTPStatus
|
from http import HTTPStatus
|
||||||
from typing import Dict
|
from typing import Dict
|
||||||
|
|
||||||
import pandas as pd
|
|
||||||
import ray
|
import ray
|
||||||
from fastapi import FastAPI
|
from fastapi import FastAPI
|
||||||
from ray import serve
|
from ray import serve
|
||||||
from ray.train.torch import TorchPredictor
|
|
||||||
from starlette.requests import Request
|
from starlette.requests import Request
|
||||||
|
|
||||||
from madewithml import evaluate, predict
|
from madewithml import evaluate, predict
|
||||||
@ -21,7 +19,7 @@ app = FastAPI(
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@serve.deployment(route_prefix="/", num_replicas="1", ray_actor_options={"num_cpus": 8, "num_gpus": 0})
|
@serve.deployment(num_replicas="1", ray_actor_options={"num_cpus": 8, "num_gpus": 0})
|
||||||
@serve.ingress(app)
|
@serve.ingress(app)
|
||||||
class ModelDeployment:
|
class ModelDeployment:
|
||||||
def __init__(self, run_id: str, threshold: int = 0.9):
|
def __init__(self, run_id: str, threshold: int = 0.9):
|
||||||
@ -30,8 +28,7 @@ class ModelDeployment:
|
|||||||
self.threshold = threshold
|
self.threshold = threshold
|
||||||
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI) # so workers have access to model registry
|
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI) # so workers have access to model registry
|
||||||
best_checkpoint = predict.get_best_checkpoint(run_id=run_id)
|
best_checkpoint = predict.get_best_checkpoint(run_id=run_id)
|
||||||
self.predictor = TorchPredictor.from_checkpoint(best_checkpoint)
|
self.predictor = predict.TorchPredictor.from_checkpoint(best_checkpoint)
|
||||||
self.preprocessor = self.predictor.get_preprocessor()
|
|
||||||
|
|
||||||
@app.get("/")
|
@app.get("/")
|
||||||
def _index(self) -> Dict:
|
def _index(self) -> Dict:
|
||||||
@ -55,11 +52,10 @@ class ModelDeployment:
|
|||||||
return {"results": results}
|
return {"results": results}
|
||||||
|
|
||||||
@app.post("/predict/")
|
@app.post("/predict/")
|
||||||
async def _predict(self, request: Request) -> Dict:
|
async def _predict(self, request: Request):
|
||||||
# Get prediction
|
|
||||||
data = await request.json()
|
data = await request.json()
|
||||||
df = pd.DataFrame([{"title": data.get("title", ""), "description": data.get("description", ""), "tag": ""}])
|
sample_ds = ray.data.from_items([{"title": data.get("title", ""), "description": data.get("description", ""), "tag": ""}])
|
||||||
results = predict.predict_with_proba(df=df, predictor=self.predictor)
|
results = predict.predict_proba(ds=sample_ds, predictor=self.predictor)
|
||||||
|
|
||||||
# Apply custom logic
|
# Apply custom logic
|
||||||
for i, result in enumerate(results):
|
for i, result in enumerate(results):
|
||||||
|
@ -1,6 +1,7 @@
|
|||||||
import datetime
|
import datetime
|
||||||
import json
|
import json
|
||||||
import os
|
import os
|
||||||
|
import tempfile
|
||||||
from typing import Tuple
|
from typing import Tuple
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
@ -10,21 +11,23 @@ import torch
|
|||||||
import torch.nn as nn
|
import torch.nn as nn
|
||||||
import torch.nn.functional as F
|
import torch.nn.functional as F
|
||||||
import typer
|
import typer
|
||||||
from ray.air import session
|
from ray.air.integrations.mlflow import MLflowLoggerCallback
|
||||||
from ray.air.config import (
|
from ray.data import Dataset
|
||||||
|
from ray.train import (
|
||||||
|
Checkpoint,
|
||||||
CheckpointConfig,
|
CheckpointConfig,
|
||||||
DatasetConfig,
|
DataConfig,
|
||||||
RunConfig,
|
RunConfig,
|
||||||
ScalingConfig,
|
ScalingConfig,
|
||||||
)
|
)
|
||||||
from ray.air.integrations.mlflow import MLflowLoggerCallback
|
from ray.train.torch import TorchTrainer
|
||||||
from ray.data import Dataset
|
from torch.nn.parallel.distributed import DistributedDataParallel
|
||||||
from ray.train.torch import TorchCheckpoint, TorchTrainer
|
|
||||||
from transformers import BertModel
|
from transformers import BertModel
|
||||||
from typing_extensions import Annotated
|
from typing_extensions import Annotated
|
||||||
|
|
||||||
from madewithml import data, models, utils
|
from madewithml import data, utils
|
||||||
from madewithml.config import EFS_DIR, MLFLOW_TRACKING_URI, logger
|
from madewithml.config import EFS_DIR, MLFLOW_TRACKING_URI, logger
|
||||||
|
from madewithml.models import FinetunedLLM
|
||||||
|
|
||||||
# Initialize Typer CLI app
|
# Initialize Typer CLI app
|
||||||
app = typer.Typer()
|
app = typer.Typer()
|
||||||
@ -106,18 +109,18 @@ def train_loop_per_worker(config: dict) -> None: # pragma: no cover, tested via
|
|||||||
lr = config["lr"]
|
lr = config["lr"]
|
||||||
lr_factor = config["lr_factor"]
|
lr_factor = config["lr_factor"]
|
||||||
lr_patience = config["lr_patience"]
|
lr_patience = config["lr_patience"]
|
||||||
batch_size = config["batch_size"]
|
|
||||||
num_epochs = config["num_epochs"]
|
num_epochs = config["num_epochs"]
|
||||||
|
batch_size = config["batch_size"]
|
||||||
num_classes = config["num_classes"]
|
num_classes = config["num_classes"]
|
||||||
|
|
||||||
# Get datasets
|
# Get datasets
|
||||||
utils.set_seeds()
|
utils.set_seeds()
|
||||||
train_ds = session.get_dataset_shard("train")
|
train_ds = train.get_dataset_shard("train")
|
||||||
val_ds = session.get_dataset_shard("val")
|
val_ds = train.get_dataset_shard("val")
|
||||||
|
|
||||||
# Model
|
# Model
|
||||||
llm = BertModel.from_pretrained("allenai/scibert_scivocab_uncased", return_dict=False)
|
llm = BertModel.from_pretrained("allenai/scibert_scivocab_uncased", return_dict=False)
|
||||||
model = models.FinetunedLLM(llm=llm, dropout_p=dropout_p, embedding_dim=llm.config.hidden_size, num_classes=num_classes)
|
model = FinetunedLLM(llm=llm, dropout_p=dropout_p, embedding_dim=llm.config.hidden_size, num_classes=num_classes)
|
||||||
model = train.torch.prepare_model(model)
|
model = train.torch.prepare_model(model)
|
||||||
|
|
||||||
# Training components
|
# Training components
|
||||||
@ -126,7 +129,8 @@ def train_loop_per_worker(config: dict) -> None: # pragma: no cover, tested via
|
|||||||
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode="min", factor=lr_factor, patience=lr_patience)
|
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode="min", factor=lr_factor, patience=lr_patience)
|
||||||
|
|
||||||
# Training
|
# Training
|
||||||
batch_size_per_worker = batch_size // session.get_world_size()
|
num_workers = train.get_context().get_world_size()
|
||||||
|
batch_size_per_worker = batch_size // num_workers
|
||||||
for epoch in range(num_epochs):
|
for epoch in range(num_epochs):
|
||||||
# Step
|
# Step
|
||||||
train_loss = train_step(train_ds, batch_size_per_worker, model, num_classes, loss_fn, optimizer)
|
train_loss = train_step(train_ds, batch_size_per_worker, model, num_classes, loss_fn, optimizer)
|
||||||
@ -134,9 +138,14 @@ def train_loop_per_worker(config: dict) -> None: # pragma: no cover, tested via
|
|||||||
scheduler.step(val_loss)
|
scheduler.step(val_loss)
|
||||||
|
|
||||||
# Checkpoint
|
# Checkpoint
|
||||||
metrics = dict(epoch=epoch, lr=optimizer.param_groups[0]["lr"], train_loss=train_loss, val_loss=val_loss)
|
with tempfile.TemporaryDirectory() as dp:
|
||||||
checkpoint = TorchCheckpoint.from_model(model=model)
|
if isinstance(model, DistributedDataParallel): # cpu
|
||||||
session.report(metrics, checkpoint=checkpoint)
|
model.module.save(dp=dp)
|
||||||
|
else:
|
||||||
|
model.save(dp=dp)
|
||||||
|
metrics = dict(epoch=epoch, lr=optimizer.param_groups[0]["lr"], train_loss=train_loss, val_loss=val_loss)
|
||||||
|
checkpoint = Checkpoint.from_directory(dp)
|
||||||
|
train.report(metrics, checkpoint=checkpoint)
|
||||||
|
|
||||||
|
|
||||||
@app.command()
|
@app.command()
|
||||||
@ -183,7 +192,6 @@ def train_model(
|
|||||||
num_workers=num_workers,
|
num_workers=num_workers,
|
||||||
use_gpu=bool(gpu_per_worker),
|
use_gpu=bool(gpu_per_worker),
|
||||||
resources_per_worker={"CPU": cpu_per_worker, "GPU": gpu_per_worker},
|
resources_per_worker={"CPU": cpu_per_worker, "GPU": gpu_per_worker},
|
||||||
_max_cpu_fraction_per_node=0.8,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
# Checkpoint config
|
# Checkpoint config
|
||||||
@ -201,7 +209,7 @@ def train_model(
|
|||||||
)
|
)
|
||||||
|
|
||||||
# Run config
|
# Run config
|
||||||
run_config = RunConfig(callbacks=[mlflow_callback], checkpoint_config=checkpoint_config, storage_path=EFS_DIR)
|
run_config = RunConfig(callbacks=[mlflow_callback], checkpoint_config=checkpoint_config, storage_path=EFS_DIR, local_dir=EFS_DIR)
|
||||||
|
|
||||||
# Dataset
|
# Dataset
|
||||||
ds = data.load_data(dataset_loc=dataset_loc, num_samples=train_loop_config["num_samples"])
|
ds = data.load_data(dataset_loc=dataset_loc, num_samples=train_loop_config["num_samples"])
|
||||||
@ -210,14 +218,13 @@ def train_model(
|
|||||||
train_loop_config["num_classes"] = len(tags)
|
train_loop_config["num_classes"] = len(tags)
|
||||||
|
|
||||||
# Dataset config
|
# Dataset config
|
||||||
dataset_config = {
|
options = ray.data.ExecutionOptions(preserve_order=True)
|
||||||
"train": DatasetConfig(fit=False, transform=False, randomize_block_order=False),
|
dataset_config = DataConfig(datasets_to_split=["train"], execution_options=options)
|
||||||
"val": DatasetConfig(fit=False, transform=False, randomize_block_order=False),
|
|
||||||
}
|
|
||||||
|
|
||||||
# Preprocess
|
# Preprocess
|
||||||
preprocessor = data.CustomPreprocessor()
|
preprocessor = data.CustomPreprocessor()
|
||||||
train_ds = preprocessor.fit_transform(train_ds)
|
preprocessor = preprocessor.fit(train_ds)
|
||||||
|
train_ds = preprocessor.transform(train_ds)
|
||||||
val_ds = preprocessor.transform(val_ds)
|
val_ds = preprocessor.transform(val_ds)
|
||||||
train_ds = train_ds.materialize()
|
train_ds = train_ds.materialize()
|
||||||
val_ds = val_ds.materialize()
|
val_ds = val_ds.materialize()
|
||||||
@ -230,7 +237,7 @@ def train_model(
|
|||||||
run_config=run_config,
|
run_config=run_config,
|
||||||
datasets={"train": train_ds, "val": val_ds},
|
datasets={"train": train_ds, "val": val_ds},
|
||||||
dataset_config=dataset_config,
|
dataset_config=dataset_config,
|
||||||
preprocessor=preprocessor,
|
metadata={"class_to_index": preprocessor.class_to_index},
|
||||||
)
|
)
|
||||||
|
|
||||||
# Train
|
# Train
|
||||||
|
@ -73,7 +73,6 @@ def tune_models(
|
|||||||
num_workers=num_workers,
|
num_workers=num_workers,
|
||||||
use_gpu=bool(gpu_per_worker),
|
use_gpu=bool(gpu_per_worker),
|
||||||
resources_per_worker={"CPU": cpu_per_worker, "GPU": gpu_per_worker},
|
resources_per_worker={"CPU": cpu_per_worker, "GPU": gpu_per_worker},
|
||||||
_max_cpu_fraction_per_node=0.8,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
# Dataset
|
# Dataset
|
||||||
@ -90,7 +89,8 @@ def tune_models(
|
|||||||
|
|
||||||
# Preprocess
|
# Preprocess
|
||||||
preprocessor = data.CustomPreprocessor()
|
preprocessor = data.CustomPreprocessor()
|
||||||
train_ds = preprocessor.fit_transform(train_ds)
|
preprocessor = preprocessor.fit(train_ds)
|
||||||
|
train_ds = preprocessor.transform(train_ds)
|
||||||
val_ds = preprocessor.transform(val_ds)
|
val_ds = preprocessor.transform(val_ds)
|
||||||
train_ds = train_ds.materialize()
|
train_ds = train_ds.materialize()
|
||||||
val_ds = val_ds.materialize()
|
val_ds = val_ds.materialize()
|
||||||
@ -102,7 +102,7 @@ def tune_models(
|
|||||||
scaling_config=scaling_config,
|
scaling_config=scaling_config,
|
||||||
datasets={"train": train_ds, "val": val_ds},
|
datasets={"train": train_ds, "val": val_ds},
|
||||||
dataset_config=dataset_config,
|
dataset_config=dataset_config,
|
||||||
preprocessor=preprocessor,
|
metadata={"class_to_index": preprocessor.class_to_index},
|
||||||
)
|
)
|
||||||
|
|
||||||
# Checkpoint configuration
|
# Checkpoint configuration
|
||||||
@ -118,7 +118,7 @@ def tune_models(
|
|||||||
experiment_name=experiment_name,
|
experiment_name=experiment_name,
|
||||||
save_artifact=True,
|
save_artifact=True,
|
||||||
)
|
)
|
||||||
run_config = RunConfig(callbacks=[mlflow_callback], checkpoint_config=checkpoint_config, storage_path=EFS_DIR)
|
run_config = RunConfig(callbacks=[mlflow_callback], checkpoint_config=checkpoint_config, storage_path=EFS_DIR, local_dir=EFS_DIR)
|
||||||
|
|
||||||
# Hyperparameters to start with
|
# Hyperparameters to start with
|
||||||
initial_params = json.loads(initial_params)
|
initial_params = json.loads(initial_params)
|
||||||
|
File diff suppressed because one or more lines are too long
@ -7,7 +7,6 @@ nltk==3.8.1
|
|||||||
numpy==1.24.3
|
numpy==1.24.3
|
||||||
numpyencoder==0.3.0
|
numpyencoder==0.3.0
|
||||||
pandas==2.0.1
|
pandas==2.0.1
|
||||||
pretty-errors==1.2.25
|
|
||||||
python-dotenv==1.0.0
|
python-dotenv==1.0.0
|
||||||
ray[air]==2.6.0
|
ray[air]==2.6.0
|
||||||
scikit-learn==1.2.2
|
scikit-learn==1.2.2
|
||||||
|
@ -54,5 +54,7 @@ def test_preprocess(df, class_to_index):
|
|||||||
|
|
||||||
def test_fit_transform(dataset_loc, preprocessor):
|
def test_fit_transform(dataset_loc, preprocessor):
|
||||||
ds = data.load_data(dataset_loc=dataset_loc)
|
ds = data.load_data(dataset_loc=dataset_loc)
|
||||||
preprocessor.fit_transform(ds)
|
preprocessor = preprocessor.fit(ds)
|
||||||
|
preprocessed_ds = preprocessor.transform(ds)
|
||||||
assert len(preprocessor.class_to_index) == 4
|
assert len(preprocessor.class_to_index) == 4
|
||||||
|
assert ds.count() == preprocessed_ds.count()
|
||||||
|
@ -4,6 +4,7 @@ from pathlib import Path
|
|||||||
import numpy as np
|
import numpy as np
|
||||||
import pytest
|
import pytest
|
||||||
import torch
|
import torch
|
||||||
|
from ray.train.torch import get_device
|
||||||
|
|
||||||
from madewithml import utils
|
from madewithml import utils
|
||||||
|
|
||||||
@ -42,9 +43,9 @@ def test_collate_fn():
|
|||||||
}
|
}
|
||||||
processed_batch = utils.collate_fn(batch)
|
processed_batch = utils.collate_fn(batch)
|
||||||
expected_batch = {
|
expected_batch = {
|
||||||
"ids": torch.tensor([[1, 2, 0], [1, 2, 3]], dtype=torch.int32),
|
"ids": torch.as_tensor([[1, 2, 0], [1, 2, 3]], dtype=torch.int32, device=get_device()),
|
||||||
"masks": torch.tensor([[1, 1, 0], [1, 1, 1]], dtype=torch.int32),
|
"masks": torch.as_tensor([[1, 1, 0], [1, 1, 1]], dtype=torch.int32, device=get_device()),
|
||||||
"targets": torch.tensor([3, 1], dtype=torch.int64),
|
"targets": torch.as_tensor([3, 1], dtype=torch.int64, device=get_device()),
|
||||||
}
|
}
|
||||||
for k in batch:
|
for k in batch:
|
||||||
assert torch.allclose(processed_batch[k], expected_batch[k])
|
assert torch.allclose(processed_batch[k], expected_batch[k])
|
||||||
|
@ -1,7 +1,7 @@
|
|||||||
import pytest
|
import pytest
|
||||||
from ray.train.torch.torch_predictor import TorchPredictor
|
|
||||||
|
|
||||||
from madewithml import predict
|
from madewithml import predict
|
||||||
|
from madewithml.predict import TorchPredictor
|
||||||
|
|
||||||
|
|
||||||
def pytest_addoption(parser):
|
def pytest_addoption(parser):
|
||||||
|
@ -1,12 +1,9 @@
|
|||||||
import numpy as np
|
import ray
|
||||||
import pandas as pd
|
|
||||||
|
|
||||||
from madewithml import predict
|
from madewithml import predict
|
||||||
|
|
||||||
|
|
||||||
def get_label(text, predictor):
|
def get_label(text, predictor):
|
||||||
df = pd.DataFrame({"title": [text], "description": "", "tag": "other"})
|
sample_ds = ray.data.from_items([{"title": text, "description": "", "tag": "other"}])
|
||||||
z = predictor.predict(data=df)["predictions"]
|
results = predict.predict_proba(ds=sample_ds, predictor=predictor)
|
||||||
preprocessor = predictor.get_preprocessor()
|
return results[0]["prediction"]
|
||||||
label = predict.decode(np.stack(z).argmax(1), preprocessor.index_to_class)[0]
|
|
||||||
return label
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user