import subprocess
# Replace these with your actual project details
= "__YOUR_PROJECT_NAME_GOES_HERE__"
PROJECT_NAME = "__YOUR_PROJECT_LOCATION_GOES_HERE__"
PROJECT_LOCATION
# Construct the API endpoint URL
= '/'.join([
API_ENDPOINT f"https://{PROJECT_LOCATION}-aiplatform.googleapis.com/v1",
f"/projects/{PROJECT_NAME}",
f"/locations/{PROJECT_LOCATION}/",
"publishers/syntheticgestalt",
"models/sg4d100m:rawPredict",
])
# Get the access token using gcloud CLI
= subprocess.run(
result =["gcloud", 'auth', 'print-access-token'],
args=True,
capture_output
)= result.stdout.decode('utf-8').strip() ACCESS_TOKEN
As of writing this document (June 6, 2025), the API interface of SG4D100M on Vertex AI is still under development. The following content is prepared before the release date and may differ in the final implementation.
Getting Started with SG4D100M
Objective
This notebook demonstrates how to use the Vertex AI API to interact with SG4D100M for molecular embedding generation.
Setting up
First, let’s set up the required configuration:
API_ENDPOINT
: The URL endpoint for sending your requestsACCESS_TOKEN
: The bearer token for authenticating your HTTP requests
In [1]:
Running Online Inference
Model Input
Let’s create embeddings for the following compounds:
In [2]:
import polars as pl
# Example molecules (caffeine and a Theanine)
= pl.DataFrame([
request_df "smiles": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"}, # Caffeine
{"smiles": "CCNC(=O)CC[C@H](N)C(=O)O"}, # Theanine
{
]) request_df
shape: (2, 1)
smiles |
---|
str |
"CN1C=NC2=C1C(=O)N(C(=O)N2C)C" |
"CCNC(=O)CC[C@H](N)C(=O)O" |
API Invocation
We can run the predictions using HTTP POST requests:
In [8]:
import requests
= requests.post(
response =API_ENDPOINT,
url={
headers"Authorization": f"Bearer {ACCESS_TOKEN}",
"Content-Type": "application/json",
},={
json"sg4d100m_version": "sg4d100m-2025-06-06",
"messages": request_df.to_dicts(),
}, )
Model Output
The model returns molecular embeddings that can be converted to a DataFrame:
In [9]:
= pl.from_dicts(response.json())
response_df response_df
Next Steps
- Use the embeddings for similarity search
- Perform clustering analysis
- Predict molecular properties