Documentation Index
Fetch the complete documentation index at: https://cerebrium-chore-remove-provider-region-from-examples.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Rime Partner Service is available from CLI version 1.39.0 and greater
Setup
- Create a Rime account and get an API key. Add the key as a secret in Cerebrium with the name “RIME_API_KEY”.
- Create a Cerebrium app with the CLI:
- Rime services use a simplified TOML configuration with the
[cerebrium.runtime.rime]section. Create acerebrium.tomlfile with the following:
Disable auth because the Rime API key in the header handles authentication.
The Rime Server validates the API key directly.
- Run
cerebrium deployto deploy the Rime service - the output of which should appear as follows:
- Send requests to the HTTP Rime service using the deployment URL from the output:
Runtime Configuration
The[cerebrium.runtime.rime] section supports the following parameters:
| Option | Type | Default | Description |
|---|---|---|---|
port | integer | required | Port the Rime server listens on. Typically 8001. |
model_name | string | — | Rime model to load (e.g. "arcana", "mist", "mistv2"). Defaults to Rime’s server default if not set. |
language | string | — | Language code for the model (e.g. "en", "es"). Defaults to Rime’s server default if not set. |
Scaling and Concurrency
Rime services support independent scaling configurations:- min_replicas: Minimum instances to maintain (0 for scale-to-zero). Recommended: 1.
- max_replicas: Maximum instances during high load.
- replica_concurrency: Concurrent requests per instance. Recommended: 3.
- cooldown: Time window (in seconds) that must pass at reduced concurrency before scaling down. Recommended: 50.
- compute: Instance type. Recommended:
AMPERE_A10.