Once you’ve created an orbital object, you can deploy it to a database by creating a TABLE or VIEW. Both approaches use the same underlying SQL generation, but they have different tradeoffs:
Tables store pre-computed predictions. They’re fast to query and work well for large datasets or complex models. The tradeoff is that predictions become stale when data changes, so you’ll need a scheduled job to refresh them.
Views compute predictions on-the-fly each time they’re queried. Predictions are always fresh, but query performance depends on model complexity and data size. Views are useful for prototyping, smaller datasets, or when you need real-time predictions.
Setup
We start by loading our packages and creating a simple fitted workflow.
library(orbital)
library(recipes)
library(parsnip)
library(workflows)
library(dplyr)
library(DBI)
library(duckdb)
rec_spec <- recipe(mpg ~ disp + wt + hp, data = mtcars) |>
step_normalize(all_numeric_predictors())
lm_spec <- linear_reg()
wf_spec <- workflow(rec_spec, lm_spec)
wf_fit <- fit(wf_spec, data = mtcars)Then create our orbital object.
orbital_obj <- orbital(wf_fit)
orbital_objConnect to a database
We’ll use DuckDB as our example database since it runs in-memory and requires no external setup. The same pattern works with other databases like PostgreSQL, Snowflake, SQL Server, and Spark.
Generating the prediction SQL
Both tables and views start the same way: use
orbital_inline() with dplyr::mutate() to apply
predictions to a database table, then extract the generated SQL.
# Apply predictions to the table
predictions <- mtcars_db |>
mutate(!!!orbital_inline(orbital_obj))
# View the lazy query
predictions
#> # Source: SQL [?? x 12]
#> # Database: DuckDB 1.4.4 [unknown@Linux 6.14.0-1017-azure:R 4.5.2/:memory:]
#> mpg cyl disp hp drat wt qsec vs am gear
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 -0.571 -0.535 3.9 -0.610 16.5 0 1 4
#> 2 21 6 -0.571 -0.535 3.9 -0.350 17.0 0 1 4
#> 3 22.8 4 -0.990 -0.783 3.85 -0.917 18.6 1 1 4
#> 4 21.4 6 0.220 -0.535 3.08 -0.00230 19.4 1 0 3
#> 5 18.7 8 1.04 0.413 3.15 0.228 17.0 0 0 3
#> 6 18.1 6 -0.0462 -0.608 2.76 0.248 20.2 1 0 3
#> 7 14.3 8 1.04 1.43 3.21 0.361 15.8 0 0 3
#> 8 24.4 4 -0.678 -1.24 3.69 -0.0278 20 1 0 4
#> 9 22.8 4 -0.726 -0.754 3.92 -0.0687 22.9 1 0 4
#> 10 19.2 6 -0.509 -0.345 3.92 0.228 18.3 1 0 4
#> # ℹ more rows
#> # ℹ 2 more variables: carb <dbl>, .pred <dbl>We can extract the SQL query using
dbplyr::remote_query().
library(dbplyr)
generated_sql <- remote_query(predictions)
generated_sql
#> <SQL> SELECT
#> q01.*,
#> ((20.090625 + (disp * -0.116131681667974)) + (wt * -3.71900968057122)) + (hp * -2.13618249713439) AS ".pred"
#> FROM (
#> SELECT
#> mpg,
#> cyl,
#> (disp - 230.721875) / 123.938693831382 AS disp,
#> (hp - 146.6875) / 68.5628684893206 AS hp,
#> drat,
#> (wt - 3.21725) / 0.978457442989697 AS wt,
#> qsec,
#> vs,
#> am,
#> gear,
#> carb
#> FROM mtcars_data
#> ) q01Creating a table
Tables store predictions at a point in time. They’re fast to query and work well with large datasets or complex models.
table_name <- "mtcars_predictions"
table_sql <- paste("CREATE OR REPLACE TABLE", table_name, "AS", generated_sql)
dbExecute(con, table_sql)
#> [1] 32
tbl(con, table_name) |>
collect()
#> # A tibble: 32 × 12
#> mpg cyl disp hp drat wt qsec vs am gear
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 -0.571 -0.535 3.9 -0.610 16.5 0 1 4
#> 2 21 6 -0.571 -0.535 3.9 -0.350 17.0 0 1 4
#> 3 22.8 4 -0.990 -0.783 3.85 -0.917 18.6 1 1 4
#> 4 21.4 6 0.220 -0.535 3.08 -0.00230 19.4 1 0 3
#> 5 18.7 8 1.04 0.413 3.15 0.228 17.0 0 0 3
#> 6 18.1 6 -0.0462 -0.608 2.76 0.248 20.2 1 0 3
#> 7 14.3 8 1.04 1.43 3.21 0.361 15.8 0 0 3
#> 8 24.4 4 -0.678 -1.24 3.69 -0.0278 20 1 0 4
#> 9 22.8 4 -0.726 -0.754 3.92 -0.0687 22.9 1 0 4
#> 10 19.2 6 -0.509 -0.345 3.92 0.228 18.3 1 0 4
#> # ℹ 22 more rows
#> # ℹ 2 more variables: carb <dbl>, .pred <dbl>The table contains predictions computed at the time it was created. To keep predictions fresh, you would schedule a job (e.g., daily or hourly) to drop and recreate the table, or use an incremental update strategy that only scores new rows.
Creating a view
Views compute predictions on-the-fly, providing always-fresh results without needing a refresh job.
view_name <- "mtcars_predictions_view"
view_sql <- paste("CREATE OR REPLACE VIEW", view_name, "AS", generated_sql)
dbExecute(con, view_sql)
#> [1] 0
tbl(con, view_name) |>
collect()
#> # A tibble: 32 × 12
#> mpg cyl disp hp drat wt qsec vs am gear
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 -0.571 -0.535 3.9 -0.610 16.5 0 1 4
#> 2 21 6 -0.571 -0.535 3.9 -0.350 17.0 0 1 4
#> 3 22.8 4 -0.990 -0.783 3.85 -0.917 18.6 1 1 4
#> 4 21.4 6 0.220 -0.535 3.08 -0.00230 19.4 1 0 3
#> 5 18.7 8 1.04 0.413 3.15 0.228 17.0 0 0 3
#> 6 18.1 6 -0.0462 -0.608 2.76 0.248 20.2 1 0 3
#> 7 14.3 8 1.04 1.43 3.21 0.361 15.8 0 0 3
#> 8 24.4 4 -0.678 -1.24 3.69 -0.0278 20 1 0 4
#> 9 22.8 4 -0.726 -0.754 3.92 -0.0687 22.9 1 0 4
#> 10 19.2 6 -0.509 -0.345 3.92 0.228 18.3 1 0 4
#> # ℹ 22 more rows
#> # ℹ 2 more variables: carb <dbl>, .pred <dbl>Keep in mind that the prediction SQL runs every time the view is queried. For complex models or large tables, this can be slow.
Selecting specific columns
In production, you often want to include only the prediction column and an identifier column, rather than all the intermediate calculations.
# Select only ID-like columns and the prediction
predictions_slim <- mtcars_db |>
mutate(row_id = row_number(), !!!orbital_inline(orbital_obj)) |>
select(row_id, .pred)
slim_sql <- remote_query(predictions_slim)
table_sql <- paste("CREATE OR REPLACE TABLE mtcars_pred_slim AS", slim_sql)
dbExecute(con, table_sql)
#> [1] 32
tbl(con, "mtcars_pred_slim") |>
collect()
#> # A tibble: 32 × 2
#> row_id .pred
#> <dbl> <dbl>
#> 1 1 23.6
#> 2 2 22.6
#> 3 3 25.3
#> 4 4 21.2
#> 5 5 18.2
#> 6 6 20.5
#> 7 7 15.6
#> 8 8 22.9
#> 9 9 22.0
#> 10 10 20.0
#> # ℹ 22 more rowsDatabase-specific considerations
Consider versioning your prediction tables or views (e.g.,
model_v1, model_v2) so you can compare
predictions across model versions or roll back if needed.
The pattern shown above works across most SQL databases, but there are some differences to be aware of:
-
Column naming: Some databases (e.g., Databricks)
don’t allow column names with dots. Use
orbital(wf_fit, prefix = "pred")to generate columns namedpredinstead of.pred. -
SQL Server: Uses
CREATE OR ALTER VIEWinstead ofCREATE OR REPLACE VIEW(requires SQL Server 2016 or later). -
SQLite: Uses
DROP VIEW IF EXISTSfollowed byCREATE VIEWsince it doesn’t supportCREATE OR REPLACE. -
Function support: Complex models using functions
like
log(),exp(), or probability functions may behave differently across databases. Always test your generated SQL on your target database. - Query complexity: Very large models (e.g., tree ensembles with many trees) may generate SQL that exceeds database-specific limits on expression depth or query length.
This article covers the basics. Production deployments often involve additional considerations like access controls, monitoring, logging, and integration with your organization’s data infrastructure.
