Quick Start¶

Deploy Redshift Spectra and make your first API call in under 10 minutes.

Overview¶

flowchart LR
    A[1. Configure] --> B[2. Build]
    B --> C[3. Deploy]
    C --> D[4. Query]

This guide walks you through the fastest path from zero to a working Redshift Spectra deployment. By the end, you'll have a fully functional API that executes queries against your Amazon Redshift cluster.

Prerequisites¶

Before starting, ensure you have:

AWS Account with administrative access
Amazon Redshift cluster (Serverless or Provisioned)
AWS CLI configured with appropriate credentials
Terraform 1.11+ and Terragrunt 0.99+ installed
Python 3.11+ for building Lambda packages
Task for running build commands

Step 1: Configure Environment¶

Clone the repository and set up your configuration:

# Clone the repository
git clone https://github.com/your-org/redshift-spectra.git
cd redshift-spectra

# Copy the environment template
cp .env.template .env

Edit .env with your AWS and Redshift settings:

# AWS Configuration
AWS_REGION=us-east-1

# Redshift Configuration
SPECTRA_REDSHIFT_CLUSTER_ID=my-redshift-cluster
SPECTRA_REDSHIFT_DATABASE=analytics
SPECTRA_REDSHIFT_SECRET_ARN=arn:aws:secretsmanager:us-east-1:123456789012:secret:redshift/credentials

# S3 Configuration (for Bulk API exports)
SPECTRA_S3_BUCKET_NAME=my-spectra-exports

# Multi-tenant Configuration
SPECTRA_TENANT_ISOLATION=database  # Options: none, schema, database

Secrets Manager Setup

Your Redshift credentials should be stored in AWS Secrets Manager with the following structure:

{
  "username": "spectra_user",
  "password": "your-password",
  "host": "cluster.xxxxx.region.redshift.amazonaws.com",
  "port": 5439,
  "database": "analytics"
}

Step 2: Build Lambda Packages¶

Build the Lambda layer and function packages:

# Install development dependencies
task setup:install-dev

# Build all Lambda packages
task build:all

This creates the following artifacts:

dist/lambda/
├── layer.zip          # Shared dependencies (~50MB)
├── api-handler.zip    # API handler code
└── authorizer.zip     # Custom authorizer code

Step 3: Deploy Infrastructure¶

Deploy the infrastructure to your AWS account:

# Initialize Terragrunt
task infra:init-dev

# Preview changes
task infra:plan-dev

# Deploy infrastructure
task infra:apply-dev

After deployment, retrieve your API endpoint:

task infra:output-dev

Example output:

api_endpoint = "https://abc123.execute-api.us-east-1.amazonaws.com/v1"
api_key = "spectra_demo_abc123xyz"

Deployment Time

Initial deployment typically takes 5-10 minutes to provision all AWS resources.

Step 4: Make Your First Query¶

Now let's execute a query against your Redshift cluster!

Set Environment Variables¶

export API_URL="https://abc123.execute-api.us-east-1.amazonaws.com/v1"
export API_KEY="spectra_demo_abc123xyz"
export TENANT_ID="demo-tenant"

Execute a Query¶

The Query API is synchronous — you submit a query and receive results immediately in the response:

curl -X POST "$API_URL/queries" \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Tenant-ID: $TENANT_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT table_name, table_type FROM information_schema.tables LIMIT 5"
  }'

Response:

{
  "data": [
    {"table_name": "users", "table_type": "BASE TABLE"},
    {"table_name": "orders", "table_type": "BASE TABLE"},
    {"table_name": "products", "table_type": "BASE TABLE"},
    {"table_name": "inventory", "table_type": "BASE TABLE"},
    {"table_name": "sales_view", "table_type": "VIEW"}
  ],
  "metadata": {
    "columns": ["table_name", "table_type"],
    "column_types": ["VARCHAR", "VARCHAR"],
    "row_count": 5,
    "total_rows": 5,
    "truncated": false,
    "execution_time_ms": 156,
    "query_id": "abc123-def456"
  }
}

That's it! Your query executed on Redshift and returned results directly in the response.

Understanding the Flow¶

sequenceDiagram
    participant Client
    participant API as API Gateway
    participant Lambda
    participant Redshift

    Client->>API: POST /queries
    API->>Lambda: Invoke handler
    Lambda->>Lambda: Validate SQL
    Lambda->>Lambda: Apply tenant context
    Lambda->>Redshift: Execute query
    Redshift-->>Lambda: Return results
    Lambda-->>API: Format response
    API-->>Client: Return data + metadata

The synchronous design provides:

Simplicity: Single request-response, no polling required
Low latency: Results returned immediately
Predictability: Automatic LIMIT enforcement prevents runaway queries

Query Examples¶

Aggregate Query¶

curl -X POST "$API_URL/queries" \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Tenant-ID: $TENANT_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT region, COUNT(*) as order_count, SUM(amount) as total FROM orders GROUP BY region"
  }'

With Custom Limit¶

curl -X POST "$API_URL/queries" \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Tenant-ID: $TENANT_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT * FROM large_table",
    "parameters": {
      "limit": 100
    }
  }'

With Timeout¶

curl -X POST "$API_URL/queries" \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Tenant-ID: $TENANT_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT * FROM complex_view WHERE date > CURRENT_DATE - 30",
    "parameters": {
      "timeout_seconds": 120
    }
  }'

Handling Large Result Sets¶

When your query returns more rows than the configured limit, the response indicates truncation:

{
  "data": [...],
  "metadata": {
    "row_count": 10000,
    "total_rows": 150000,
    "truncated": true
  }
}

For complete data export, use the Bulk API:

# Create a bulk export job
curl -X POST "$API_URL/bulk/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Tenant-ID: $TENANT_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "query",
    "object": "orders",
    "query": "SELECT * FROM orders WHERE year = 2024",
    "content_type": "CSV",
    "compression": "GZIP"
  }'

See the Bulk API Guide for complete documentation.

Next Steps¶

Now that you have a working API, explore these topics:

Configuration

Customize settings for your deployment

Configuration Guide
Query API

Learn the full Query API capabilities

Query API Guide
Multi-Tenancy

Set up isolated tenant environments

Multi-Tenancy
Security

Configure authentication and authorization

Security Overview

Troubleshooting¶

Query fails with 'Access Denied'

Ensure your Redshift credentials in Secrets Manager have permission to execute queries. The database user needs SELECT permission on the target tables.

Also verify the Lambda execution role has permission to access Secrets Manager.

Query times out

The default timeout is 60 seconds. For longer-running queries:

Increase timeout_seconds in the request (max: 300 seconds)
For very long queries, use the Bulk API which supports 24-hour timeouts
Optimize your query or add appropriate indexes

Results are truncated unexpectedly

The Query API has a maximum row limit of 10,000 rows. If you need more:

Use the Bulk API for complete data export
Add more specific WHERE clauses to reduce result size
Use aggregation to summarize data

Authentication fails with 401

Verify your API key is correct and the Authorization header format:

Authorization: Bearer your-api-key

Also ensure the X-Tenant-ID header is present for multi-tenant deployments.

SQL validation error

The API blocks potentially dangerous SQL patterns. Ensure your query:

Uses only SELECT statements (no DDL/DML)
Doesn't contain comments that could hide injection attempts
Doesn't use system tables that are restricted

Cleanup¶

To remove all deployed resources:

# Destroy infrastructure
task infra:destroy-dev

# Clean build artifacts
task setup:clean

Data Loss

Destroying infrastructure will delete all DynamoDB tables and S3 data. Ensure you've exported any important data before cleanup.