About Redshift

Introduction to Amazon Redshift

Tens of thousands of customers today rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries, making it a widely used cloud data warehouse.

How it works

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and machine learning to deliver the best price performance at any scale.

Features and benefits

Each year we release hundreds of features and product improvements, driven by customer use cases and feedback. Find out more about what’s new.

Analyze all your data

Get integrated insights running real-time and predictive analytics on complex, scaled data across your operational databases, data lakes, data warehouses and thousands of third-party datasets.

Federated query: With the new federated query capability in Amazon Redshift, you can reach into your operational relational databases. Query live data across one or more Amazon Relational Database Service (RDS), Aurora PostgreSQL, RDS MySQL, and Aurora MySQL databases to get instant visibility into the full business operations without requiring data movement. You can join data from your Redshift data warehouses, data in your data lakes, and data in your operational stores to make better data-driven decisions. Amazon Redshift offers optimizations to reduce data movement over the network and complements it with its massively parallel data processing for high-performance queries. Learn more.

Data Sharing: Amazon Redshift data sharing allows you to extend the ease of use, performance, and cost benefits of Amazon Redshift in a single cluster to multi-cluster deployments while being able to share data. Data sharing enables instant, granular, and fast data access across Redshift clusters without the need to copy or move it. Data sharing provides live access to data so your users always see the most current and consistent information as it’s updated in the data warehouse. You can securely share live data with Redshift clusters in the same or different AWS accounts and across regions. Learn more.

AWS Data Exchange for Amazon Redshift: Query Amazon Redshift datasets from your own Redshift cluster without extracting, transforming, and loading (ETL) the data. You can subscribe to Redshift cloud data warehouse products in AWS Data Exchange. As soon as a provider makes an update, the change is visible to subscribers. If you are a data provider, access is automatically granted when a subscription starts and revoked when it ends, invoices are automatically generated when payments are due, and payments are collected through AWS. You can license access to flat files, data in Amazon Redshift, and data delivered through APIs, all with a single subscription. Learn more.

Redshift ML: Redshift ML makes it easy for data analysts, data scientists, BI professionals, and developers to create, train, and deploy Amazon SageMaker models using SQL. With Redshift ML, you can use SQL statements to create and train Amazon SageMaker models on your data in Amazon Redshift and then use those models for predictions such as churn detection, financial forecasting, personalization, and risk scoring directly in your queries and reports. Learn more.

Amazon Redshift Integration for Apache Spark: This feature makes it easy to build and run Apache Spark applications on Amazon Redshift data, enabling customers to open up the data warehouse for a broader set of analytics and machine learning solutions. With Amazon Redshift Integration for Apache Spark, developers using AWS analytics and ML services such as Amazon EMR, AWS Glue, Amazon Athena Spark, and Amazon SageMaker can get started in seconds, and effortlessly build Apache Spark applications that read from and write to their Amazon Redshift data warehouse without compromising on performance of the applications or transactional consistency of the data. Amazon Redshift Integration for Apache Spark also makes it easier to monitor and troubleshoot performance issues of Apache Spark applications when using with Amazon Redshift.

Amazon Aurora Zero-ETL to Amazon Redshift: It is a no-code integration between Amazon Aurora and Amazon Redshift that enables Amazon Aurora customers to use Amazon Redshift for near real-time analytics and machine learning on petabytes of transactional data. Within seconds of transactional data being written into Amazon Aurora, Amazon Aurora Zero-ETL to Amazon Redshift seamlessly makes the data available in Amazon Redshift, eliminating the need for customers to build and maintain complex data pipelines performing extract, transform, and load (ETL) operations. This integration reduces operational burden and cost, and enables customers to focus on improving their applications. With near real-time access to transactional data, customers can leverage Amazon Redshift’s analytics and machine learning capabilities to derive insights from transactional and other data to respond effectively to critical, time sensitive events.

Streaming Ingestion: Data engineers, data analysts, and big data developers are using real-time streaming engines to improve customer responsiveness. With the new streaming ingestion capability in Amazon Redshift, you can use SQL (Structured Query Language) to connect to and directly ingest data from Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (MSK). Amazon Redshift Streaming Ingestion also makes it easy to create and manage downstream pipelines by letting you create materialized views on top of streams directly. The materialized views can also include SQL transformations as part of your ELT (Extract Load Transform) pipeline. You can manually refresh defined materialized views to query the most recent streaming data. This approach allows you to perform downstream processing and transformations of streaming data using existing familiar tools at no additional cost.

Query and export data to and from your data lake: No other cloud data warehouse makes it as easy to both query data and write data back to your data lake in open formats. You can query open file formats such as Parquet, ORC, JSON, Avro, CSV, and more directly in Amazon S3 using familiar ANSI SQL. To export data to your data lake, simply use the Amazon Redshift UNLOAD command in your SQL code and specify Parquet as the file format, and Amazon Redshift automatically takes care of data formatting and data movement into S3. This gives you the flexibility to store highly structured, frequently accessed data and semi-structured data in an Amazon Redshift data warehouse, while keeping up to exabytes of structured, semi-structured, and unstructured data in Amazon S3. Exporting data from Amazon Redshift back to your data lake lets you analyze the data further with AWS services such as Amazon Athena, Amazon EMR, and Amazon SageMaker.

AWS services integration: Native integration with AWS services, databases, and machine learning services makes it easier to handle complete analytics workflows without friction. For example, AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. AWS Glue can extract, transform, and load (ETL) data into Amazon Redshift. Amazon Kinesis Data Firehose is the easiest way to capture, transform, and load streaming data into Amazon Redshift for near real-time analytics. You can use Amazon EMR to process data using Hadoop/Spark and load the output into Amazon Redshift for BI and analytics. Amazon QuickSight is the first BI service with pay-per-session pricing that you can use to create reports, visualizations, and dashboards on Redshift data. You can use Amazon Redshift to prepare your data to run machine learning (ML) workloads with Amazon SageMaker. To accelerate migrations to Amazon Redshift, you can use the AWS Schema Conversion tool and the AWS Database Migration Service (DMS). Amazon Redshift is also deeply integrated with Amazon Key Management Service (KMS) and Amazon CloudWatch for security, monitoring, and compliance. You can also use Lambda user-defined functions (UDFs) to invoke a Lambda function from your SQL queries as if you are invoking a UDF in Amazon Redshift. You can write Lambda UDFs to integrate with AWS Partner services and to access other popular AWS services such as Amazon DynamoDB and Amazon SageMaker.

Partner console integration: You can accelerate data onboarding and create valuable business insights in minutes by integrating with select partner solutions in the Amazon Redshift console. With these solutions you can bring data from applications such as Salesforce, Google Analytics, Facebook Ads, Slack, Jira, Splunk, and Marketo into your Redshift data warehouse in an efficient and streamlined way. It also lets you join these disparate datasets and analyze them together to produce actionable insights.

Auto-copy from Amazon S3: Amazon Redshift supports auto-copy to simplify and automate data loading from Amazon S3 reducing time and effort to build custom solutions or manage 3rd party services. With this feature, Amazon Redshift eliminates the need for manually and repeatedly running copy procedures by automating file ingestion and taking care of continuous data loading steps under the hood. Support for auto-copy makes it easy for line-of-business users and data analysts without any data engineering knowledge to easily create ingestion rules and configure the location of the data they wish to load from Amazon S3. As new data lands in specified Amazon S3 folders, ingestion process is triggered automatically based on user-defined configurations. All file formats are supported by the Redshift copy command, including CSV, JSON, Parquet, and Avro.

RedShift