AWS Analytics
Big Data Analysis using various AWS managed services like Amazon Athena, Amazon EMR, Amazon Redshift, Amazon Opensearch, Amazon QuickSight etc.
Amazon Athena
This article will help you to understand Amazon Anthena along with use cases & best practises.
Updated 01 Oct, 2021
Read MoreQuery S3 Data Using Amazon Athena
Amazon Athena is serverless interactive query service that makes it easy to analyze large-scale datasets in Amazon S3 using standard SQL. Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL.
Updated 01 Oct, 2021
Read MoreAnalyze Athena Datasource using QuickSight
QuickSight is Amazon’s Business Intelligence pay-per-session service which allows you to create and publish interactive dashboards and charts. Quicksight can query data with Athena to provide easy-to-understand insights.
Updated 01 Oct, 2021
Read MoreAmazon EMR
Amazon EMR (Elastic MapReduce), is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data.
Updated 18 Oct, 2021
Read MoreCreate Amazon EMR Cluster
With Amazon EMR we can set up & launch a cluster to process and analyze data with various big data frameworks very easily.
Updated 18 Oct, 2021
Read MoreProcess Data as Job Steps in EMR
We can submit jobs and interact directly with the data frameworks that is installed in the Amazon EMR cluster. Alternatively, we can submit one or more ordered steps to an Amazon EMR cluster. Each step is a unit of work that contains instructions to manipulate data for processing by the data framework installed on the cluster.
Updated 12 Oct, 2021
Read MoreAmazon Redshift
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the AWS cloud to efficiently analyze all your data using your existing business intelligence tools.
Updated 15 Oct, 2021
Read MoreCreate Amazon Redshift Cluster
Amazon Redshift is a fast, scalable data warehouse that makes it simple and cost-effective to analyze data across data warehouse and data lake. With a few clicks, we can create a Amazon Redshift cluster in minutes.
Updated 16 Oct, 2021
Read MoreData Analysis using Redshift
In this article we are going to query Amazon Redshift for Data Analytics and perform Data Visualization using Metabase.
Updated 19 Oct, 2021
Read MoreAmazon Kinesis Services
Amazon Kinesis services helps to collect, process, and analyze data streams in real time.
Updated 23 Oct, 2021
Read MoreStreaming Data Analytics with Amazon Kinesis
In today's fast-paced digital landscape, processing and analyzing real-time data streams is crucial for timely decision-making. Amazon Kinesis, a robust data streaming service, facilitates this by collecting, processing, and analyzing streaming data. This article demonstrates how to set up and utilize Amazon Kinesis for real-time analytics using Twitter as a data source.
Updated 02 Jul, 2024
Read MoreAmazon MSK
Amazon Managed Streaming for Apache Kafka is a fully managed, highly available, and secure Apache Kafka service to process streaming data.
Updated 13 Aug, 2023
Read MoreStreaming Data Analytics with Amazon MSK
In the era of big data, the ability to analyze streaming data in real-time is essential for gaining timely insights and making data-driven decisions. Amazon Managed Streaming for Apache Kafka (MSK) is a powerful solution for this purpose. This article guides you through setting up Amazon MSK, ingesting data from Twitter, and analyzing it using Apache Flink.
Updated 02 Jul, 2024
Read MoreHow to create AWS Lambda Layer
As the demand for serverless computing continues to grow, AWS Lambda has become a popular choice for developers looking to build scalable and efficient applications. One of the key features of AWS Lambda is its support for layers, which allow you to package and reuse code across multiple functions. In this article, we'll walk through the process of creating an AWS Lambda layer using Python on MacOS.
Updated 19 Apr, 2026
Read MoreInstall & Configure PostgreSQL on Amazon Linux
As a developer, having a reliable database management system is crucial for storing and managing data. PostgreSQL is one such popular open-source relational database management system that offers robust features and scalability. In this article, we will walk you through the process of installing and configuring PostgreSQL on Amazon Linux.
Updated 19 Apr, 2026
Read MoreAirflow Installation on AWS EC2
Are you looking for a hassle-free way to set up Apache Airflow on your Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instance? Look no further! In this article, we'll walk you through the process of installing Airflow using a simple script that can be added as part of the user data while launching an EC2 instance.
Updated 19 Apr, 2026
Read MoreMLflow Installation on AWS EC2
Are you looking for a comprehensive guide on how to install the MLflow tracking server in an AWS EC2 instance? Look no further! In this article, we will walk you through the process of setting up an MLflow tracking server in an EC2 instance, including creating an S3 bucket and assigning an IAM role.
Updated 19 Apr, 2026
Read MoreKafka installation on AWS EC2
In this article, we will explore how to deploy a Confluent Kafka cluster using Docker-Compose. We will create a comprehensive configuration file that includes the necessary services and dependencies to provision a fully functional Kafka cluster in AWS EC2 instance for a Demo or PoC use case.
Updated 19 Apr, 2026
Read MoreAutomating AWS Infrastructure Provisioning with Terraform and GitHub Actions
Automating the provisioning of AWS infrastructure is essential for ensuring consistency and minimizing human errors during deployments. With Terraform and GitHub Actions, you can implement a Continuous Delivery (CD) pipeline that deploys to multiple environments (like staging and production) across different AWS accounts.
Updated 19 Apr, 2026
Read MoreSimplifying AWS Access in GitHub Actions with OIDC Provider
Managing AWS access keys for GitHub Actions can be a challenge, especially when ensuring security and ease of access. Traditionally, AWS IAM user access keys have been used to grant GitHub Actions the permissions needed to interact with AWS resources. However, there is a more secure and manageable way: using OpenID Connect (OIDC) identity providers to obtain temporary AWS credentials.
Updated 19 Apr, 2026
Read MoreGitHub Self-Hosted Private Runners on AWS
Using GitHub’s default runners may not always be ideal, particularly if you need custom configurations, enhanced security, or cost-efficiency. Hosting self-managed GitHub runners on AWS offers flexibility and control over your CI/CD processes. In this guide, we'll walk through the process of setting up GitHub self-hosted private runners on AWS.
Updated 19 Apr, 2026
Read MoreDocker Kafka Connect Container for AWS MSK cluster
Are you looking for a seamless way to integrate your Apache Kafka cluster on Amazon Managed Streaming for Kafka (MSK) with other data sources and sinks? Look no further! In this article, we'll guide you through the process of setting up a Docker Kafka Connect container on MacOS to work with your AWS MSK cluster.
Updated 19 Apr, 2026
Read MoreIceberg Data Lake on Amazon S3 with AWS Glue Catalog
Apache Iceberg is a high-performance open table format for analytic datasets. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. Iceberg provides ACID compliance, Schema evolution, Time travel for data lakes.
Updated 19 Apr, 2026
Read MoreNo sub-category under this category