![AWS] What is EMR?](https://aws.darcy-it.com/wp-content/uploads/2021/03/2021-04-23_00h42_40.png)
What is EMR?
Amazon EMR is a
data processing platform for big data processing
for big data processing.
It can leverage open source tools such as Apache Spark, Apache Hive,Apache HBase,ApacheFlink,Apache Hudi, andPresto to process vast amounts of data.
Amazon EMR is a managed cluster platform for processing and analyzing large volumes of data as a big data framework for Apache Hadoop and Apache Spark.
EMR can be used to process data and business intelligence workloads for analysis.
EMR Features
Amazon EMR is an open source framework for
Apache Spark
and Hadoop to quickly and cost-effectively process and analyze vast amounts of data.
Amazon EMR and Hive can be used to quickly and efficiently process large amounts of data, including data stored in DynamoDB.
This is an
implementation method used for high-speed data processing, not report generation.
but rather for high-speed data processing.
Amazon EMR is the industry's leading big data cloud platform, and is the first to offer Apache Spark , Apache Spark Apache Hive , Apache HBase Apache HBase Apache Flink Apache Flink , Apache Hudi, Apache Hive, Apache HBase, Apache Flink Apache Hudi Presto Presto and open source tools such as Apache Flink, Apache Hudi, and Presto to process huge amounts of data.
Amazon EMR is a managed
Hadoop framework
and provides a managed Hadoop framework.
However
Amazon EMR is configured using EC2 instances.
Therefore, the operating system, etc. of the EC2 instances that make up the Amazon EMR
The operating system and other aspects of the EC2 instances that make up the Amazon EMR are accessible to the user.
They are called nodes. Each node has a role in the cluster and is called a node type.
Amazon EMR also installs various software components on each node type, giving each node a role in distributed applications such as Apache Hadoop.
Select the option to purchase EC2 instances when setting up your cluster.
On-demand instances, spot instances, or both are available.
Spot instances within Amazon EMR offer the option to purchase Amazon EC2 instance capacity at a lower cost than on-demand purchases.
The disadvantage of using spot instances is the possibility of unexpected termination due to price fluctuations. For Amazon EMR clusters, spot instances can be used to reduce costs compared to on-demand.
Amazon EMR is a managed cluster platform that runs big data frameworks such as Apache Hadoop and Apache Spark on AWS to simplify processing and analysis of large volumes of data.
Amazon EMR can be used to
S3, DynamoDB, etc. to transform and analyze large amounts of data between
Amazon EMR
Therefore, it can be used as the best service for processing and analyzing log files in S3.
S3 Select is not suitable for parsing large numbers of log files.
S3 Select is not suitable for analyzing large numbers of log files.
Reference Site
https://aws.darcy-it.com/amazon_redshift%e3%81%a8%e3%81%af%ef%bc%9f/