AWS Spark

What is Spark?

Apache Spark is an open-source distributed cluster-computing framework.
Spark provides many interfaces such as SparkCore, SparkSQL and Spark Streaming.
All of which, enable massive “in memory” parallel computing.

Spark advantages:

Fast data processing
In Memory

Spark use cases:

Complex data pipelines
streaming

Spark antipattern use cases

Operational DB

Our Spark Blogs

Fixing small files performance issues in Apache Spark, using DataFlint
Working with R studio and a remote Spark cluster (SPARK R)
AWS S3 caching while working with Hive spark SQL and External table | LLAP
How to work with maximize resource allocation and Spark dynamic allocation [ AWS EMR Spark ]
Spark performance tuning demystified
Securing Spark JDBC + thrift connection (SSL) @ AWS EMR (demystified)
How to connect via JDBC to Spark SQL EMR on AWS
Converting TPCH data from row based to columnar Via Hive or SparkSQL and run ad hoc queries via Athena on columnar data

Architectures and meetups which includes Spark

English

AWS Big Data in 200KM/h
AWS Big Data Demystified – Part 1 [English]

Hebrew

200KM/h overview on Big Data in AWS | Part 1
200KM/h overview on Big Data in AWS | Part 2
AWS Big Data Demystified – Part 3