AWS Spark
What is Spark?
Apache Spark is an open-source distributed cluster-computing framework.
Spark provides many interfaces such as SparkCore, SparkSQL and Spark Streaming.
All of which, enable massive “in memory” parallel computing.
Spark advantages:
- Fast data processing
- In Memory
Spark use cases:
- Complex data pipelines
- streaming
Spark antipattern use cases
Operational DB
Our Spark Blogs
- Working with R studio and a remote Spark cluster (SPARK R)
- AWS S3 caching while working with Hive spark SQL and External table | LLAP
- How to work with maximize resource allocation and Spark dynamic allocation [ AWS EMR Spark ]
- Spark performance tuning demystified
- Securing Spark JDBC + thrift connection (SSL) @ AWS EMR (demystified)
- How to connect via JDBC to Spark SQL EMR on AWS
- Converting TPCH data from row based to columnar Via Hive or SparkSQL and run ad hoc queries via Athena on columnar data