Learning Spark 2nd Edition 114663

Name: Learning Spark 2nd Edition
Brand: O'Reilly
SKU: 114663
Price: 830 UAH
Availability: InStock
Rating: 5 (1 reviews)
ISBN: 978-93-8588-905-9

Код товару: 114663Паперова книга

ISBN

978-93-8588-905-9
Бренд

O'Reilly
Автор

Jules Damji, Denny Lee, Brooke Wenig, Tathagata Das
Рік

2020
Мова

Англійська
Ілюстрації

Чорно-білі

Data is getting bigger, arriving faster, and coming in varied formats-and it all needs to be processed at scale for analytics or machine learning. How can you process such varied data workloads efficiently? Enter Apache Spark. Updated to emphasize new features in Spark 2.4., this second edition shows data engineers and scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine-learning algorithms. Through discourse, code snippets, and notebooks, you'll be able to: Learn Python, SQL, Scala, or Java high-level APIs: DataFrames and Datasets Peek under the hood of the Spark SQL engine to understand Spark transformations and performance Inspect, tune, and debug your Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow Use open source Pandas framework Koalas and Spark for data transformation and feature engineering

About the Author

Jules S. Damji is an Apache Spark Community and Developer Advocate at Databricks. He is a hands-on developer with over 20 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, LoudCloud/Opsware, VeriSign, ProQuest, and Hortonworks, building large-scale distributed systems. He holds a B.Sc and M.Sc in Computer Science and MA in Political Advocacy and Communication from Oregon State University, Cal State, and Johns Hopkins University respectively. Denny Lee is a Technical Product Manager at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Apache Spark, Deep Learning, Machine Learning, and Genomics. Brooke Wenig is the Machine Learning Practice Lead at Databricks. She guides and assists customers in implementing machine learning pipelines, as well as teaching Distributed Machine Learning & Deep Learning courses. She received an MS in Computer Science from UCLA with a focus on distributed machine learning. She speaks Mandarin Chinese fluently and enjoys cycling. Tathagata Das is an Apache Spark committer and a member of the PMC. He's the lead developer behind Spark Streaming and currently develops Structured Streaming. Previously, he was a grad student in the UC Berkeley at AMPLab, where he conducted research about data-center frameworks and networks with Scott Shenker and Ion Stoica.

830 ₴

Купити

Monobank

до 10 платежей

от 93 ₴ / міс.

Нова Пошта
Безкоштовно від 3'000,00 ₴
Укрпошта
Безкоштовно від 1'000,00 ₴
Meest Пошта
Безкоштовно від 3'000,00 ₴

Інші книги O'Reilly

244786

Robust Python: Write Clean and Maintainable Code. 1st Ed.

Patrick Viafore

1'900 ₴

153338

Data Science on AWS. Implementing End-to-End, Continuous AI and Machine Learning Pipelines

Chris FreglyAntje Barth

2'200 ₴

67141

Optimized C++: Proven Techniques for Heightened Performance 1st Edition

Kurt Guntheroth

2'882 ₴

114613

Effective TypeScript: 62 Specific Ways to Improve Your TypeScript 1st Edition

Dan Vanderkam

950 ₴

269111

The Engineering Executive's Primer: Impactful Technical Leadership 1st Edition

Will Larson

1'200 ₴

244778

Networking and Kubernetes: A Layered Approach. 1st Ed.

James Strong, Vallery Lancey

2'100 ₴

114640

gRPC: Up and Running: Building Cloud Native Applications with Go and Java for Docker and Kubernetes 1st Edition

Kasun IndrasiriDanesh Kuruppu

1'100 ₴

13076

Embedded Android Porting, Extending, and Customizing

Karim Yaghmour

810 ₴

160097

Hybrid Cloud Apps with OpenShift and Kubernetes: Delivering Highly Available Applications and Services

Michael ElderJake KitchenerDr. Topol, Brad

1'900 ₴

114655

Arduino Cookbook: Recipes to Begin, Expand, and Enhance Your Projects 3rd Edition

Michael MargolisBrian Jepson

1'300 ₴

Характеристики

Бренд
O'Reilly
Автор
Jules DamjiDenny LeeBrooke WenigTathagata Das
Категорія
Комп'ютерна література
Номер видання
2-ге вид.
Рік
2020
Сторінок
300
Формат
170х240 мм
Обкладинка
М'яка
Тип паперу
Офсетний
Мова
Англійська
Ілюстрації
Чорно-білі

Від видавця

About the Author

Анотація

Data is getting bigger, arriving faster, and coming in varied formats—and it all needs to be processed at scale for analytics or machine learning. How can you process such varied data workloads efficiently? Enter Apache Spark.

Updated to emphasize new features in Spark 2.x., this second edition shows data engineers and scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex and data analytics employ machine-learning algorithms. Through discourse, code snippets, and notebooks, you’ll be able to:

Learn Python, SQL, Scala, or Java high-level APIs: DataFrames and Datasets
Peek under the hood of the Spark SQL engine to understand Spark transformations and performance
Inspect, tune, and debug your Spark operations with Spark configurations and Spark UI
Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka
Perform analytics on and batch streaming data using Structured Streaming
Build reliable data pipelines with open source Delta Lake and Spark
Develop machine learning pipelines with MLlib and productionize models using MLflow
Use open source Pandas framework Koalas and Spark for data transformation and feature engineering

About the Author

Denny Lee is a Technical Product Manager at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Apache Spark, Deep Learning, Machine Learning, and Genomics.

Brooke Wenig is the Machine Learning Practice Lead at Databricks. She guides and assists customers in implementing machine learning pipelines, as well as teaching Distributed Machine Learning & Deep Learning курсів. She received an MS in Computer Science from UCLA with a focus on distributed machine learning. She speaks Mandarin Chinese fluently and enjoys cycling.

Tathagata Das is an Apache Spark committer and a member of the PMC. He's the lead developer behind Spark Streaming and currently develops Structured Streaming. Previously, he was a grad student in the UC Berkeley at AMPLab, where he conducted research about data center frameworks and networks with Scott Shenker and Ion Stoica.