{"product_id":"learning-spark-lightning-fast-data-analytics-paperback","title":"Learning Spark: Lightning-Fast Data Analytics - Paperback","description":"\u003cdiv\u003e\u003cp style=\"text-align: right;\"\u003e\u003ca href=\"https:\/\/reportcopyrightinfringement.com\/\" target=\"_blank\" rel=\"nofollow\"\u003e\u003cb\u003eReport copyright infringement\u003c\/b\u003e\u003c\/a\u003e\u003c\/p\u003e\u003c\/div\u003e\u003cp\u003eby \u003cb\u003eJules Damji\u003c\/b\u003e (Author), \u003cb\u003eBrooke Wenig\u003c\/b\u003e (Author), \u003cb\u003eTathagata Das\u003c\/b\u003e (Author)\u003c\/p\u003e\u003cp\u003e\u003c\/p\u003e\u003cp\u003eData is bigger, arrives faster, and comes in a variety of formatsâ and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. \u003c\/p\u003e\u003cp\u003e Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, youâ ll be able to: \u003c\/p\u003e\u003cul\u003e \u003cli\u003eLearn Python, SQL, Scala, or Java high-level Structured APIs \u003c\/li\u003e\n\u003cli\u003eUnderstand Spark operations and SQL Engine \u003c\/li\u003e\n\u003cli\u003eInspect, tune, and debug Spark operations with Spark configurations and Spark UI \u003c\/li\u003e\n\u003cli\u003eConnect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka \u003c\/li\u003e\n\u003cli\u003ePerform analytics on batch and streaming data using Structured Streaming \u003c\/li\u003e\n\u003cli\u003eBuild reliable data pipelines with open source Delta Lake and Spark \u003c\/li\u003e\n\u003cli\u003eDevelop machine learning pipelines with MLlib and productionize models using MLflow \u003c\/li\u003e\n\u003c\/ul\u003e\u003ch3\u003eAuthor Biography\u003c\/h3\u003e\u003cp\u003e\u003c\/p\u003e\u003cp\u003eJules S. Damji is a senior developer advocate at Databricks and an MLflow contributor. He is a hands-on developer with over 20 years of experience and has worked as a software engineer at leading companies such as Sun Microsystems, Netscape, @Home, Loudcloud\/Opsware, Verisign, ProQuest, and Hortonworks, building large scale distributed systems. He holds a B.Sc. and an M.Sc. in computer science and an MA in political advocacy and communication from Oregon State University, Cal State, and Johns Hopkins University, respectively.\u003c\/p\u003e\u003cp\u003eBrooke Wenig is a machine learning practice lead at Databricks. She leads a team of data scientists who develop large-scale machine learning pipelines for customers, as well as teaching courses on distributed machine learning best practices. Previously, she was a principal data science consultant at Databricks. She holds an M.S. in computer science from UCLA with a focus on distributed machine learning.\u003c\/p\u003e\u003cp\u003eTathagata Das is a staff software engineer at Databricks, an Apache Spark committer, and a member of the Apache Spark Project Management Committee (PMC). He is one of the original developers of Apache Spark, the lead developer of Spark Streaming (DStreams), and is currently one of the core developers of Structured Streaming and Delta Lake. Tathagata holds an M.S. in computer science from UC Berkeley.\u003c\/p\u003e\u003cp\u003eDenny Lee is a staff developer advocate at Databricks who has been working with Apache Spark since 0.6. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premises and cloud environments. He also has an M.S. in biomedical informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise healthcare customers.\u003c\/p\u003e\n            \u003cdiv\u003e\n\u003cstrong\u003eNumber of Pages:\u003c\/strong\u003e 397\u003c\/div\u003e\n            \u003cdiv\u003e\n\u003cstrong\u003eDimensions:\u003c\/strong\u003e 0.9 x 9.2 x 7 IN\u003c\/div\u003e\n            \u003cdiv\u003e\n\u003cstrong\u003ePublication Date:\u003c\/strong\u003e August 25, 2020\u003c\/div\u003e\n            ","brand":"BooksCloud","offers":[{"title":"Default Title","offer_id":47212562645241,"sku":"9781492050049","price":79.99,"currency_code":"USD","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0789\/2782\/3097\/files\/ZlNyVDJWc05Sa2ZJTUh5UlJHd2Jpdz09.webp?v=1768094472","url":"https:\/\/bookscloud.io\/products\/learning-spark-lightning-fast-data-analytics-paperback","provider":"BooksCloud Book Dropshipping","version":"1.0","type":"link"}