pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/neha-dev-dot/Pyspark-Tutorial

sorigen="anonymous" media="all" rel="stylesheet" href="https://github.githubassets.com/assets/primer-9e07ff8eaaaff3a3.css" /> GitHub - neha-dev-dot/Pyspark-Tutorial: This repository is part of my journey to learn **PySpark**, the Python API for Apache Spark. I explored the fundamentals of distributed data processing using Spark and practiced with real-world data transformation and querying use cases. Β· GitHub
Skip to content

neha-dev-dot/Pyspark-Tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”₯ PySpark Essentials

This project is a hands-on collection of notebooks, code snippets, and exercises focused on learning Apache Spark with Python (PySpark). It includes my notes and experiments while exploring core Spark concepts, transformations, actions, DataFrame API, and more.


πŸš€ What is PySpark?

PySpark is the Python API for Apache Spark, a powerful open-source distributed computing engine used for large-scale data processing and analytics. PySpark allows you to leverage the power of distributed computing using Python.


πŸ“˜ Topics Covered

  • βœ… Introduction to Spark & PySpark
  • βœ… SparkContext & SparkSession
  • βœ… RDDs (Resilient Distributed Datasets)
  • βœ… DataFrames & Datasets
  • βœ… Transformations vs Actions
  • βœ… Reading/Writing: JSON, CSV, Parquet
  • βœ… PySpark SQL & Queries
  • βœ… GroupBy, Aggregations, Joins
  • βœ… Handling Nulls & Missing Data
  • βœ… User-Defined Functions (UDFs)
  • βœ… Window Functions
  • βœ… Data Partitioning & Performance Optimization
  • βœ… Intro to MLlib (Optional)

✍️ How I Learn

I follow a "Learn by Doing" approach. Each notebook contains:

βœ… Detailed explanations

πŸ§ͺ Hands-on code examples

πŸ“Œ Real-world case studies

About

This repository is part of my journey to learn **PySpark**, the Python API for Apache Spark. I explored the fundamentals of distributed data processing using Spark and practiced with real-world data transformation and querying use cases.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy