Skip to content

Commit

Permalink
ADD: markdowns
Browse files Browse the repository at this point in the history
  • Loading branch information
MoustafaAMahmoud committed Jun 6, 2024
1 parent 3b3600f commit 290c909
Show file tree
Hide file tree
Showing 10 changed files with 1,921 additions and 2,376 deletions.
30 changes: 0 additions & 30 deletions Code/24-GroupByKey-Vs-ReduceByKey/24-GroupByKey-Vs-ReduceByKey.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,33 +9,3 @@ jupyter:
jupytext_version: 1.16.2
---

# Aggregation: groupByKey V. reduceByKey

```python

# Example 3: Group By Transformation
pairs_rdd = sc.parallelize([("A", 1), ("B", 1), ("A", 2), ("B", 2), ("A", 3)] * 5000000)
print(f"Original Pairs RDD result: {pairs_rdd.take(10)}")

```

```python
import time
# Measure performance of groupByKey and sum
start_time = time.time()
grouped_rdd = pairs_rdd.groupByKey().mapValues(lambda values: sum(values))
grouped_result = grouped_rdd.collect()
group_by_key_duration = time.time() - start_time
print(f"GroupByKey duration: {group_by_key_duration:.4f} seconds")
print(f"Grouped RDD result (sum): {grouped_result[:10]}") # Display only the first 10 results for brevity
```

```python
# Measure performance of reduceByKey and sum
start_time = time.time()
reduced_rdd = pairs_rdd.reduceByKey(lambda x, y: x + y)
reduced_result = reduced_rdd.collect()
reduce_by_key_duration = time.time() - start_time
print(f"ReduceByKey duration: {reduce_by_key_duration:.4f} seconds")
print(f"Reduced RDD result: {reduced_result[:10]}") # Display only the first 10 results for brevity
```
Binary file added Code/25-Join-RDDs/25-Joining-RDDs.dbc
Binary file not shown.
44 changes: 44 additions & 0 deletions Code/25-Join-RDDs/25-Joining-RDDs.html

Large diffs are not rendered by default.

Loading

0 comments on commit 290c909

Please sign in to comment.