Skip to content

Commit

Permalink
feat: publish some new notes
Browse files Browse the repository at this point in the history
  • Loading branch information
harrylowkey committed Jul 21, 2024
1 parent 9b12666 commit a1319ed
Show file tree
Hide file tree
Showing 5 changed files with 374 additions and 51 deletions.
188 changes: 188 additions & 0 deletions notes/backend/python/concurreny_implementation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# Python Concurrency Implementation

<!-- published_date: 21 Jul, 2024 -->
<!-- description: Python Concurrency Implementation -->
<!-- tags: python, concurrency, parallelism -->

Implementing concurrency in Python can be done in several ways depending on the task at hand and the level of concurrency required. Here are the main approaches:

### 1. **Threading**
The `threading` module is used for creating and working with threads. Threads are lighter than processes and are useful for I/O-bound tasks.

```python
import threading
import time

def print_numbers():
for i in range(5):
print(i)
time.sleep(1)

def print_letters():
for letter in 'abcde':
print(letter)
time.sleep(1)

thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_letters)

thread1.start()
thread2.start()

thread1.join()
thread2.join()
```

### 2. **Multiprocessing**
The `multiprocessing` module allows you to create processes. Processes are separate memory spaces and are useful for CPU-bound tasks.

```python
import multiprocessing
import time

def print_numbers():
for i in range(5):
print(i)
time.sleep(1)

def print_letters():
for letter in 'abcde':
print(letter)
time.sleep(1)

process1 = multiprocessing.Process(target=print_numbers)
process2 = multiprocessing.Process(target=print_letters)

process1.start()
process2.start()

process1.join()
process2.join()
```

### 3. **Asyncio**
The `asyncio` module is used for asynchronous programming and is useful for I/O-bound tasks, particularly when dealing with network operations.
`asyncio` is a framework for writing single-threaded, concurrent code using the async and await keywords.
It does not inherently use multiple threads. Instead, it uses an event loop to manage asynchronous operations within a single thread.

*Note*: when handling blocking I/O operations, use `threading` or `multiprocessing` instead if handler not support asynchronous I/O like boto3

```python
import asyncio

async def print_numbers():
for i in range(5):
print(i)
await asyncio.sleep(1)

async def print_letters():
for letter in 'abcde':
print(letter)
await asyncio.sleep(1)

async def main():
task1 = asyncio.create_task(print_numbers())
task2 = asyncio.create_task(print_letters())
await task1
await task2

asyncio.run(main())
```

### 4. **Concurrent.Futures**
The `concurrent.futures` module provides a high-level interface for asynchronously executing callables using threads or processes.

#### Using ThreadPoolExecutor: (use threading pool behind the scenes)
```python
from concurrent.futures import ThreadPoolExecutor
import time

def print_numbers():
for i in range(5):
print(i)
time.sleep(1)

def print_letters():
for letter in 'abcde':
print(letter)
time.sleep(1)

with ThreadPoolExecutor() as executor:
executor.submit(print_numbers)
executor.submit(print_letters)
```

#### Using ProcessPoolExecutor: (use multiprocessing pool behind the scenes)
```python
from concurrent.futures import ProcessPoolExecutor
import time

def print_numbers():
for i in range(5):
print(i)
time.sleep(1)

def print_letters():
for letter in 'abcde':
print(letter)
time.sleep(1)

with ProcessPoolExecutor() as executor:
executor.submit(print_numbers)
executor.submit(print_letters)
```

Each method has its own use cases, strengths, and weaknesses. Threading is suitable for I/O-bound tasks, multiprocessing for CPU-bound tasks, asyncio for managing a large number of network connections, and concurrent.futures for a higher-level interface for threading and multiprocessing.


### Combining `asyncio` with Threads
While `asyncio` itself does not use threads for running tasks, you can run blocking code in a separate thread using `loop.run_in_executor`. This is useful for offloading blocking I/O operations to a thread pool while keeping the event loop responsive.

```python
import asyncio
import concurrent.futures
import time

def blocking_io():
time.sleep(3)
print("Blocking I/O finished")

async def main():
loop = asyncio.get_running_loop()
with concurrent.futures.ThreadPoolExecutor() as pool:
await loop.run_in_executor(pool, blocking_io)
print("Continued with other tasks while blocking I/O runs in the background")

asyncio.run(main())
```

In this example, `blocking_io` is a blocking function that runs in a separate thread, allowing the event loop to continue executing other tasks.

### Key Points
- `asyncio` uses an event loop to manage asynchronous tasks in a single thread.
- `asyncio` does not use multiple threads for its event loop and task management.
- You can use `loop.run_in_executor` to run blocking code in a separate thread or process.

In summary, `asyncio` is designed for concurrency within a single thread using an event loop, but it provides mechanisms to integrate with threading or multiprocessing when necessary.


## When to use threading, multiprocessing, or asyncio?

- Use multiprocessing for CPU-bound tasks
- Use threading for I/O-bound tasks
- Use asyncio for supported asynchronous I/O-bound tasks

Let take an example of uploading files to s3 storage, we have 2 options:
- Using threading
- Using asyncio

Which one is suitable for this use case?

+) Currently if we are using synchronus library like boto3, we should go with threading.
Because we spawn other threading that handling uploading files which can blocking I/O in these spawned thread but not blocking main thread

+) If we're using asynchornus library like aioboto3, we can go with asyncio - to achieve non-blocking I/O operations with asyncio.

+) If we need to use asyncio with boto3, we can combine thread using loop.run_in_executor to keep the event loop responsive.


Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# ELB Deployment

https://assets-pt.media.datacumulus.com/aws-dva-pt/assets/pt1-q15-i1.jpg
74 changes: 74 additions & 0 deletions notes/certifications/aws-certificated-developer/test_1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Test 1

1. A developer is defining the signers that can create signed URLs for their Amazon CloudFront distributions.
Which of the following statements should the developer consider while defining the signers? (Select two)

A. When you create a signer, the public key is with CloudFront and private key is used to sign a portion of URL
B. When you use the root user to manage CloudFront key pairs, you can only have up to two active CloudFront key pairs per AWS account


## Services

- Amazon Kinesis Data Streams:
allows you to continuously capture gigabytes of data per second from hundreds of thousands of sources such as website clickstreams,
database event streams, financial transactions, social media feeds, IT logs, and location-tracking events.

- Amazon Kinesis Firehose:
fully managed service for delivering real-time streaming data to destinations such as Amazon S3, Amazon Redshift,
Amazon OpenSearch Service (formerly Elasticsearch Service), and third-party services like Datadog and Splunk.
It simplifies the process of capturing, transforming, and loading streaming data into these destinations.

- Security Group: stateful
- Network ACL: stateless
- SQS: The queue along with all its contents has to be deleted after testing - *Delete Queue*, not *Remove Queue*

- X-Ray:
- VPC Flow Logs:
- CloundTrail:

- CDK deployment: Create the app from a template provided by AWS CDK not by AWS CloudFormation.

- Load Balancer: can target EC2 instances only within an AWS Region.

- ASG:Auto Scaling group can contain EC2 instances in *one or more Availability Zones* within the *same Region* but cannot span across *multiple Regions*.

- CDK: AWS CDK is a framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation.
The CDK Toolkit again poses regional limitations not for everiy regions

- SAM sections required: Transform, Resources

- EBS volumes: are AZ locked

- KMS Encryption: can encrypt up to 4 kilobytes (4096 bytes) of arbitrary data

- RDS: Automated backups are limited to a single AWS Region while manual snapshots and Read Replicas are supported across multiple Regions.


## Questions:

[?] You are running workloads on AWS and have embedded RDS database connection strings within each web server hosting your applications.
After failing a security audit, you are looking at a different approach to store your secrets securely and automatically rotate the database credentials.
Which AWS service can you use to address this use-case?
[A]: Systems Manager

[?] CodeCommit is a managed version control service that hosts private Git repositories in the AWS cloud.
Which of the following credential types is NOT supported by IAM for CodeCommit?
[A]: IAM username and password

[?] Question 9
Incorrect
A developer has an application that stores data in an Amazon S3 bucket. The application uses an HTTP API to store and retrieve objects. When the PutObject API operation adds objects to the S3 bucket the developer must encrypt these objects at rest by using server-side encryption with Amazon S3-managed keys (SSE-S3).
Which solution will guarantee that any upload request without the mandated encryption is not processed?
[A]: Invoke the PutObject API operation and set the x-amz-server-side-encryption header as AES256 (- not ss3:3).
Use an S3 bucket policy to deny permission to upload an object unless the request has this header

[?] Which of the following security credentials can only be created by the AWS Account root user?
[A]: CloudFront Key pairs

[?]
As part of his development work, an AWS Certified Developer Associate is creating policies and attaching them to IAM identities.
After creating necessary Identity-based policies, he is now creating Resource-based policies.
Which is the only resource-based policy that the IAM service supports?
[A] Trust policy


54 changes: 54 additions & 0 deletions notes/databases/partining_vs_sharding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Partioning vs Sharding

Database partitioning, sharding, and replication are techniques used to manage data in a database to improve performance, scalability, and availability. Here's an overview of each:

### Database Partitioning

- One database
- Horizontal partitioning: by rows
- Vertical partitioning: by columns

Partitioning involves dividing a large database into smaller, more manageable pieces called partitions. Each partition can be managed and accessed separately, which can improve performance and manageability. Partitioning can be done in several ways:

1. **Horizontal Partitioning (Range Partitioning)**: Rows are divided into different tables based on a range of values. For example, a table with sales data could be partitioned by date, with each partition containing data for a specific month or year.

2. **Vertical Partitioning**: Columns are divided into different tables. For example, a customer table could be split into two tables: one containing frequently accessed columns like customer IDs and names, and another with less frequently accessed information like addresses and phone numbers.

3. **List Partitioning**: Data is divided based on a list of values. For example, a table of products might be partitioned based on product categories.

4. **Hash Partitioning**: A hash function is applied to a key to determine the partition in which to place the row. This can help evenly distribute data across partitions.

### Sharding

- Large dataset across mulitple databases or servers
- Sharding is one of the implemenation of Horizontal Partitioning

Example:
- Geo-based sharding: user location
- Range-based sharding: user ID
- Hash-based sharding
- Manual and automatic sharding

Sharding is a type of horizontal partitioning, but it involves splitting a *large dataset across multiple databases or servers*, known as shards. Each shard holds a subset of the data, and together they form the complete dataset. Sharding can improve performance and scalability, as queries can be executed in parallel across shards. There are different approaches to sharding:

1. **Key-Based Sharding (Hash Sharding)**: A hash function is used to determine the shard placement based on a shard key. This method helps distribute data evenly but can be complex to implement.

2. **Range-Based Sharding**: Data is divided into shards based on ranges of a key. This can be simpler to implement but may lead to uneven data distribution if the data is not uniformly distributed.

3. **Geographic Sharding**: Data is partitioned based on geographic location, which can be useful for applications with users distributed across different regions.

### Replication

- Copying data from one database to another

Replication involves copying data from one database to another to ensure high availability and fault tolerance. There are different types of replication:

1. **Master-Slave Replication**: One database (the master) receives all write operations, and changes are replicated to one or more read-only databases (slaves). This setup can improve read performance and provide redundancy.

2. **Master-Master Replication**: Multiple databases (masters) can accept write operations, and changes are replicated between them. This can provide high availability and allow for distributed writes but requires conflict resolution mechanisms.

3. **Multi-Master Replication**: Similar to master-master replication but involves more than two databases, providing even greater redundancy and availability.

4. **Synchronous vs. Asynchronous Replication**: In synchronous replication, changes are replicated immediately, ensuring consistency but potentially slowing down write operations. In asynchronous replication, changes are propagated after a delay, which can improve performance but might lead to temporary inconsistencies.

These techniques can be used individually or in combination, depending on the specific needs and architecture of the database system.
Loading

0 comments on commit a1319ed

Please sign in to comment.