Async, Non-Blocking and Blocking I/O

Process, Thread and Memory Management
Process vs Thread
Context switching
Async, Non-Blocking and Blocking I/O
Event loop. Libuv. Node.js. Twisted
Libuv
Node.js
Python
Sanic (Async Flask)

Process, Thread and Memory Management

Modern computers run programs simultaneously. You can listen to Spotify, browser internet in Chrome and play video games at the same time. Spotify doesn't stop when you play a video game, video games don't stop when you receive messages in telegram. Are those programs run in parallel? Let's try to figure it out.

Every program is executed in one or multiple processes.
Each process consists of one or multiple threads.
Each CPU core can run only one process at a time.

Computer runs only N programs in parallel, where N is a number of CPU cores.

Let's suppose we have one core CPU computer. Even this computer can run multiple programs simultaneously, Spotify doesn't stop when you copy 10GB video. How is it possible?

The audio player doesn't stop, it pauses. It pauses very short to give control to another program. Another program makes progress and pauses again. Pauses are too short to be noticeable. It is called Multitasking.

Multitasking -- illusion that programs run in parallel.

O/S level component that defines the way processes use CPU is called Scheduler. It defines methods of assigning resources to processes.

When Scheduler replaces the executing process, it needs to save the process's state, load state of the next process and resume it. The process works for some time and algorithms repeats. Saving and restoring process state is called Context switching.

Process

instance of program
has one or many threads
contains code and activity

Thread

smallest primitive managed by a scheduler
part of the process

Scheduler

assigns work to resources that complete the work
work may be virtual computation elements such as threads, processes, data flows
which scheduled onto hardware resources such as CPU, GPU,

Multitasking

method of sharing processor and system resources
allows a processor to switch between tasks
switches performed on I/O or hardware interrupts or time quotas

Time-Sharing

most common form of multitasking
illusion that applications run in parallel

Process - Representation

machine code of the executable programm
memory
- executable code
- input-output data
- call stack
- heap
system descriptors (files, handlers, sinks)
security attributes
processor state (context), registers, addresses.

Security

OS keeps processes separate
processes don't interfere with each other
OS allocates the resources processes need

Process vs Thread

processes are independent - threads are part of process
process carry all system information - threads are sharing process state, memory and resources
processes have separate address space, whereas threads share their address space
processes interact only through the system provided inter-process communication mechanisms
processes context switching is expensive - threads context switching is cheaper

Context switching

When it is time to some CPU time to another process, O/S saves the state of currently running process, loads state of another process and resumes it on CPU. This is called Context Switching.

The scheduler works with both Processes and Threads. Performing process context switch is more expensive than performing a thread context switch. Threads share memory and resources with the parent process. Thread switch requires fewer changes in the current state.

Async, Non-Blocking and Blocking I/O

Programs read and write data. The type of I/O model determines how efficiently the program is using CPU. For example, Apache Web Server is using Blocking I/O. When you request a web page, it will read data from the filesystem or network. The process will be blocked until the operation is complete.

Let's imagine that we have 10000 simultaneous users. Users request different pages. While one process is blocked, all 10000 will have to wait for it to complete. That is not usable.

That's why Apache Web Server forks new process per new open connection. When one process blocks, it doesn't affect other processes. It works fine, but there is one problem with this approach.

100000 processes cost a lot of memory
100000 processes waste a lot of time for context switching

Next example is Nginx webserver. It uses async I/O model. It is single-threaded. It can handle much more open connections and requests per second using fewer resources.

Basic idea is that it has one worker thread. Each I/O read/write doesn't block worker thread. Instead, those calls only start operation and execution continues. As soon as the result will be ready it will be added to the event queue. Worker thread takes events from the event queue and handles them. This way the server doesn't waste memory and CPU.

I/O forms:

Blocking
Non-blocking
Asynchorous

Blocking I/O

io operation blocks execution context(thread or process)

# Open socket, read data from socket, print it.
socket = network.open()
data = socket.read() # Execution blocked untill read operation is complete
print(data)

Non-Blocking I/O --- Polling

I/O operations don't block execution
result of the operation has to be polled
pooling eats CPU cycles

# Open socket, wait until the socket is ready, read data and print it.
socket = network.open()
ready = False
while not ready: # Execution is not blocked by the system, but blocked by the loop.
  ready = network.poll(socket, 'input')
data = socket.read() # Read immidiatly returns result.
print(data)

Async I/O --- Callback

I/O operations don't block execution
Completion of the operations notifies execution

# Open socket, create callback, initiate read and pass callback.
loop = create_event_loop()

socket = network.open(loop)
def handler(err, data): # callback function
  if err:
    return print('Error reading data, {}'.format(err))

  print(data)
socket.read(handler)

loop.run()

Event loop. Libuv. Node.js. Twisted

Event Table

each async operation added to the event table
maps events to handler functions
doesn't execute handlers

Event queue

list of events ready for handling

Event loop

constantly running process spinning loop
each loop iteration handles an event from the event queue
runs event handler from the event table

# Pseudo code illustrating work of event loop

while True:
  event = event_queue.get()
  if event:
    handler = event_table[event.id]
    handler(event)

Libuv

Cross-Platform library focusing on Async I/O. Developed primarily in C for Node.js. Nowadays widely used outside of initial scope.

built around the event loop
event loop tied to a single thread

Features

cross-platform asynchronous I/O library
abstractions on top of epoll, kqueue, IOCP, event ports
async TCP / UDP
async DNS
async fs operations
child processes
thread pool

Network I/O

I/O performed on non-blocking sockets:

epoll on Linux
kqueue on OSX and other BSDs
event ports on SunOS
IOCP on Windows

Thread Pool

Not all operations are async in libuv. Because of lack of O/S level API for file I/O, libuv uses thread pool under the hood for next operations:

File system operations
DNS functions
User specified code

Node.js

Node is an open-source platform that allowed JavaScript to move outside of the browser. It was created by Ryal Dahl as a result of his frustration with Apache HTTP server in 2009. Apache HTTP server uses process per connection model. For each open connection, the server spawns one more child process. This model is very limited if you want to handle a huge number of concurrent connections.

Node.js consists of two parts

Google V8 JavaScript engine
Libuv library for managing I/O

JavaScript is one of the best languages for async programming model, as it was always async, and the whole ecosystem of libraries already supported the async model.

Python

Asyncio

Library to write concurrent code using async/await syntax. Foundation for writing high-performance I/O heavy applications in python.

Asyncio is implemented around python select module. Select module provides access to select, poll, devpoll, epoll, kqueue.

Uvloop is a python wrapper around libuv, which can be used as a replacement for select module. This can make the asyncio module faster, more about this in the article uvloop: Blazing fast Python networking, by Yury Selivanov

Asyncio - Hello World

import asyncio

async def main():
    print('Hello ...')
    await asyncio.sleep(1)
    print('... World!')

asyncio.run(main())

Asyncio - Quering postgresql

import asyncio
import asyncpg

async def run():
    conn = await asyncpg.connect(...)
    values = await conn.fetch('''SELECT * FROM cats''')
    await conn.close()
    print(values)

asyncio(run())

Sanic (Async Flask)

Sanic - Amazing library heavily inspired by flask and built around asyncio.

from sanic import Sanic
from sanic.response import json
from asyncpg import create_pool

## Init Sanic

app = Sanic(__name__)

## Database management

@app.listener('before_server_start')
async def connect_to_database(app, loop):
    pool = await create_pool(user='postgres', password='postgres',
                             database='postgres', host='127.0.0.1',
                             loop=loop)
    app.config['pool'] = pool


@app.listener('after_server_stop')
async def close_connection(app, loop):
    pool = app.config['pool']
    async with pool.acquire() as conn:
        await conn.close()


## Database queries

async def get_tweets(pool):
    sql = '''
        SELECT id, text FROM tweets;
    '''
    async with pool.acquire() as conn:
        rows = await conn.fetch(sql)
        return rows


async def get_tweet(pool, id):
    sql = '''
        SELECT id, text FROM tweets
        WHERE id = $1;
    '''
    async with pool.acquire() as conn:
        rows = await conn.fetch(sql, id)
        return rows

## Http handlers

@app.route("/api/tweets/<id>")
async def handler_request(req, id):
    pool = req.app.config['pool']
    tweet = await get_tweet(pool, int(id))
    return json(tweet)

@app.route("/api/tweets")
async def handler_request(req):
    pool = req.app.config['pool']
    tweets = await get_tweets(pool)
    return json(tweets)

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8000)

Async, Non-Blocking and Blocking I/O

Table of Contents

Process, Thread and Memory Management

Process vs Thread

Context switching

Async, Non-Blocking and Blocking I/O

Event loop. Libuv. Node.js. Twisted

Libuv

Node.js

Python

Sanic (Async Flask)