Modern computers run programs simultaneously. You can listen to Spotify, browser internet in Chrome and play video games at the same time. Spotify doesn't stop when you play a video game, video games don't stop when you receive messages in telegram. Are those programs run in parallel? Let's try to figure it out.
Computer runs only N programs in parallel, where N is a number of CPU cores.
Let's suppose we have one core CPU computer. Even this computer can run multiple programs simultaneously, Spotify doesn't stop when you copy 10GB video. How is it possible?
The audio player doesn't stop, it pauses. It pauses very short to give control to another program. Another program makes progress and pauses again. Pauses are too short to be noticeable. It is called Multitasking.
Multitasking -- illusion that programs run in parallel.
O/S level component that defines the way processes use CPU is called Scheduler. It defines methods of assigning resources to processes.
When Scheduler replaces the executing process, it needs to save the process's state, load state of the next process and resume it. The process works for some time and algorithms repeats. Saving and restoring process state is called Context switching.
Process
Thread
Scheduler
Multitasking
Time-Sharing
Process - Representation
Security
When it is time to some CPU time to another process, O/S saves the state of currently running process, loads state of another process and resumes it on CPU. This is called Context Switching.
The scheduler works with both Processes and Threads. Performing process context switch is more expensive than performing a thread context switch. Threads share memory and resources with the parent process. Thread switch requires fewer changes in the current state.
Programs read and write data. The type of I/O model determines how efficiently the program is using CPU. For example, Apache Web Server is using Blocking I/O. When you request a web page, it will read data from the filesystem or network. The process will be blocked until the operation is complete.
Let's imagine that we have 10000 simultaneous users. Users request different pages. While one process is blocked, all 10000 will have to wait for it to complete. That is not usable.
That's why Apache Web Server forks new process per new open connection. When one process blocks, it doesn't affect other processes. It works fine, but there is one problem with this approach.
Next example is Nginx webserver. It uses async I/O model. It is single-threaded. It can handle much more open connections and requests per second using fewer resources.
Basic idea is that it has one worker thread. Each I/O read/write doesn't block worker thread. Instead, those calls only start operation and execution continues. As soon as the result will be ready it will be added to the event queue. Worker thread takes events from the event queue and handles them. This way the server doesn't waste memory and CPU.
I/O forms:
Blocking I/O
# Open socket, read data from socket, print it.socket = network.open()data = socket.read() # Execution blocked untill read operation is completeprint(data)
Non-Blocking I/O --- Polling
# Open socket, wait until the socket is ready, read data and print it.socket = network.open()ready = Falsewhile not ready: # Execution is not blocked by the system, but blocked by the loop.ready = network.poll(socket, 'input')data = socket.read() # Read immidiatly returns result.print(data)
Async I/O --- Callback
# Open socket, create callback, initiate read and pass callback.loop = create_event_loop()socket = network.open(loop)def handler(err, data): # callback functionif err:return print('Error reading data, {}'.format(err))print(data)socket.read(handler)loop.run()
Event Table
Event queue
Event loop
# Pseudo code illustrating work of event loopwhile True:event = event_queue.get()if event:handler = event_table[event.id]handler(event)
Cross-Platform library focusing on Async I/O. Developed primarily in C for Node.js. Nowadays widely used outside of initial scope.
Features
Network I/O
I/O performed on non-blocking sockets:
Thread Pool
Not all operations are async in libuv. Because of lack of O/S level API for file I/O, libuv uses thread pool under the hood for next operations:
Node is an open-source platform that allowed JavaScript to move outside of the browser. It was created by Ryal Dahl as a result of his frustration with Apache HTTP server in 2009. Apache HTTP server uses process per connection model. For each open connection, the server spawns one more child process. This model is very limited if you want to handle a huge number of concurrent connections.
Node.js consists of two parts
JavaScript is one of the best languages for async programming model, as it was always async, and the whole ecosystem of libraries already supported the async model.
Asyncio
Library to write concurrent code using async/await syntax. Foundation for writing high-performance I/O heavy applications in python.
Asyncio is implemented around python select
module. Select module provides access to select
, poll
, devpoll
, epoll
, kqueue
.
Uvloop is a python wrapper around libuv
, which can be used as a replacement for select
module. This can make the asyncio module faster, more about this in the article uvloop: Blazing fast Python networking, by Yury Selivanov
Asyncio - Hello World
import asyncioasync def main():print('Hello ...')await asyncio.sleep(1)print('... World!')asyncio.run(main())
Asyncio - Quering postgresql
import asyncioimport asyncpgasync def run():conn = await asyncpg.connect(...)values = await conn.fetch('''SELECT * FROM cats''')await conn.close()print(values)asyncio(run())
Sanic - Amazing library heavily inspired by flask and built around asyncio.
from sanic import Sanicfrom sanic.response import jsonfrom asyncpg import create_pool## Init Sanicapp = Sanic(__name__)## Database management@app.listener('before_server_start')async def connect_to_database(app, loop):pool = await create_pool(user='postgres', password='postgres',database='postgres', host='127.0.0.1',loop=loop)app.config['pool'] = pool@app.listener('after_server_stop')async def close_connection(app, loop):pool = app.config['pool']async with pool.acquire() as conn:await conn.close()## Database queriesasync def get_tweets(pool):sql = '''SELECT id, text FROM tweets;'''async with pool.acquire() as conn:rows = await conn.fetch(sql)return rowsasync def get_tweet(pool, id):sql = '''SELECT id, text FROM tweetsWHERE id = $1;'''async with pool.acquire() as conn:rows = await conn.fetch(sql, id)return rows## Http handlers@app.route("/api/tweets/<id>")async def handler_request(req, id):pool = req.app.config['pool']tweet = await get_tweet(pool, int(id))return json(tweet)@app.route("/api/tweets")async def handler_request(req):pool = req.app.config['pool']tweets = await get_tweets(pool)return json(tweets)if __name__ == "__main__":app.run(host="0.0.0.0", port=8000)