Ubuntu Studio Plans a Reboot for 18.10 release

Do you use Ubuntu Studio? Chances are you don’t — but the team behind the creatively curated community spin hope that’ll change in the near future. Plans are afoot to “reboot” Ubuntu Studio for its upcoming October release. There’s talk of new software, a new look, a new website, and a renewed sense of purpose. […]

This post, Ubuntu Studio Plans a Reboot for 18.10 release, was written by Joey Sneddon and first appeared on OMG! Ubuntu!.

Read More

Bassel Khartabil Free Fellowship, GNOME 3.28.1 Release, New Version of Mixxx and More

News briefs for April 16, 2018.

The Bassel Khartabil Free Fellowship was awarded yesterday to Majd
Al-shihabi, a Palestinian-Syrian engineer and urban planning graduate based
in Beirut, Lebanon: “The Fellowship will support Majd’s efforts in
building a unified platform for Syrian and Palestinian oral history archives,
as well as the digitizing and release of previously forgotten 1940s era
public domain maps of Palestine.” The Creative Commons also announced the first three
winners of the Bassel Khartabil Memorial Fund: Egypt-based The Mosireen
Collective, and Lebanon-based Sharq.org and ASI-REM/ADEF Lebanon. For all the
details, see the announcement
on the Creative
Commons website.

GNOME 3.28 is ready for prime time after receiving its first point
release on Friday, which includes numerous improvements and bug fixes. See
the announcement
for all the details on version 3.28.1.

Apache Subversion 1.10 has been released. This version is “a superset of
all previous Subversion releases, and is as of the time of its release
considered the current “best” release. Any feature or bugfix in 1.0.x through
1.9.x is also in 1.10, but 1.10 contains features and bugfixes not present in
any earlier release. The new features will eventually be documented in a 1.10
version of the free Subversion
book
.” New features include improved path-based authorization, new
interactive conflict resolver, added support for LZ4 compression and more.
See the release
notes
for more information.

A new version of Mixxx, the free and open-source DJ software, was released
today. Version 2.1 has “new and improved controller mappings, updated Deere
and LateNight skins, overhauled effects system, and much more”.

Kayenta, a new
open-source project from Google and Netflix for automated deployment
monitoring was announced
recently. GeekWire reports that the project’s goal is “to help other companies
that want to modernize their application deployment practices but don’t
exactly have the same budget and expertise to build their own solution.”

Read More

Multiprocessing in Python

Multiprocessing in Python

Image
python logo

Reuven M. Lerner
Mon, 04/16/2018 – 09:20

Python’s “multiprocessing” module feels like threads, but actually launches
processes.

Many people, when they start to work with Python, are excited to hear
that the language supports threading. And, as I’ve discussed in previous
articles,
Python does indeed support native-level threads
with an easy-to-use and convenient interface.

However, there is a downside to these threads—namely the global
interpreter lock (GIL), which ensures that only one thread runs at a
time. Because a thread cedes the GIL whenever it uses I/O, this means
that although threads are a bad idea in CPU-bound Python programs, they’re a
good idea when you’re dealing with I/O.

But even when you’re using lots of I/O, you might prefer to take full
advantage of a multicore system. And in the world of Python, that
means using processes.

In my article “Launching
External Processes in Python”
, I described how you can launch processes from within a Python
program, but those examples all demonstrated that you can launch a
program in an external process. Normally, when people talk about
processes, they work much like they do with threads, but are even more
independent (and with more overhead, as well).

So, it’s something of a dilemma: do you launch easy-to-use
threads, even though they don’t really run in parallel? Or, do you
launch new processes, over which you have little control?

The answer is somewhere in the middle. The Python standard library
comes with “multiprocessing”, a module that gives the feeling of
working with threads, but that actually works with processes.

So in this article, I look at the “multiprocessing” library and describe some of the basic things it can do.

Multiprocessing Basics

The “multiprocessing” module is designed to look and feel like the
“threading” module, and it largely succeeds in doing so. For example,
the following is a simple example of a multithreaded program:


#!/usr/bin/env python3

import threading
import time
import random

def hello(n):
    time.sleep(random.randint(1,3))
    print("[{0}] Hello!".format(n))

for i in range(10):
    threading.Thread(target=hello, args=(i,)).start()

print("Done!")

In this example, there is a function (hello) that prints
“Hello!”
along with whatever argument is passed. It then runs a for loop that
runs hello ten times, each of them in an independent thread.

But wait. Before the function prints its output, it first sleeps for a
few seconds. When you run this program, you then end up with output that
demonstrates how the threads are running in parallel, and not
necessarily in the order they are invoked:


$ ./thread1.py
Done!
[2] Hello!
[0] Hello!
[3] Hello!
[6] Hello!
[9] Hello!
[1] Hello!
[5] Hello!
[8] Hello!
[4] Hello!
[7] Hello!

If you want to be sure that “Done!” is printed after all the threads
have finished running, you can use join. To do that, you need to grab
each instance of threading.Thread, put it in a list, and then invoke
join on each thread:


#!/usr/bin/env python3

import threading
import time
import random

def hello(n):
    time.sleep(random.randint(1,3))
    print("[{0}] Hello!".format(n))

threads = [ ]
for i in range(10):
    t = threading.Thread(target=hello, args=(i,))
    threads.append(t)
    t.start()

for one_thread in threads:
    one_thread.join()

print("Done!")

The only difference in this version is it puts the thread
object in a list (“threads”) and then iterates over that list,
joining them one by one.

But wait a second—I promised that I’d talk about
“multiprocessing”, not threading. What gives?

Well, “multiprocessing” was designed to give the feeling of working
with threads. This is so true that I basically can do some
search-and-replace on the program I just presented:

  • threading → multiprocessing
  • Thread → Process
  • threads → processes
  • thread → process

The result is as follows:


#!/usr/bin/env python3

import multiprocessing
import time
import random

def hello(n):
    time.sleep(random.randint(1,3))
    print("[{0}] Hello!".format(n))

processes = [ ]
for i in range(10):
    t = multiprocessing.Process(target=hello, args=(i,))
    processes.append(t)
    t.start()

for one_process in processes:
    one_process.join()

print("Done!")

In other words, you can run a function in a new process, with full
concurrency and take advantage of multiple cores, with
multiprocessing.Process. It works very much like a thread, including
the use of join on the Process objects you create. Each instance of
Process represents a process running on the computer, which
you can see
using ps, and which you can (in theory) stop with
kill.

What’s the Difference?

What’s amazing to me is that the API is almost identical, and yet two
very different things are happening behind the scenes. Let me try to
make the distinction clearer with another pair of examples.

Perhaps the biggest difference, at least to anyone programming with
threads and processes, is the fact that threads share global
variables. By contrast, separate processes are completely separate; one
process cannot affect another’s variables. (In a future article, I plan
to look at how
to get around that.)

Here’s a simple example of how a function running in a thread can
modify a global variable (note that what I’m doing here is to prove a
point; if you really want to modify global variables from within a
thread, you should use a lock):


#!/usr/bin/env python3

import threading
import time
import random

mylist = [ ]

def hello(n):
    time.sleep(random.randint(1,3))
    mylist.append(threading.get_ident())   # bad in real code!
    print("[{0}] Hello!".format(n))

threads = [ ]
for i in range(10):
    t = threading.Thread(target=hello, args=(i,))
    threads.append(t)
    t.start()

for one_thread in threads:
    one_thread.join()

print("Done!")
print(len(mylist))
print(mylist)

The program is basically unchanged, except that it defines a new, empty
list (mylist) at the top. The function appends its ID to that list
and then returns.

Now, the way that I’m doing this isn’t so wise, because Python data
structures aren’t thread-safe, and appending to a list from within
multiple threads eventually will catch up with you. But the point
here isn’t to demonstrate threads, but rather to contrast them with
processes.

When I run the above code, I get:


$ ./th-update-list.py
[0] Hello!
[2] Hello!
[6] Hello!
[3] Hello!
[1] Hello!
[4] Hello!
[5] Hello!
[7] Hello!
[8] Hello!
[9] Hello!
Done!
10
[123145344081920, 123145354592256, 123145375612928,
 ↪123145359847424, 123145349337088, 123145365102592,
 ↪123145370357760, 123145380868096, 123145386123264,
 ↪123145391378432]

So, you can see that the global variable mylist is shared by the
threads, and that when one thread modifies the list, that change is
visible to all the other threads.

But if you change the program to use “multiprocessing”, the output
looks a bit different:


#!/usr/bin/env python3

import multiprocessing
import time
import random
import os

mylist = [ ]

def hello(n):
    time.sleep(random.randint(1,3))
    mylist.append(os.getpid())
    print("[{0}] Hello!".format(n))

processes = [ ]
for i in range(10):
    t = multiprocessing.Process(target=hello, args=(i,))
    processes.append(t)
    t.start()

for one_process in processes:
    one_process.join()

print("Done!")
print(len(mylist))
print(mylist)

Aside from the switch to multiprocessing, the biggest change in this
version of the program is the use of os.getpid to get the current
process ID.

The output from this program is as follows:


$ ./proc-update-list.py
[0] Hello!
[4] Hello!
[7] Hello!
[8] Hello!
[2] Hello!
[5] Hello!
[6] Hello!
[9] Hello!
[1] Hello!
[3] Hello!
Done!
0
[]

Everything seems great until the end when it checks the value of
mylist. What happened to it? Didn’t the program append to it?

Sort of. The thing is, there is no “it” in this program. Each time
it
creates a new process with “multiprocessing”, each process has its own
value of the global mylist list. Each process thus adds to its own
list, which goes away when the processes are joined.

This means the call to mylist.append succeeds, but it succeeds in
ten different processes. When the function returns from executing in
its own process, there is no trace left of the list from that
process. The only mylist variable in the main process remains empty,
because no one ever appended to it.

Queues to the Rescue

In the world of threaded programs, even when you’re able to append to
the global mylist variable, you shouldn’t do it. That’s because
Python’s data structures aren’t thread-safe. Indeed, only one data
structure is guaranteed to be thread safe—the Queue class in the
multiprocessing module.

Queues are FIFOs (that is, “first in, first out”). Whoever wants to add
data to a queue invokes the put method on the queue. And whoever
wants to retrieve data from a queue uses the get command.

Now, queues in the world of multithreaded programs prevent issues
having to do with thread safety. But in the world of multiprocessing,
queues allow you to bridge the gap among your processes, sending data
back to the main process. For example:


#!/usr/bin/env python3

import multiprocessing
import time
import random
import os
from multiprocessing import Queue

q = Queue()

def hello(n):
    time.sleep(random.randint(1,3))
    q.put(os.getpid())
    print("[{0}] Hello!".format(n))

processes = [ ]
for i in range(10):
    t = multiprocessing.Process(target=hello, args=(i,))
    processes.append(t)
    t.start()

for one_process in processes:
    one_process.join()

mylist = [ ]
while not q.empty():
    mylist.append(q.get())

print("Done!")
print(len(mylist))
print(mylist)

In this version of the program, I don’t create mylist until late in
the game. However, I create an instance of
multiprocessing.Queue
very early on. That Queue instance is designed to be shared across the
different processes. Moreover, it can handle any type of Python data
that can be stored using “pickle”, which basically means any data
structure.

In the hello function, it replaces the call to
mylist.append with
one to q.put, placing the current process’ ID number on the queue.
Each of the ten processes it creates will add its own PID to the queue.

Note that this program takes place in stages. First it launches ten
processes, then they all do their work in parallel, and then it waits
for them to complete (with join), so that it can process the
results. It pulls data off the queue, puts it onto mylist, and then
performs some calculations on the data it has retrieved.

The implementation of queues is so smooth and easy to work with,
it’s easy to forget that these queues are using some serious
behind-the-scenes operating system magic to keep things
coordinated. It’s easy to think that you’re working with threading,
but that’s just the point of multiprocessing; it might feel like
threads, but each process runs separately. This gives you true
concurrency within your program, something threads cannot do.

Conclusion

Threading is easy to work with, but threads don’t truly execute in parallel.
Multiprocessing is a module that provides an API that’s almost
identical to that of threads. This doesn’t paper over all of the
differences, but it goes a long way toward making sure things aren’t
out of control.

Read More

Use Font Finder to Quickly Browse & Install Google Fonts on Ubuntu

Font Finder for LinuxIf you’re looking for a decent font finder app for Linux then you need look no further than the perfectly named ‘Font Finder’. Font Finder is a free, open source app for Linux that lets you sift through, sort and and install fonts from the huge Google Fonts archive, all from the comfort of your desktop. […]

This post, Use Font Finder to Quickly Browse & Install Google Fonts on Ubuntu, was written by Joey Sneddon and first appeared on OMG! Ubuntu!.

Read More

Xen Project Contributor Spotlight: Stefano Stabellini

The Xen Project is comprised of a diverse set of member companies and contributors that are committed to the growth and success of the Xen Project Hypervisor. The Xen Project Hypervisor is a staple technology for server and cloud vendors, and is gaining traction in the embedded, security and automotive space. This blog series highlights the companies contributing to the changes and growth being made to the Xen Project, and how the Xen Project technology bolsters their business.

Read More

Capital One: Open Source in a Regulated Environment

Most people know Capital One as one of the largest credit card companies in the U.S. Some also know that we’re one of the nation’s largest banks — number 8 in the U.S. by assets. But Capital One is also a technology-focused digital bank that is proud to be disrupting the financial services industrythrough our commitment to cutting edge technologies and innovative digital products. Like all U.S.

Read More