Planet Python

Last update: September 20, 2016 04:51 AM

September 19, 2016

Curtis Miller

An Introduction to Stock Market Data Analysis with Python (Part 1)

This post is the first in a two-part series on stock data analysis using Python, based on a lecture I gave on the subject for MATH 3900 (Data Science) at the University of Utah. In these posts, I will discuss basics such as obtaining the data from Yahoo! Finance using pandas, visualizing stock data, moving…Read more An Introduction to Stock Market Data Analysis with Python (Part 1)

September 19, 2016 03:00 PM

Andre Roberge

Backward incompatible change in handling permalinks with Reeborg coming soon

About two years ago, I implemented a permalink scheme which was intended to facilitate sharing various programming tasks in Reeborg’s World. As I added new capabilities, the number of possible items to include grew tremendously. In fact, for rich enough worlds, the permalink can be too long for the browser to handle. To deal with such situations, I had to implement a clumsy way to import and

September 19, 2016 02:17 PM

Doug Hellmann

dbm — Unix Key-Value Databases — PyMOTW 3

dbm is a front-end for DBM-style databases that use simple string values as keys to access records containing strings. It uses whichdb() to identify databases, then opens them with the appropriate module. It is used as a back-end for shelve, which stores objects in a DBM database using pickle . Read more… This post is … Continue reading dbm — Unix Key-Value Databases — PyMOTW 3

September 19, 2016 01:00 PM

Python Piedmont Triad User Group

PYPTUG Monthly meeting September 27th 2016 (Just bring Glue)

Come join PYPTUG at out next monthly meeting (August 30th 2016) to learn more about the Python programming language, modules and tools. Python is the perfect language to learn if you’ve never programmed before, and at the other end, it is also the perfect tool that no expert would do without. Monthly meetings are in addition to our project nights.

What

Meeting will start at 6:00pm.

We will open on an Intro to PYPTUG and on how to get started with Python, PYPTUG activities and members projects, in particular some updates on the Quadcopter project, then on to News from the community.

Main Talk: Just Bring Glue – Leveraging Multiple Libraries To Quickly Build Powerful New Tools

by Rob Agle

Bio:

Rob Agle is a software engineer at Inmar, where he works on the high-availability REST APIs powering the organization’s digital promotions network. His technical interests include application and network security, machine learning and natural language processing.

Abstract:

It has never been easier for developers to create simple-yet-powerful data-driven or data-informed tools. Through case studies, we’ll explore a few projects that use a number of open source libraries or modules in concert. Next, we’ll cover strategies for learning these new tools. Finally, we wrap up with pitfalls to keep in mind when gluing powerful things together quickly.

Lightning talks!

We will have some time for extemporaneous “lightning talks” of 5-10 minute duration. If you’d like to do one, some suggestions of talks were provided here, if you are looking for inspiration. Or talk about a project you are working on.

When

Tuesday, September 27th 2016
Meeting starts at 6:00PM

Where

Wake Forest University, close to Polo Rd and University Parkway:

Wake Forest University, Winston-Salem, NC 27109

And speaking of parking: Parking after 5pm is on a first-come, first-serve basis. The official parking policy is:

“Visitors can park in any general parking lot on campus. Visitors should avoid reserved spaces, faculty/staff lots, fire lanes or other restricted area on campus. Frequent visitors should contact Parking and Transportation to register for a parking permit.“

Mailing List

Don’t forget to sign up to our user group mailing list:

It is the only step required to become a PYPTUG member.

RSVP on meetup:

https://www.meetup.com/PYthon-Piedmont-Triad-User-Group-PYPTUG/events/233759543/

September 19, 2016 12:58 PM

Mike Driscoll

PyDev of the Week: Benedikt Eggers

This week we welcome Benedikt Eggers (@be_eggers) as our PyDev of the Week. Benedikt is one of the core developers working on the IronPython project. IronPython is the version of Python that is integrated with Microsoft’s .NET framework, much like Jython is integrated with Java. If you’re interesting in seeing what Benedikt has been up to lately, you might want to check out his Github profile. Let’s take a few minutes to get to know our fellow Pythoneer!

Could you tell us a little about yourself (hobbies, education, etc):

My name is Benedikt Eggers and I was born and live in Germany (23 years). I’ve working as a software developer and engineer and had studied business informatics. At my little spare time I do sports and work on open source projects, like IronPython.

Why did you start using Python?

To be honest, I’ve started using Python by searching for a script engine for .net. That way I came to IronPython and established it in our company. There we are using it to extend our software and writing and using Python modules in both worlds. After a while I got more into Python and thought that’s a great concept of a dynamic language. So it’s a good contrast to C#. It is perfect for scripting and other nice and quick stuff.

What other programming languages do you know and which is your favorite?

The language I’m most familiar with is C#. To be honest, this is also my “partly” favorite language to write larger application and complex products. But I also like Python/IronPython very much, cause it allows me to achieve my goals very quickly with less and readable code. So a favorite language is hard to pick, cause I like to use the best technology in its specific environment (Same could be said about relational and document based database, …)

What projects are you working on now?

Mostly I’m working on my projects at work. We (http://simplic-systems.com/) are continuously working on creating more and more open source projects and also contribute to other open source projects. So I spend a lot of time there. But I also can use a lot of this time to work on IronPython. So I’m able to mix this up and work a few projects parallel. But spending time working on IronPython is something I really like, so I’m doing it, cause I enjoy it.

Which Python libraries are your favorite (core or 3rd party)?

I really like requests and all the packages to easily work with web-services and other modern technologies. On the other side, I use a lot of Python Modules in our continuous integration environment, to automate our build process. So there I also use the core libraries to move, rename files by reading JSON configurations and so on. So there are a lot of libraries I like. Because they make my life much easier every day.

Is there anything else you’d like to say?

Yes – I’d love to see how fast we are growing and that we found people who are willing to contribute to IronPython. I think we are on a good way and hope that we can achieve all of our goals. I hope that IronPython 3 and all other releases are coming soon. Furthermore I’d like to thank Jeff Hardy a lot, who has contributed to the project in that past years and is always very helpful. Finally also a thanks goes to Alex Earl who has working on this project too in the last years and now wants to bring it back together with the community. I think we will work great together!

Thanks so much for doing the interview!

September 19, 2016 12:30 PM

Wesley Chun

Accessing Gmail from Python (plus BONUS)

NOTE: The code covered in this blogpost is also available in a video walkthrough here.

UPDATE (Aug 2016): The code has been modernized to use oauth2client.tools.run_flow() instead of the deprecated oauth2client.tools.run(). You can read more about that change here.

Introduction

The last several posts have illustrated how to connect to public/simple and authorized Google APIs. Today, we’re going to demonstrate accessing the Gmail (another authorized) API. Yes, you read that correctly… “API.” In the old days, you access mail services with standard Internet protocols such as IMAP/POP and SMTP. However, while they are standards, they haven’t kept up with modern day email usage and developers’ needs that go along with it. In comes the Gmail API which provides CRUD access to email threads and drafts along with messages, search queries, management of labels (like folders), and domain administration features that are an extra concern for enterprise developers.

Earlier posts demonstrate the structure and “how-to” use Google APIs in general, so the most recent posts, including this one, focus on solutions and apps, and use of specific APIs. Once you review the earlier material, you’re ready to start with Gmail scopes then see how to use the API itself.

Gmail API Scopes

Below are the Gmail API scopes of authorization. We’re listing them in most-to-least restrictive order because that’s the order you should consider using them in — use the most restrictive scope you possibly can yet still allowing your app to do its work. This makes your app more secure and may prevent inadvertently going over any quotas, or accessing, destroying, or corrupting data. Also, users are less hesitant to install your app if it asks only for more restricted access to their inboxes.

'https://www.googleapis.com/auth/gmail.readonly' — Read-only access to all resources + metadata
'https://www.googleapis.com/auth/gmail.send' — Send messages only (no inbox read nor modify)
'https://www.googleapis.com/auth/gmail.labels' — Create, read, update, and delete labels only
'https://www.googleapis.com/auth/gmail.insert' — Insert and import messages only
'https://www.googleapis.com/auth/gmail.compose' — Create, read, update, delete, and send email drafts and messages
'https://www.googleapis.com/auth/gmail.modify' — All read/write operations except for immediate & permanent deletion of threads & messages
'https://mail.google.com/' — All read/write operations (use with caution)

Using the Gmail API

We’re going to create a sample Python script that goes through your Gmail threads and looks for those which have more than 2 messages, for example, if you’re seeking particularly chatty threads on mailing lists you’re subscribed to. Since we’re only peeking at inbox content, the only scope we’ll request is ‘gmail.readonly’, the most restrictive scope. The API string is ‘gmail’ which is currently on version 1, so here’s the call to apiclient.discovery.build() you’ll use:

GMAIL = discovery.build('gmail', 'v1', http=creds.authorize(Http()))

Note that all lines of code above that is predominantly boilerplate (that was explained in earlier posts). Anyway, once you have an established service endpoint with build(), you can use the list() method of the threads service to request the file data. The one required parameter is the user’s Gmail address. A special value of ‘me’ has been set aside for the currently authenticated user.

threads = GMAIL.users().threads().list(userId='me').execute().get('threads', [])

If all goes well, the (JSON) response payload will (not be empty or missing and) contain a sequence of threads that we can loop over. For each thread, we need to fetch more info, so we issue a second API call for that. Specifically, we care about the number of messages in a thread:

for thread in threads:
tdata = GMAIL.users().threads().get(userId='me', id=thread['id']).execute()
nmsgs = len(tdata['messages'])

We’re seeking only all threads more than 2 (that means at least 3) messages, discarding the rest. If a thread meets that criteria, scan the first message and cycle through the email headers looking for the “Subject” line to display to users, skipping the remaining headers as soon as we find one:

    if nmsgs > 2:
msg = tdata['messages'][0]['payload']
subject = ''
for header in msg['headers']:
if header['name'] == 'Subject':
subject = header['value']
break
if subject:
print('%s (%d msgs)' % (subject, nmsgs))

If you’re on many mailing lists, this may give you more messages than desired, so feel free to up the threshold from 2 to 50, 100, or whatever makes sense for you. (In that case, you should use a variable.) Regardless, that’s pretty much the entire script save for the OAuth2 code that we’re so familiar with from previous posts. The script is posted below in its entirety, and if you run it, you’ll see an interesting collection of threads… YMMV depending on what messages are in your inbox:

$ python3 gmail_threads.py
[Tutor] About Python Module to Process Bytes (3 msgs)
Core Python book review update (30 msgs)
[Tutor] scratching my head (16 msgs)
[Tutor] for loop for long numbers (10 msgs)
[Tutor] How to show the listbox from sqlite and make it searchable? (4 msgs)
[Tutor] find pickle and retrieve saved data (3 msgs)

BONUS: Python 3!

As of Mar 2015 (formally in Apr 2015 when the docs were updated), support for Python 3 was added to Google APIs Client Library (3.3+)! This update was a long time coming (relevant GitHub thread), and allows Python 3 developers to write code that accesses Google APIs. If you’re already running 3.x, you can use its pip command (pip3) to install the Client Library:

$ pip3 install -U google-api-python-client

Because of this, unlike previous blogposts, we’re deliberately going to avoid use of the print statement and switch to the print() function instead. If you’re still running Python 2, be sure to add the following import so that the code will also run in your 2.x interpreter:

from __future__ import print_function

Conclusion

To find out more about the input parameters as well as all the fields that are in the response, take a look at the docs for threads().list(). For more information on what other operations you can execute with the Gmail API, take a look at the reference docs and check out the companion video for this code sample. That’s it!

Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!):

from __future__ import print_functionfrom apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools
SCOPES = 'https://www.googleapis.com/auth/gmail.readonly'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run_flow(flow, store)
GMAIL = discovery.build('gmail', 'v1', http=creds.authorize(Http()))
threads = GMAIL.users().threads().list(userId='me').execute().get('threads', [])
for thread in threads:
tdata = GMAIL.users().threads().get(userId='me', id=thread['id']).execute()
nmsgs = len(tdata['messages'])
if nmsgs > 2:
msg = tdata['messages'][0]['payload']
subject = ''
for header in msg['headers']:
if header['name'] == 'Subject':
subject = header['value']
break
if subject:
print('%s (%d msgs)' % (subject, nmsgs))

You can now customize this code for your own needs, for a mobile frontend, a server-side backend, or to access other Google APIs. If you want to see another example of using the Gmail API (displaying all your inbox labels), check out the Python Quickstart example in the official docs or its equivalent in Java (server-side, Android), iOS (Objective-C, Swift), C#/.NET, PHP, Ruby, JavaScript (client-side, Node.js), or Go. That’s it… hope you find these code samples useful in helping you get started with the Gmail API!

EXTRA CREDIT: To test your skills and challenge yourself, try writing code that allows users to perform a search across their email, or perhaps creating an email draft, adding attachments, then sending them! Note that to prevent spam, there are strict Program Policies that you must abide with… any abuse could rate limit your account or get it shut down. Check out those rules plus other Gmail terms of use here.

September 19, 2016 12:04 PM

Jeff Knupp

Writing Idiomatic Python Video Four Is Out!

After an unplanned two-year hiatus, the fourth video in the Writing Idiomatic Python Video Series is out! This was long overdue, and for that I sincerely apologize. All I can do now is continue to produce the rest at a steady clip and get them out as quickly as possible. I hope you find the video useful! Part 5 will be out soon…

September 19, 2016 06:07 AM

September 18, 2016

Omaha Python Users Group

September 21 Meeting

Lightning Talks, discussion, and topic selection for this seasons meetings.

Event Details:

Where: DoSpace @ 7205 Dodge Street / Meeting Room #2
When: July 20, 2016 @ 6:30pm – 8:00pm
Who: People interested in programming with Python

September 18, 2016 11:37 PM

Experienced Django

KidsTasks – Working on Models

This is part two in the KidsTasks series where we’re designing and implementing a django app to manage daily task lists for my kids. See part 1 for details on requirements and goals.

Model Design Revisited

As I started coding up the models and the corresponding admin pages for the design I presented in the last section it became clear that there were several bad assumptions and mistakes in that design. (I plan to write up a post about designs and their mutability in the coming weeks.)

The biggest conceptual problem I had was the difference between “python objects” and “django models”. Django models correspond to database tables and thus do not map easily to things like “I want a list of Tasks in my DayOfWeekSchedule”.

After building up a subset of the models described in part 1, I found that the CountedTask model wasn’t going to work the way I had envisioned. Creating it as a direct subclass of Task caused unexpected (initially, at least) behavior in that all CountedTasks were also Tasks and thus showed up in all lists where Tasks could be added. While this behavior makes sense, it doesn’t fit the model I was working toward. After blundering with a couple of other ideas, it finally occurred to me that the main problem was the fundamental design. If something seems really cumbersome to implement it might be pointing to a design error.

Stepping back, it occurred to me that the idea of a “Counted” task was putting information at the wrong level. An individual task shouldn’t care if it’s one of many similar tasks in a Schedule, nor should it know how many there are. That information should be part of the Schedule models instead.

Changing this took more experimenting than I wanted, largely due to a mismatch in my thinking and how django models work. The key for working through this level of confusion was by trying to figure out how to add multiple Tasks of the same type to a Schedule. That led me to this Stack Overflow question which describes using an intermediate model to relate the two items. This does exactly what I’m looking for, allowing me to say that Kid1 needs to Practice Piano twice on Tuesdays without the need for a CountedTask model.

Changing this created problems for our current admin.py, however. I found ideas for how to clean that up here, which describes how to use inlines as part of the admin pages.

Using inlines and intermediate models, I was able to build up a schedule for a kid in a manner similar to my initial vision. The next steps will be to work on views for this model and see where the design breaks!

Wrap Up

I’m going to stop this session here but I want to add a few interesting points and tidbits I’ve discovered on the way:

If you make big changes to the model and you don’t yet have any significant data, you can wipe out the database easily and start over with the following steps:

$ rm db.sqlite3 tasks/migrations/ -rf
$ ./manage.py makemigrations tasks
$ ./manage.py migrate
$ ./manage.py createsuperuser
$ ./manage.py runserver

For models, it’s definitely worthwhile to add a __str__ (note: maybe __unicode__?) method and a Meta class to each one. The __str__ method controls how the class is described, at least in the admin pages. The Meta class allows you to control the ordering when items of this model are listed. Cool!
I found (and forgot to note where) in the official docs an example of using a single char to store the name of the week while displaying the full day name. This looks like this:

    day_of_week_choices = (
        ('M', 'Monday'),
        ('T', 'Tuesday'),
        ('W', 'Wednesday'),
        ('R', 'Thursday'),
        ('F', 'Friday'),
        ('S', 'Saturday'),
        ('N', 'Sunday'),
    )
	...
    day_name = models.CharField(max_length=1, choices=day_of_week_choices)

NOTE that we’re going to have to tie the “name” fields in many of these models to the kid to which it’s associated. I’m considering if the kid can be combined into the schedule, but I don’t think that’s quite right. Certainly changes are coming to that part of the design.

That’s it! The state of the code at the point I’m writing this can be found here:

git@github.com:jima80525/KidTasks.git
git checkout blog/02-Models-first-steps

Thanks for reading!

September 18, 2016 10:48 PM

François Dion

Something for your mind: Polymath Podcast launched

Some episodes

will have more Art content, some will have more Business content, some will have more Science content, and some will be a nice blend of different things. But for sure, the show will live up to its name and provide you with “something for your mind”. It might raise more questions than it answers, and that is fine too.

Episode 000

Listen to Something for your mind on http://Artchiv.es

Francois Dion
@f_dion

September 18, 2016 09:23 PM

Weekly Python Chat

Tips for learning Django

Making mistakes is a great way to learn, but some mistakes are kind of painful to make. Special guest Melanie Crutchfield and I are going to chat about things you’ll wish you knew earlier when making your first website with Django.

September 18, 2016 05:00 PM

Krzysztof Żuraw

Python & WebDAV- part two

In the last post, I set up owncloud with WebDAV server. Now it’s time to use it.

Table of Contents:

I was searching for good python library to work with WebDAV for a long time.
I finally found it- easywebdav. It works
nicely but the problem is that doesn’t have support for python 3. Let’s jump quickly
to my simple project for cli tool- webdav editor.

I decided to create cli tool to work with WebDAV server- webdav editor. Right now
it supports only basic commands like login, listing the content of directories, uploading
and downloading files.

I started from creating file webdav_utility.py:

from urlparse import urlparse
import easywebdav


class Client(object):

    def login(self, *args):
        argparse_namespace = args[0]
        url_components = urlparse(argparse_namespace.server)
        host, port = url_components.netloc.split(':')
        webdav_client = easywebdav.connect(
            host=host,
            port=port,
            path=url_components.path,
            username=argparse_namespace.user,
            password=argparse_namespace.password
        )
        pickle.dump(webdav_client, open('webdav_login', 'wb'))

    def list_content(self, *args):
        argparse_namespace = args[0]
        print [i.name for i in webdav_client.ls(argparse_namespace.path)]

    def upload_file(self, *args):
        argparse_namespace = args[0]
        webdav_client.upload(
            argparse_namespace.from_path, argparse_namespace.to_path
        )

    def download_file(self, *args):
        argparse_namespace = args[0]
        webdav_client.download(
            argparse_namespace.from_path, argparse_namespace.to_path
        )

In class Client, I write simple functions that are wrappers around easywebdav
API. In login I parse provided URL in form like localhost:8888/owncloud/remote.php/webdav
to get host, port and path for easywebdav.connect to establish a proper connection.

Another method that is worth mentioning is list_content where I retrieve names of files under a
directory on WebDAV server. In every method I provide *args argument and argparse_namespace
which leads to another component of application- module cli.py:

import argparse

from webdav_utility import Client

client = Client()

parser = argparse.ArgumentParser(description='Simple command line utility for WebDAV')
subparsers = parser.add_subparsers(help='Commands')

login_parser = subparsers.add_parser('login', help='Authenticate with WebDAV')
login_parser.add_argument('-s', '--server', required=True)
login_parser.add_argument('-u', '--user', required=True)
login_parser.add_argument('-p', '--password', required=True)
login_parser.set_defaults(func=client.login)

ls_parser = subparsers.add_parser('ls', help='List content of directory under WebDAV')
ls_parser.add_argument('-p', '--path', required=True)
ls_parser.set_defaults(func=client.list_content)

upload_parser = subparsers.add_parser('upload', help='Upload files to WebDAV')
upload_parser.add_argument('-f', '--from', metavar='PATH')
upload_parser.add_argument('-t', '--to', metavar='PATH')
upload_parser.set_defaults(func=client.upload_file)

download_parser = subparsers.add_parser('download', help='Download files from WebDAV')
download_parser.add_argument('-f', '--from', metavar='PATH')
download_parser.add_argument('-t', '--to', metavar='PATH')
download_parser.set_defaults(func=client.download_file)

if __name__ == '__main__':
    args = parser.parse_args()
    args.func(args)

There I use argparse. I create the main parser
with four additionals subparsers for login, ls, upload and download. Thanks to that
I have different namespace for every one of previously mentioned subparsers.

Problem is that this
solution is not generic enough because after running my command with login parameter I can get:
Namespace(server='localhost:8888', user='admin', password='admin') and running the same command but
with ls I will receive: Namespace(path='path_to_file'). To handle that I used set_defaults for
every subparser. I tell argparse to invoke function specified by func keyword (which is different for every command).
Thanks to that I only need to call this code once:

if __name__ == '__main__':
    args = parser.parse_args()
    args.func(args)

That’s the reason I introduce argparse_namespaces in Client.

OK, tool right now works nicely, but there is no place to store information if I am logged or not. So
calling python cli.py login -s localhost -u admin -p admin works but python cli.py ls -p / not.
To overcome that I came up with an idea to pickle webdav_client like this:

class Client(object):

  def login(self, *args):
    # login user etc
    pickle.dump(webdav_client, open('webdav_login', 'wb'))

  def list_content(self, *args):
    webdav_client = pickle.load(open('webdav_login', 'rb'))
    # rest of the code

Then I can easily run:

$ python cli.py login --server example.org/owncloud/remote.php/webdav --user admin --password admin
$ python cli.py ls --path '/'
['/owncloud/remote.php/webdav/', '/owncloud/remote.php/webdav/Documents/', '/owncloud/remote.php/webdav/Photos/', '/owncloud/remote.php/webdav/ownCloud%20Manual.pdf']

In this series, I setup an owncloud server and write simple tool just to show capabilities of WebDAV. I believe
that some work, especially for webdav editor cli can still be done: the better way to handle user auth than pickle,
separate Client class from argparse dependencies. If you have additional comments or thoughts please
write a comment! Thank you for reading.

Michał Bultrowicz

Choosing a CI service for your open-source project

I host my code on GitHub, as probably many or you do .
The easiest way to have it automatically tested in a clean environment (what everyone should do)
is, of course, to use one of the hosted CI services integrated with GitHub.

September 18, 2016 12:00 AM

September 17, 2016

Philip Semanchuk

Thanks for PyData Carolinas

My PyData Pass

Thanks to all who made PyData Carolinas 2016 a success! I had conversations about eating well while on the road, conveyor belts, and a Fortran algorithm to calculate the interaction of charged particles. Great stuff!

My talk was on getting Python to talk to compiled languages; specifically C, Fortran, and C++.

Once the video is online I’ll update this post with a link.

September 17, 2016 10:02 PM

Abu Ashraf Masnun

Python: Using the `requests` module to download large files efficiently

If you use Python regularly, you might have come across the wonderful requests library. I use it almost everyday to read urls or make POST requests. In this post, we shall see how we can download a large file using the requests module with low memory consumption.

To Stream or Not to Stream

When downloading large files/data, we probably would prefer the streaming mode while making the get call. If we use the stream parameter and set it to True, the download will not immediately start. The file download will start when we try to access the content property or try to iterate over the content using iter_content / iter_lines.

If we set stream to False, all the content is downloaded immediately and put into memory. If the file size is large, this can soon cause issues with higher memory consumption. On the other hand – if we set stream to False, the content is not downloaded, but the headers are downloaded and the connection is kept open. We can now choose to proceed downloading the file or simply cancel it.

But we must also remember that if we decide to stream the file, the connection will remain open and can not go back to the connection pool. If we’re working with many large files, these might lead to some efficiency. So we should carefully choose where we should stream. And we should take proper care to close the connections and dispose any unused resources in such scenarios.

Iterating The Content

By setting the stream parameter, we have delayed the download and avoided taking up large chunks of memory. The headers have been downloaded but the body of the file still awaits retrieval. We can now get the data by accessing the content property or choosing to iterate over the content. Accessing the content directly would read the entire response data to memory at once. That is a scenario we want to avoid when our target file is quite large.

So we are left with the choice to iterate over the content. We can use iter_content where the content would be read chunk by chunk. Or we can use iter_lines where the content would be read line by line. Either way, the entire file will not be loaded into memory and keep the memory usage down.

Code Example

response = requests.get(url, stream=True)
handle = open(target_path, "wb")
for chunk in response.iter_content(chunk_size=512):
    if chunk:  # filter out keep-alive new chunks
        handle.write(chunk)

The code should be self explanatory. We are opening the url with stream set to True. And then we are opening a file handle to the target_path (where we want to save our file). Then we iterate over the content, chunk by chunk and write the data to the file.

That’s it!

September 17, 2016 09:48 PM

Glyph Lefkowitz

Hitting The Wall

I’m an introvert.

I say that with a full-on appreciation of
just how awful
thinkpieces on “introverts” are.

However, I feel compelled to write about this today because of a certain type
of social pressure that a certain type of introvert faces. Specifically, I am
a high-energy introvert.

Cementing this piece’s place in the hallowed halls of just awful thinkpieces,
allow me to compare my mild cognitive fatigue with the plight of those
suffering from chronic illness and disability. There’s a social phenomenon
associated with many chronic illnesses,
“but you don’t LOOK sick”, where
well-meaning people will look at someone who is suffering, with no obvious
symptoms, and imply that they really ought to be able to “be normal”.

As a high-energy introvert, I frequently participate in social events. I go to
meet-ups and conferences and I engage in plenty of
public speaking. I am, in a sense,
comfortable extemporizing in front of large groups of strangers.

This all sounds like extroverted behavior, I know. But there’s a key
difference.

Let me posit two axes for personality type: on the X axis, “introvert” to
“extrovert”, and on the Y, “low energy” up to “high energy”.

The X axis describes what kinds of activities give you energy, and the Y axis
describes how large your energy reserves are for the other type.

Notice that I didn’t say which type of activity you enjoy.

Most people who would self-describe as “introverts” are in the
low-energy/introvert quadrant. They have a small amount of energy available
for social activities, which they need to frequently re-charge by doing
solitary activities. As a result of frequently running out of energy for
social activities, they don’t enjoy social activities.

Most people who would self-describe as “extroverts” are also on the
“low-energy” end of the spectrum. They have low levels of patience for
solitary activity, and need to re-charge by spending time with friends, going
to parties, etc, in order to have the mental fortitude to sit still for a while
and focus. Since they can endlessly get more energy from the company of
others, they tend to enjoy social activities quite a bit.

Therefore we have certain behaviors we expect to see from “introverts”. We
expect them to be shy, and quiet, and withdrawn. When someone who behaves this
way has to bail on a social engagement, this is expected. There’s a certain
affordance for it. If you spend a few hours with them, they may be initially
friendly but will visibly become uncomfortable and withdrawn.

This “energy” model of personality is of course an oversimplification – it’s my
personal belief that everyone needs some balance of privacy and socialization
and solitude and eventually overdoing one or the other will be bad for anyone –
but it’s a useful one.

As a high-energy introvert, my behavior often confuses people. I’ll show up
at a week’s worth of professional events, be the life of the party, go out to
dinner at all of them, and then disappear for a month. I’m not visibily shy –
quite the opposite, I’m a gregarious raconteur. In fact, I quite visibly
enjoy the company of friends. So, usually, when I try to explain that I am
quite introverted, this claim is met with (quite understandable) skepticism.

In fact, I am quite functionally what society expects of an “extrovert” – until
I hit the wall.

In endurance sports, one is said to
“hit the wall” at the point
where all the short-term energy reserves in one’s muscles are exhausted, and
there is a sudden, dramatic loss of energy. Regardless, many people enjoy
endurance sports; part of the challenge of them is properly managing your
energy.

This is true for me and social situations. I do enjoy social situations
quite a bit! But they are nevertheless quite taxing for me, and without
prolonged intermissions of solitude, eventually I get to the point where I can
no longer behave as a normal social creature without an excruciating level of
effort and anxiety.

Several years ago, I attended a prolonged social event where I hit the
wall, hard. The event itself was several hours too long for me, involved
meeting lots of strangers, and in the lead-up to it I hadn’t had a weekend to
myself for a few weeks due to work commitments and family stuff. Towards the
end I noticed I was developing a completely
flat affect, and had to
start very consciously performing even basic body language, like looking at
someone while they were talking or smiling. I’d never been so exhausted and
numb in my life; at the time I thought I was just stressed from work.

Afterwards though, I started having a lot of weird nightmares,
even during the daytime.
This concerned me, since I’d never had such a severe reaction to a social
situation, and I didn’t have good language to describe it. It was also a
little perplexing that what was effectively a nice party, the first half of
which had even been fun for me, would cause such a persistent negative reaction
after the fact. After some research, I eventually discovered that such
involuntary thoughts are
a hallmark of PTSD.

While I’ve managed to avoid this level of exhaustion before or since, this was
a real learning experience for me that the consequences of incorrectly managing
my level of social interaction can be quite severe.

I’d rather not do that again.

The reason I’m writing this, though, is not to avoid future anxiety. My
social energy reserves are quite large enough, and I now have enough
self-knowledge, that it is extremely unlikely I’d ever find myself in that
situation again.

The reason I’m writing is to help people understand that I’m not blowing them
off because I don’t like them. Many times now, I’ve declined or bailed an
invitation from someone, and later heard that they felt hurt that I was
passive-aggressively refusing to be friendly.

I certainly understand this reaction. After all, if you see someone at a party
and they’re clearly having a great time and chatting with everyone, but then
when you invite them to do something, they say “sorry, too much social
stuff”, that seems like a pretty passive-aggressive way to respond.

You might even still be skeptical after reading this. “Glyph, if you were
really an introvert, surely, I would have seen you looking a little shy and
withdrawn. Surely I’d see some evidence of stage fright before your talks.”

But that’s exactly the problem here: no, you wouldn’t.

At a social event, since I have lots of energy to begin with, I’ll build up a
head of steam on burning said energy that no low-energy introvert would ever
risk. If I were to run out of social-interaction-juice, I’d be in the middle
of a big crowd telling a long and elaborate story when I find myself exhausted.
If I hit the wall in that situation, I can’t feel a little awkward and make
excuses and leave; I’ll be stuck creepily faking a smile like a sociopath and
frantically looking for a way out of the converstaion for an hour, as the
pressure from a large crowd of people rapidly builds up months worth of
nightmare fuel from my spiraling energy deficit.

Given that I know that’s what’s going to happen, you won’t see me when I’m
close to that line. You won’t be in at my desk when I silently sit and type
for a whole day, or on my couch when I quietly read a book for ten hours at a
time. My solitary side is, by definition, hidden.

But, if I don’t show up to your party, I promise: it’s not you, it’s me.

September 17, 2016 09:18 PM

Podcast.init

Episode 75 – Sandstorm.io with Asheesh Laroia

Summary

Sandstorm.io is an innovative platform that aims to make self-hosting applications easier and more maintainable for the average individual. This week we spoke with Asheesh Laroia about why running your own services is desirable, how they have made security a first priority, how Sandstorm is architected, and what the installation process looks like.

Brief Introduction

Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show you can visit our site at pythonpodcast.com
Linode is sponsoring us this week. Check them out at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for your next project
We are also sponsored by Rollbar. Rollbar is a service for tracking and aggregating your application errors so that you can find and fix the bugs in your application before your users notice they exist. Use the link rollbar.com/podcastinit to get 90 days and 300,000 errors for free on their bootstrap plan.
Hired has also returned as a sponsor this week. If you’re looking for a job as a developer or designer then Hired will bring the opportunities to you. Sign up at hired.com/podcastinit to double your signing bonus.
Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.
To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers
Join our community! Visit discourse.pythonpodcast.com for your opportunity to find out about upcoming guests, suggest questions, and propose show ideas.
I would also like to mention that the organizers of PyCon Zimbabwe are looking to the global Python community for help in supporting their event. If you would like to donate the link will be in the show notes.
Your hosts as usual are Tobias Macey and Chris Patti
Today we’re interviewing Asheesh Laroia about Sandstorm.io, a project that is trying to make self-hosted applications easy and secure for everyone.

Use the promo code podcastinit20 to get a $20 credit when you sign up!

I’m excited to tell you about a new sponsor of the show, Rollbar.

One of the frustrating things about being a developer, is dealing with errors… (sigh)

Relying on users to report errors
Digging thru log files trying to debug issues
A million alerts flooding your inbox ruining your day…

With Rollbar’s full-stack error monitoring, you get the context, insights and control you need to find and fix bugs faster. It’s easy to get started tracking the errors and exceptions in your stack.You can start tracking production errors and deployments in 8 minutes – or less, and Rollbar works with all major languages and frameworks, including Ruby, Python, Javascript, PHP, Node, iOS, Android and more.You can integrate Rollbar into your existing workflow such as sending error alerts to Slack or Hipchat, or automatically create new issues in Github, JIRA, Pivotal Tracker etc.

We have a special offer for Podcast.__init__ listeners. Go to rollbar.com/podcastinit, signup, and get the Bootstrap Plan free for 90 days. That’s 300,000 errors tracked for free.Loved by developers at awesome companies like Heroku, Twilio, Kayak, Instacart, Zendesk, Twitch and more. Help support Podcast.__init__ and give Rollbar a try a today. Go to rollbar.com/podcastinit

On Hired software engineers & designers can get 5+ interview requests in a week and each offer has salary and equity upfront. With full time and contract opportunities available, users can view the offers and accept or reject them before talking to any company. Work with over 2,500 companies from startups to large public companies hailing from 12 major tech hubs in North America and Europe. Hired is totally free for users and If you get a job you’ll get a $2,000 “thank you” bonus. If you use our special link to signup, then that bonus will double to $4,000 when you accept a job. If you’re not looking for a job but know someone who is, you can refer them to Hired and get a $1,337 bonus when they accept a job.

Interview with Asheesh Laroia

Introductions
How did you get introduced to Python? – Tobias
Can you start by telling everyone about the Sandstorm project and how you got involved with it? – Tobias
What are some of the reasons that an individual would want to self-host their own applications rather than using comparable services available through third parties? – Tobias
How does Sandstorm try to make the experience of hosting these various applications simple and enjoyable for the broadest variety of people? – Tobias
What does the system architecture for Sandstorm look like? – Tobias
I notice that Sandstorm requires a very recent Linux kernel version. What motivated that choice and how does it affect adoption? – Chris
One of the notable aspects of Sandstorm is the security model that it uses. Can you explain the capability-based authorization model and how it enables Sandstorm to ensure privacy for your users? – Tobias
What are some of the most difficult challenges facing you in terms of software architecture and design? – Tobias
What is involved in setting up your own server to run Sandstorm and what kinds of resources are required for different use cases? – Tobias
You have a number of different applications available for users to install. What is involved in making a project compatible with the Sandstorm runtime environment? Are there any limitations in terms of languages or application architecture for people who are targeting your platform? – Tobias
How much of Sandstorm is written in Python and what other languages does it use? – Tobias

Keep In Touch

Picks

Tobias
Chris
Asheesh

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Summary
Sandstorm.io is an innovative platform that aims to make self-hosting applications easier and more maintainable for the average individual. This week we spoke with Asheesh Laroia about why running your own services is desirable, how they have made security a first priority, how Sandstorm is architected, and what the installation process looks like.Brief IntroductionHello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show you can visit our site at pythonpodcast.comLinode is sponsoring us this week. Check them out at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for your next projectWe are also sponsored by Rollbar. Rollbar is a service for tracking and aggregating your application errors so that you can find and fix the bugs in your application before your users notice they exist. Use the link rollbar.com/podcastinit to get 90 days and 300,000 errors for free on their bootstrap plan.Hired has also returned as a sponsor this week. If you’re looking for a job as a developer or designer then Hired will bring the opportunities to you. Sign up at hired.com/podcastinit to double your signing bonus.Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workersJoin our community! Visit discourse.pythonpodcast.com for your opportunity to find out about upcoming guests, suggest questions, and propose show ideas.I would also like to mention that the organizers of PyCon Zimbabwe are looking to the global Python community for help in supporting their event. If you would like to donate the link will be in the show notes.Your hosts as usual are Tobias Macey and Chris PattiToday we’re interviewing Asheesh Laroia about Sandstorm.io, a project that is trying to make self-hosted applications easy and secure for everyone.
Use the promo code podcastinit20 to get a $20 credit when you sign up!

I’m excited to tell you about a new sponsor of the show, Rollbar.
One of the frustrating things about being a developer, is dealing with errors… (sigh)Relying on users to report errorsDigging thru log files trying to debug issuesA million alerts flooding your inbox ruining your day…With Rollbar’s full-stack error monitoring, you get the context, insights and control you need to find and fix bugs faster. It’s easy to get started tracking the errors and exceptions in your stack.You can start tracking production errors and deployments in 8 minutes – or less, and Rollbar works with all major languages and frameworks, including Ruby, Python, Javascript, PHP, Node, iOS, Android and more.You can integrate Rollbar into your existing workflow such as sending error alerts to Slack or Hipchat, or automatically create new issues in Github, JIRA, Pivotal Tracker etc.

On Hired software engineers designers can get 5+ interview requests in a week and each offer has salary and equity upfront. With full time and contract opportunities available, users can view the offers and accept or reject them before talking to any company. Work with over 2,500 companies from startups to large public companies hailing from 12 major tech hubs in North America and Europe. Hired is totally free for users and If you get a job you’ll get a $2,000 “thank you” bonus. If you use our special link to

September 17, 2016 08:52 PM

BangPypers

Ansible Workshop – BangPypers September Meetup

September Bangpypers meetup happened at RedHat office in Bannerghata road. 31 people attended the event.

In the previous meetup Abraham presented a talk on ansible. Many participants were interested in it and so we planned for workshop this time.

Abraham started workshop with brief explanation of virtualbox, vagrant and ansible. He helped participants to setup them.

After that he explained simple ansible modules like ping, shell e.t.c and how to run those on target machines.

Later he explained about ansible playbook and how to can configure them.

We had lunch break for about 30 minutes. After resuming from break, he showed a demo of deploying a django web app. Here he used 4 machines(1 load balancer and 3 web apps) and showed hot to automatically configure and orchestrate them.

Then he showed how to update all webservers with zero downtime.

Here are a few photos from workshop.

Workshop content can be found in gitHub.

Thanks to Abraham for conducting workshop and Redhat for hosting the event.

September 17, 2016 06:26 PM

End Point

Executing Custom SQL in Django Migrations

Since version 1.7, Django has natively supported database migrations similar to Rails migrations. The biggest difference fundamentally between the two is the way the migrations are created: Rails migrations are written by hand, specifying changes you want made to the database, while Django migrations are usually automatically generated to mirror the database schema in its current state.

Usually, Django’s automatic schema detection works quite nicely, but occasionally you will have to write some custom migration that Django can’t properly generate, such as a functional index in PostgreSQL.

Creating an empty migration

To create a custom migration, it’s easiest to start by generating an empty migration. In this example, it’ll be for an application called blog:

$ ./manage.py makemigrations blog --empty -n create_custom_index
Migrations for 'blog':
  0002_create_custom_index.py:

This generates a file at blog/migrations/0002_create_custom_index.py that will look something like this:

# -*- coding: utf-8 -*-                                                                                                                                                                                             
# Generated by Django 1.9.4 on 2016-09-17 17:35                                                                                                                                                                     
from __future__ import unicode_literals                                                                                                                                                                             
                                                                                                                                                                                                                    
from django.db import migrations                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
class Migration(migrations.Migration):                                                                                                                                                                              
                                                                                                                                                                                                                    
    dependencies = [                                                                                                                                                                                                
        ('blog', '0001_initial'),                                                                                                                                                                                   
    ]                                                                                                                                                                                                               
                                                                                                                                                                                                                    
    operations = [                                                                                                                                                                                                  
    ]

Adding Custom SQL to a Migration

The best way to run custom SQL in a migration is through the migration.RunSQL operation. RunSQL allows you to write code for migrating forwards and backwards—that is, applying migrations and unapplying them. In this example, the first string in RunSQL is the forward SQL, the second is the reverse SQL.

# -*- coding: utf-8 -*-                                                                                                                                                                                             
# Generated by Django 1.9.4 on 2016-09-17 17:35                                                                                                                                                                     
from __future__ import unicode_literals                                                                                                                                                                             
                                                                                                                                                                                                                    
from django.db import migrations                                                                                                                                                                                    
                                                                                                                                                                                                                    
                                                                                                                                                                                                                    
class Migration(migrations.Migration):                                                                                                                                                                              
                                                                                                                                                                                                                    
    dependencies = [                                                                                                                                                                                                
        ('blog', '0001_initial'),                                                                                                                                                                                   
    ]                                                                                                                                                                                                               
                                                                                                                                                                                                                    
    operations = [                                                                                                                                                                                                  
        migrations.RunSQL(                                                                                                                                                                                          
            "CREATE INDEX i_active_posts ON posts(id) WHERE active",                                                                                                                                         
            "DROP INDEX i_active_posts"                                                                                                                                                                             
        )                                                                                                                                                                                                           
    ]

Unless you’re using Postgres for your database, you’ll need to install the sqlparse library, which allows Django to break the SQL strings into individual statements.

Running the Migrations

Running your migrations is easy:

$ ./manage.py migrate
Operations to perform:
  Apply all migrations: blog, sessions, auth, contenttypes, admin
Running migrations:
  Rendering model states... DONE
  Applying blog.0002_create_custom_index... OK

Unapplying migrations is also simple. Just provide the name of the app to migrate and the id of the migration you want to go to, or “zero” to reverse all migrations on that app:

$./manage.py migrate blog 0001
Operations to perform:
  Target specific migration: 0001_initial, from blog
Running migrations:
  Rendering model states... DONE
  Unapplying blog.0002_create_custom_index... OK

Hand-written migrations can be used for many other operations, including data migrations. Full documentation for migrations can be found in the Django documentation.

(This post originally covered South migrations and was updated by Phin Jensen to illustrate the now-native Django migrations.)

September 17, 2016 03:20 PM

Anatoly Techtonik

Python Usability Bugs: subprocess.Popen executable

subprocess.Popen seems to be designed as a “swiss army knife” of managing external processes, and while the task is pretty hard to solve in cross-platform way, it seems the people who have contributed to it did manage to achieve that. But it still came with some drawbacks and complications. Let’s study one of these that I think is a top one from usability point of view, because it confuses people a lot.

I’ve got a simple program that prints its name and own arguments (forgive me for Windows code, as I was debugging the issue on Windows, but this works the same on Linux too). The program is written in Go to get single executable, because subprocess has special handling for child Python processes (another usability bug for another time).

>argi.exe 1 2 3 4
prog: E:argi.exe
args: [1 2 3 4]

Let’s execute it with subprocess.Popen, and for that I almost always look up the official documentation for Popen prototype:

subprocess.Popen(args, bufsize=0, executable=None, stdin=None, stdout=None, stderr=None, preexec_fn=None, close_fds=False, shell=False, cwd=None, env=None, universal_newlines=False, startupinfo=None, creationflags=0)

Quite scary, right? But let’s skip confusing part and quickly figure out something out of it (because time is scarce). Looks like this should do the trick:

import subprocessargs = "1 2 3 4".split()
p = subprocess.Popen(args, executable="argi.exe")
p.communicate()

After saving this code to “subs.py” and running it, you’d probably expect something like this :

> python subs.py
prog: E:argi.exe
args: [1 2 3 4]

And… you won’t get this. What you get is this:

> python subs.py
prog: 1
args: [2 3 4]

And that’s kind of crazy – not only the executable was renamed, but the first argument was lost, and it appears that this is actually a documented behavior. So let’s define Python Usability Bug as something that is documented but not expected (by most folks who is going to read the code). The trick to get code do what is expected is never use executable argument to subprocess.Popen:

import subprocessargs = "1 2 3 4".split()
args.insert(0, "argi.exe")
p = subprocess.Popen(args)
p.communicate()

>python suby.py
prog: argi.exe
args: [1 2 3 4]

The explanation for former “misbehavior” is that executable is a hack that allows to rename program when running subprocess. It should be named substitute, or – even better – altname to work as an alternative name to pass to child process (instead of providing alternative executable for the former name). To make subprocess.Popen even more intuitive, the args argument should have been named command.

From the high level design point of view, the drawbacks of this function is that it *does way too much*, its arguments are not always intuitive – it takes *a lot of time to grok official docs*, and I need to read it *every time*, because there are too many little important details of Popen behavior (have anybody tried to create its state machine?), so over the last 5 years I still discover various problems with it. Today I just wanted to save you some hours that I’ve wasted myself while debugging pymake on Windows.

That’s it for now. Bonus points to update this post with link when I get more time / mana for it:

[ ] people who have contributed to it
[ ] it came with drawbacks
[ ] have anybody tried to create its state machine?
[ ] subprocess has special handling for child Python processes

September 17, 2016 11:46 AM

Weekly Python StackOverflow Report

(xxxvii) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2016-09-17 10:31:28 GMT

What’s the closest I can get to calling a Python function using a different Python version? – [13/4]
How to traverse cyclic directed graphs with modified DFS algorithm – [11/2]
Difference between generators and functions returning generators – [8/3]
Why does python’s datetime.datetime.strptime(‘201412’, ‘%Y%m%d’) not raise a ValueError? – [8/3]
Accessing the choices passed to argument in argparser? – [7/3]
Where’s the logic that returns an instance of a subclass of OSError exception class? – [7/2]
TypeError: str object is not an iterator – [6/7]
condensing multiple if statements in python – [6/4]
How to fillna() with value 0 after calling resample? – [6/3]
How to read strange csv files in Pandas? – [6/3]

September 17, 2016 10:32 AM

Nick Coghlan

The Python Packaging Ecosystem

From Development to Deployment

There have been a few recent articles reflecting on the current status of
the Python packaging ecosystem from an end user perspective, so it seems
worthwhile for me to write-up my perspective as one of the lead architects
for that ecosystem on how I characterise the overall problem space of software
publication and distribution, where I think we are at the moment, and where I’d
like to see us go in the future.

For context, the specific articles I’m replying to are:

These are all excellent pieces considering the problem space from different
perspectives, so if you’d like to learn more about the topics I cover here,
I highly recommend reading them.

Since it heavily influences the way I think about packaging system design in
general, it’s worth stating my core design philosophy explicitly:

As a software consumer, I should be able to consume libraries, frameworks,
and applications in the binary format of my choice, regardless of whether
or not the relevant software publishers directly publish in that format
As a software publisher working in the Python ecosystem, I should be able to
publish my software once, in a single source-based format, and have it be
automatically consumable in any binary format my users care to use

This is emphatically not the way many software packaging systems work – for a
great many systems, the publication format and the consumption format are
tightly coupled, and the folks managing the publication format or the
consumption format actively seek to use it as a lever of control over a
commercial market (think operating system vendor controlled application stores,
especially for mobile devices).

While we’re unlikely to ever pursue the specific design documented in the
rest of the PEP (hence the “Deferred” status), the
“Development, Distribution, and Deployment of Python Software”
section of PEP 426 provides additional details on how this philosophy applies
in practice.

I’ll also note that while I now work on software supply chain management
tooling at Red Hat, that wasn’t the case when I first started actively
participating in the upstream Python packaging ecosystem
design process. Back then I was working
on Red Hat’s main
hardware integration testing system, and
growing increasingly frustrated with the level of effort involved in
integrating new Python level dependencies into Beaker’s RPM based development
and deployment model. Getting actively involved in tackling these problems on
the Python upstream side of things then led to also getting more actively
involved in addressing them on the
Red Hat downstream side.

When talking about the design of software packaging ecosystems, it’s very easy
to fall into the trap of only considering the “direct to peer developers” use
case, where the software consumer we’re attempting to reach is another developer
working in the same problem domain that we are, using a similar set of
development tools. Common examples of this include:

Linux distro developers publishing software for use by other contributors to
the same Linux distro ecosystem
Web service developers publishing software for use by other web service
developers
Data scientists publishing software for use by other data scientists

In these more constrained contexts, you can frequently get away with using a
single toolchain for both publication and consumption:

Linux: just use the system package manager for the relevant distro
Web services: just use the Python Packaging Authority’s twine for publication
and pip for consumption
Data science: just use conda for everything

For newer languages that start in one particular domain with a preferred
package manager and expand outwards from there, the apparent simplicity arising
from this homogeneity of use cases may frequently be attributed as an essential
property of the design of the package manager, but that perception of inherent
simplicity will typically fade if the language is able to successfully expand
beyond the original niche its default package manager was designed to handle.

In the case of Python, for example, distutils was designed as a consistent
build interface for Linux distro package management, setuptools for plugin
management in the Open Source Application Foundation’s Chandler project, pip
for dependency management in web service development, and conda for local
language-independent environment management in data science.
distutils and setuptools haven’t fared especially well from a usability
perspective when pushed beyond their original design parameters (hence the
current efforts to make it easier to use full-fledged build systems like
Scons and Meson as an alternative when publishing Python packages), while pip
and conda both seem to be doing a better job of accommodating increases in
their scope of application.

This history helps illustrate that where things really have the potential to
get complicated (even beyond the inherent challenges of domain-specific
software distribution) is when you start needing to cross domain boundaries.
For example, as the lead maintainer of contextlib in the Python
standard library, I’m also the maintainer of the contextlib2 backport
project on PyPI. That’s not a domain specific utility – folks may need it
regardless of whether they’re using a self-built Python runtime, a pre-built
Windows or Mac OS X binary they downloaded from python.org, a pre-built
binary from a Linux distribution, a CPython runtime from some other
redistributor (homebrew, pyenv, Enthought Canopy, ActiveState,
Continuum Analytics, AWS Lambda, Azure Machine Learning, etc), or perhaps even
a different Python runtime entirely (PyPy, PyPy.js, Jython, IronPython,
MicroPython, VOC, Batavia, etc).

Fortunately for me, I don’t need to worry about all that complexity in the
wider ecosystem when I’m specifically wearing my contextlib2 maintainer
hat – I just publish an sdist and a universal wheel file to PyPI, and the rest
of the ecosystem has everything it needs to take care of redistribution
and end user consumption without any further input from me.

However, contextlib2 is a pure Python project that only depends on the
standard library, so it’s pretty much the simplest possible case from a
tooling perspective (the only reason I needed to upgrade from distutils to
setuptools was so I could publish my own wheel files, and the only reason I
haven’t switched to using the much simpler pure-Python-only flit instead of
either of them is that that doesn’t yet easily support publishing backwards
compatible setup.py based sdists).

This means that things get significantly more complex once we start wanting to
use and depend on components written in languages other than Python, so that’s
the broader context I’ll consider next.

When it comes to handling the software distribution problem in general, there
are two main ways of approaching it:

design a plugin management system that doesn’t concern itself with the
management of the application framework that runs the plugins
design a platform component manager that not only manages the plugins
themselves, but also the application frameworks that run them

This “plugin manager or platform component manager?” question shows up over and
over again in software distribution architecture designs, but the case of most
relevance to Python developers is in the contrasting approaches that pip and
conda have adopted to handling the problem of external dependencies for Python
projects:

pip is a plugin manager for Python runtimes. Once you have a Python runtime
(any Python runtime), pip can help you add pieces to it. However, by design,
it won’t help you manage the underlying Python runtime (just as it wouldn’t
make any sense to try to install Mozilla Firefox as a Firefox Add-On, or
Google Chrome as a Chrome Extension)
conda, by contrast, is a component manager for a cross-platform platform
that provides its own Python runtimes (as well as runtimes for other
languages). This means that you can get pre-integrated components, rather
than having to do your own integration between plugins obtained via pip and
language runtimes obtained via other means

What this means is that pip, on its own, is not in any way a direct
alternative to conda. To get comparable capabilities to those offered by conda,
you have to add in a mechanism for obtaining the underlying language runtimes,
which means the alternatives are combinations like:

apt-get + pip
dnf + pip
yum + pip
pyenv + pip
homebrew (Mac OS X) + pip
python.org Windows installer + pip
Enthought Canopy
ActiveState’s Python runtime + PyPM

This is the main reason why “just use conda” is excellent advice to any
prospective Pythonista that isn’t already using one of the platform component
managers mentioned above: giving that answer replaces an otherwise operating
system dependent or Python specific answer to the runtime management problem
with a cross-platform and (at least somewhat) language neutral one.

It’s an especially good answer for Windows users, as chocalatey/OneGet/Windows
Package Management isn’t remotely comparable to pyenv or homebrew at this point
in time, other runtime managers don’t work on Windows, and getting folks
bootstrapped with MinGW, Cygwin or the new (still experimental) Windows
Subsystem for Linux is just another hurdle to place between them and whatever
goal they’re learning Python for in the first place.

However, conda’s pre-integration based approach to tackling the external
dependency problem is also why “just use conda for everything” isn’t a
sufficient answer for the Python software ecosystem as a whole.

If you’re working on an operating system component for Fedora, Debian, or any
other distro, you actually want to be using the system provided Python
runtime, and hence need to be able to readily convert your upstream Python
dependencies into policy compliant system dependencies.

Similarly, if you’re wanting to support folks that deploy to a preconfigured
Python environment in services like AWS Lambda, Azure Cloud Functions, Heroku,
OpenShift or Cloud Foundry, or that use alternative Python runtimes like PyPy
or MicroPython, then you need a publication technology that doesn’t tightly
couple your releases to a specific version of the underlying language runtime.

As a result, pip and conda end up existing at slightly different points in the
system integration pipeline:

Publishing and consuming Python software with pip is a matter of “bring your
own Python runtime”. This has the benefit that you can readily bring your
own runtime (and manage it using whichever tools make sense for your use
case), but also has the downside that you must supply your own runtime
(which can sometimes prove to be a significant barrier to entry for new
Python users, as well as being a pain for cross-platform environment
management).
Like Linux system package managers before it, conda takes away the
requirement to supply your own Python runtime by providing one for you.
This is great if you don’t have any particular preference as to which
runtime you want to use, but if you do need to use a different runtime
for some reason, you’re likely to end up fighting against the tooling, rather
than having it help you. (If you’re tempted to answer “Just add another
interpreter to the pre-integrated set!” here, keep in mind that doing so
without the aid of a runtime independent plugin manager like pip acts as a
multiplier on the platform level integration testing needed, which can be a
significant cost even when it’s automated)

In case it isn’t already clear from the above, I’m largely happy with the
respective niches that pip and conda are carving out for themselves as a
plugin manager for Python runtimes and as a cross-platform platform focused
on (but not limited to) data analysis use cases.

However, there’s still plenty of scope to improve the effectiveness of the
collaboration between the upstream Python Packaging Authority and downstream
Python redistributors, as well as to reduce barriers to entry for participation
in the ecosystem in general, so I’ll go over some of the key areas I see for
potential improvement.

Sustainability and the bystander effect

It’s not a secret that the core PyPA infrastructure (PyPI, pip, twine,
setuptools) is
nowhere near as well-funded
as you might expect given its criticality to the operations of some truly
enormous organisations.

The biggest impact of this is that even when volunteers show up ready and
willing to work, there may not be anybody in a position to effectively wrangle
those volunteers, and help keep them collaborating effectively and moving in a
productive direction.

To secure long term sustainability for the core Python packaging infrastructure,
we’re only talking amounts on the order of a few hundred thousand dollars a
year – enough to cover some dedicated operations and publisher support staff for
PyPI (freeing up the volunteers currently handling those tasks to help work on
ecosystem improvements), as well as to fund targeted development directed at
some of the other problems described below.

However, rather than being a true
“tragedy of the commons“,
I personally chalk this situation up to a different human cognitive bias: the
bystander effect.

The reason I think that is that we have so many potential sources of the
necessary funding that even folks that agree there’s a problem that needs to be
solved are assuming that someone else will take care of it, without actually
checking whether or not that assumption is entirely valid.

The primary responsibility for correcting that oversight falls squarely on the
Python Software Foundation, which is why the Packaging Working Group was
formed in order to investigate possible sources of additional funding, as well
as to determine how any such funding can be spent most effectively.

However, a secondary responsibility also falls on customers and staff of
commercial Python redistributors, as this is exactly the kind of ecosystem
level risk that commercial redistributors are being paid to manage on behalf of
their customers, and they’re currently not handling this particular situation
very well. Accordingly, anyone that’s actually paying for CPython, pip, and
related tools (either directly or as a component of a larger offering), and
expecting them to be supported properly as a result, really needs to be asking
some very pointed question of their suppliers right about now. (Here’s a sample
question: “We pay you X dollars a year, and the upstream Python ecosystem is
one of the things we expect you to support with that revenue. How much of what
we pay you goes towards maintenance of the upstream Python packaging
infrastructure that we rely on every day?”).

One key point to note about the current situation is that as a 501(c)(3) public
interest charity, any work the PSF funds will be directed towards better
fulfilling that public interest mission, and that means focusing primarily on
the needs of educators and non-profit organisations, rather than those of
private for-profit entities.

Commercial redistributors are thus far better positioned to properly
represent their customers interests in areas where their priorities may
diverge from those of the wider community (closing the “insider threat”
loophole in PyPI’s current security model is a particular case that comes to
mind – see Making PyPI security independent of SSL/TLS).

Migrating PyPI to pypi.org

An instance of the new PyPI implementation (Warehouse) is up and running at
https://pypi.org/ and connected directly to the
production PyPI database, so folks can already explicitly opt-in to using it
over the legacy implementation if they prefer to do so.

However, there’s still a non-trivial amount of design, development and QA work
needed on the new version before all existing traffic can be transparently
switched over to using it.

Getting at least this step appropriately funded and a clear project management
plan in place is the main current focus of the PSF’s Packaging Working Group.

Making the presence of a compiler on end user systems optional

Between the wheel format and the manylinux1 usefully-distro-independent
ABI definition, this is largely handled now, with conda available as an
option to handle the relatively small number of cases that are still a problem
for pip.

The main unsolved problem is to allow projects to properly express the
constraints they place on target environments so that issues can be detected
at install time or repackaging time, rather than only being detected as
runtime failures. Such a feature will also greatly expand the ability to
correctly generate platform level dependencies when converting Python
projects to downstream package formats like those used by conda and Linux
system package managers.

Making PyPI security independent of SSL/TLS

PyPI currently relies entirely on SSL/TLS to protect the integrity of the link
between software publishers and PyPI, and between PyPI and software consumers.
The only protections against insider threats from within the PyPI
administration team are ad hoc usage of GPG artifact signing by some projects,
personal vetting of new team members by existing team members and 3rd party
checks against previously published artifact hashes unexpectedly changing.

A credible design for end-to-end package signing that adequately accounts for
the significant usability issues that can arise around publisher and consumer
key management has been available for almost 3 years at this point (see
Surviving a Compromise of PyPI
and
Surviving a Compromise of PyPI: the Maximum Security Edition).

However, implementing that solution has been gated not only on being able to
first retire the legacy infrastructure, but also the PyPI administators being
able to credibly commit to the key management obligations of operating the
signing system, as well as to ensuring that the system-as-implemented actually
provides the security guarantees of the system-as-designed.

Accordingly, this isn’t a project that can realistically be pursued until the
underlying sustainability problems have been suitably addressed.

Automating wheel creation

While redistributors will generally take care of converting upstream Python
packages into their own preferred formats, the Python-specific wheel format
is currently a case where it is left up to publishers to decide whether or
not to create them, and if they do decide to create them, how to automate that
process.

Having PyPI take care of this process automatically is an obviously desirable
feature, but it’s also an incredibly expensive one to build and operate.

Thus, it currently makes sense to defer this cost to individual projects, as
there are quite a few commercial continuous integration and continuous
deployment service providers willing to offer free accounts to open source
projects, and these can also be used for the task of producing release
artifacts. Projects also remain free to only publish source artifacts, relying
on pip’s implicit wheel creation and caching and the appropriate use of
private PyPI mirrors and caches to meet the needs of end users.

For downstream platform communities already offering shared build
infrastructure to their members (such as Linux distributions and conda-forge),
it may make sense to offer Python wheel generation as a supported output option
for cross-platform development use cases, in addition to the platform’s native
binary packaging format.

September 17, 2016 03:46 AM

September 16, 2016

pythonwise

Simple Object Pools

Sometimes we need object pools to limit the number of resource consumed. The most common example is database connnections.

In Go we sometime use a buffered channel as a simple object pool.

In Python, we can dome something similar with a Queue. Python’s context manager makes the resource handing automatic so clients don’t need to remember to return the object.

Here’s the output of both programs:


$ go run pool.go
worker 7 got resource 0
worker 0 got resource 2
worker 3 got resource 1
worker 8 got resource 2
worker 1 got resource 0
worker 9 got resource 1
worker 5 got resource 1
worker 4 got resource 0
worker 2 got resource 2
worker 6 got resource 1


$ python pool.py
worker 5 got resource 1
worker 8 got resource 2
worker 1 got resource 3
worker 4 got resource 1
worker 0 got resource 2
worker 7 got resource 3
worker 6 got resource 1
worker 3 got resource 2
worker 9 got resource 3
worker 2 got resource 1

September 16, 2016 04:49 PM

Enthought

Canopy Data Import Tool: New Updates

In May of 2016 we released the Canopy Data Import Tool, a significant new feature of our Canopy graphical analysis environment software. With the Data Import Tool, users can now quickly and easily import CSVs and other structured text files into Pandas DataFrames through a graphical interface, manipulate the data, and create reusable Python scripts to speed future data wrangling.

Watch a 2-minute demo video to see how the Canopy Data Import Tool works:

With the latest version of the Data Import Tool released this month (v. 1.0.4), we’ve added new capabilities and enhancements, including:

The ability to select and import a specific table from among multiple tables on a webpage,
Intelligent alerts regarding the saved state of exported Python code, and
Unlimited file sizes supported for import.

Download Canopy and start a free 7 day trial of the data import tool

New: Choosing from multiple tables on a webpage

Example of page with multiple tables for selection

The latest release of the Canopy Data Import Tool supports the selection of a specific table from a webpage for import, such as this Wikipedia page

In addition to CSVs and structured text files, the Canopy Data Import Tool (the Tool) provides the ability to load tables from a webpage. If the webpage contains multiple tables, by default the Tool loads the first table.

With this release, we provide the user with the ability to choose from multiple tables to import using a scrollable index parameter to select the table of interest for import.

Example: loading and working with tables from a Wikipedia page

For example, let’s try to load a table from the Demography of the UK wiki page using the Tool. In total, there are 10 tables on that wiki page.

As you can see in the screenshot below, the Tool initially loads the first table on the wiki page.
However, we are interested in loading the table ‘Vital statistics since 1960’, which is the fifth table on the page. (Note that indexing starts at 0). For a quick history lesson on why Python uses zero based indexing, see Guido van Rossum’s explanation here).
After the initial read-in, we can click on the ‘Table index on page’ scroll bar, choose ‘4’ and click on ‘Refresh Data’ to load the table of interest in the Data Import Tool.

See how the Canopy Data Import Tool loads a table from a webpage and prepares the data for manipulation and interaction:

The Data Import Tool allows you to select a specific table from a webpage where multiple are present, with a simple drop down menu. Once you’ve selected your table, you can readily toggle between 3 views: the Pandas DataFrame generated by the Tool, the raw data and the corresponding auto-generated Python code. Consecutively, you can export the DataFrame to the IPython console for further plotting and further analysis.

Further, as you can see, the first row contains column names and the first column looks like an index for the Data Frame. Therefore, you can select the ‘First row is column names’ checkbox and again click on ‘Refresh Data’ to prompt the Tool to re-read the table but, this time, use the data in the first row as column names. Then, we can right-click on the first column and select the ‘Set as Index’ option to make column 0 the index of the DataFrame.
You can toggle between the DataFrame, Raw Data and Python Code tabs in the Tool, to peek at the raw data being loaded by the Tool and the corresponding Python code auto-generated by the Tool.
Finally, you can click on the ‘Use DataFrame’ button, in the bottom right, to send the DataFrame to the IPython kernel in the Canopy User Environment, for plotting and further analysis.

New: Keeping track of exported Python scripts

The Tool generates Python commands for all operations performed by the user and provides the user with the ability to save the generated Python script. With this new update, the Tool keeps track of the saved and current states of the generated Python script and intelligently alerts the user if he/she clicks on the ‘Use DataFrame’ button without saving changes in the Python script.

New: Unlimited file sizes supported for import

In the initial release, we chose to limit the file sizes that can be imported using the Tool to 70 MB, to ensure optimal performance. With this release, we removed that restriction and allow files of any size to be uploaded with the tool. For files over 70 MB we now provide the user with a warning that interaction, manipulation and operations on the imported Data Frame might be slower than normal, and allow them to select whether to continue or begin with a smaller subset of data to develop a script to be applied to the larger data set.

Additions and Fixes

Along with the feature additions discussed above, based on continued user feedback, we implemented a number of UI/UX improvements and bug fixes in this release. For a complete list of changes introduced in version 1.0.4 of the Data Import Tool, please refer to the Release Notes page in the Tool’s documentation. If you have any feedback regarding the Data Import Tool, we’d love to hear from you at canopy.support@enthought.com.

Additional resources:

Download Canopy and start a free 7 day trial of the data import tool

See the Webinar “Fast Forward Through Data Analysis Dirty Work” for examples of how the Canopy Data Import Tool accelerates data munging:

September 16, 2016 04:47 PM

CubicWeb

Monitor all the things! … and early too!

Following the “release often, release early” mantra, I thought it
might be a good idea to apply it to monitoring on one of our client
projects. So right from the demo stage where we deliver a new version
every few weeks (and sometimes every few days), we setup some
monitoring.

Monitoring performance

The project is an application built with the CubicWeb platform, with
some ElasticSearch for indexing and searching. As with any complex
stack, there are a great number of places where one could monitor
performance metrics.

Here are a few things we have decided to monitor, and with what tools.

Monitoring CubicWeb

To monitor our running Python code, we have decided to use statsd, since it is already built into
CubicWeb’s core. Out of the box, you can configure a
statsd server address in your all-in-one.conf configuration. That will
send out some timing statistics about some core functions.

The statsd server (there a numerous implementations, we use a simple
one : python-pystatsd) gets the raw metrics and outputs
them to carbon which
stores the time series data in whisper files (which can be
swapped out for a different technology if need be).

If we are curious about a particular function or view that might be
taking too long to generate or slow down the user experience, we can
just add the @statsd_timeit
decorator there. Done. It’s monitored.

statsd monitoring is a fire-and-forget UDP type of monitoring, it
should not have any impact on the performance of what you are
monitoring.

Monitoring Apache

Simply enough we re-use the statsd approach by plugging in an apache
module to time the HTTP responses sent back by apache. With nginx
and varnish, this is also really easy.

One of the nice things about this part is that we can then get
graphs of errors since we will differentiate OK 200 type codes from
500 type codes (HTTP codes).

Monitoring ElasticSearch

ElasticSearch comes with some metrics in GET /_stats endpoint, the
same goes for individual nodes, individual indices and even at cluster
level. Some popular tools can be installed through the ElasticSearch
plugin system or with Kibana (plugin system there too).

We decided on a different approach that fitted well with our other
tools (and demonstrates their flexibility!) : pull stats out of
ElasticSearch with SaltStack,
push them to Carbon, pull them out with Graphite and display
them in Grafana (next to our other metrics).

On the SaltStack side, we wrote a two line execution module (elasticsearch.py)

import requests
def stats:
    return request.get('http://localhost:9200/_stats').json()

This gets shipped using the custom execution modules mechanism
(_modules and saltutils.sync_modules), and is executed every minute
(or less) in the salt scheduler. The
resulting dictionary is fed to the carbon returner that is configured
to talk to a carbon server somewhere nearby.

# salt demohost elasticsearch.stats
[snip]
  { "indextime_inmillis" : 30,
[snip]

Monitoring web metrics

To evaluate parts of the performance of a web page we can look at some
metrics such as the number of assets the browser will need to
download, the size of the assets (js, css, images, etc) and even
things such as the number of subdomains used to deliver assets. You
can take a look at such metrics in most developer tools available in
the browser, but we want to graph this over time. A nice tool for this
is sitespeed.io (written in javascript
with phantomjs). Out of the box, it has a graphite outputter so
we just have to add –graphiteHost FQDN. sitespeed.io even
recommends using grafana to visualize the
results and publishes some example dashboards that can be adapted to
your needs.

The sitespeed.io command is configured and run by salt using pillars
and its scheduler.

We will have to take a look at using their jenkins plugin with our
jenkins continuous integration instance.

Monitoring crashes / errors / bugs

Applications will have bugs (in particular when released often to get
a client to validate some design choices early). Level 0 is having
your client calling you up saying the application has crashed. The
next level is watching some log somewhere to see those errors pop
up. The next level is centralised logs on which you can monitor the
numerous pieces of your application (rsyslog over UDP helps here,
graylog might be a good solution for
visualisation).

When it starts getting useful and usable is when your bugs get
reported with some rich context. That’s when using sentry gets in. It’s free software developed on github (although the website does not
really show that) and it is written in python, so it was a good match
for our culture. And it is pretty awesome too.

We plug sentry into our WSGI pipeline (thanks to cubicweb-pyramid) by installing
and configuring the sentry cube : cubicweb-sentry. This will catch
rich context bugs and provide us with vital information about what the user
was doing when the crash occured.

This also helps sharing bug information within a team.

The sentry cube reports on errors being raised when using the web
application, but can also catch some errors when running some
maintenance or import commands (ccplugins in CubicWeb). In this
particular case, a lot of importing is being done and Sentry can
detect and help us triage the import errors with context on which
files are failing.

Monitoring usage / client side

This part is a bit neglected for the moment. Client side we can use
Javascript to monitor usage. Some basic metrics can come from piwik which is usually used for audience
statistics. To get more precise statistics we’ve been told Boomerang has an interesting
approach, enabling a closer look at how fast a page was displayed
client side, how much time was spend on DNS, etc.

On the client side, we’re also looking at two features of the Sentry
project : the raven-js
client which reports Javascript errors directly from the browser to
the Sentry server, and the user feedback form which captures some
context when something goes wrong or a user/client wants to report
that something should be changed on a given page.

Load testing – coverage

To wrap up, we also often generate traffic to catch some bugs and
performance metrics automatically :

wget –mirror $URL
linkchecker $URL
for $search_term in cat corpus; do wget URL/$search_term ; done
wapiti $URL –scope page
nikto $URL

Then watch the graphs and the errors in Sentry… Fix them. Restart.

Graphing it in Grafana

We’ve spend little time on the dashboard yet since we’re concentrating on collecting the metrics for now. But here is a glimpse of the “work in progress” dashboard which combines various data sources and various metrics on the same screen and the same time scale.

Further plans

internal health checks, we’re taking a look at python-hospital and healthz: Stop
reverse engineering applications and start monitoring from the
inside (Monitorama) (the idea is to
distinguish between the app is running and the app is serving it’s
purpose), and pyramid_health
graph the number of Sentry errors and the number of types of errors:
the sentry API should be able to give us this information. Feed it to
Salt and Carbon.
setup some alerting : next versions of Grafana will be doing that, or with elastalert
setup “release version X” events in Graphite that are displayed in
Grafana, maybe with some manual command or a postcreate command when
using docker-compose up ?
make it easier for devs to have this kind of setup. Using this suite
of tools in developement might sometimes be overkill, but can be
useful.

September 16, 2016 11:34 AM