etcdobj: A Minimal etcd Object Mapper for Python

I didn’t have a lot on my agenda Friday. I wanted to review and return emails, do some reading, get some minor hacking on etcdobj done (more on that…), eat more calories then normal in an attempt to screw with my metabolism (nailed it!), catch up with a few coworkers, play some video games, and, apparently, accidentally order an air purifier from Amazon. I succeed in all of it. But on to this etcdobj thing…

While working on Commissaire I started to feel a bit dirty over storing json documents in keys. It’s not uncommon, but it felt like it would be so much better if a document was broken into three layers:

  • Python: Classes/Objects
  • Transport: For saving/retreiving objects
  • etcd: A single or series of keys

By splitting up what normally is json data into a series of keys and two clients change overlapping parts of an object there won’t be a collision or require the client to fail, fetch, update, then try saving again. I searched the Internet for a library that would provide this and came up wanting. It seems that either simple keys/values or shoving json into a key is what most people stick with.

etcdobj is truly minimal. Partly because it’s new, partly because being small should make it easier to build upon or even bundle (it’s got a very permissive license), and partly because I’ve never written an ORM-like library before and don’t want to build to much on what could be a shaky foundation. That’s why I’m hoping this post will encourage some more eyes and help with the code.

Current Example

To create a representation of data a class must subclass EtcdObj and follow a few rules.

  1. __name__ must be provided as it will be the parent in the key path.
  2. Fields are class level variables and must be set to an instance that subclasses etcdobj.Field.
  3. The name of a field is the next layer in the key path and do not need to be the same as the class level variable.
from etcdobj import EtcdObj, fields

class Example(EtcdObj):
    __name__ = 'example' # The parent key
    # Fields all take a name that will be used as their key
    anint = fields.IntField('anint')
    astr = fields.StrField('astr')
    adict = fields.DictField('adict')

Creating a new object and saving it to etcd is pretty easy.

server = Server()

ex = Example(anint=1, astr="hello", adict={"hi": "there"})
ex.anint = 100  # update the value of anint
server.save(ex)
# Would save like so:
# /example/anint = "100"
# /example/astr = "hello"
# /example/adict/hi = "there"

As is retrieving the data.

new_ex = server.read(Example())
# new_ex.anint = 100
# new_ex.astr = "hello"
# new_ex.adict = {"hi": "there"}

Ideas

Some ideas for the future include:

  • Object watching (if data changes on the server it changes in the local instance)
  • Object to json structure
  • Deep DictField value casting/validation
  • Library level logging

Lend a Hand

The code base is currently around 416 lines of code including documentation and license header. If etcdobj sounds like something you’d use come take a look and help make it something better than I can produce all by my lonesome.

From Gevent to CherryPy

I’ve been working on a project for the last few months on GitHub called Commissaire along with some other smart folks. Without getting to deep into what the software is supposed to do, just know it’s a REST service which needs to handle some asynchronous tasks. When prototyping the web service I started utilizing gevent for it’s WSGI server and coroutines but, as it turns out, it didn’t end up being the best fit. This is not a post about gevent sucking because it doesn’t suck. gevent is pretty awesome but it’s not for every use case.

The Problem

One of the asynchronous tasks we do in Commissaire utilizes Ansible. We use the Ansible python API to handle part of host bootstrapping of a new host. Under the covers Ansible uses the multiprocessing module when executing it’s work. Specifically, this occurs when the TaskQueueManager starts its run. Under normal circumstances this is no problem but when gevent is in use it’s monkey patching ends up causing some problems. As noted in the post using monkey.patch_all(thread=False, socket=False) can be a solution. What this ends up doing is patching everything except thread and socket. But even this wasn’t enough for us to get past problems we were facing between multiprocessing, gevent, and Ansible. The closest patch we found was to also disable os, subprocess and a few other things making most of gevents great features unavailable. At this point it seemed pretty obvious gevent was not going to be a good fit.

Looking Elsewhere

There are no lack of options when looking for a Python web application server. Here are the requirements that I figured we would need:

Requirements

  • Importable as a library
  • Supports WSGI
  • Supports TLS
  • Active user base
  • Active development
  • Does not require a reverse proxy
  • Does not require greenlets
  • Supports Python 2 and 3

Based on the name of this post you already know we chose CherryPy. It hit all the requirements and came with a few added benefits. The plugin system which allows for calls to be published over an internal bus let’s us decouple our data saving internals (though couples us with CherryPy as it is doing the abstraction). The server is also already available in many Linux distributions at new enough versions. That’s a big boon hoping to have software easily installed via traditional means.

The runner up was Waitress. Unlike CherryPy which assumes you are developing within the CherryPy web framework, Waitress assumes WSGI. Unfortunately, Waitress requires a reverse proxy for TLS. If it had first class support for TLS we would have probably have picked it.

Going back to a more traditional threading server is definitely not as sexy as utilizing greenlets/coroutines but it has provided a consistent result when paired with a multiprocessing worker process and that is what matters.

Porting Time

Porting to a different library can be an annoying task and can feel like busy work. Porting can be even worse when you liked the library in use in the first place as I did (and still do!) with gevent.

Initial porting of main functionality from gevent to CherryPy took roughly four hours. After that, porting it took about another 6 hours to iron out some rough edges followed by updating unit tests. Really, the unit testing updates ended up being more work, in terms of time, than the actual functionality. A lot of that was our fault in how we use mock, but I digress. That’s really not much time!

So What

So far I’m happy with the results. The application functionality works as expected, the request/response speeds are more than acceptable, and CherryPy as a server has been fun to work with. Assuming no crazy corner cases don’t crop up I don’t see use moving off CherryPy anytime soon.

Flask-Track-Usage 1.1.0 Released

A few years ago the initial Flask-Track-Usage release was announced via my blog. At the time I thought I’d probably be the one user. I’m glad to say I was wrong! Today I’m happy to announce the release of Flask-Track-Usage 1.1.0 which sports a number enhancements and bug fixes.

Unfortunately, some changes are not backwards compatible. However, I believe the backwards incompatible changes make the overall experience better. If you would like to stick with the previous version of Flask-Track-Usage make sure to version pin in your requirements file/section:

flask_track_usage==1.0.1

Version 1.1.0 has made changes requested by the community as well as a few bug fixes. These include:

  • Addition of the X-Forwarded-For header as xforwardedfor in storage. Requested by jamylak.
  • Configurable GeoIP endpoint support. Requested by jamylak.
  • Migration from pymongo.Connection to pymongo.MongoClient.
  • Better SQLStorage metadata handling. Requested by gouthambs.
  • SQLStorage implementation redesign. Requested and implemented by gouthambs.
  • Updated documentation for 1.1.0.
  • Better unittesting.

I’d like to thank Gouthaman Balaraman who has been a huge help authoring the SQLStorage based on the SQLAlchemy ORM and providing feedback and support on Flask-Track-Usage design.

As always, please report bugs and feature requests on the GitHub Issues Page.

Why I Chose NewsBlur

Not all that long ago Google Reader closed it doors pushing millions of users off the platform. Many users were frustrated to lose their long time place to get their news not all that different from someone in yesteryear losing their favorite newspaper.  The whole thing was far from ideal but did go to teach users that you can’t expect cloud services to last forever (which is a good wake up call). But in the fall of Google Reader came many possible replacements which added their own spins on how one reads news. Feedly, The Old Reader and NetVibes were a few of the popular replacements. But I settled on NewsBlur and eventually became a paid user.

NewsBlur is mainly written by Samuel Clay (more on why I say mainly later).  He seems like a friendly, hard working fellow. He responds to bug reports and is active in his products community.  While this may seem like common sense just take a few minutes to look at random SaaS products on the Internet. You’ll find many of the developers are hidden behind customer service groups who, at worst, are outsourced and are more of a dead end than a way to get things fixed. Long story short, it seems like Samuel really cares about his product.

It is possible to have a Free account on NewsBlur. While you are limited to a specific amount of feeds many people will find the limits are higher than the feed counts they had in Google Reader. At the time of writing the limit is 64 sites.

There are some social features provided by NewsBlur yet these features are not required nor forced into general workflow. For instance, there is a concept of the BlurBlog which looks like it could be fun. But I tend to read the news and share elsewhere. If I ever decide to use the BlurBlog functionality it’s there. Otherwise I can just use NewsBlur as a fantastic reader.

NewsBlur is Open Source under the MIT license (also known as the Expat License). This gives me peace of mind knowing if Samuel ever decided that he was done with NewsBlur I could export my feeds, setup my own instance, and continue using the product on my own infrastructure. Yeah, it’s not trivial but it’s possible which is a huge advantage given the last reader I used shut down.

No software is without it’s bugs but Samuel does a good job bug squashing. And if you are developer who wants to give a hand you can patch the issue yourself and submit the fix (another win for Open Source). At the time of writing there are 43 development contributors to NewsBlur. This is a much better solution than waiting for a customer service representative to reinterpret your bug submission to a developer so that the fix may be done someday in the future.

If you are still looking for a replacement for Google Reader give NewsBlur a chance even if it’s a second chance as the application seems to be enhanced weekly. If you like it, consider becoming a paid user. Can you can’t say no to Shiloh:

Introducing Flask-Track-Usage

A little while ago one of the guys on a project I work one was asking about how many people were using the projects public web service. My first thought was to go grepping through logs. After all, the requests are right there and pretty consumable with a bit of Unix command line magic. But after a little discussion it became clear that would get old after a while. What about a week from now? How about a month or year? Few people want to go run commands and then manually correlate them. This lead to us looking around for some common solutions. The most obvious one was Google Analytics. To be honest I don’t much care about those systems. While that one may not (or may be) intrusive on users I just don’t feel all that comfortable forcing people to be subjected to a third party of a third party unless there is no other good choice. Luckily, being that the metrics are service related, the javascript/cookie/pixel based transaction wouldn’t have worked very well anyway.

So it was off to look at what others have made with a heavy eye towards Flask based solutions so it matched the same framework we were already using. Flask-Analytics came up in a search. The simple design was something I liked but the extension was more so aimed at using cookies to track users through an application while we want to track overall usage. I figured it was time to roll something ourselves and provide it back out to the community if they could use it as well.

Here it is in all it’s simplistic glory: Flask-Track-Usage. It doesn’t use cookies nor javascript and can store the results into any system which you provide a callable or Storage object. There is also FreeGeoIP integration for those what want to track where users are coming from. The code comes with a MongoDB Storage object for those who want to store the content back into their MongoDB. Want to know a bit more of the technical details? Check out the README or the project page. Patches welcome!

Raspberry Pi and Arduino: Good Friends

I have a Raspberry Pi and it’s pretty great. I have an Arduino Esplora and an Arduino Micro and they are fun. Though, as I’ve played more with the Arduino I’ve found one totally understandable drawback: it’s more or less local only. I mean that the data that comes back through sensors or the items being controlled only send response back over serial or USB/serial. It makes sense but it also limits what can be done with it when used all by itself. One of my early ideas for a project was to use a light and a temperature sensor to keep an eye on aging homebrew beer. Nothing super fancy, just records the information for viewing and alerting when the sensors see data outside of the accepted norms. I could do this with some LED’s, buzzer or a display that would notify me when things were off but that isn’t really the type of alerting I’d like to see. That type of alerting would require me to go look at the box for information. I might as well do the sensor gathering manually with my eyes and feeling inside the box. It also means the data would be lost on every iteration. Data from 10 minutes ago could only be gathered if I was present 10 minutes back. Enter Raspberry Pi.

The Raspberry Pi is able to power the Arduino Esplora and, likely, the Arduino Micro. Since communication is over USB/serial the Raspberry Pi can collect the data from the sensors and provide a networked view into the data. For instance, a web interface showing temperature and light graphs. And, best yet, it’s simple to add a USB wireless adapter to the Pi to avoid running an ethernet cable back to the network.

Now, from what I read, it’s possible to use Raspberry Pi itself without an Arduino to collect data and control devices but it requires an ADC for analog input/output. But there is something that seems more proper about separating the physical logic (using C in Arduino) from the notification and reporting logic (using Python on a Raspberry Pi). It feels almost MVC like.

In any case, if you are looking at doing some analog and digital stuff with a Raspberry Pi do know that adding a small Arduino makes life easy and, if you decide to change to another device for providing network views it should be a simple switch over.