flask-track-usage 2.0.0

flask-track-usage 2.0.0 has been released! Thanks to all who helped provide patches and test. Note: 2.0.0 is the recommended upgrade version from 1.1.0. 1.1.1 was released for those who are unable to make the needed changes to move to 2.x. You can check out the latest docs over at readthedocs.

The changes include:

  • MANIFEST.in: Add alembic to dist
  • CONTRIBUTORS: Add John Dupuy
  • py3: Fix import issue with summarization
  • .travis: Change mysql driver
  • test: Fix summerize tests for py3
  • travis: Add 3.6
  • docs: Quick fixes
  • README.md: Update docs to rtd
  • Use parens for multilines
  • Update versions to 2.0.0
  • sql: Increase ip_info from 128 to 1024
  • alembic: Upgrade ip_info from 128 to 1024
  • alembic: Support for upgrading SQL schema
  • sql: Create table if it is not present
  • couchdb: Add track_var and username
  • redis: Add track_var and username
  • Adding user_defined variable field to storage
  • Hooks: add new hooks system
  • test: Skip mongoengine if connection can not be made
  • storage: Rename to PrinterWriter
  • output: Add OutputWriter
  • storage: Create base class and Writer
  • requirements: Added six
  • Copyright now a range
  • Add CONTRIBUTORS
  • doc: Add note about py2 and 3
  • py3: Fix most obvious offenders
  • Move mongoengine ref in Travis CI config
  • Update Travis CI config to include mongoengine lib
  • pep8 fixes
  • MongoEngineStorage: updated docs; added get_usage
  • added testing
  • moved MongoEngineStorage to mongo.py
  • doc: Minor updates for a future release
  • Initial support for multiple storage backends
  • Update versions to denote moving towards 2.0.0
  • Added MongoEngineStorage code; adding test next.
  • docs: Update version to 1.1.1
  • release: v1.1.1
  • Updates for freegeoip function
  • test: Update sqlalchemy test for updated flask
  • test: Update mongo test for updated flask
  • test: test_data works with current Flask results
  • travis: Force pymongo version for travis
  • storage: Minor doc and structure updates for new backends.
  • Redis support
  • Added CouchDB integration. (#30)

flask-track-usage 2.0.0 testing

flask-track-usage is nearing a new milestone: 2.0.0. This new release will include a few bugfixes and a number of enhancements. The two that stick out are time aggregate functions in the form of hooks, split between Storage and Writer classes, and the ability to store custom date via a track_var global. Currently the work is being housed in a branch but will be updated and merged once 2.0.0 has a little time for testing. Some highlights in 2.0.0 include:

  • alembic: Support for upgrading SQL schema
  • sql: Create table if it is not pres
  • entcouchdb: Add track_var and username
  • redis: Add track_var and username
  • user_defined variable field to storage
  • Hooks: add new hooks system
  • test: Skip mongoengine if connection can not be made
  • storage: Rename to PrinterWriter
  • output: Add OutputWriter
  • storage: Create base class and Writer
  • requirements: Added six
  • doc: Add note about py2 and 3
  • freegeoip: Fix missing attribute
  • py3: Fix most obvious offenders
  • Move mongoengine ref in Travis CI config
  • Update Travis CI config to include mongoengine lib

Help test the 2.0.0 branch by cloning and installing in your development environment!

Programmatic Ansible Middle Ground

 

Forward

About a year ago serversforhackers posted a great article on how to run Ansible programamatically  Since then Ansible has had a major release which introduced changes within the Python API.

Simulating The CLI

Not that long ago Jason  DeTiberus and I were talking about how to use Ansible from within other Python packages. One of the things he said was that it should be possible to reuse the command line code instead of the internal API if you hook into the right place. I finally had some time to take a look and it seems he’s right!

If you took a look at the 2.0 API you’ll see there is a lot more power handed over to you as the developer but with that comes a lot of code. Code that for many will be nearly copy/paste style code directly from command-line interface code. So when there is not a need for the extra power why not just reuse code that already exists?

import os  # Used for expanding paths

from ansible.cli.playbook import PlaybookCLI
from ansible.errors import AnsibleOptionsError, AnsibleParserError

def execute_playbook(playbook, hosts, args=[]):
    """
    :param playbook: Full path to the playbook to execute.
    :type playbook: str
    :param hosts: A host or hosts to target the playbook against.
    :type hosts: str, list, or tuple
    :param args: Other arguments to pass to the run.
    :type args: list
    :returns: The TaskQueueHandler for the run.
    :rtype: ansible.executor.task_queue_manager.TaskQueueManager.
    """
    # Set hosts args up right for the ansible parser. It likes to have trailing ,'s
    if isinstance(hosts, basestring):
        hosts = hosts + ','
    elif hasattr(hosts, '__iter__'):
        hosts = ','.join(hosts) + ','
    else:
        raise AnsibleParserError('Can not parse hosts of type {}'.format(
            type(hosts)))

    # Create the cli object
    cli_args = ['playbook'] + args + ['-i', hosts, os.path.realpath(playbook)]
    print('Executing: {}'.format(' '.join(cli_args)))
    cli = PlaybookCLI(cli_args)
    # Parse args and run it
    try:
        cli.parse()
        # Return the result:
        # 0: Success
        # 1: "Error"
        # 2: Host failed
        # 3: Unreachable
        # 4: Parser Error
        # 5: Options error
        return cli.run()
    except (AnsibleParserError, AnsibleOptionsError) as error:
        print('{}: {}'.format(type(error), error))
        raise error

 

Breaking It Down

The function starts off with some hosts parsing. This is not really needed but it does make the function easier to work with. On the command line Ansible likes to have a comma at the end of hosts passed in. This chunk of code makes sure that if a list or string is given for a host that the resulting host string is properly formatted.

    # Set hosts args up right for the ansible parser. It likes to have trailing ,'s
    if isinstance(hosts, basestring):
        hosts = hosts + ','
    elif hasattr(hosts, '__iter__'):
        hosts = ','.join(hosts) + ','
    else:
        raise AnsibleParserError('Can not parse hosts of type {}'.format(type(hosts)))

The Real Code

This chunk of code is what is actually calling Ansible. It creates the command line argument list, creates a PlaybookCLI instance, has it parsed, and then executes the playbook.

    # Create the cli object
    cli_args = ['playbook'] + args + ['-i', hosts, os.path.realpath(playbook)]
    print('Executing: {}'.format(' '.join(cli_args)))
    cli = PlaybookCLI(cli_args)
    # Parse args and run it
    try:
        cli.parse()
        # Return the result:
        # 0: Success
        # 1: "Error"
        # 2: Host failed
        # 3: Unreachable
        # 4: Parser Error
        # 5: Options error
        return cli.run()
    except (AnsibleParserError, AnsibleOptionsError) as error:
        print('{}: {}'.format(type(error), error))
        raise error

Using The Function

# Execute /tmp/test.yaml with 2 hosts
result = execute_playbook('/tmp/test.yaml', ['192.168.152.100', '192.168.152.101'])

# Execute /tmp/test.yaml with 1 host and add the -v flag
result = execute_playbook('/tmp/test.yaml', '192.168.152.101', ['-v'])

Intercepting The Output

One drawback of using the command-line interface code directly is that the output is expected to go to the user in the standard way. That is to say, it’s sent to the screen and colorized. This will probably be fine for some, but others may want to grab the output and use it in some form. While it is possible to change output through the configuration options it is also possible to monkey patch display and intercept the output for your own use cases. As an example, here is a Display class which forwards all output that is not meant for the screen only to our logging.info method.

# MONKEY PATCH to catch output. This must happen at the start of the code!
import logging

from ansible.utils.display import Display

# Set up our logging
logger = logging.getLogger('transport')
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
handler.formatter = logging.Formatter('%(name)s - %(message)s')
logger.addHandler(handler)

class LogForward(Display):
    """
    Quick hack of a log forwarder
    """

    def display(self, msg, screen_only=None, *args, **kwargs):
        """
        Pass display data to the logger.
        :param msg: The message to log.
        :type msg: str
        :param args: All other non-keyword arguments.
        :type args: list
        :param kwargs: All other keyword arguments.
        :type kwargs: dict
        """
        # Ignore if it is screen only output
        if screen_only:
            return
        logging.getLogger('transport').info(msg)

    # Forward it all to display
    info = display
    warning = display
    error = display
    # Ignore debug
    debug = lambda s, *a, **k: True

# By simply setting display Ansible will slurp it in as the display instance
display = LogForward()
# END MONKEY PATCH. Add code after this line.

Putting It All Together

If you want to use it all together it should look like this:

# MONKEY PATCH to catch output. This must happen at the start of the code!
import logging

from ansible.utils.display import Display

# Set up our logging
logger = logging.getLogger('transport')
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
handler.formatter = logging.Formatter('%(name)s - %(message)s')
logger.addHandler(handler)

class LogForward(Display):
    """
    Quick hack of a log forwarder
    """

    def display(self, msg, screen_only=None, *args, **kwargs):
        """
        Pass display data to the logger.
        :param msg: The message to log.
        :type msg: str
        :param args: All other non-keyword arguments.
        :type args: list
        :param kwargs: All other keyword arguments.
        :type kwargs: dict
        """
        # Ignore if it is screen only output
        if screen_only:
            return
        logging.getLogger('transport').info(msg)

    # Forward it all to display
    info = display
    warning = display
    error = display
    # Ignore debug
    debug = lambda s, *a, **k: True

# By simply setting display Ansible will slurp it in as the display instance
display = LogForward()
# END MONKEY PATCH. Add code after this line.

import os  # Used for expanding paths

from ansible.cli.playbook import PlaybookCLI
from ansible.errors import AnsibleOptionsError, AnsibleParserError

def execute_playbook(playbook, hosts, args=[]):
    """
    :param playbook: Full path to the playbook to execute.
    :type playbook: str
    :param hosts: A host or hosts to target the playbook against.
    :type hosts: str, list, or tuple
    :param args: Other arguments to pass to the run.
    :type args: list
    :returns: The TaskQueueHandler for the run.
    :rtype: ansible.executor.task_queue_manager.TaskQueueManager.
    """
    # Set hosts args up right for the ansible parser. It likes to have trailing ,'s
    if isinstance(hosts, basestring):
        hosts = hosts + ','
    elif hasattr(hosts, '__iter__'):
        hosts = ','.join(hosts) + ','
    else:
        raise AnsibleParserError('Can not parse hosts of type {}'.format(
            type(hosts)))

    # Create the cli object
    cli_args = ['playbook'] + args + ['-i', hosts, os.path.realpath(playbook)]
    logger.info('Executing: {}'.format(' '.join(cli_args)))
    cli = PlaybookCLI(cli_args)
    # Parse args and run it
    try:
        cli.parse()
        # Return the result:
        # 0: Success
        # 1: "Error"
        # 2: Host failed
        # 3: Unreachable
        # 4: Parser Error
        # 5: Options error
        return cli.run()
    except (AnsibleParserError, AnsibleOptionsError) as error:
        logger.error('{}: {}'.format(type(error), error))
        raise error

 

Pros and Cons

Of course nothing is without drawbacks. Here are some negatives with this method:

  • No direct access to “TaskQueueManager“
  • If the CLI changes the code must change
  • Monkey patching …. ewww

But the positives seem to be worth it so far:

  • You don’t have to deal with “TaskQueueManager“ and all of the construction code
  • The CLI doesn’t seem to change often
  • The same commands one would run on the CLI can easily be extrapolated and even run manually

etcdobj: A Minimal etcd Object Mapper for Python

I didn’t have a lot on my agenda Friday. I wanted to review and return emails, do some reading, get some minor hacking on etcdobj done (more on that…), eat more calories then normal in an attempt to screw with my metabolism (nailed it!), catch up with a few coworkers, play some video games, and, apparently, accidentally order an air purifier from Amazon. I succeed in all of it. But on to this etcdobj thing…

While working on Commissaire I started to feel a bit dirty over storing json documents in keys. It’s not uncommon, but it felt like it would be so much better if a document was broken into three layers:

  • Python: Classes/Objects
  • Transport: For saving/retreiving objects
  • etcd: A single or series of keys

By splitting up what normally is json data into a series of keys and two clients change overlapping parts of an object there won’t be a collision or require the client to fail, fetch, update, then try saving again. I searched the Internet for a library that would provide this and came up wanting. It seems that either simple keys/values or shoving json into a key is what most people stick with.

etcdobj is truly minimal. Partly because it’s new, partly because being small should make it easier to build upon or even bundle (it’s got a very permissive license), and partly because I’ve never written an ORM-like library before and don’t want to build to much on what could be a shaky foundation. That’s why I’m hoping this post will encourage some more eyes and help with the code.

Current Example

To create a representation of data a class must subclass EtcdObj and follow a few rules.

  1. __name__ must be provided as it will be the parent in the key path.
  2. Fields are class level variables and must be set to an instance that subclasses etcdobj.Field.
  3. The name of a field is the next layer in the key path and do not need to be the same as the class level variable.
from etcdobj import EtcdObj, fields

class Example(EtcdObj):
    __name__ = 'example' # The parent key
    # Fields all take a name that will be used as their key
    anint = fields.IntField('anint')
    astr = fields.StrField('astr')
    adict = fields.DictField('adict')

Creating a new object and saving it to etcd is pretty easy.

server = Server()

ex = Example(anint=1, astr="hello", adict={"hi": "there"})
ex.anint = 100  # update the value of anint
server.save(ex)
# Would save like so:
# /example/anint = "100"
# /example/astr = "hello"
# /example/adict/hi = "there"

As is retrieving the data.

new_ex = server.read(Example())
# new_ex.anint = 100
# new_ex.astr = "hello"
# new_ex.adict = {"hi": "there"}

Ideas

Some ideas for the future include:

  • Object watching (if data changes on the server it changes in the local instance)
  • Object to json structure
  • Deep DictField value casting/validation
  • Library level logging

Lend a Hand

The code base is currently around 416 lines of code including documentation and license header. If etcdobj sounds like something you’d use come take a look and help make it something better than I can produce all by my lonesome.

From Gevent to CherryPy

I’ve been working on a project for the last few months on GitHub called Commissaire along with some other smart folks. Without getting to deep into what the software is supposed to do, just know it’s a REST service which needs to handle some asynchronous tasks. When prototyping the web service I started utilizing gevent for it’s WSGI server and coroutines but, as it turns out, it didn’t end up being the best fit. This is not a post about gevent sucking because it doesn’t suck. gevent is pretty awesome but it’s not for every use case.

The Problem

One of the asynchronous tasks we do in Commissaire utilizes Ansible. We use the Ansible python API to handle part of host bootstrapping of a new host. Under the covers Ansible uses the multiprocessing module when executing it’s work. Specifically, this occurs when the TaskQueueManager starts its run. Under normal circumstances this is no problem but when gevent is in use it’s monkey patching ends up causing some problems. As noted in the post using monkey.patch_all(thread=False, socket=False) can be a solution. What this ends up doing is patching everything except thread and socket. But even this wasn’t enough for us to get past problems we were facing between multiprocessing, gevent, and Ansible. The closest patch we found was to also disable os, subprocess and a few other things making most of gevents great features unavailable. At this point it seemed pretty obvious gevent was not going to be a good fit.

Looking Elsewhere

There are no lack of options when looking for a Python web application server. Here are the requirements that I figured we would need:

Requirements

  • Importable as a library
  • Supports WSGI
  • Supports TLS
  • Active user base
  • Active development
  • Does not require a reverse proxy
  • Does not require greenlets
  • Supports Python 2 and 3

Based on the name of this post you already know we chose CherryPy. It hit all the requirements and came with a few added benefits. The plugin system which allows for calls to be published over an internal bus let’s us decouple our data saving internals (though couples us with CherryPy as it is doing the abstraction). The server is also already available in many Linux distributions at new enough versions. That’s a big boon hoping to have software easily installed via traditional means.

The runner up was Waitress. Unlike CherryPy which assumes you are developing within the CherryPy web framework, Waitress assumes WSGI. Unfortunately, Waitress requires a reverse proxy for TLS. If it had first class support for TLS we would have probably have picked it.

Going back to a more traditional threading server is definitely not as sexy as utilizing greenlets/coroutines but it has provided a consistent result when paired with a multiprocessing worker process and that is what matters.

Porting Time

Porting to a different library can be an annoying task and can feel like busy work. Porting can be even worse when you liked the library in use in the first place as I did (and still do!) with gevent.

Initial porting of main functionality from gevent to CherryPy took roughly four hours. After that, porting it took about another 6 hours to iron out some rough edges followed by updating unit tests. Really, the unit testing updates ended up being more work, in terms of time, than the actual functionality. A lot of that was our fault in how we use mock, but I digress. That’s really not much time!

So What

So far I’m happy with the results. The application functionality works as expected, the request/response speeds are more than acceptable, and CherryPy as a server has been fun to work with. Assuming no crazy corner cases don’t crop up I don’t see use moving off CherryPy anytime soon.

I Wish Fossil Would Take Off

Fossil is the coolest distributed SCM you are not using. No seriously. It boasts features not found in any of the common distributed SCMs used by nearly every developer today.

About 5 or 6 years ago I started getting a little frustrated with Git. The main complaint I kept coming back to over and over was, to use Git effectively, one needed to use GitHubTrac, or any other ways to add an interface with issues and information. There was also the problem of getting many CVS/SVN folks comfortable with Git terminology which fueled my recommendation of Mercurial, but I digress. It was around this time that a friend of mine and I started looking at a way we could include issues within a Git repository. At that time we looked a projects like BugsEverywhere which provide a separate tool to track bugs within the repository. We gave it a go for a little while but eventually fell away from it as, at the time, it really felt like a second class citizen in the Git toolchain. We spent a little time developing our on solution but then gave up realizing that Git was so tied to the GitHub way.

Around this time one of us found Fossil and started to play around with it. I was blown away at how it took care of code, issues, wiki, tracking, and code hosting. You essentially get a distributed version of Trac for every clone. All the data comes along and you are able to update documentation, code, issues, etc.. all as part of a fossil push.

As of the time of writing Fossil boasts (from the main page):

  1. Integrated Bug Tracking, Wiki, and Technotes
  2. Built-In Web Interface
  3. Self-Contained
  4. Simple Networking
  5. CGI/SCGI Enabled
  6. Autosync
  7. Robust & Reliable
  8. Free and Open-Source

I touched a little bit on 1 and 2, but 3 is also a pretty cool feature. If you do an install of Git you really are installing a bit more than you may realize. For example, Fedora’s Git package requires:

  1. asciidoc
  2. desktop-file-utils
  3. emacs
  4. expat-devel
  5. gettext
  6. libcurl-devel
  7. libgnome-keyring-devel
  8. openssl-devel
  9. pcre-devel
  10. perl(Error)
  11. perl(ExtUtils::MakeMaker)
  12. pkgconfig(bash-completion)
  13. python
  14. rpmlib(CompressedFileNames) <= 3.0.4-1
  15. rpmlib(FileDigests) <= 4.6.0-1
  16. systemd
  17. xmlto
  18. zlib-devel >= 1.2

In other words you need a specific editor, 2 languages available on the system, a specific init system, and a part of GNOME. Plain Git directly from source requires less, but still more than one would think. Fossil notes it’s dependencies as:

Fossil needs to be linked against zlib. If the HTTPS option is enabled, then it will also need to link against the appropriate SSL implementation. And, of course, Fossil needs to link against the standard C library. No other libraries or external dependences are used.

Philosophy

Fossil and Git have very different philosophies. The most interesting point to me when reading up on the differences was this:

Git puts a lot of emphasis on maintaining a “clean” check-in history. Extraneous and experimental branches by individual developers often never make it into the main repository. And branches are often rebased before being pushed, to make it appear as if development had been linear. Git strives to record what the development of a project should have looked like had there been no mistakes.

Fossil, in contrast, puts more emphasis on recording exactly what happened, including all of the messy errors, dead-ends, experimental branches, and so forth. One might argue that this makes the history of a Fossil project “messy”. But another point of view is that this makes the history “accurate”. In actual practice, the superior reporting tools available in Fossil mean that the added “mess” is not a factor.

One commentator has mused that Git records history according to the victors, whereas Fossil records history as it actually happened.

While pretty, (nearly) liner history is a simple read it rarely is actually true.

githistory

Using Fossil

There is a pretty decent quick start to get one started. At first run through it feels clunky. For instance, when doing a checkout you have to open the repository with fossil open but then again people felt (and some still feel) that git add $FILES, git commit, git push $PLACE $BRANCH feels wrong. I think that with enough time one can be just as comfortable with fossil’s commands and flow as they would be with git.

Truth Be Told

My biggest want for Fossil to take off is to be able to offline and merge bugs/issues and documentation without forcing everyone to adopt third party tools to integrate with an SCM. I also would like to keep my hands on my keyboard rather than logging into GitHub to review stuff (yeah, I know there are keyboard shortcuts …). Anyway, here is hoping more people will give Fossil a try!

Flask-Track-Usage 1.1.0 Released

A few years ago the initial Flask-Track-Usage release was announced via my blog. At the time I thought I’d probably be the one user. I’m glad to say I was wrong! Today I’m happy to announce the release of Flask-Track-Usage 1.1.0 which sports a number enhancements and bug fixes.

Unfortunately, some changes are not backwards compatible. However, I believe the backwards incompatible changes make the overall experience better. If you would like to stick with the previous version of Flask-Track-Usage make sure to version pin in your requirements file/section:

flask_track_usage==1.0.1

Version 1.1.0 has made changes requested by the community as well as a few bug fixes. These include:

  • Addition of the X-Forwarded-For header as xforwardedfor in storage. Requested by jamylak.
  • Configurable GeoIP endpoint support. Requested by jamylak.
  • Migration from pymongo.Connection to pymongo.MongoClient.
  • Better SQLStorage metadata handling. Requested by gouthambs.
  • SQLStorage implementation redesign. Requested and implemented by gouthambs.
  • Updated documentation for 1.1.0.
  • Better unittesting.

I’d like to thank Gouthaman Balaraman who has been a huge help authoring the SQLStorage based on the SQLAlchemy ORM and providing feedback and support on Flask-Track-Usage design.

As always, please report bugs and feature requests on the GitHub Issues Page.