Programmatic Ansible Middle Ground

 

Forward

About a year ago serversforhackers posted a great article on how to run Ansible programamatically  Since then Ansible has had a major release which introduced changes within the Python API.

Simulating The CLI

Not that long ago Jason  DeTiberus and I were talking about how to use Ansible from within other Python packages. One of the things he said was that it should be possible to reuse the command line code instead of the internal API if you hook into the right place. I finally had some time to take a look and it seems he’s right!

If you took a look at the 2.0 API you’ll see there is a lot more power handed over to you as the developer but with that comes a lot of code. Code that for many will be nearly copy/paste style code directly from command-line interface code. So when there is not a need for the extra power why not just reuse code that already exists?

import os  # Used for expanding paths

from ansible.cli.playbook import PlaybookCLI
from ansible.errors import AnsibleOptionsError, AnsibleParserError

def execute_playbook(playbook, hosts, args=[]):
    """
    :param playbook: Full path to the playbook to execute.
    :type playbook: str
    :param hosts: A host or hosts to target the playbook against.
    :type hosts: str, list, or tuple
    :param args: Other arguments to pass to the run.
    :type args: list
    :returns: The TaskQueueHandler for the run.
    :rtype: ansible.executor.task_queue_manager.TaskQueueManager.
    """
    # Set hosts args up right for the ansible parser. It likes to have trailing ,'s
    if isinstance(hosts, basestring):
        hosts = hosts + ','
    elif hasattr(hosts, '__iter__'):
        hosts = ','.join(hosts) + ','
    else:
        raise AnsibleParserError('Can not parse hosts of type {}'.format(
            type(hosts)))

    # Create the cli object
    cli_args = ['playbook'] + args + ['-i', hosts, os.path.realpath(playbook)]
    print('Executing: {}'.format(' '.join(cli_args)))
    cli = PlaybookCLI(cli_args)
    # Parse args and run it
    try:
        cli.parse()
        # Return the result:
        # 0: Success
        # 1: "Error"
        # 2: Host failed
        # 3: Unreachable
        # 4: Parser Error
        # 5: Options error
        return cli.run()
    except (AnsibleParserError, AnsibleOptionsError) as error:
        print('{}: {}'.format(type(error), error))
        raise error

 

Breaking It Down

The function starts off with some hosts parsing. This is not really needed but it does make the function easier to work with. On the command line Ansible likes to have a comma at the end of hosts passed in. This chunk of code makes sure that if a list or string is given for a host that the resulting host string is properly formatted.

    # Set hosts args up right for the ansible parser. It likes to have trailing ,'s
    if isinstance(hosts, basestring):
        hosts = hosts + ','
    elif hasattr(hosts, '__iter__'):
        hosts = ','.join(hosts) + ','
    else:
        raise AnsibleParserError('Can not parse hosts of type {}'.format(type(hosts)))

The Real Code

This chunk of code is what is actually calling Ansible. It creates the command line argument list, creates a PlaybookCLI instance, has it parsed, and then executes the playbook.

    # Create the cli object
    cli_args = ['playbook'] + args + ['-i', hosts, os.path.realpath(playbook)]
    print('Executing: {}'.format(' '.join(cli_args)))
    cli = PlaybookCLI(cli_args)
    # Parse args and run it
    try:
        cli.parse()
        # Return the result:
        # 0: Success
        # 1: "Error"
        # 2: Host failed
        # 3: Unreachable
        # 4: Parser Error
        # 5: Options error
        return cli.run()
    except (AnsibleParserError, AnsibleOptionsError) as error:
        print('{}: {}'.format(type(error), error))
        raise error

Using The Function

# Execute /tmp/test.yaml with 2 hosts
result = execute_playbook('/tmp/test.yaml', ['192.168.152.100', '192.168.152.101'])

# Execute /tmp/test.yaml with 1 host and add the -v flag
result = execute_playbook('/tmp/test.yaml', '192.168.152.101', ['-v'])

Intercepting The Output

One drawback of using the command-line interface code directly is that the output is expected to go to the user in the standard way. That is to say, it’s sent to the screen and colorized. This will probably be fine for some, but others may want to grab the output and use it in some form. While it is possible to change output through the configuration options it is also possible to monkey patch display and intercept the output for your own use cases. As an example, here is a Display class which forwards all output that is not meant for the screen only to our logging.info method.

# MONKEY PATCH to catch output. This must happen at the start of the code!
import logging

from ansible.utils.display import Display

# Set up our logging
logger = logging.getLogger('transport')
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
handler.formatter = logging.Formatter('%(name)s - %(message)s')
logger.addHandler(handler)

class LogForward(Display):
    """
    Quick hack of a log forwarder
    """

    def display(self, msg, screen_only=None, *args, **kwargs):
        """
        Pass display data to the logger.
        :param msg: The message to log.
        :type msg: str
        :param args: All other non-keyword arguments.
        :type args: list
        :param kwargs: All other keyword arguments.
        :type kwargs: dict
        """
        # Ignore if it is screen only output
        if screen_only:
            return
        logging.getLogger('transport').info(msg)

    # Forward it all to display
    info = display
    warning = display
    error = display
    # Ignore debug
    debug = lambda s, *a, **k: True

# By simply setting display Ansible will slurp it in as the display instance
display = LogForward()
# END MONKEY PATCH. Add code after this line.

Putting It All Together

If you want to use it all together it should look like this:

# MONKEY PATCH to catch output. This must happen at the start of the code!
import logging

from ansible.utils.display import Display

# Set up our logging
logger = logging.getLogger('transport')
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
handler.formatter = logging.Formatter('%(name)s - %(message)s')
logger.addHandler(handler)

class LogForward(Display):
    """
    Quick hack of a log forwarder
    """

    def display(self, msg, screen_only=None, *args, **kwargs):
        """
        Pass display data to the logger.
        :param msg: The message to log.
        :type msg: str
        :param args: All other non-keyword arguments.
        :type args: list
        :param kwargs: All other keyword arguments.
        :type kwargs: dict
        """
        # Ignore if it is screen only output
        if screen_only:
            return
        logging.getLogger('transport').info(msg)

    # Forward it all to display
    info = display
    warning = display
    error = display
    # Ignore debug
    debug = lambda s, *a, **k: True

# By simply setting display Ansible will slurp it in as the display instance
display = LogForward()
# END MONKEY PATCH. Add code after this line.

import os  # Used for expanding paths

from ansible.cli.playbook import PlaybookCLI
from ansible.errors import AnsibleOptionsError, AnsibleParserError

def execute_playbook(playbook, hosts, args=[]):
    """
    :param playbook: Full path to the playbook to execute.
    :type playbook: str
    :param hosts: A host or hosts to target the playbook against.
    :type hosts: str, list, or tuple
    :param args: Other arguments to pass to the run.
    :type args: list
    :returns: The TaskQueueHandler for the run.
    :rtype: ansible.executor.task_queue_manager.TaskQueueManager.
    """
    # Set hosts args up right for the ansible parser. It likes to have trailing ,'s
    if isinstance(hosts, basestring):
        hosts = hosts + ','
    elif hasattr(hosts, '__iter__'):
        hosts = ','.join(hosts) + ','
    else:
        raise AnsibleParserError('Can not parse hosts of type {}'.format(
            type(hosts)))

    # Create the cli object
    cli_args = ['playbook'] + args + ['-i', hosts, os.path.realpath(playbook)]
    logger.info('Executing: {}'.format(' '.join(cli_args)))
    cli = PlaybookCLI(cli_args)
    # Parse args and run it
    try:
        cli.parse()
        # Return the result:
        # 0: Success
        # 1: "Error"
        # 2: Host failed
        # 3: Unreachable
        # 4: Parser Error
        # 5: Options error
        return cli.run()
    except (AnsibleParserError, AnsibleOptionsError) as error:
        logger.error('{}: {}'.format(type(error), error))
        raise error

 

Pros and Cons

Of course nothing is without drawbacks. Here are some negatives with this method:

  • No direct access to “TaskQueueManager“
  • If the CLI changes the code must change
  • Monkey patching …. ewww

But the positives seem to be worth it so far:

  • You don’t have to deal with “TaskQueueManager“ and all of the construction code
  • The CLI doesn’t seem to change often
  • The same commands one would run on the CLI can easily be extrapolated and even run manually
Advertisements

etcdobj: A Minimal etcd Object Mapper for Python

I didn’t have a lot on my agenda Friday. I wanted to review and return emails, do some reading, get some minor hacking on etcdobj done (more on that…), eat more calories then normal in an attempt to screw with my metabolism (nailed it!), catch up with a few coworkers, play some video games, and, apparently, accidentally order an air purifier from Amazon. I succeed in all of it. But on to this etcdobj thing…

While working on Commissaire I started to feel a bit dirty over storing json documents in keys. It’s not uncommon, but it felt like it would be so much better if a document was broken into three layers:

  • Python: Classes/Objects
  • Transport: For saving/retreiving objects
  • etcd: A single or series of keys

By splitting up what normally is json data into a series of keys and two clients change overlapping parts of an object there won’t be a collision or require the client to fail, fetch, update, then try saving again. I searched the Internet for a library that would provide this and came up wanting. It seems that either simple keys/values or shoving json into a key is what most people stick with.

etcdobj is truly minimal. Partly because it’s new, partly because being small should make it easier to build upon or even bundle (it’s got a very permissive license), and partly because I’ve never written an ORM-like library before and don’t want to build to much on what could be a shaky foundation. That’s why I’m hoping this post will encourage some more eyes and help with the code.

Current Example

To create a representation of data a class must subclass EtcdObj and follow a few rules.

  1. __name__ must be provided as it will be the parent in the key path.
  2. Fields are class level variables and must be set to an instance that subclasses etcdobj.Field.
  3. The name of a field is the next layer in the key path and do not need to be the same as the class level variable.
from etcdobj import EtcdObj, fields

class Example(EtcdObj):
    __name__ = 'example' # The parent key
    # Fields all take a name that will be used as their key
    anint = fields.IntField('anint')
    astr = fields.StrField('astr')
    adict = fields.DictField('adict')

Creating a new object and saving it to etcd is pretty easy.

server = Server()

ex = Example(anint=1, astr="hello", adict={"hi": "there"})
ex.anint = 100  # update the value of anint
server.save(ex)
# Would save like so:
# /example/anint = "100"
# /example/astr = "hello"
# /example/adict/hi = "there"

As is retrieving the data.

new_ex = server.read(Example())
# new_ex.anint = 100
# new_ex.astr = "hello"
# new_ex.adict = {"hi": "there"}

Ideas

Some ideas for the future include:

  • Object watching (if data changes on the server it changes in the local instance)
  • Object to json structure
  • Deep DictField value casting/validation
  • Library level logging

Lend a Hand

The code base is currently around 416 lines of code including documentation and license header. If etcdobj sounds like something you’d use come take a look and help make it something better than I can produce all by my lonesome.

From Gevent to CherryPy

I’ve been working on a project for the last few months on GitHub called Commissaire along with some other smart folks. Without getting to deep into what the software is supposed to do, just know it’s a REST service which needs to handle some asynchronous tasks. When prototyping the web service I started utilizing gevent for it’s WSGI server and coroutines but, as it turns out, it didn’t end up being the best fit. This is not a post about gevent sucking because it doesn’t suck. gevent is pretty awesome but it’s not for every use case.

The Problem

One of the asynchronous tasks we do in Commissaire utilizes Ansible. We use the Ansible python API to handle part of host bootstrapping of a new host. Under the covers Ansible uses the multiprocessing module when executing it’s work. Specifically, this occurs when the TaskQueueManager starts its run. Under normal circumstances this is no problem but when gevent is in use it’s monkey patching ends up causing some problems. As noted in the post using monkey.patch_all(thread=False, socket=False) can be a solution. What this ends up doing is patching everything except thread and socket. But even this wasn’t enough for us to get past problems we were facing between multiprocessing, gevent, and Ansible. The closest patch we found was to also disable os, subprocess and a few other things making most of gevents great features unavailable. At this point it seemed pretty obvious gevent was not going to be a good fit.

Looking Elsewhere

There are no lack of options when looking for a Python web application server. Here are the requirements that I figured we would need:

Requirements

  • Importable as a library
  • Supports WSGI
  • Supports TLS
  • Active user base
  • Active development
  • Does not require a reverse proxy
  • Does not require greenlets
  • Supports Python 2 and 3

Based on the name of this post you already know we chose CherryPy. It hit all the requirements and came with a few added benefits. The plugin system which allows for calls to be published over an internal bus let’s us decouple our data saving internals (though couples us with CherryPy as it is doing the abstraction). The server is also already available in many Linux distributions at new enough versions. That’s a big boon hoping to have software easily installed via traditional means.

The runner up was Waitress. Unlike CherryPy which assumes you are developing within the CherryPy web framework, Waitress assumes WSGI. Unfortunately, Waitress requires a reverse proxy for TLS. If it had first class support for TLS we would have probably have picked it.

Going back to a more traditional threading server is definitely not as sexy as utilizing greenlets/coroutines but it has provided a consistent result when paired with a multiprocessing worker process and that is what matters.

Porting Time

Porting to a different library can be an annoying task and can feel like busy work. Porting can be even worse when you liked the library in use in the first place as I did (and still do!) with gevent.

Initial porting of main functionality from gevent to CherryPy took roughly four hours. After that, porting it took about another 6 hours to iron out some rough edges followed by updating unit tests. Really, the unit testing updates ended up being more work, in terms of time, than the actual functionality. A lot of that was our fault in how we use mock, but I digress. That’s really not much time!

So What

So far I’m happy with the results. The application functionality works as expected, the request/response speeds are more than acceptable, and CherryPy as a server has been fun to work with. Assuming no crazy corner cases don’t crop up I don’t see use moving off CherryPy anytime soon.

I Wish Fossil Would Take Off

Fossil is the coolest distributed SCM you are not using. No seriously. It boasts features not found in any of the common distributed SCMs used by nearly every developer today.

About 5 or 6 years ago I started getting a little frustrated with Git. The main complaint I kept coming back to over and over was, to use Git effectively, one needed to use GitHubTrac, or any other ways to add an interface with issues and information. There was also the problem of getting many CVS/SVN folks comfortable with Git terminology which fueled my recommendation of Mercurial, but I digress. It was around this time that a friend of mine and I started looking at a way we could include issues within a Git repository. At that time we looked a projects like BugsEverywhere which provide a separate tool to track bugs within the repository. We gave it a go for a little while but eventually fell away from it as, at the time, it really felt like a second class citizen in the Git toolchain. We spent a little time developing our on solution but then gave up realizing that Git was so tied to the GitHub way.

Around this time one of us found Fossil and started to play around with it. I was blown away at how it took care of code, issues, wiki, tracking, and code hosting. You essentially get a distributed version of Trac for every clone. All the data comes along and you are able to update documentation, code, issues, etc.. all as part of a fossil push.

As of the time of writing Fossil boasts (from the main page):

  1. Integrated Bug Tracking, Wiki, and Technotes
  2. Built-In Web Interface
  3. Self-Contained
  4. Simple Networking
  5. CGI/SCGI Enabled
  6. Autosync
  7. Robust & Reliable
  8. Free and Open-Source

I touched a little bit on 1 and 2, but 3 is also a pretty cool feature. If you do an install of Git you really are installing a bit more than you may realize. For example, Fedora’s Git package requires:

  1. asciidoc
  2. desktop-file-utils
  3. emacs
  4. expat-devel
  5. gettext
  6. libcurl-devel
  7. libgnome-keyring-devel
  8. openssl-devel
  9. pcre-devel
  10. perl(Error)
  11. perl(ExtUtils::MakeMaker)
  12. pkgconfig(bash-completion)
  13. python
  14. rpmlib(CompressedFileNames) <= 3.0.4-1
  15. rpmlib(FileDigests) <= 4.6.0-1
  16. systemd
  17. xmlto
  18. zlib-devel >= 1.2

In other words you need a specific editor, 2 languages available on the system, a specific init system, and a part of GNOME. Plain Git directly from source requires less, but still more than one would think. Fossil notes it’s dependencies as:

Fossil needs to be linked against zlib. If the HTTPS option is enabled, then it will also need to link against the appropriate SSL implementation. And, of course, Fossil needs to link against the standard C library. No other libraries or external dependences are used.

Philosophy

Fossil and Git have very different philosophies. The most interesting point to me when reading up on the differences was this:

Git puts a lot of emphasis on maintaining a “clean” check-in history. Extraneous and experimental branches by individual developers often never make it into the main repository. And branches are often rebased before being pushed, to make it appear as if development had been linear. Git strives to record what the development of a project should have looked like had there been no mistakes.

Fossil, in contrast, puts more emphasis on recording exactly what happened, including all of the messy errors, dead-ends, experimental branches, and so forth. One might argue that this makes the history of a Fossil project “messy”. But another point of view is that this makes the history “accurate”. In actual practice, the superior reporting tools available in Fossil mean that the added “mess” is not a factor.

One commentator has mused that Git records history according to the victors, whereas Fossil records history as it actually happened.

While pretty, (nearly) liner history is a simple read it rarely is actually true.

githistory

Using Fossil

There is a pretty decent quick start to get one started. At first run through it feels clunky. For instance, when doing a checkout you have to open the repository with fossil open but then again people felt (and some still feel) that git add $FILES, git commit, git push $PLACE $BRANCH feels wrong. I think that with enough time one can be just as comfortable with fossil’s commands and flow as they would be with git.

Truth Be Told

My biggest want for Fossil to take off is to be able to offline and merge bugs/issues and documentation without forcing everyone to adopt third party tools to integrate with an SCM. I also would like to keep my hands on my keyboard rather than logging into GitHub to review stuff (yeah, I know there are keyboard shortcuts …). Anyway, here is hoping more people will give Fossil a try!

Cloud Message Queues

More and more of my personal work utilizes message queues (MQ) to integrate systems or to propagate longer running work across pools of workers. AMQP is the 300 pound gorilla in the room when it comes to message queuing. Implementations of RabbitMQ, Qpid, Red Hat MRG, etc.. abound. However, when you are the little guy on the field it can be economical to use a cloud service so you can focus directly on your product. Can cloud message queues be a good replacement for running a MQ yourself?

What I Expect

I’m going to make some assumptions that the service will be available, messages will not disappear (unless I set it to do so) and minor network latency is acceptable.

These are features I expect in priority order:

  1. Central connection point (+5)
  2. FIFO support (+4)
  3. Delivery to first available consumer (+3)
  4. Basic publish/subscribe support (+2)
  5. Push message support (+1)

Options

As it turns out there are more players in the cloud messaging space than I would have thought! A quick search turned up the obvious Amazon SQS along with IronMQ, stormmqSoftlayer Message Queue and Marconi.

Amazon SQS

I tend to think Amazon’s SQS is probably the default MQ as a Service. So many people use AWS and it’s right there ready to be used.

  1. Central connection point: Yes (+5)
  2. FIFO support: No
  3. Delivery to first available consumer: Sort of… (+1)
  4. Basic publish/subscribe support: Yes (+2)
  5. Push message support: Sort of… (+0)

There is no doubt that Amazon’s SQS is a great system but right off the bat it’s obvious it doesn’t meet what I expect. According to the FAQ:

Q: Does Amazon SQS provide first-in-first-out (FIFO) access to messages?

No, Amazon SQS does not guarantee FIFO access to messages in Amazon SQS queues, mainly because of the distributed nature of the Amazon SQS. If you require specific message ordering, you should design your application to handle it.

The first consumer who makes an API request will get the next message. That is sort of delivery to the first available consumer but not exactly.

Push messaging is not directly supported but long polling is available. This is close enough that I’d consider it.

Q: What is SQS Long Polling?

SQS long polling is a new way to retrieve messages from your SQS queues. While the traditional SQS short polling returns immediately, even if the queue being polled is empty, SQS long polling doesn’t return a response until a message arrives in the queue, or the long poll times out. SQS long polling makes it easy and inexpensive to retrieve messages from your SQS queue as soon as they are available.

Feature Result: 8

Result: 8/15

IronMQ

Comparing IronMQ to Amazon SQS was interesting. Unlike SQS, IronMQ uses a REST interface which I feel simplifies MQ as a web service. I played a little bit with the service and it was much speedier than I thought it would be! I even tried the beanstalkd support but wasn’t able to get it to fully work.

  1. Central connection point: Yes (+5)
  2. FIFO support: Yes (+4)
  3. Delivery to first available consumer: Sort of… (+1)
  4. Basic publish/subscribe support: Yes (+2)
  5. Push message support: Not really…

Again there is a hiccup on delivery to the first available consumer. Just like SQS, IronMQ is based off requests from the clients. It could match if a client requests only when it can fully dedicate itself to the next message.

Unfortunately, the push support in IronMQ doesn’t cut it for me. If I’m reading the documentation correctly it makes an assumption that the consumers are all listening HTTP servers. I see the use case for this but I also wouldn’t want to make some or all of my consumers publicly listening on the Internet and spinning off work in another thread or process. I’d rather long polling.

Feature Result: 12

Result: 12/15

stormmq

This didn’t get any testing whatsoever. According to the features page it will have a GA of Q1 2013 (which was earlier this year..). I sent a message via Twitter to find out if I’m just seeing old data on their site. For now I will assume that the statement that it’s AMQP 1-0 is true.

  1. Central connection point: Yes (+5)
  2. FIFO support: Yes (+4)
  3. Delivery to first available consumer: Yes (+3)
  4. Basic publish/subscribe support: Yes (+2)
  5. Push message support: Yes (+1)

This was the first service which would meet all of my wants. However, I can’t sign up for it or use it so it kind of knocks it off the list.

Feature Result: 15

Penalty for not being available: -15

Result: 0/15

Softlayer Message Queue

  1. Central connection point: Yes (+5)
  2. FIFO support: No
  3. Delivery to first available consumer: No
  4. Basic publish/subscribe support: Yes (+2)
  5. Push message support: No

Like SQS, Softlayer Message Queue notes that FIFO is not supported:

Does SoftLayer Message Queue provide first-in-first-out (FIFO) message access?

While the system does it’s best to return messages in FIFO order, it is not guaranteed. A number of factors can influence message ordering, including message timeouts and updated visibility intervals.

The FAQ also notes that it is possible to have conditions where a consumer (or consumers?) may get the exact same message multiple times.

How can multiple readers access the same message queue, without losing messages or processing them many times?

Queues can be accessed by any number of consumers/clients. When a consumer requests messages from the queue, each message is marked invisible—this prevents other consumers from receiving the same message. However, the distributed nature of the message queue cannot guarantee single delivery of any one message. While the system makes a best effort, clients should expect the possibility of receiving the same message multiple times.

Since the MQ service is web service (REST) based there is no pushing of messages, only pulling. I didn’t see anything noting long polling or even push to HTTP servers like IronMQ.

Feature Result: 7

Result: 7/15

Marconi

  1. Central connection point: Yes (+5)
  2. FIFO support: Yes (+4)
  3. Delivery to first available consumer: Sort of.. (+1)
  4. Basic publish/subscribe support: Yes (+2)
  5. Push message support: In progress…

Since Marconi is primarily web service based (REST) it has the same issues as Amazon SQS and IronMQ.

Push messaging is not currently available from what I can tell but there is ZMQ bindings in the work with AMQP work coming later.

The negative part to using Marconi is that one either needs to be using an OpenStack based service already or they will need to set up the service themselves. This does add some overhead to using it. Also I keep wanting to say macaroni.

Feature result: 12

Bonus for being Open Source: +2

Penalty for needing specific software: -5

Result: 9/15

Conclusion

So can cloud message queuing be a good replacement for running your own service? It highly depends on what you are doing, but if you don’t need all the features of modern MQ systems then you can run with a cloud MQ and cut some time/cost.

It may seem unfair, but I am wary of using stormmq because of the old GA info on the site even if they went GA today. At a later date after GA I’d consider trying them but I’d want to see a bit better messaging (no pun intended) via their site.

For me it makes the most sense to use IronMQ until I end up on an OpenStack based system at which point, if ZMQ or AMQP is available, I’d likely switch over. Having ZMQ/AMQP’s push ability would be worth the move.