Check out John Dupuy’s video on getting up and running with flask-track-usage 2.0.0!
Tag: Development
flask-track-usage 2.0.0 testing
flask-track-usage is nearing a new milestone: 2.0.0. This new release will include a few bugfixes and a number of enhancements. The two that stick out are time aggregate functions in the form of hooks, split between Storage and Writer classes, and the ability to store custom date via a track_var global. Currently the work is being housed in a branch but will be updated and merged once 2.0.0 has a little time for testing. Some highlights in 2.0.0 include:
- alembic: Support for upgrading SQL schema
- sql: Create table if it is not pres
- entcouchdb: Add track_var and username
- redis: Add track_var and username
- user_defined variable field to storage
- Hooks: add new hooks system
- test: Skip mongoengine if connection can not be made
- storage: Rename to PrinterWriter
- output: Add OutputWriter
- storage: Create base class and Writer
- requirements: Added six
- doc: Add note about py2 and 3
- freegeoip: Fix missing attribute
- py3: Fix most obvious offenders
- Move mongoengine ref in Travis CI config
- Update Travis CI config to include mongoengine lib
Help test the 2.0.0 branch by cloning and installing in your development environment!
Programmatic Ansible Middle Ground
Forward
About a year ago serversforhackers posted a great article on how to run Ansible programamatically Since then Ansible has had a major release which introduced changes within the Python API.
Simulating The CLI
Not that long ago Jason DeTiberus and I were talking about how to use Ansible from within other Python packages. One of the things he said was that it should be possible to reuse the command line code instead of the internal API if you hook into the right place. I finally had some time to take a look and it seems he’s right!
If you took a look at the 2.0 API you’ll see there is a lot more power handed over to you as the developer but with that comes a lot of code. Code that for many will be nearly copy/paste style code directly from command-line interface code. So when there is not a need for the extra power why not just reuse code that already exists?
import os # Used for expanding paths from ansible.cli.playbook import PlaybookCLI from ansible.errors import AnsibleOptionsError, AnsibleParserError def execute_playbook(playbook, hosts, args=[]): """ :param playbook: Full path to the playbook to execute. :type playbook: str :param hosts: A host or hosts to target the playbook against. :type hosts: str, list, or tuple :param args: Other arguments to pass to the run. :type args: list :returns: The TaskQueueHandler for the run. :rtype: ansible.executor.task_queue_manager.TaskQueueManager. """ # Set hosts args up right for the ansible parser. It likes to have trailing ,'s if isinstance(hosts, basestring): hosts = hosts + ',' elif hasattr(hosts, '__iter__'): hosts = ','.join(hosts) + ',' else: raise AnsibleParserError('Can not parse hosts of type {}'.format( type(hosts))) # Create the cli object cli_args = ['playbook'] + args + ['-i', hosts, os.path.realpath(playbook)] print('Executing: {}'.format(' '.join(cli_args))) cli = PlaybookCLI(cli_args) # Parse args and run it try: cli.parse() # Return the result: # 0: Success # 1: "Error" # 2: Host failed # 3: Unreachable # 4: Parser Error # 5: Options error return cli.run() except (AnsibleParserError, AnsibleOptionsError) as error: print('{}: {}'.format(type(error), error)) raise error
Breaking It Down
The function starts off with some hosts parsing. This is not really needed but it does make the function easier to work with. On the command line Ansible likes to have a comma at the end of hosts passed in. This chunk of code makes sure that if a list or string is given for a host that the resulting host string is properly formatted.
# Set hosts args up right for the ansible parser. It likes to have trailing ,'s if isinstance(hosts, basestring): hosts = hosts + ',' elif hasattr(hosts, '__iter__'): hosts = ','.join(hosts) + ',' else: raise AnsibleParserError('Can not parse hosts of type {}'.format(type(hosts)))
The Real Code
This chunk of code is what is actually calling Ansible. It creates the command line argument list, creates a PlaybookCLI
instance, has it parsed, and then executes the playbook.
# Create the cli object cli_args = ['playbook'] + args + ['-i', hosts, os.path.realpath(playbook)] print('Executing: {}'.format(' '.join(cli_args))) cli = PlaybookCLI(cli_args) # Parse args and run it try: cli.parse() # Return the result: # 0: Success # 1: "Error" # 2: Host failed # 3: Unreachable # 4: Parser Error # 5: Options error return cli.run() except (AnsibleParserError, AnsibleOptionsError) as error: print('{}: {}'.format(type(error), error)) raise error
Using The Function
# Execute /tmp/test.yaml with 2 hosts result = execute_playbook('/tmp/test.yaml', ['192.168.152.100', '192.168.152.101']) # Execute /tmp/test.yaml with 1 host and add the -v flag result = execute_playbook('/tmp/test.yaml', '192.168.152.101', ['-v'])
Intercepting The Output
One drawback of using the command-line interface code directly is that the output is expected to go to the user in the standard way. That is to say, it’s sent to the screen and colorized. This will probably be fine for some, but others may want to grab the output and use it in some form. While it is possible to change output through the configuration options it is also possible to monkey patch display
and intercept the output for your own use cases. As an example, here is a Display
class which forwards all output that is not meant for the screen only to our logging.info
method.
# MONKEY PATCH to catch output. This must happen at the start of the code! import logging from ansible.utils.display import Display # Set up our logging logger = logging.getLogger('transport') logger.setLevel(logging.INFO) handler = logging.StreamHandler() handler.formatter = logging.Formatter('%(name)s - %(message)s') logger.addHandler(handler) class LogForward(Display): """ Quick hack of a log forwarder """ def display(self, msg, screen_only=None, *args, **kwargs): """ Pass display data to the logger. :param msg: The message to log. :type msg: str :param args: All other non-keyword arguments. :type args: list :param kwargs: All other keyword arguments. :type kwargs: dict """ # Ignore if it is screen only output if screen_only: return logging.getLogger('transport').info(msg) # Forward it all to display info = display warning = display error = display # Ignore debug debug = lambda s, *a, **k: True # By simply setting display Ansible will slurp it in as the display instance display = LogForward() # END MONKEY PATCH. Add code after this line.
Putting It All Together
If you want to use it all together it should look like this:
# MONKEY PATCH to catch output. This must happen at the start of the code! import logging from ansible.utils.display import Display # Set up our logging logger = logging.getLogger('transport') logger.setLevel(logging.INFO) handler = logging.StreamHandler() handler.formatter = logging.Formatter('%(name)s - %(message)s') logger.addHandler(handler) class LogForward(Display): """ Quick hack of a log forwarder """ def display(self, msg, screen_only=None, *args, **kwargs): """ Pass display data to the logger. :param msg: The message to log. :type msg: str :param args: All other non-keyword arguments. :type args: list :param kwargs: All other keyword arguments. :type kwargs: dict """ # Ignore if it is screen only output if screen_only: return logging.getLogger('transport').info(msg) # Forward it all to display info = display warning = display error = display # Ignore debug debug = lambda s, *a, **k: True # By simply setting display Ansible will slurp it in as the display instance display = LogForward() # END MONKEY PATCH. Add code after this line. import os # Used for expanding paths from ansible.cli.playbook import PlaybookCLI from ansible.errors import AnsibleOptionsError, AnsibleParserError def execute_playbook(playbook, hosts, args=[]): """ :param playbook: Full path to the playbook to execute. :type playbook: str :param hosts: A host or hosts to target the playbook against. :type hosts: str, list, or tuple :param args: Other arguments to pass to the run. :type args: list :returns: The TaskQueueHandler for the run. :rtype: ansible.executor.task_queue_manager.TaskQueueManager. """ # Set hosts args up right for the ansible parser. It likes to have trailing ,'s if isinstance(hosts, basestring): hosts = hosts + ',' elif hasattr(hosts, '__iter__'): hosts = ','.join(hosts) + ',' else: raise AnsibleParserError('Can not parse hosts of type {}'.format( type(hosts))) # Create the cli object cli_args = ['playbook'] + args + ['-i', hosts, os.path.realpath(playbook)] logger.info('Executing: {}'.format(' '.join(cli_args))) cli = PlaybookCLI(cli_args) # Parse args and run it try: cli.parse() # Return the result: # 0: Success # 1: "Error" # 2: Host failed # 3: Unreachable # 4: Parser Error # 5: Options error return cli.run() except (AnsibleParserError, AnsibleOptionsError) as error: logger.error('{}: {}'.format(type(error), error)) raise error
Pros and Cons
Of course nothing is without drawbacks. Here are some negatives with this method:
- No direct access to “TaskQueueManager“
- If the CLI changes the code must change
- Monkey patching …. ewww
But the positives seem to be worth it so far:
- You don’t have to deal with “TaskQueueManager“ and all of the construction code
- The CLI doesn’t seem to change often
- The same commands one would run on the CLI can easily be extrapolated and even run manually
etcdobj: A Minimal etcd Object Mapper for Python
I didn’t have a lot on my agenda Friday. I wanted to review and return emails, do some reading, get some minor hacking on etcdobj done (more on that…), eat more calories then normal in an attempt to screw with my metabolism (nailed it!), catch up with a few coworkers, play some video games, and, apparently, accidentally order an air purifier from Amazon. I succeed in all of it. But on to this etcdobj thing…
While working on Commissaire I started to feel a bit dirty over storing json documents in keys. It’s not uncommon, but it felt like it would be so much better if a document was broken into three layers:
- Python: Classes/Objects
- Transport: For saving/retreiving objects
- etcd: A single or series of keys
By splitting up what normally is json data into a series of keys and two clients change overlapping parts of an object there won’t be a collision or require the client to fail, fetch, update, then try saving again. I searched the Internet for a library that would provide this and came up wanting. It seems that either simple keys/values or shoving json into a key is what most people stick with.
etcdobj is truly minimal. Partly because it’s new, partly because being small should make it easier to build upon or even bundle (it’s got a very permissive license), and partly because I’ve never written an ORM-like library before and don’t want to build to much on what could be a shaky foundation. That’s why I’m hoping this post will encourage some more eyes and help with the code.
Current Example
To create a representation of data a class must subclass EtcdObj and follow a few rules.
- __name__ must be provided as it will be the parent in the key path.
- Fields are class level variables and must be set to an instance that subclasses etcdobj.Field.
- The name of a field is the next layer in the key path and do not need to be the same as the class level variable.
from etcdobj import EtcdObj, fields class Example(EtcdObj): __name__ = 'example' # The parent key # Fields all take a name that will be used as their key anint = fields.IntField('anint') astr = fields.StrField('astr') adict = fields.DictField('adict')
Creating a new object and saving it to etcd is pretty easy.
server = Server() ex = Example(anint=1, astr="hello", adict={"hi": "there"}) ex.anint = 100 # update the value of anint server.save(ex) # Would save like so: # /example/anint = "100" # /example/astr = "hello" # /example/adict/hi = "there"
As is retrieving the data.
new_ex = server.read(Example()) # new_ex.anint = 100 # new_ex.astr = "hello" # new_ex.adict = {"hi": "there"}
Ideas
Some ideas for the future include:
- Object watching (if data changes on the server it changes in the local instance)
- Object to json structure
- Deep DictField value casting/validation
- Library level logging
Lend a Hand
The code base is currently around 416 lines of code including documentation and license header. If etcdobj sounds like something you’d use come take a look and help make it something better than I can produce all by my lonesome.
From Gevent to CherryPy
I’ve been working on a project for the last few months on GitHub called Commissaire along with some other smart folks. Without getting to deep into what the software is supposed to do, just know it’s a REST service which needs to handle some asynchronous tasks. When prototyping the web service I started utilizing gevent for it’s WSGI server and coroutines but, as it turns out, it didn’t end up being the best fit. This is not a post about gevent sucking because it doesn’t suck. gevent is pretty awesome but it’s not for every use case.
The Problem
One of the asynchronous tasks we do in Commissaire utilizes Ansible. We use the Ansible python API to handle part of host bootstrapping of a new host. Under the covers Ansible uses the multiprocessing module when executing it’s work. Specifically, this occurs when the TaskQueueManager starts its run. Under normal circumstances this is no problem but when gevent is in use it’s monkey patching ends up causing some problems. As noted in the post using monkey.patch_all(thread=False, socket=False) can be a solution. What this ends up doing is patching everything except thread and socket. But even this wasn’t enough for us to get past problems we were facing between multiprocessing, gevent, and Ansible. The closest patch we found was to also disable os, subprocess and a few other things making most of gevents great features unavailable. At this point it seemed pretty obvious gevent was not going to be a good fit.
Looking Elsewhere
There are no lack of options when looking for a Python web application server. Here are the requirements that I figured we would need:
Requirements
- Importable as a library
- Supports WSGI
- Supports TLS
- Active user base
- Active development
- Does not require a reverse proxy
- Does not require greenlets
- Supports Python 2 and 3
Based on the name of this post you already know we chose CherryPy. It hit all the requirements and came with a few added benefits. The plugin system which allows for calls to be published over an internal bus let’s us decouple our data saving internals (though couples us with CherryPy as it is doing the abstraction). The server is also already available in many Linux distributions at new enough versions. That’s a big boon hoping to have software easily installed via traditional means.
The runner up was Waitress. Unlike CherryPy which assumes you are developing within the CherryPy web framework, Waitress assumes WSGI. Unfortunately, Waitress requires a reverse proxy for TLS. If it had first class support for TLS we would have probably have picked it.
Going back to a more traditional threading server is definitely not as sexy as utilizing greenlets/coroutines but it has provided a consistent result when paired with a multiprocessing worker process and that is what matters.
Porting Time
Porting to a different library can be an annoying task and can feel like busy work. Porting can be even worse when you liked the library in use in the first place as I did (and still do!) with gevent.
Initial porting of main functionality from gevent to CherryPy took roughly four hours. After that, porting it took about another 6 hours to iron out some rough edges followed by updating unit tests. Really, the unit testing updates ended up being more work, in terms of time, than the actual functionality. A lot of that was our fault in how we use mock, but I digress. That’s really not much time!
So What
So far I’m happy with the results. The application functionality works as expected, the request/response speeds are more than acceptable, and CherryPy as a server has been fun to work with. Assuming no crazy corner cases don’t crop up I don’t see use moving off CherryPy anytime soon.
I Wish Fossil Would Take Off
Fossil is the coolest distributed SCM you are not using. No seriously. It boasts features not found in any of the common distributed SCMs used by nearly every developer today.
About 5 or 6 years ago I started getting a little frustrated with Git. The main complaint I kept coming back to over and over was, to use Git effectively, one needed to use GitHub, Trac, or any other ways to add an interface with issues and information. There was also the problem of getting many CVS/SVN folks comfortable with Git terminology which fueled my recommendation of Mercurial, but I digress. It was around this time that a friend of mine and I started looking at a way we could include issues within a Git repository. At that time we looked a projects like BugsEverywhere which provide a separate tool to track bugs within the repository. We gave it a go for a little while but eventually fell away from it as, at the time, it really felt like a second class citizen in the Git toolchain. We spent a little time developing our on solution but then gave up realizing that Git was so tied to the GitHub way.
Around this time one of us found Fossil and started to play around with it. I was blown away at how it took care of code, issues, wiki, tracking, and code hosting. You essentially get a distributed version of Trac for every clone. All the data comes along and you are able to update documentation, code, issues, etc.. all as part of a fossil push.
As of the time of writing Fossil boasts (from the main page):
- Integrated Bug Tracking, Wiki, and Technotes
- Built-In Web Interface
- Self-Contained
- Simple Networking
- CGI/SCGI Enabled
- Autosync
- Robust & Reliable
- Free and Open-Source
I touched a little bit on 1 and 2, but 3 is also a pretty cool feature. If you do an install of Git you really are installing a bit more than you may realize. For example, Fedora’s Git package requires:
- asciidoc
- desktop-file-utils
- emacs
- expat-devel
- gettext
- libcurl-devel
- libgnome-keyring-devel
- openssl-devel
- pcre-devel
- perl(Error)
- perl(ExtUtils::MakeMaker)
- pkgconfig(bash-completion)
- python
- rpmlib(CompressedFileNames) <= 3.0.4-1
- rpmlib(FileDigests) <= 4.6.0-1
- systemd
- xmlto
- zlib-devel >= 1.2
In other words you need a specific editor, 2 languages available on the system, a specific init system, and a part of GNOME. Plain Git directly from source requires less, but still more than one would think. Fossil notes it’s dependencies as:
Fossil needs to be linked against zlib. If the HTTPS option is enabled, then it will also need to link against the appropriate SSL implementation. And, of course, Fossil needs to link against the standard C library. No other libraries or external dependences are used.
Philosophy
Fossil and Git have very different philosophies. The most interesting point to me when reading up on the differences was this:
Git puts a lot of emphasis on maintaining a “clean” check-in history. Extraneous and experimental branches by individual developers often never make it into the main repository. And branches are often rebased before being pushed, to make it appear as if development had been linear. Git strives to record what the development of a project should have looked like had there been no mistakes.
Fossil, in contrast, puts more emphasis on recording exactly what happened, including all of the messy errors, dead-ends, experimental branches, and so forth. One might argue that this makes the history of a Fossil project “messy”. But another point of view is that this makes the history “accurate”. In actual practice, the superior reporting tools available in Fossil mean that the added “mess” is not a factor.
One commentator has mused that Git records history according to the victors, whereas Fossil records history as it actually happened.
While pretty, (nearly) liner history is a simple read it rarely is actually true.
Using Fossil
There is a pretty decent quick start to get one started. At first run through it feels clunky. For instance, when doing a checkout you have to open the repository with fossil open but then again people felt (and some still feel) that git add $FILES, git commit, git push $PLACE $BRANCH feels wrong. I think that with enough time one can be just as comfortable with fossil’s commands and flow as they would be with git.
Truth Be Told
My biggest want for Fossil to take off is to be able to offline and merge bugs/issues and documentation without forcing everyone to adopt third party tools to integrate with an SCM. I also would like to keep my hands on my keyboard rather than logging into GitHub to review stuff (yeah, I know there are keyboard shortcuts …). Anyway, here is hoping more people will give Fossil a try!
Flask-Track-Usage 1.1.0 Released
A few years ago the initial Flask-Track-Usage release was announced via my blog. At the time I thought I’d probably be the one user. I’m glad to say I was wrong! Today I’m happy to announce the release of Flask-Track-Usage 1.1.0 which sports a number enhancements and bug fixes.
Unfortunately, some changes are not backwards compatible. However, I believe the backwards incompatible changes make the overall experience better. If you would like to stick with the previous version of Flask-Track-Usage make sure to version pin in your requirements file/section:
flask_track_usage==1.0.1
Version 1.1.0 has made changes requested by the community as well as a few bug fixes. These include:
- Addition of the X-Forwarded-For header as xforwardedfor in storage. Requested by jamylak.
- Configurable GeoIP endpoint support. Requested by jamylak.
- Migration from pymongo.Connection to pymongo.MongoClient.
- Better SQLStorage metadata handling. Requested by gouthambs.
- SQLStorage implementation redesign. Requested and implemented by gouthambs.
- Updated documentation for 1.1.0.
- Better unittesting.
I’d like to thank Gouthaman Balaraman who has been a huge help authoring the SQLStorage based on the SQLAlchemy ORM and providing feedback and support on Flask-Track-Usage design.
As always, please report bugs and feature requests on the GitHub Issues Page.
Red Hat Developer Blog: Git Bonsai, or Keeping Your Branches Well Pruned
Code repositories are the final resting place for code, acting as equal parts bank vault, museum, and graveyard. Unlike a vault, content is almost always added faster than it is removed, but much like a graveyard, there is a definite miasma when a repository becomes too full. That smell is developer frustration, as they have to search through dozens, or eventually, hundreds of branches to find the one they want.
We’ve had sporadic cases where branches did not get merged into masters (and sometimes fixes were overwritten in later releases) and have wasted collectively hundreds of developer hours on “which branch is this in?” exchanges.
Sam and I talk about a simple yet helpful git tool to squash bad branches over at the Red Hat Developer Blog.
Red Hat Developer Blog: Feeling Developer Pain
The rest of this post describes our journey from initially trying to implement a simple solution to improve the day-to-day lives of developers, through the technical limitations we experienced along the way, and finally arrives at the empathy for our developers we’ve gained from that experience. We’ll wrap up with a note on how Red Hat Software Collections (announced as GA in September) would’ve simplified our development process.
Read the whole post Tim and I wrote over at the Red Hat Develope Blog.