diff options
-rw-r--r-- | README.python | 87 | ||||
-rw-r--r-- | TODO.python | 65 | ||||
-rw-r--r-- | baserockimport/app.py | 3 | ||||
-rwxr-xr-x | baserockimport/exts/python.find_deps | 124 | ||||
-rwxr-xr-x | baserockimport/exts/python.to_lorry | 73 | ||||
-rwxr-xr-x | baserockimport/exts/python_run_pip | 32 |
6 files changed, 173 insertions, 211 deletions
diff --git a/README.python b/README.python index a22f517..579034a 100644 --- a/README.python +++ b/README.python @@ -1,32 +1,43 @@ README ------ -Most (nearly all) python packages use setuptools, for detailed information on -setuptools see the setuptools docs[1]. If you're not familiar with setuptools -you should read the docs[1][2] before continuing. - Please note that this tool expects any python packages to be on pypi, you cannot currently import packages from other places. -This import tool uses a combination of pypi metadata, -pip and setuptools commands to extract dependency information -to create a set of definitions useable with Baserock. This is not a stable -process and will not work smoothly in many cases: because setup.py -is just an ordinary Python script it's possible for a setup.py to do things that -break the import tool's means to extract dependencies, for example, some packages -bypass parts of setuptools and subclass parts of distutils's core instead. -Another problem with importing python packages is that packages are uploaded -to pypi as tarballs rather than as repositories and as a result the import tool -generates a lot of tarball lorries which is the least desireable kind of lorry -to use with Baserock. To avoid this the import tool looks through parts of the -package metadata for links to real repos, this detection is currently extremely -basic and will hopefully be improved in future to allow the tool to reduce the -number of tarball lorries it generates. Some python packages -only declare their dependency information in a human readable form within a -README, this tool cannot do anything to extract dependency -information that is not encoded in a machine readable fashion. At the time of -writing numpy is an example of such a package: running the import tool on numpy -will yield a stratum that contains numpy and none of its dependencies. +This import tool uses PyPI metadata and setuptools commands to extract +dependency information. + +To get runtime dependency information for a package, it sets up a 'virtualenv' +environment, installs the package from PyPI using 'pip', and then uses 'pip +freeze' to get a list of exactly what packages have been installed. This is +pretty inefficient, in terms of computing resources: installation involves +downloading and sometimes compiling C source code. However, it is the most +reliable and maintainable approach we have found so far. + +Python packaging metadata is something of a free-for-all. Reusing the code of +Pip is great because Pip is probably the most widely-tested consumer of Python +packaging metadata. We did submit a patch to add an 'install +--list-dependencies' mode to Pip, but this wasn't accepted. The current +approach should work with pretty much any version of Pip, no special patches +required. + +Most (nearly all) python packages use setuptools, for detailed information on +setuptools see the setuptools docs[1]. + +Python packages are uploaded to PyPI as tarballs, and PyPI doesn't have a +standard way of indicating where the canonical source repository for a package +is. The python.to_lorry import extension just generates .lorry files pointing +to those tarballs, as this is the most reliable thing it can do. It would be +possible to guess at where the source repo is, but this can have random failure +cases and you end up mirroring a project's home page instead of its source code +sometimes. + +Some python packages only declare their dependency information in a human +readable form within a README, this tool cannot do anything to extract +dependency information that is not encoded in a machine readable fashion. At +the time of writing, 'numpy' is an example of such a package: running the import +tool on 'numpy' will yield a stratum that contains 'numpy' and none of its +dependencies. Python packages may require other packages to be present for build/installation to proceed, in setuptools these are called setup requirements. @@ -34,26 +45,28 @@ Setup requirements naturally translate to Baserock build dependencies, in practice most python packages don't have any setup requirements, so the lists of build-depends for each chunk will generally be empty lists. -Many python packages require additional (in addition to a python interpreter) -packages to be present at runtime, in setuptools parlance these are install -requirements. The import tool uses pip to recursively extract runtime -dependency information for a given package, each dependency is added to the -same stratum as the package we're trying to import. All packages implicitly -depend on a python interpreter, the import tool encodes this by making all -strata build depend on core, which at the time of writing contains cpython. Traps ----- * Because pip executes setup.py commands to determine dependencies and some packages' setup.py files invoke compilers, the import tool may end up -running compilers. +running compilers. You can pass `--log=/dev/stdout` to get detailed progress +information on the console, which will show you if this is happening. -* pip puts errors on stdout, some import tool errors may be vague: if it's -not clear what's going on you can check the log, if you're using ---log-level=debug then the import tool will log the output of all the commands -it executes to obtain dependency information. -[1]: https://pythonhosted.org/setuptools/ -[2]: https://pythonhosted.org/an_example_pypi_project/setuptools.html +Good testcases +-------------- + +Here are some interesting test cases: + + - ftw.blog (fails because needs .zip import) + - MySQL-python (fails because needs .zip import) + - nixtla (long, unnecessary compilation involved) + - python-keystoneclient (~40 deps, needs --force-stratum due to missing tag) + - rejester (~24 deps) + - requests (~26 deps, mostly not actually needed at runtime) + + +[1]: https://pythonhosted.org/setuptools/ diff --git a/TODO.python b/TODO.python index 16b7889..6c276f7 100644 --- a/TODO.python +++ b/TODO.python @@ -25,68 +25,3 @@ this will be confusing, we should emit nice version numbers. i.e. nixtla * add a test runner - -* Importing python packages that use pbr fails, see -https://bitbucket.org/pypa/setuptools/issue/73/typeerror-dist-must-be-a-distribution#comment-7267980 -The most sensible option would seem to be to make use of the sane environment -that pbr provides: just read the dependency information from the text files -that pbr projects provide, see, http://docs.openstack.org/developer/pbr/ - -Results from running the import tool on various python packages follow: - -* Imports tested so far (stratum is generated) - * SUCCEEDS - * nixtla: fine but requires compilation - * ryser - * Twisted - * Django - * textdata - * whale-agent - * virtualenv - * lxml - * nose - * six - * simplejson - * pika - * MarkupSafe - * zc.buildout - * Paste - * pycrypto - * Jinja2 - * Flask - * bcdoc - * pymongo - - * FAILS - * python-keystoneclient - * All openstack stuff requires pbr, pbr does not play nicely with - current setuptools see: [Issue 73](https://bitbucket.org/pypa/setuptoolsissue/73/typeerror-dist-must-be-a-distribution#comment-7267980) - we can either fix setuptools/pbr or make use of the sane environment - pbr provides. - * persistent-pineapple - * Git repo[1] has different layout to tarball[2] downloadeable from pypi, - git repo's layout isn't 'installable' by pip, so dependencies can - not be determined. - [1]: https://github.com/JasonAUnrein/Persistent-Pineapple - [2]: https://pypi.python.org/packages/source/p/persistent_pineapple/persistent_pineapple-1.0.0.tar.gz - * ftw.blog - * cannot satisfy dependencies - * boto - * cannot satisfy dependencies - * jmespath - * cannot satisfy dependencies - * rejester - * its setup.py subclasses distutils.core - * requests - * cannot satisfy dependencies - * MySQL-python - * egg_info blows up, - * python setup.py install doesn't even work - * maybe the user's expected to do some manual stuff first, who knows - * rejester (its setup.py subclasses distutils.core) - * redis-jobs (succeeded at first, no longer exists on pypi) - * coverage (stratum couldn't be generated because some tags are missing) - -* Imports completely tested, built, deployed and executed successfully: - - * Flask diff --git a/baserockimport/app.py b/baserockimport/app.py index 5f3d435..ae95d58 100644 --- a/baserockimport/app.py +++ b/baserockimport/app.py @@ -227,7 +227,8 @@ class BaserockImportApplication(cliapp.Application): loop = baserockimport.mainloop.ImportLoop(app=self, goal_kind='python', goal_name=package_name, - goal_version=package_version) + goal_version=package_version, + ignore_version_field=True) loop.enable_importer('python', strata=['strata/core.morph'], package_comp_callback=comp) loop.run() diff --git a/baserockimport/exts/python.find_deps b/baserockimport/exts/python.find_deps index 91a9e39..b173110 100755 --- a/baserockimport/exts/python.find_deps +++ b/baserockimport/exts/python.find_deps @@ -24,6 +24,7 @@ from __future__ import print_function +import contextlib import sys import subprocess import os @@ -32,6 +33,7 @@ import tempfile import logging import select import signal +import shutil import pkg_resources import xmlrpclib @@ -262,65 +264,109 @@ def find_build_deps(source, name, version=None): return build_deps + +@contextlib.contextmanager +def temporary_virtualenv(): + tempdir = tempfile.mkdtemp() + + logging.debug('Creating virtualenv in %s', tempdir) + p = subprocess.Popen(['virtualenv', tempdir], stdout=subprocess.PIPE, + stderr=subprocess.STDOUT) + + while True: + line = p.stdout.readline() + if line == '': + break + + logging.debug(line.rstrip('\n')) + + p.wait() # even with eof, wait for termination + + try: + yield tempdir + finally: + logging.debug('Removing virtualenv at %s', tempdir) + shutil.rmtree(tempdir) + + def find_runtime_deps(source, name, version=None, use_requirements_file=False): logging.debug('Finding runtime dependencies for %s%s at %s' % (name, ' %s' % version if version else '', source)) - # Run our patched pip to get a list of installed deps - # Run pip install . --list-dependencies=instdeps.txt with cwd=source - # Some temporary file needed for storing the requirements tmpfd, tmppath = tempfile.mkstemp() logging.debug('Writing install requirements to: %s', tmppath) - args = ['pip', 'install', '.', '--list-dependencies=%s' % tmppath] - if use_requirements_file: - args.insert(args.index('.') + 1, '-r') - args.insert(args.index('.') + 2, 'requirements.txt') + with temporary_virtualenv() as virtenv_path: + shutil.copytree(source, os.path.join(virtenv_path, 'source')) - logging.debug('Running pip, args: %s' % args) + pip_runner = os.path.join(os.path.dirname(os.path.abspath(__file__)), + 'python_run_pip') + logging.debug('pip_runner: %s', pip_runner) - p = subprocess.Popen(args, cwd=source, stdout=subprocess.PIPE, - stderr=subprocess.STDOUT) + subprocess_env = os.environ.copy() + subprocess_env['TMPDIR'] = os.path.join(virtenv_path, 'tmp') + logging.debug('Using %s as TMPDIR', subprocess_env['TMPDIR']) + p = subprocess.Popen(pip_runner, cwd=virtenv_path, env=subprocess_env, + stdout=subprocess.PIPE, stderr=subprocess.STDOUT) - output = [] - while True: - line = p.stdout.readline() - if line == '': - break + output = [] + while True: + line = p.stdout.readline() + if line == '': + break - logging.debug(line.rstrip('\n')) - output.append(line) + logging.debug(line.rstrip('\n')) + output.append(line) - p.wait() # even with eof, wait for termination + p.wait() # even with eof, wait for termination - logging.debug('pip exited with code: %d' % p.returncode) + logging.debug('pip exited with code: %d' % p.returncode) - if p.returncode != 0: - error('Failed to get runtime dependencies for %s%s at %s. Output from ' - 'Pip: %s' % (name, ' %s' % version if version else '', source, - ' '.join(output))) + if p.returncode != 0: + error('Failed to get runtime dependencies for %s%s at %s. Output ' + 'from Pip: %s' % (name, ' %s' % version if version else '', + source, ' '.join(output))) + + # Now run pip freeze + logging.debug('Running pip freeze') + + p = subprocess.Popen(['/bin/bash', '-c', + 'source bin/activate; ' + 'pip freeze --disable-pip-version-check'], + cwd=virtenv_path, + stdout=tmpfd, stderr=subprocess.PIPE) - with os.fdopen(tmpfd) as tmpfile: - ss = resolve_specs(pkg_resources.parse_requirements(tmpfile)) - logging.debug("Resolved specs for %s: %s" % (name, ss)) + _, err = p.communicate() + os.close(tmpfd) - logging.debug("Removing root package from specs") + if p.returncode != 0: + error('failed to get runtime dependencies for %s%s at %s: %s' + % (name, ' %s' % version if version else '', source, err)) - # filter out "root" package - # hyphens and underscores are treated as equivalents - # in distribution names - specsets = {k: v for (k, v) in ss.iteritems() - if k not in [name, name.replace('_', '-')]} + with open(tmppath) as f: + logging.debug(f.read()) - versions = resolve_versions(specsets) - logging.debug('Resolved versions: %s' % versions) + with open(tmppath) as tmpfile: + ss = resolve_specs(pkg_resources.parse_requirements(tmpfile)) + logging.debug("Resolved specs for %s: %s" % (name, ss)) - # Since any of the candidates in versions should satisfy - # all specs, we just pick the first version we see - runtime_deps = {name: vs[0] for (name, vs) in versions.iteritems()} + logging.debug("Removing root package from specs") - os.remove(tmppath) + # filter out "root" package + # hyphens and underscores are treated as equivalents + # in distribution names + specsets = {k: v for (k, v) in ss.iteritems() + if k not in [name, name.replace('_', '-')]} + + versions = resolve_versions(specsets) + logging.debug('Resolved versions: %s' % versions) + + # Since any of the candidates in versions should satisfy + # all specs, we just pick the first version we see + runtime_deps = {name: vs[0] for (name, vs) in versions.iteritems()} + + os.remove(tmppath) if (len(runtime_deps) == 0 and not use_requirements_file and os.path.isfile(os.path.join(source, 'requirements.txt'))): @@ -360,6 +406,8 @@ class PythonFindDepsExtension(ImportExtension): root = {'python': deps} + logging.debug('Returning %s', root) + print(json.dumps(root)) if __name__ == '__main__': diff --git a/baserockimport/exts/python.to_lorry b/baserockimport/exts/python.to_lorry index ebde27a..30f38e7 100755 --- a/baserockimport/exts/python.to_lorry +++ b/baserockimport/exts/python.to_lorry @@ -37,58 +37,6 @@ from importer_python_common import * from utils import warn, error import utils - -def find_repo_type(url): - - debug_vcss = False - - # Don't bother with detection if we can't get a 200 OK - logging.debug("Getting '%s' ..." % url) - - status_code = requests.get(url).status_code - if status_code != 200: - logging.debug('Got %d status code from %s, aborting repo detection' - % (status_code, url)) - return None - - logging.debug('200 OK for %s' % url) - logging.debug('Finding repo type for %s' % url) - - vcss = [('git', 'clone'), ('hg', 'clone'), - ('svn', 'checkout'), ('bzr', 'branch')] - - for (vcs, vcs_command) in vcss: - logging.debug('Trying %s %s' % (vcs, vcs_command)) - tempdir = tempfile.mkdtemp() - - p = subprocess.Popen([vcs, vcs_command, url], stdout=subprocess.PIPE, - stderr=subprocess.STDOUT, stdin=subprocess.PIPE, - cwd=tempdir) - - # We close stdin on parent side to prevent the child from blocking - # if it reads on stdin - p.stdin.close() - - while True: - line = p.stdout.readline() - if line == '': - break - - if debug_vcss: - logging.debug(line.rstrip('\n')) - - p.wait() # even with eof on both streams, we still wait - - shutil.rmtree(tempdir) - - if p.returncode == 0: - logging.debug('%s is a %s repo' % (url, vcs)) - return vcs - - logging.debug("%s doesn't seem to be a repo" % url) - - return None - def filter_urls(urls): allowed_extensions = ['tar.gz', 'tgz', 'tar.Z', 'tar.bz2', 'tbz2', 'tar.lzma', 'tar.xz', 'tlz', 'txz', 'tar'] @@ -101,7 +49,7 @@ def filter_urls(urls): def get_releases(client, package_name): try: - releases = client.package_releases(package_name) + releases = client.package_releases(package_name, True) except Exception as e: error("Couldn't fetch release data:", e) @@ -185,23 +133,8 @@ class PythonLorryExtension(ImportExtension): logging.debug('Treating %s as %s' % (package_name, new_proj_name)) package_name = new_proj_name - try: - metadata = self.fetch_package_metadata(package_name) - except Exception as e: - error("Couldn't fetch package metadata: ", e) - - info = metadata.json()['info'] - - repo_type = (find_repo_type(info['home_page']) - if 'home_page' in info else None) - - if repo_type: - # TODO: Don't hardcode extname here. - print(utils.str_repo_lorry('python', lorry_prefix, package_name, - repo_type, info['home_page'])) - else: - print(generate_tarball_lorry(lorry_prefix, client, - package_name, version)) + print(generate_tarball_lorry(lorry_prefix, client, + package_name, version)) if __name__ == '__main__': PythonLorryExtension().run() diff --git a/baserockimport/exts/python_run_pip b/baserockimport/exts/python_run_pip new file mode 100755 index 0000000..f3877b4 --- /dev/null +++ b/baserockimport/exts/python_run_pip @@ -0,0 +1,32 @@ +#!/bin/bash +# +# Copyright © 2015 Codethink Limited +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; version 2 of the License. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License along +# with this program; if not, write to the Free Software Foundation, Inc., +# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +set -e + +pip_install="pip install --disable-pip-version-check ." + +source bin/activate +cd source + +if [[ -e requirements.txt ]] +then + echo "Running $pip_install -r requirements.txt" + $pip_install -r requirements.txt +else + echo "Running $pip_install" + $pip_install +fi |