summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRichard Ipsum <richard.ipsum@codethink.co.uk>2015-06-19 15:17:33 +0100
committerRichard Ipsum <richard.ipsum@codethink.co.uk>2015-08-26 14:38:11 +0000
commit5957c58e8113439e3ef36ff646eb52695085046d (patch)
treef1009b3a8a1a3cb7ef02d6cb7a1759b60796e095
parentca1d08d765cc298bb4eeee9a2182aa67de657f5f (diff)
downloadimport-5957c58e8113439e3ef36ff646eb52695085046d.tar.gz
Use virtualenv and Pip to find runtime deps; remove searching for upstream
Co-authored-by: Sam Thursfield <sam.thursfield@codethink.co.uk> Since upstream pip do not want to merge https://github.com/pypa/pip/pull/2371 we should avoid depending on this pull request. To find runtime dependencies we now run pip install inside a virtual env then run pip freeze to obtain the dependency set, this has the advantage that nearly all the work is being done by pip. Originally the python extensions were designed to look for upstream git repos, in practice this is unreliable and won't be compatible with obtaining dependencies using pip install, so the downside of this approach is that all lorries will be tarballs, the upshot is that we can now automatically import many packages that we couldn't import before. Another upshot of this approach is that we may be able to consider the removal of a lot of the spec processing and validation code if we're willing to worry less about build dependencies, we're not sure whether we should be willing to worry less about build dependencies though. We've had encouraging results using this patch so far, we are now able to import, without user intervention, packages that failed previously, such as boto, persistent-pineapple, jmespath, coverage, requests also almost imported successfully but appears to require a release of pytest that is uploaded as a zip. Change-Id: I705c6f6bd722df041d17630287382f851008e97a
-rw-r--r--README.python87
-rw-r--r--TODO.python65
-rw-r--r--baserockimport/app.py3
-rwxr-xr-xbaserockimport/exts/python.find_deps124
-rwxr-xr-xbaserockimport/exts/python.to_lorry73
-rwxr-xr-xbaserockimport/exts/python_run_pip32
6 files changed, 173 insertions, 211 deletions
diff --git a/README.python b/README.python
index a22f517..579034a 100644
--- a/README.python
+++ b/README.python
@@ -1,32 +1,43 @@
README
------
-Most (nearly all) python packages use setuptools, for detailed information on
-setuptools see the setuptools docs[1]. If you're not familiar with setuptools
-you should read the docs[1][2] before continuing.
-
Please note that this tool expects any python packages to be on pypi, you
cannot currently import packages from other places.
-This import tool uses a combination of pypi metadata,
-pip and setuptools commands to extract dependency information
-to create a set of definitions useable with Baserock. This is not a stable
-process and will not work smoothly in many cases: because setup.py
-is just an ordinary Python script it's possible for a setup.py to do things that
-break the import tool's means to extract dependencies, for example, some packages
-bypass parts of setuptools and subclass parts of distutils's core instead.
-Another problem with importing python packages is that packages are uploaded
-to pypi as tarballs rather than as repositories and as a result the import tool
-generates a lot of tarball lorries which is the least desireable kind of lorry
-to use with Baserock. To avoid this the import tool looks through parts of the
-package metadata for links to real repos, this detection is currently extremely
-basic and will hopefully be improved in future to allow the tool to reduce the
-number of tarball lorries it generates. Some python packages
-only declare their dependency information in a human readable form within a
-README, this tool cannot do anything to extract dependency
-information that is not encoded in a machine readable fashion. At the time of
-writing numpy is an example of such a package: running the import tool on numpy
-will yield a stratum that contains numpy and none of its dependencies.
+This import tool uses PyPI metadata and setuptools commands to extract
+dependency information.
+
+To get runtime dependency information for a package, it sets up a 'virtualenv'
+environment, installs the package from PyPI using 'pip', and then uses 'pip
+freeze' to get a list of exactly what packages have been installed. This is
+pretty inefficient, in terms of computing resources: installation involves
+downloading and sometimes compiling C source code. However, it is the most
+reliable and maintainable approach we have found so far.
+
+Python packaging metadata is something of a free-for-all. Reusing the code of
+Pip is great because Pip is probably the most widely-tested consumer of Python
+packaging metadata. We did submit a patch to add an 'install
+--list-dependencies' mode to Pip, but this wasn't accepted. The current
+approach should work with pretty much any version of Pip, no special patches
+required.
+
+Most (nearly all) python packages use setuptools, for detailed information on
+setuptools see the setuptools docs[1].
+
+Python packages are uploaded to PyPI as tarballs, and PyPI doesn't have a
+standard way of indicating where the canonical source repository for a package
+is. The python.to_lorry import extension just generates .lorry files pointing
+to those tarballs, as this is the most reliable thing it can do. It would be
+possible to guess at where the source repo is, but this can have random failure
+cases and you end up mirroring a project's home page instead of its source code
+sometimes.
+
+Some python packages only declare their dependency information in a human
+readable form within a README, this tool cannot do anything to extract
+dependency information that is not encoded in a machine readable fashion. At
+the time of writing, 'numpy' is an example of such a package: running the import
+tool on 'numpy' will yield a stratum that contains 'numpy' and none of its
+dependencies.
Python packages may require other packages to be present for
build/installation to proceed, in setuptools these are called setup requirements.
@@ -34,26 +45,28 @@ Setup requirements naturally translate to Baserock build dependencies, in
practice most python packages don't have any setup requirements, so the lists
of build-depends for each chunk will generally be empty lists.
-Many python packages require additional (in addition to a python interpreter)
-packages to be present at runtime, in setuptools parlance these are install
-requirements. The import tool uses pip to recursively extract runtime
-dependency information for a given package, each dependency is added to the
-same stratum as the package we're trying to import. All packages implicitly
-depend on a python interpreter, the import tool encodes this by making all
-strata build depend on core, which at the time of writing contains cpython.
Traps
-----
* Because pip executes setup.py commands to determine dependencies
and some packages' setup.py files invoke compilers, the import tool may end up
-running compilers.
+running compilers. You can pass `--log=/dev/stdout` to get detailed progress
+information on the console, which will show you if this is happening.
-* pip puts errors on stdout, some import tool errors may be vague: if it's
-not clear what's going on you can check the log, if you're using
---log-level=debug then the import tool will log the output of all the commands
-it executes to obtain dependency information.
-[1]: https://pythonhosted.org/setuptools/
-[2]: https://pythonhosted.org/an_example_pypi_project/setuptools.html
+Good testcases
+--------------
+
+Here are some interesting test cases:
+
+ - ftw.blog (fails because needs .zip import)
+ - MySQL-python (fails because needs .zip import)
+ - nixtla (long, unnecessary compilation involved)
+ - python-keystoneclient (~40 deps, needs --force-stratum due to missing tag)
+ - rejester (~24 deps)
+ - requests (~26 deps, mostly not actually needed at runtime)
+
+
+[1]: https://pythonhosted.org/setuptools/
diff --git a/TODO.python b/TODO.python
index 16b7889..6c276f7 100644
--- a/TODO.python
+++ b/TODO.python
@@ -25,68 +25,3 @@ this will be confusing, we should emit nice version numbers.
i.e. nixtla
* add a test runner
-
-* Importing python packages that use pbr fails, see
-https://bitbucket.org/pypa/setuptools/issue/73/typeerror-dist-must-be-a-distribution#comment-7267980
-The most sensible option would seem to be to make use of the sane environment
-that pbr provides: just read the dependency information from the text files
-that pbr projects provide, see, http://docs.openstack.org/developer/pbr/
-
-Results from running the import tool on various python packages follow:
-
-* Imports tested so far (stratum is generated)
- * SUCCEEDS
- * nixtla: fine but requires compilation
- * ryser
- * Twisted
- * Django
- * textdata
- * whale-agent
- * virtualenv
- * lxml
- * nose
- * six
- * simplejson
- * pika
- * MarkupSafe
- * zc.buildout
- * Paste
- * pycrypto
- * Jinja2
- * Flask
- * bcdoc
- * pymongo
-
- * FAILS
- * python-keystoneclient
- * All openstack stuff requires pbr, pbr does not play nicely with
- current setuptools see: [Issue 73](https://bitbucket.org/pypa/setuptoolsissue/73/typeerror-dist-must-be-a-distribution#comment-7267980)
- we can either fix setuptools/pbr or make use of the sane environment
- pbr provides.
- * persistent-pineapple
- * Git repo[1] has different layout to tarball[2] downloadeable from pypi,
- git repo's layout isn't 'installable' by pip, so dependencies can
- not be determined.
- [1]: https://github.com/JasonAUnrein/Persistent-Pineapple
- [2]: https://pypi.python.org/packages/source/p/persistent_pineapple/persistent_pineapple-1.0.0.tar.gz
- * ftw.blog
- * cannot satisfy dependencies
- * boto
- * cannot satisfy dependencies
- * jmespath
- * cannot satisfy dependencies
- * rejester
- * its setup.py subclasses distutils.core
- * requests
- * cannot satisfy dependencies
- * MySQL-python
- * egg_info blows up,
- * python setup.py install doesn't even work
- * maybe the user's expected to do some manual stuff first, who knows
- * rejester (its setup.py subclasses distutils.core)
- * redis-jobs (succeeded at first, no longer exists on pypi)
- * coverage (stratum couldn't be generated because some tags are missing)
-
-* Imports completely tested, built, deployed and executed successfully:
-
- * Flask
diff --git a/baserockimport/app.py b/baserockimport/app.py
index 5f3d435..ae95d58 100644
--- a/baserockimport/app.py
+++ b/baserockimport/app.py
@@ -227,7 +227,8 @@ class BaserockImportApplication(cliapp.Application):
loop = baserockimport.mainloop.ImportLoop(app=self,
goal_kind='python',
goal_name=package_name,
- goal_version=package_version)
+ goal_version=package_version,
+ ignore_version_field=True)
loop.enable_importer('python', strata=['strata/core.morph'],
package_comp_callback=comp)
loop.run()
diff --git a/baserockimport/exts/python.find_deps b/baserockimport/exts/python.find_deps
index 91a9e39..b173110 100755
--- a/baserockimport/exts/python.find_deps
+++ b/baserockimport/exts/python.find_deps
@@ -24,6 +24,7 @@
from __future__ import print_function
+import contextlib
import sys
import subprocess
import os
@@ -32,6 +33,7 @@ import tempfile
import logging
import select
import signal
+import shutil
import pkg_resources
import xmlrpclib
@@ -262,65 +264,109 @@ def find_build_deps(source, name, version=None):
return build_deps
+
+@contextlib.contextmanager
+def temporary_virtualenv():
+ tempdir = tempfile.mkdtemp()
+
+ logging.debug('Creating virtualenv in %s', tempdir)
+ p = subprocess.Popen(['virtualenv', tempdir], stdout=subprocess.PIPE,
+ stderr=subprocess.STDOUT)
+
+ while True:
+ line = p.stdout.readline()
+ if line == '':
+ break
+
+ logging.debug(line.rstrip('\n'))
+
+ p.wait() # even with eof, wait for termination
+
+ try:
+ yield tempdir
+ finally:
+ logging.debug('Removing virtualenv at %s', tempdir)
+ shutil.rmtree(tempdir)
+
+
def find_runtime_deps(source, name, version=None, use_requirements_file=False):
logging.debug('Finding runtime dependencies for %s%s at %s'
% (name, ' %s' % version if version else '', source))
- # Run our patched pip to get a list of installed deps
- # Run pip install . --list-dependencies=instdeps.txt with cwd=source
-
# Some temporary file needed for storing the requirements
tmpfd, tmppath = tempfile.mkstemp()
logging.debug('Writing install requirements to: %s', tmppath)
- args = ['pip', 'install', '.', '--list-dependencies=%s' % tmppath]
- if use_requirements_file:
- args.insert(args.index('.') + 1, '-r')
- args.insert(args.index('.') + 2, 'requirements.txt')
+ with temporary_virtualenv() as virtenv_path:
+ shutil.copytree(source, os.path.join(virtenv_path, 'source'))
- logging.debug('Running pip, args: %s' % args)
+ pip_runner = os.path.join(os.path.dirname(os.path.abspath(__file__)),
+ 'python_run_pip')
+ logging.debug('pip_runner: %s', pip_runner)
- p = subprocess.Popen(args, cwd=source, stdout=subprocess.PIPE,
- stderr=subprocess.STDOUT)
+ subprocess_env = os.environ.copy()
+ subprocess_env['TMPDIR'] = os.path.join(virtenv_path, 'tmp')
+ logging.debug('Using %s as TMPDIR', subprocess_env['TMPDIR'])
+ p = subprocess.Popen(pip_runner, cwd=virtenv_path, env=subprocess_env,
+ stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
- output = []
- while True:
- line = p.stdout.readline()
- if line == '':
- break
+ output = []
+ while True:
+ line = p.stdout.readline()
+ if line == '':
+ break
- logging.debug(line.rstrip('\n'))
- output.append(line)
+ logging.debug(line.rstrip('\n'))
+ output.append(line)
- p.wait() # even with eof, wait for termination
+ p.wait() # even with eof, wait for termination
- logging.debug('pip exited with code: %d' % p.returncode)
+ logging.debug('pip exited with code: %d' % p.returncode)
- if p.returncode != 0:
- error('Failed to get runtime dependencies for %s%s at %s. Output from '
- 'Pip: %s' % (name, ' %s' % version if version else '', source,
- ' '.join(output)))
+ if p.returncode != 0:
+ error('Failed to get runtime dependencies for %s%s at %s. Output '
+ 'from Pip: %s' % (name, ' %s' % version if version else '',
+ source, ' '.join(output)))
+
+ # Now run pip freeze
+ logging.debug('Running pip freeze')
+
+ p = subprocess.Popen(['/bin/bash', '-c',
+ 'source bin/activate; '
+ 'pip freeze --disable-pip-version-check'],
+ cwd=virtenv_path,
+ stdout=tmpfd, stderr=subprocess.PIPE)
- with os.fdopen(tmpfd) as tmpfile:
- ss = resolve_specs(pkg_resources.parse_requirements(tmpfile))
- logging.debug("Resolved specs for %s: %s" % (name, ss))
+ _, err = p.communicate()
+ os.close(tmpfd)
- logging.debug("Removing root package from specs")
+ if p.returncode != 0:
+ error('failed to get runtime dependencies for %s%s at %s: %s'
+ % (name, ' %s' % version if version else '', source, err))
- # filter out "root" package
- # hyphens and underscores are treated as equivalents
- # in distribution names
- specsets = {k: v for (k, v) in ss.iteritems()
- if k not in [name, name.replace('_', '-')]}
+ with open(tmppath) as f:
+ logging.debug(f.read())
- versions = resolve_versions(specsets)
- logging.debug('Resolved versions: %s' % versions)
+ with open(tmppath) as tmpfile:
+ ss = resolve_specs(pkg_resources.parse_requirements(tmpfile))
+ logging.debug("Resolved specs for %s: %s" % (name, ss))
- # Since any of the candidates in versions should satisfy
- # all specs, we just pick the first version we see
- runtime_deps = {name: vs[0] for (name, vs) in versions.iteritems()}
+ logging.debug("Removing root package from specs")
- os.remove(tmppath)
+ # filter out "root" package
+ # hyphens and underscores are treated as equivalents
+ # in distribution names
+ specsets = {k: v for (k, v) in ss.iteritems()
+ if k not in [name, name.replace('_', '-')]}
+
+ versions = resolve_versions(specsets)
+ logging.debug('Resolved versions: %s' % versions)
+
+ # Since any of the candidates in versions should satisfy
+ # all specs, we just pick the first version we see
+ runtime_deps = {name: vs[0] for (name, vs) in versions.iteritems()}
+
+ os.remove(tmppath)
if (len(runtime_deps) == 0 and not use_requirements_file
and os.path.isfile(os.path.join(source, 'requirements.txt'))):
@@ -360,6 +406,8 @@ class PythonFindDepsExtension(ImportExtension):
root = {'python': deps}
+ logging.debug('Returning %s', root)
+
print(json.dumps(root))
if __name__ == '__main__':
diff --git a/baserockimport/exts/python.to_lorry b/baserockimport/exts/python.to_lorry
index ebde27a..30f38e7 100755
--- a/baserockimport/exts/python.to_lorry
+++ b/baserockimport/exts/python.to_lorry
@@ -37,58 +37,6 @@ from importer_python_common import *
from utils import warn, error
import utils
-
-def find_repo_type(url):
-
- debug_vcss = False
-
- # Don't bother with detection if we can't get a 200 OK
- logging.debug("Getting '%s' ..." % url)
-
- status_code = requests.get(url).status_code
- if status_code != 200:
- logging.debug('Got %d status code from %s, aborting repo detection'
- % (status_code, url))
- return None
-
- logging.debug('200 OK for %s' % url)
- logging.debug('Finding repo type for %s' % url)
-
- vcss = [('git', 'clone'), ('hg', 'clone'),
- ('svn', 'checkout'), ('bzr', 'branch')]
-
- for (vcs, vcs_command) in vcss:
- logging.debug('Trying %s %s' % (vcs, vcs_command))
- tempdir = tempfile.mkdtemp()
-
- p = subprocess.Popen([vcs, vcs_command, url], stdout=subprocess.PIPE,
- stderr=subprocess.STDOUT, stdin=subprocess.PIPE,
- cwd=tempdir)
-
- # We close stdin on parent side to prevent the child from blocking
- # if it reads on stdin
- p.stdin.close()
-
- while True:
- line = p.stdout.readline()
- if line == '':
- break
-
- if debug_vcss:
- logging.debug(line.rstrip('\n'))
-
- p.wait() # even with eof on both streams, we still wait
-
- shutil.rmtree(tempdir)
-
- if p.returncode == 0:
- logging.debug('%s is a %s repo' % (url, vcs))
- return vcs
-
- logging.debug("%s doesn't seem to be a repo" % url)
-
- return None
-
def filter_urls(urls):
allowed_extensions = ['tar.gz', 'tgz', 'tar.Z', 'tar.bz2', 'tbz2',
'tar.lzma', 'tar.xz', 'tlz', 'txz', 'tar']
@@ -101,7 +49,7 @@ def filter_urls(urls):
def get_releases(client, package_name):
try:
- releases = client.package_releases(package_name)
+ releases = client.package_releases(package_name, True)
except Exception as e:
error("Couldn't fetch release data:", e)
@@ -185,23 +133,8 @@ class PythonLorryExtension(ImportExtension):
logging.debug('Treating %s as %s' % (package_name, new_proj_name))
package_name = new_proj_name
- try:
- metadata = self.fetch_package_metadata(package_name)
- except Exception as e:
- error("Couldn't fetch package metadata: ", e)
-
- info = metadata.json()['info']
-
- repo_type = (find_repo_type(info['home_page'])
- if 'home_page' in info else None)
-
- if repo_type:
- # TODO: Don't hardcode extname here.
- print(utils.str_repo_lorry('python', lorry_prefix, package_name,
- repo_type, info['home_page']))
- else:
- print(generate_tarball_lorry(lorry_prefix, client,
- package_name, version))
+ print(generate_tarball_lorry(lorry_prefix, client,
+ package_name, version))
if __name__ == '__main__':
PythonLorryExtension().run()
diff --git a/baserockimport/exts/python_run_pip b/baserockimport/exts/python_run_pip
new file mode 100755
index 0000000..f3877b4
--- /dev/null
+++ b/baserockimport/exts/python_run_pip
@@ -0,0 +1,32 @@
+#!/bin/bash
+#
+# Copyright © 2015 Codethink Limited
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+set -e
+
+pip_install="pip install --disable-pip-version-check ."
+
+source bin/activate
+cd source
+
+if [[ -e requirements.txt ]]
+then
+ echo "Running $pip_install -r requirements.txt"
+ $pip_install -r requirements.txt
+else
+ echo "Running $pip_install"
+ $pip_install
+fi