Resolving build errors with python lxml on low memory machines
I use the Python lxml module regularly and it’s one of the few modules where I’ve encountered build problems. The problems were related to its memory requirements and the troubleshooting process wasn’t helped by the errors that were logged during the build. While there are a few comments on stack overflow about similar errors under Linux, they were hard to find, and there wasn’t anything about building on OpenBSD. To be fair, this isn’t a reflection on lxml - it’s just the unfortunate soul that highlighted these system errors. My solution is below, hopefully with enough context in the error messages that this can be found by someone else who has the same problem.
Here’s how it manifests on on a CentOS6 VM with 512Mb RAM and no swap (this is how an EC2 t1.micro comes when you use the official CentOS 6 AMI and how a Rackspace 512Mb Standard instance comes when you use their CentOS6 image)
$ pip install lxml
Collecting lxml
Using cached lxml-3.4.1.tar.gz
/usr/local/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url'
warnings.warn(msg)
Building lxml version 3.4.1.
Building without Cython.
Using build configuration of libxslt 1.1.26
Building against libxml2/libxslt in the following directory: /usr/lib64
Installing collected packages: lxml
[snip]
building 'lxml.etree' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/src
creating build/temp.linux-x86_64-2.7/src/lxml
gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/include/libxml2 -I/tmp/pip-build-rlPuyA/lxml/src/lxml/includes -I/usr/local/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o -w
{standard input}: Assembler messages:
{standard input}:491197: Warning: end of file not at end of a line; newline inserted
{standard input}:492215: Error: unknown pseudo-op: `.strin'
gcc: Internal error: Killed (program cc1)
Please submit a full bug report.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
error: command 'gcc' failed with exit status 1
----------------------------------------
Command "/home/esteele/.virtualenvs/test27/bin/python2.7 -c "import setuptools, tokenize;__file__='/tmp/pip-build-rlPuyA/lxml/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-3YNmLG-record/install-record.txt --single-version-externally-managed --compile --install-headers /home/esteele/.virtualenvs/test27/include/site/python2.7" failed with error code 1 in /tmp/pip-build-rlPuyA/lxml
The error messages didn’t help much, so let me help by saying that this is because the system ran out of memory. You can confirm this easily:
$ sudo tail /var/log/messages | grep -B1 Killed
Jan 25 15:45:49 localhost kernel: Out of memory: Kill process 6979 (cc1) score 676 or sacrifice child
Jan 25 15:45:49 localhost kernel: Killed process 6979, UID 1003, (cc1) total-vm:456244kB, anon-rss:338792kB, file-rss:8kB
So I added 512Mb swap:
$ sudo dd if=/dev/zero of=/swapfile bs=1024 count=500000
500000+0 records in
500000+0 records out
512000000 bytes (512 MB) copied, 1.8003 s, 284 MB/s
$ sudo mkswap /swapfile
mkswap: /swapfile: warning: don't erase bootbits sectors
on whole disk. Use -f to force.
Setting up swapspace version 1, size = 499996 KiB
no label, UUID=c3ca02fa-e36f-4275-b452-42f0675b89b5
$ sudo swapon /swapfile
$ free
total used free shared buffers cached
Mem: 502220 101812 400408 0 5288 48560
-/+ buffers/cache: 47964 454256
Swap: 499992 0 499992
And now it installs:
$ pip install lxml
Collecting lxml
Using cached lxml-3.4.1.tar.gz
/usr/local/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url'
warnings.warn(msg)
Building lxml version 3.4.1.
Building without Cython.
Using build configuration of libxslt 1.1.26
Building against libxml2/libxslt in the following directory: /usr/lib64
Installing collected packages: lxml
[snip]
Successfully installed lxml-3.4.1
However on a OpenBSD 5.6 machine with the same 512 Mb RAM and even more swap (768Mb), it still wouldn’t install - there’s something else going on.
$ sysctl -a | grep -i physmem; pstat -sk
hw.physmem=520028160
Device 1K-blocks Used Avail Capacity Priority
/dev/wd0b 769984 10468 759516 1% 0
$ pip install lxml
Collecting lxml
Using cached lxml-3.4.1.tar.gz
/usr/local/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url'
warnings.warn(msg)
Building lxml version 3.4.1.
Building without Cython.
Using build configuration of libxslt 1.1.28
Building against libxml2/libxslt in the following directory: /usr/local/lib
Installing collected packages: lxml
[snip]
building 'lxml.etree' extension
creating build/temp.openbsd-5.6-amd64-2.7
creating build/temp.openbsd-5.6-amd64-2.7/src
creating build/temp.openbsd-5.6-amd64-2.7/src/lxml
cc -pthread -fno-strict-aliasing -O2 -pipe -DNDEBUG -O2 -pipe -fPIC -fPIC -I/usr/local/include -I/usr/local/include/libxml2 -I/tmp/pip-build-YCP0o0/lxml/src/lxml/includes -I/usr/local/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.openbsd-5.6-amd64-2.7/src/lxml/lxml.etree.o -w
cc1: out of memory allocating 4072 bytes after a total of 0 bytes
error: command 'cc' failed with exit status 1
----------------------------------------
Command "/home/esteele/.virtualenvs/test27/bin/python2.7 -c "import setuptools, tokenize;__file__='/tmp/pip-build-YCP0o0/lxml/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-TRArBR-record/install-record.txt --single-version-externally-managed --compile --install-headers /home/esteele/.virtualenvs/test27/include/site/python2.7" failed with error code 1 in /tmp/pip-build-YCP0o0/lxml
After a bit of troubleshooting, it’s clear that we have a different situation from Linux; we’re not being killed by the OOM killer, we’re hitting resource limits on the data area.
$ ulimit -a
time(cpu-seconds) unlimited
file(blocks) unlimited
coredump(blocks) unlimited
data(kbytes) 524288
stack(kbytes) 4096
lockedmem(kbytes) 161612
memory(kbytes) 483220
nofiles(descriptors) 512
processes 128
And an attempt to increase it fails because the user is in the default login class (I’d not come across BSD login classes before so I’d chosen the default when creating the account).
$ ulimit -Sd 1000000
ksh: ulimit: bad -d limit: Invalid argument
$ ulimit -aH | grep data
data(kbytes) 524288
So I changed the login class and restarted the shell:
$ sudo usermod -L staff esteele
At that point limits can be increased (I put this in my .profile
).
$ ulimit -Sd 1000000
$ ulimit -a | grep data
data(kbytes) 1000000
And it builds fine:
$ pip install lxml
Collecting lxml
Using cached lxml-3.4.1.tar.gz
/usr/local/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url'
warnings.warn(msg)
Building lxml version 3.4.1.
Building without Cython.
Using build configuration of libxslt 1.1.28
Building against libxml2/libxslt in the following directory: /usr/local/lib
Installing collected packages: lxml
[snip]
Successfully installed lxml-3.4.1
So the same root cause (insufficient memory), manifested in two different ways, and in both cases the cause was not immediately obvious to me. Hopefully this saves someone else a little time.