Memory use in CPython and MicroPython

LWN.net needs you!

Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

By Jake Edge
June 21, 2017

PyCon

At PyCon 2017, Kavya Joshi looked at some of the differences between the Python reference implementation (known as "CPython") and that of MicroPython. In particular, she described the differences in memory use and handling between the two. Those differences are part of what allows MicroPython to run on the severely memory-constrained microcontrollers it targets—an environment that could never support CPython.

CPython is the standard and default implementation that everyone loves and uses, she said. But it has a bad reputation when it comes to memory use. That has led to some alternate implementations, including MicroPython.

As the "leanest and meanest" of the alternative implementations, MicroPython targets microcontrollers, so that Python can be used to control hardware. MicroPython can run in 16KB of RAM using 256KB of ROM; it implements most of Python 3.4. Some things have been left out of the language and standard library, since they do not make sense for microcontrollers, such as metaclasses and multiprocessing.

CPython and MicroPython are fairly similar under the hood, Joshi said. They are both bytecode interpreters for stack-based virtual machines. They are also both C programs. Yet they are on the opposite ends of the memory-use spectrum.

As a simple experiment, and not any kind of formal benchmark, she tested Python 3.6 and MicroPython 1.8 on a 64-bit Ubuntu 16.10 system with 4GB of RAM. Her test was to create 200,000 objects of various types (integers, strings, lists) and to prevent them from being garbage collected. She then measured the heap use of the programs from within Python itself.

CPython and MicroPython both use custom allocators to manage their heaps. But CPython will grow its heap on demand, while MicroPython has a fixed-sized heap. Since Joshi is measuring heap use, that fixed size is the upper boundary for the measurements. But, as we will see, MicroPython does not use the heap in the same way that CPython does, which will also affect the measurements.

She showed graphs of the heap usage, starting with the integer test (Joshi's slides are available at Speaker Deck). The test created 200,000 integer objects for consecutive integers starting at 10¹⁰ (10**10 in Python-speak). The CPython graph shows a linear increase in heap use, while the MicroPython graph is completely flat. The graphs look the same for string objects. The CPython heap use is more than that of MicroPython, but the actual numbers do not matter all that much, she said, as it is the shape of the graphs that she is interested in.

Lists show something a bit different, however. The test creates a list that it keeps appending small integers to; the graphs show the heap use as the list gets bigger. Both graphs show a step function for heap usage, though the MicroPython steps seem to double in depth and height, while the increase in step size for CPython is more gradual. One of the questions she would like to answer is why the two interpreters "have such vastly different memory footprints and memory profiles".

Objects and memory

To explain what is going on, she needed to show how objects are implemented internally for both interpreters. Since CPython uses more memory in these tests, there must be an underlying reason. Either CPython objects are larger or CPython allocates more of them—or both.

CPython allocates all of its objects on the heap, so the statement "x = 1" will create an object for "1" on the heap. Each object has two parts, the header (PyObject_HEAD) and some variable object-specific fields. The header is the overhead for the object and consists of two eight-byte fields for a reference count and a pointer to the type of object it represents. That means 16 bytes is the lower bound on the size of a CPython object.

For an integer, the object-specific field consists of an eight-byte length indicating how many four-byte values will follow (remember that Python can represent arbitrary-sized integers). So to store an integer object that has a value of 2³⁰-1 or less (two bits are used for other purposes), it takes 28 bytes, 24 of which are overhead. For the values in the test (10**10 and up), it took 32 bytes per object, so 200,000 integer objects consumed 6MB of space.

In MicroPython, though, each "object" is only eight bytes in size. Those eight bytes could be used as a pointer to another data structure or used more directly. "Pointer tagging" allows for multiple uses of eight byte objects. It is used to encode some extra information in the objects; since all addresses are aligned to eight-byte boundaries, the three low-order bits are available to store tag values. Not all of the tag bits are used that way, though; if the low-order bit is one, that indicates a "small" integer, which is stored in the other 63 bits. A 10₂ in the low-order two bits means there is a 62-bit index to an interned string in the rest of the value. 00₂ in those bits indicates the other bits are a pointer to a concrete object (i.e. anything that is not a small integer or interned string).

Since the values stored in the test will fit as small integers, only eight bytes are used per object. Those are stored on the stack, rather than the heap, which explains the flat heap usage graph line for MicroPython. Even if you measure the stack usage, which is a little more complicated to do, Joshi said, MicroPython is using way less memory than CPython for these objects.

For the string experiment, which created 200,000 strings of ten characters or less, the explanation is similar. CPython has 48 bytes of overhead for an ASCII string (PyASCIIObject), so a one-character, two-byte string (e.g. "a" plus a null terminator) takes up 50 bytes; a ten-character string would take up 59 bytes. MicroPython stores small strings (254 characters or less) as arrays with three bytes of overhead. So the one-character string takes five bytes, which is 1/10 of the storage that CPython needs.

That still doesn't explain the flat line in the graph, however. Strings in MicroPython are stored in a pre-allocated array, she said. There are no new heap allocations when a string object is created. Effectively, the allocation has just been moved in time so it doesn't appear as an increase in heap size in the graph.

Mutable objects

Strings and integers are both immutable objects, but the third experiment used lists, which are mutable objects. In CPython, that adds overhead in the form of a PyGC_HEAD structure. That structure is used for garbage-collection tracking and is 24 bytes in size. In all, a PyListObject is 64 bytes in size; but you also have to add pointers for each of the objects in the list, which are stored as an array, and the storage for each object.

The steps in the graph for the list experiment indicate dynamic resizing of the array of list elements. If that resizing was done for each append, the graph would be linearly increasing as with integers and strings, but CPython overallocates the array in anticipation of further append operations. That has the effect of amortizing the cost of the resizing over multiple append operations.

The garbage collection mechanism for MicroPython is completely different from that of CPython and does not use reference counting. So, MicroPython concrete objects, which are used for lists, do not have a reference count; they still use a type pointer, though, as CPython does. Removing the reference count saves eight bytes over CPython. In addition, there is no additional overhead for garbage collection in MicroPython, which saves 24 bytes. Thus, mutable objects in MicroPython are 32 bytes smaller than their CPython counterparts.

So it turns out that CPython objects are larger than those of MicroPython, generally substantially larger. The differences for other object types, and classes in particular, are not as clear cut when compared to CPython 3.6, Joshi said, as there have been some CPython optimizations that substantially reduce the overhead. In addition, CPython allocates more objects in these tests than MicroPython does. MicroPython stores small integers directly on the stack, rather than allocate them from the heap as CPython does.

Garbage collection

The overhead incurred to track CPython objects for garbage collection is probably making the audience wonder how that works, Joshi said. CPython uses reference counting, which simply tracks references to an object; every time an object is assigned to a variable, placed on a list, or inserted into a dictionary, for example, its count is increased. When a reference goes away, because of a del() operation or a variable going out of scope, for instance, the count is decreased. When the count reaches zero, the object is deallocated.

But that doesn't work in all cases, which is where the PyGC_HEAD structure for mutable objects comes into play. If objects refer to themselves or indirectly contain a reference to themselves, it results in a reference cycle. She gave an example:

    x = [ 1, 2, 3]
    x.append(x)

If x subsequently goes out of scope or del(x) is executed, the reference count will not drop to zero, thus the object would never be freed. CPython handles that with a cyclic garbage collector that detects and breaks reference cycles. Since only mutable objects can have these cycles, they are the only objects tracked using the PyGC_HEAD information.

MicroPython tracks allocations in the heap using a bitmap. It breaks the heap up into 32-byte allocation units, each of which has its free or in-use state tracked with two bits. There is a mark-and-sweep garbage collector that maintains the bitmap when it periodically runs; it manages all of the objects that are allocated on the heap. In that way, the mark-and-sweep approach is effectively trading execution time for a reduction in the overhead needed for tracking memory.

Joshi said that she had "barely scratched the surface" of the optimizations that MicroPython has to reduce its memory use. But the code is available; it is clear and easy to follow C that any who are interested should go and look at.

Differing approaches

She completed the talk with an evaluation of why CPython and MicroPython chose the approaches they did. Typically, there is a trade-off between memory use and performance, but that is not really the case here. MicroPython performs better than CPython for most benchmarks, Joshi said, though it does do worse on some, especially those involving large dictionaries. The trade-off here is functionality; MicroPython does not implement all of the features of CPython. That means, for example, that users do not have access to the entire ecosystem of third-party packages that CPython users enjoy.

Different design decisions were made by the CPython and MicroPython projects in part because of the philosophies behind their development. Back in the 1980s and early 1990s when CPython was being created, its developers wanted "simplicity of implementation and design first", they would "worry about performance later".

That is exactly the opposite of the philosophy behind MicroPython. It is a fairly modern implementation, so it could borrow ideas from Lua, JavaScript, and other languages that have come about in the decades since CPython was created. MicroPython is also solely targeted at these highly constrained microcontrollers. Effectively, they are targeting very different kinds of systems and use cases, which helps explain the choices that have been made.

A YouTube video of Joshi's talk is available.

[I would like to thank the Linux Foundation for travel assistance to Portland for PyCon.]

Index entries for this article
Conference	PyCon/2017

(Log in to post comments)

Memory use in CPython and MicroPython

Posted Jun 21, 2017 19:52 UTC (Wed) by pbonzini (subscriber, #60935) [Link]

I am very surprised to hear that CPython does not tag small integers. Does it have other optimizations to speed up arithmetic?

Memory use in CPython and MicroPython

Posted Jun 24, 2017 0:12 UTC (Sat) by eklitzke (subscriber, #36426) [Link]

Small integers are interned, although obviously that's quite different than tagging.

Mainly this is due to how the C API works. Everything in Python is represented as a PyObject, which makes working with the C API very uniform and easy. Implementing optimizations liked tagged integers would improve the speed of the interpreter, but at the cost of complexity in the C API. The issue with reference counting is similar: it's uncommon for new interpreters to use reference counting (rather than mark and sweep), but working with a reference counted GC is much easier to do in C than working with a more advanced GC.

There's a reason why Python has such an extensive ecosystem of C libraries. This is both its greatest strength, and its greatest weakness. Python has a really amazing ecosystem of data science libraries (NumPy, SciPy, Pandas, Tensorflow, etc.) which is largely due to how easy it is to port existing C and Fortran libraries to Python (in many cases it can be trivially automated). There are other places where C bindings are used very extensively, again due to how easy the C API is to use. On the other hand, the huge reliance of C extensions for Python makes it difficult for alternative Python implementations to get widespread adoption.

Memory use in CPython and MicroPython

Posted Jun 27, 2017 18:12 UTC (Tue) by lhastings (guest, #66451) [Link]

CPython 2 had two integer types: the "int" and the "long". An "int" was a fast machine-native integer; a "long" was an arbitrary-precision integer. This bifurcation resulted in occasional dumb bugs (if isinstance(x, int) would fail if x was a long).

For CPython 3, Guido decreed that "long" was the new "int". The machine-native integer type was removed, and "int" was the arbitrary-precision integer. This was proposed/discussed in PEP 237:

https://www.python.org/dev/peps/pep-0237/

Tagged ints would add an enormous amount of complexity in CPython, and I suspect it would break the C API. In general, CPython's goal is to make life nicer for the programmer, and if that means spending more CPU / memory getting there, so be it. I've never asked Guido about tagged ints in CPython but I think we have prima faciae evidence he's not interested--if he was, they probably would have gone in at some point over the last 25 years.

Memory use in CPython and MicroPython

Posted Jun 28, 2017 5:01 UTC (Wed) by foom (subscriber, #14868) [Link]

I actually implemented tagged integers in CPython a decade ago on a whim. The performance improvement seemed relatively promising, IMO.

https://mail.python.org/pipermail/python-dev/2004-July/04...

It was very nearly a trivial change. It didn't add much complexity to CPython, and only _barely_ broke the C API -- in basically the same manner that CPython broke it for other reasons in the meantime: "foo->ob_type" and "foo->ob_refcnt" should no longer be accessed directly, but through a macro.

That global search&replace of direct field access to use of a macro was almost the entire extent of the diff.

More recent python verisons already introduced macros for those (Py_REFCNT and Py_TYPE), which are used in most of the codebase, because the definition of PyObject_HEAD changed incompatibly a while back. So, making this change is probably even easier nowadays. It's conceivable it's more valuable, too, given the removal of the int type. But I don't really track python development anymore.

Memory use in CPython and MicroPython

Posted Jul 2, 2017 16:07 UTC (Sun) by pboddie (subscriber, #50784) [Link]

That's a very interesting reference you provide. It is disappointing that the response was more or less "we don't like people misusing pointers because we mess around with pointers a lot ourselves".

It's true that CPython (at least 2.x) can get a lot of performance benefits from just special-casing integer operations. The virtual machine effectively has explicit tests for integers in all the operator bytecode handlers, and upon finding integer operands, just executes the fast integer-only operation (with overflow tests, of course). (It is amusing that people in the referenced discussion questioned the need for integer-related optimisations given the presence of these existing ones in CPython.)

I found it disheartening when evaluating integer-heavy benchmark performance of a Python-like language compiler/translator that I've written, testing it against CPython, because CPython's "fast path" trickery does remove a lot of the normal dispatch overhead. I had originally only intended to have a faster generic dispatch mechanism and had assumed that it would outperform that of CPython in most cases.

So I introduced tagged integers in order to avoid allocation overhead for integer objects, which CPython addresses by employing lists of preallocated objects, and by also short-circuiting the usual dispatch mechanism. Well, why not play by the same rules?

Memory use in CPython and MicroPython

Posted Jun 21, 2017 20:21 UTC (Wed) by ejr (subscriber, #51652) [Link]

MicroPython's techniques harken to days long before CPython. Once upon a time, people used machines with kilobytes of memory, much like the smallest microcontrollers (that are being phased out because it's cheaper to include more memory now). Implementations of dynamic languages like Lisp on those employed many fun memory-saving techniques. Smalltalk and ZIL added yet more fun memory-saving bits for different object models.

The difference is that CPython was targeting environments that have/had ever increasing memory and sequential computational power. The latter has flat-lined, but memory has continued to increase. Their effort likely is best spent addressing the lack of sequential performance increase... Fine-tuning the octet packing would be more relevant for using vector units in current cell-phone and above processors.

Memory use in CPython and MicroPython

Posted Jun 22, 2017 9:54 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

It depends. L2 cache and TLB sizes have not increased as fast as main memory; L3 have, but they are shared within the many cores in the same package. In particular better octet packing leads to more efficient GC, and often to less pointer chasing which also improves sequential performance.

Memory use in CPython and MicroPython

Posted Jun 22, 2017 11:50 UTC (Thu) by ejr (subscriber, #51652) [Link]

I'd argue that L1 & L2 are more tied to the feeds and speeds of the cores and hence should stagnate. The TLB is an excellent example for using space more effectively. I do think that memory optimizations are important, but less so for CPython than any parallelization story. I'm still not a fan of Python (particularly with the intentional lack of floating-point semantics), but it has wide adoption.

Memory use in CPython and MicroPython

Posted Jun 21, 2017 20:25 UTC (Wed) by SEJeff (guest, #51588) [Link]

Micropython runs absolutely fantastic on the esp8266 and esp32 teensy microcontrollers. Huge fan of the work they've put in for this.

Memory use in CPython and MicroPython

Posted Jun 22, 2017 1:19 UTC (Thu) by flussence (subscriber, #85566) [Link]

I'm not a Python person but those numbers are pretty amazing! I don't think even QBasic was that memory-efficient.

Memory use in CPython and MicroPython

Posted Jun 23, 2017 17:56 UTC (Fri) by ignacio.hernandez (guest, #56157) [Link]

Nice work!

I had some demo code to brute force finding prime numbers and this interpreter slices this (horrible) test running time from 7.5s to 3.3s. I would think having better baseline performance is a welcome feature especially in small systems.