8/1/2022»»Monday

Python Slots Memory

8/1/2022

Full blog post: if I told you there was a simple technique you can employ on your. Now it is time to overview this Python feature and why you should use it in your Python code. slots were initially designed in order to optimise memory in Python, but soon people realised that this feature can do a lot of other cool things. The idea of slots optimisation is simple. Memory optimization with Python slots; Hi guys! In our previous article about hashes we briefly mentioned about property slots. Now it is time to overview this Python feature and why you should use it in your Python code. slots were initially designed in order to optimise memory in Python, but soon people realised that this feature can. Python Memory Management and Tips Transcripts Chapter: Memory and classes Lecture: Slots are faster, not just smaller Learn more about this course Login or purchase this course to watch this video and the rest of the course contents. 0:00 The final example when we're talking about.

PEP:583
Title:A Concurrency Memory Model for Python
Version:56116
Last-Modified:2007-06-28 12:53:41 -0700 (Thu, 28 Jun 2007)
Author:Jeffrey Yasskin <jyasskin at google.com>
Status:Withdrawn
Type:Informational
Created:22-Mar-2008
Post-History:

Contents

  • Two simple memory models
    • Happens-before consistency
  • Surprising behaviors with races
  • The rules for Python
  • Implementation Details

This PEP describes how Python programs may behave in the presence ofconcurrent reads and writes to shared variables from multiple threads.We use a happens before relation to define when variable accessesare ordered or concurrent. Nearly all programs should simply use locksto guard their shared variables, and this PEP highlights some of thestrange things that can happen when they don't, but programmers oftenassume that it's ok to do 'simple' things without locking, and it'ssomewhat unpythonic to let the language surprise them. Unfortunately,avoiding surprise often conflicts with making Python run quickly, sothis PEP tries to find a good tradeoff between the two.

So far, we have 4 major Python implementations -- CPython, Jython[10],IronPython[11], and PyPy[12] -- as well as lots of minor ones. Some ofthese already run on platforms that do aggressive optimizations. Ingeneral, these optimizations are invisible within a single thread ofexecution, but they can be visible to other threads executingconcurrently. CPython currently uses a GIL[13] to ensure that otherthreads see the results they expect, but this limits it to a singleprocessor. Jython and IronPython run on Java's or .NET's threadingsystem respectively, which allows them to take advantage of more coresbut can also show surprising values to other threads.

So that threaded Python programs continue to be portable betweenimplementations, implementers and library authors need to agree onsome ground rules.

Variable
A name that refers to an object. Variables are generallyintroduced by assigning to them, and may be destroyed by passingthem to del. Variables are fundamentally mutable, whileobjects may not be. There are several varieties of variables:module variables (often called 'globals' when accessed from withinthe module), class variables, instance variables (also known asfields), and local variables. All of these can be shared betweenthreads (the local variables if they're saved into a closure).The object in which the variables are scoped notionally has adict whose keys are the variables' names.
Object
A collection of instance variables (a.k.a. fields) and methods.At least, that'll do for this PEP.
Program Order
The order that actions (reads and writes) happen within a thread,which is very similar to the order they appear in the text.
Conflicting actions
Two actions on the same variable, at least one of which is a write.
Data race
A situation in which two conflicting actions happen at the sametime. 'The same time' is defined by the memory model.

Before talking about the details of data races and the surprisingbehaviors they produce, I'll present two simple memory models. Thefirst is probably too strong for Python, and the second is probablytoo weak.

Sequential Consistency

In a sequentially-consistent concurrent execution, actions appear tohappen in a global total order with each read of a particular variableseeing the value written by the last write that affected thatvariable. The total order for actions must be consistent with theprogram order. A program has a data race on a given input when one ofits sequentially consistent executions puts two conflicting actionsnext to each other.

This is the easiest memory model for humans to understand, although itdoesn't eliminate all confusion, since operations can be split in oddplaces.

Happens-before consistency

The program contains a collection of synchronization actions, whichin Python currently include lock acquires and releases and threadstarts and joins. Synchronization actions happen in a global totalorder that is consistent with the program order (they don't have tohappen in a total order, but it simplifies the description of themodel). A lock release synchronizes with all later acquires of thesame lock. Similarly, given t = threading.Thread(target=worker):

  • A call to t.start() synchronizes with the first statement inworker().
  • The return from worker() synchronizes with the return fromt.join().
  • If the return from t.start() happens before (see below) a callto t.isAlive() that returns False, the return fromworker() synchronizes with that call.

We call the source of the synchronizes-with edge a release operationon the relevant variable, and we call the target an acquire operation.

The happens before order is the transitive closure of the programorder with the synchronizes-with edges. That is, action A happensbefore action B if:

  • A falls before B in the program order (which means they run in thesame thread)
  • A synchronizes with B
  • You can get to B by following happens-before edges from A.

An execution of a program is happens-before consistent if each readR sees the value of a write W to the same variable such that:

  • R does not happen before W, and
  • There is no other write V that overwrote W before R got achance to see it. (That is, it can't be the case that W happensbefore V happens before R.)

You have a data race if two conflicting actions aren't related byhappens-before.

An example

Let's use the rules from the happens-before model to prove that thefollowing program prints '[7]':

  1. Because myqueue is initialized in the main thread beforethread1 or thread2 is started, that initialization happensbefore worker1 and worker2 begin running, so there's no wayfor either to raise a NameError, and both myqueue.l andmyqueue.cond are set to their final objects.
  2. The initialization of x in worker1 happens before it callsmyqueue.put(), which happens before it callsmyqueue.l.append(x), which happens before the call tomyqueue.cond.release(), all because they run in the samethread.
  3. In worker2, myqueue.cond will be released and re-acquireduntil myqueue.l contains a value (x). The call tomyqueue.cond.release() in worker1 happens before that lastcall to myqueue.cond.acquire() in worker2.
  4. That last call to myqueue.cond.acquire() happens beforemyqueue.get() reads myqueue.l, which happens beforemyqueue.get() returns, which happens before print y, againall because they run in the same thread.
  5. Because happens-before is transitive, the list initially stored inx in thread1 is initialized before it is printed in thread2.

Usually, we wouldn't need to look all the way into a thread-safequeue's implementation in order to prove that uses were safe. Itsinterface would specify that puts happen before gets, and we'd reasondirectly from that.

Lots of strange things can happen when code has data races. It's easyto avoid all of these problems by just protecting shared variableswith locks. This is not a complete list of race hazards; it's just acollection that seem relevant to Python.

In all of these examples, variables starting with r are localvariables, and other variables are shared between threads.

Zombie values

This example comes from the Java memory model[16]:

Initially p is q and p.x 0.

Thread 1Thread 2
r1 = pr6 = p
r2 = r1.xr6.x = 3
r3 = q
r4 = r3.x
r5 = r1.x

Can produce r2 r5 0 but r4 3, proving thatp.x went from 0 to 3 and back to 0.

A good compiler would like to optimize out the redundant load ofp.x in initializing r5 by just re-using the value alreadyloaded into r2. We get the strange result if thread 1 sees memoryin this order:

EvaluationComputesWhy
r1 = p
r2 = r1.xr2 0
r3 = qr3 is p
p.x = 3Side-effect of thread 2
r4 = r3.xr4 3
r5 = r2r5 0Optimized from r5 = r1.x because r2 r1.x.

Inconsistent Orderings

From N2177: Sequential Consistency for Atomics[14], and also known asIndependent Read of Independent Write (IRIW).

Initially, a b 0.

Thread 1Thread 2Thread 3Thread 4
r1 = ar3 = ba = 1b = 1
r2 = br4 = a

We may get r1 r3 1 and r2 r4 0, proving boththat a was written before b (thread 1's data), and thatb was written before a (thread 2's data). See SpecialRelativity for areal-world example.

This can happen if thread 1 and thread 3 are running on processorsthat are close to each other, but far away from the processors thatthreads 2 and 4 are running on and the writes are not beingtransmitted all the way across the machine before becoming visible tonearby threads.

Neither acquire/release semantics nor explicit memory barriers canhelp with this. Making the orders consistent without locking requiresdetailed knowledge of the architecture's memory model, but Javarequires it for volatiles so we could use documentation aimed at itsimplementers.

A happens-before race that's not a sequentially-consistent race

From the POPL paper about the Java memory model [#JMM-popl].

Initially, x y 0.

Thread 1Thread 2
r1 = xr2 = y
if r1 != 0:if r2 != 0:
y = 42x = 42

Can r1 r2 42???

In a sequentially-consistent execution, there's no way to get anadjacent read and write to the same variable, so the program should beconsidered correctly synchronized (albeit fragile), and should onlyproduce r1 r2 0. However, the following execution ishappens-before consistent:

StatementValueThread
r1 = x421
if r1 != 0:true1
y = 421
r2 = y422
if r2 != 0:true2
x = 422

WTF, you are asking yourself. Because there were no inter-threadhappens-before edges in the original program, the read of x in thread1 can see any of the writes from thread 2, even if they only happenedbecause the read saw them. There are data races in thehappens-before model.

We don't want to allow this, so the happens-before model isn't enoughfor Python. One rule we could add to happens-before that wouldprevent this execution is:

If there are no data races in any sequentially-consistentexecution of a program, the program should have sequentiallyconsistent semantics.

Java gets this rule as a theorem, but Python may not want all of themachinery you need to prove it.

Self-justifying values

Also from the POPL paper about the Java memory model [#JMM-popl].

Initially, x y 0.

Thread 1Thread 2
r1 = xr2 = y
y = r1x = r2

Can x y 42???

In a sequentially consistent execution, no. In a happens-beforeconsistent execution, yes: The read of x in thread 1 is allowed to seethe value written in thread 2 because there are no happens-beforerelations between the threads. This could happen if the compiler orprocessor transforms the code into:

Thread 1Thread 2
y = 42r2 = y
r1 = xx = r2
if r1 != 42:
y = r1

It can produce a security hole if the speculated value is a secretobject, or points to the memory that an object used to occupy. Javacares a lot about such security holes, but Python may not.

Uninitialized values (direct)

From several classic double-checked locking examples.

Initially, d None.

Thread 1Thread 2
while not d: passd = [3, 4]
assert d[1] 4

This could raise an IndexError, fail the assertion, or, withoutsome care in the implementation, cause a crash or other undefinedbehavior.

Thread 2 may actually be implemented as:

Because the assignment to d and the item assignments are independent,the compiler and processor may optimize that to:

Which is obviously incorrect and explains the IndexError. If we thenlook deeper into the implementation of r1.append(3), we may findthat it and d[1] cannot run concurrently without causing their ownrace conditions. In CPython (without the GIL), those race conditionswould produce undefined behavior.

There's also a subtle issue on the reading side that can cause thevalue of d[1] to be out of date. Somewhere in the implementation oflist, it stores its contents as an array in memory. This array mayhappen to be in thread 1's cache. If thread 1's processor reloadsd from main memory without reloading the memory that ought tocontain the values 3 and 4, it could see stale values instead. As faras I know, this can only actually happen on Alphas and maybe Itaniums,and we probably have to prevent it anyway to avoid crashes.

Uninitialized values (flag)

From several more double-checked locking examples.

Initially, d dict() and initialized False.

Thread 1Thread 2
while not initialized: passd['a'] = 3
r1 = d['a']initialized = True
r2 = r1 3
assert r2

This could raise a KeyError, fail the assertion, or, without somecare in the implementation, cause a crash or other undefinedbehavior.

Because d and initialized are independent (except in theprogrammer's mind), the compiler and processor can rearrange thesealmost arbitrarily, except that thread 1's assertion has to stay afterthe loop.

Inconsistent guarantees from relying on data dependencies

This is a problem with Java final variables and the proposeddata-dependency ordering[15] in C++0x.

First execute:

Then in two threads:

Thread 1Thread 2
while not h: passr1 = Init()
assert h [1,2,3]freeze(r1)
assert h gh = r1
Python slot machine code

If h has semantics similar to a Java final variable (exceptfor being write-once), then even though the first assertion isguaranteed to succeed, the second could fail.

Data-dependent guarantees like those final provides only work ifthe access is through the final variable. It's not even safe toaccess the same object through a different route. Unfortunately,because of how processors work, final's guarantees are only cheap whenthey're weak.

The first rule is that Python interpreters can't crash due to raceconditions in user code. For CPython, this means that race conditionscan't make it down into C. For Jython, it means thatNullPointerExceptions can't escape the interpreter.

Presumably we also want a model at least as strong as happens-beforeconsistency because it lets us write a simple description of howconcurrent queues and thread launching and joining work.

Other rules are more debatable, so I'll present each one with pros andcons.

Python Slot Machine Code

Data-race-free programs are sequentially consistent

We'd like programmers to be able to reason about their programs as ifthey were sequentially consistent. Since it's hard to tell whetheryou've written a happens-before race, we only want to requireprogrammers to prevent sequential races. The Java model does thisthrough a complicated definition of causality, but if we don't want toinclude that, we can just assert this property directly.

No security holes from out-of-thin-air reads

If the program produces a self-justifying value, it could exposeaccess to an object that the user would rather the program not see.Again, Java's model handles this with the causality definition. Wemight be able to prevent these security problems by banningspeculative writes to shared variables, but I don't have a proof ofthat, and Python may not need those security guarantees anyway.

Restrict reorderings instead of defining happens-before

The .NET [#CLR-msdn] and x86 [#x86-model] memory models are based ondefining which reorderings compilers may allow. I think that it'seasier to program to a happens-before model than to reason about allof the possible reorderings of a program, and it's easier to insertenough happens-before edges to make a program correct, than to insertenough memory fences to do the same thing. So, although we couldlayer some reordering restrictions on top of the happens-before base,I don't think Python's memory model should be entirely reorderingrestrictions.

Atomic, unordered assignments

Assignments of primitive types are already atomic. If you assign3<<72 + 5 to a variable, no thread can see only part of the value.Jeremy Manson suggested that we extend this to all objects. Thisallows compilers to reorder operations to optimize them, withoutallowing some of the more confusing uninitialized values. Thebasic idea here is that when you assign a shared variable, readerscan't see any changes made to the new value before the assignment, orto the old value after the assignment. So, if we have a program like:

Initially, (d.a, d.b) (1, 2), and (e.c, e.d) (3, 4).We also have class Obj(object): pass.

Thread 1Thread 2
r1 = Obj()r3 = d
r1.a = 3r4, r5 = r3.a, r3.b
r1.b = 4r6 = e
d = r1r7, r8 = r6.c, r6.d
r2 = Obj()
r2.c = 6
r2.d = 7
e = r2

(r4, r5) can be (1, 2) or (3, 4) but nothing else, and(r7, r8) can be either (3, 4) or (6, 7) but nothingelse. Unlike if writes were releases and reads were acquires,it's legal for thread 2 to see (e.c, e.d) (6, 7) and (d.a,d.b) (1, 2) (out of order).

This allows the compiler a lot of flexibility to optimize withoutallowing users to see some strange values. However, because it relieson data dependencies, it introduces some surprises of its own. Forexample, the compiler could freely optimize the above example to:

Thread 1Thread 2
r1 = Obj()r3 = d
r2 = Obj()r6 = e
r1.a = 3r4, r7 = r3.a, r6.c
r2.c = 6r5, r8 = r3.b, r6.d
r2.d = 7
e = r2
r1.b = 4
d = r1

As long as it didn't let the initialization of e move above any ofthe initializations of members of r2, and similarly for d andr1.

This also helps to ground happens-before consistency. To see theproblem, imagine that the user unsafely publishes a reference to anobject as soon as she gets it. The model needs to constrain whatvalues can be read through that reference. Java says that every fieldis initialized to 0 before anyone sees the object for the first time,but Python would have trouble defining 'every field'. If instead wesay that assignments to shared variables have to see a value at leastas up to date as when the assignment happened, then we don't run intoany trouble with early publication.

Two tiers of guarantees

Most other languages with any guarantees for unlocked variablesdistinguish between ordinary variables and volatile/atomic variables.They provide many more guarantees for the volatile ones. Python can'teasily do this because we don't declare variables. This may or maynot matter, since python locks aren't significantly more expensivethan ordinary python code. If we want to get those tiers back, we could:

  1. Introduce a set of atomic types similar to Java's [5]or C++'s [6]. Unfortunately, we couldn't assign tothem with =.
  2. Without requiring variable declarations, we could also specify thatall of the fields on a given object are atomic.
  3. Extend the __slots__ mechanism [7] with a parallel__volatiles__ list, and maybe a __finals__ list.

Sequential Consistency

We could just adopt sequential consistency for Python. This avoidsall of the hazards mentioned above, but it prohibits lots ofoptimizations too. As far as I know, this is the current model ofCPython, but if CPython learned to optimize out some variable reads,it would lose this property.

If we adopt this, Jython's dict implementation may no longer beable to use ConcurrentHashMap because that only promises to createappropriate happens-before edges, not to be sequentially consistent(although maybe the fact that Java volatiles are totally orderedcarries over). Both Jython and IronPython would probably need to useAtomicReferenceArrayor the equivalent for any __slots__ arrays.

Adapt the x86 model

The x86 model is:

  1. Loads are not reordered with other loads.
  2. Stores are not reordered with other stores.
  3. Stores are not reordered with older loads.
  4. Loads may be reordered with older stores to different locations butnot with older stores to the same location.
  5. In a multiprocessor system, memory ordering obeys causality (memoryordering respects transitive visibility).
  6. In a multiprocessor system, stores to the same location have atotal order.
  7. In a multiprocessor system, locked instructions have a total order.
  8. Loads and stores are not reordered with locked instructions.

In acquire/release terminology, this appears to say that every storeis a release and every load is an acquire. This is slightly weakerthan sequential consistency, in that it allows inconsistentorderings, but it disallows zombie values and the compileroptimizations that produce them. We would probably want to weaken themodel somehow to explicitly allow compilers to eliminate redundantvariable reads. The x86 model may also be expensive to implement onother platforms, although because x86 is so common, that may notmatter much.

Upgrading or downgrading to an alternate model

We can adopt an initial memory model without totally restrictingfuture implementations. If we start with a weak model and want to getstronger later, we would only have to change the implementations, notprograms. Individual implementations could also guarantee a strongermemory model than the language demands, although that could hurtinteroperability. On the other hand, if we start with a strong modeland want to weaken it later, we can add a from __future__ importweak_memory statement to declare that some modules are safe.

The required model is weaker than any particular implementation. Thissection tries to document the actual guarantees each implementationprovides, and should be updated as the implementations change.

CPython

Uses the GIL to guarantee that other threads don't see funnyreorderings, and does few enough optimizations that I believe it'sactually sequentially consistent at the bytecode level. Threads canswitch between any two bytecodes (instead of only between statements),so two threads that concurrently execute:

with i initially 0 could easily end up with i1 insteadof the expected i2. If they execute:

instead, CPython 2.6 will always give the right answer, but it's easyto imagine another implementation in which this statement won't beatomic.

Memory

PyPy

Also uses a GIL, but probably does enough optimization to violatesequential consistency. I know very little about this implementation.

Jython

Provides true concurrency under the Java memory model[16] and storesall object fields (except for those in __slots__?) in aConcurrentHashMap,which provides fairly strong ordering guarantees. Local variables ina function may have fewer guarantees, which would become visible ifthey were captured into a closure that was then passed to anotherthread.

IronPython

Provides true concurrency under the CLR memory model, which probablyprotects it from uninitialized values. IronPython uses a lockedmap to store object fields, providing at least as many guarantees asJython.

[1]The Java Memory Model, by Jeremy Manson, Bill Pugh, andSarita Adve(http://www.cs.umd.edu/users/jmanson/java/journal.pdf). This paperis an excellent introduction to memory models in general and haslots of examples of compiler/processor optimizations and thestrange program behaviors they can produce.

Slot Machine In Python

[2]N2480: A Less Formal Explanation of theProposed C++ Concurrency Memory Model, Hans Boehm(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2480.html)
[3]Memory Models: Understand the Impact of Low-LockTechniques in Multithreaded Apps, Vance Morrison(http://msdn2.microsoft.com/en-us/magazine/cc163715.aspx)
[4]Intel(R) 64 Architecture Memory Ordering White Paper(http://www.intel.com/products/processor/manuals/318147.pdf)
[5]Package java.util.concurrent.atomic(http://java.sun.com/javase/6/docs/api/java/util/concurrent/atomic/package-summary.html)
[6]C++ Atomic Types and Operations, Hans Boehm andLawrence Crowl(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2427.html)
[7]__slots__ (http://docs.python.org/ref/slots.html)
[8]Alternatives to SC, a thread on the cpp-threads mailing list,which includes lots of good examples.(http://www.decadentplace.org.uk/pipermail/cpp-threads/2007-January/001287.html)
[9]python-safethread, a patch by Adam Olsen for CPythonthat removes the GIL and statically guarantees that all objectsshared between threads are consistentlylocked. (http://code.google.com/p/python-safethread/)
[10]http://www.jython.org/
[11]http://www.codeplex.com/Wiki/View.aspx?ProjectName=IronPython

Python Slots Memory Drive

[12]http://codespeak.net/pypy/dist/pypy/doc/home.html
[13]http://en.wikipedia.org/wiki/Global_Interpreter_Lock
[14]http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2177.html
[15]http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2556.html
[16](1, 2)http://java.sun.com/docs/books/jls/third_edition/html/memory.html

Python Slots Memory Game

Thanks to Jeremy Manson and Alex Martelli for detailed discussions onwhat this PEP should look like.

Python List Memory

This document has been placed in the public domain.

Source: https://github.com/python/peps/blob/master/pep-0583.rst
 tecinizhca1987.netlify.com © 2022