python slots

發表於 2015-11-08 | 分類於 python的那些事 |

上一篇翻译了一篇Python中使用__slots__达到节省内存的方法，今天就看一下Python内部是怎么实现__slots__这个功能的。

>>> class A(object):
...     __slots__ = ["v"]
...
>>> a = A() 
>>> a.v = 1
>>> a.b = 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'A' object has no attribute 'b'
>>> type(A.v)
<type 'member_descriptor'>

上面这一段代码，Class A声明了__slots__参数，对实例a的v变量进行赋值没有问题，但对实例a的b变量赋值，则会有个AttributeError的异常发生。再仔细看看A.v，原来他是一个descriptor(有时间可以写一篇Python descriptor相关的文章，Python中很多功能的实现都离不开descriptor)！
那么，Python在构建Class A的时候到底都做了什么，我们慢慢地解析下去。

Class A的创建

Python中类的构建的相关字节码是BUILD_CLASS:

# in ceval.c
case BUILD_CLASS:
      u = TOP();
      v = SECOND();
      w = THIRD();
      STACKADJ(-2);
      x = build_class(u, v, w);
      SET_TOP(x);
      Py_DECREF(u);
      Py_DECREF(v);
      Py_DECREF(w);

顺着这段起始代码顺藤摸瓜，最后会到命名为type_new的方法中去：

  # in typeobject.c
  static PyObject *
  type_new(PyTypeObject *metatype, PyObject *args, PyObject *kwds) {
        ....
        // dict 可以理解为你将要新建的类的locals，里面便含有__slots__字段
        if (!PyArg_ParseTupleAndKeywords(args, kwds, "SO!O!:type", kwlist,
                                 &name,
                                 &PyTuple_Type, &bases,
                                 &PyDict_Type, &dict))
            return NULL;
        ....
        // 获取__slots__参数
        slots = PyDict_GetItemString(dict, "__slots__");
        nslots = 0;
        add_dict = 0;    // 是否要加入__dict__
        add_weak = 0;  // 是否要加入__weakref__

        if (slots == NULL) {
              //... ignore
        } else {
              // 会对slots做一下处理，包括计算nslots，但不影响我们对slots实现的理解
        }

        // 为需要创建的类分配空间
        type = (PyTypeObject *)metatype->tp_alloc(metatype, nslots);
        ....
        // 转换指针，变为PyHeapTypeObject，PyHeapTypeObject和PyTypeObject的关系可以认为是包含关系，具体的细节可以再写一篇文章了
        et = (PyHeapTypeObject *)type;
        ....
        // 将slots相关的参数设置为members
        mp = PyHeapType_GET_MEMBERS(et);
        slotoffset = base->tp_basicsize;
        if (slots != NULL) {
            for (i = 0; i < nslots; i++, mp++) {
                mp->name = PyString_AS_STRING(
                              PyTuple_GET_ITEM(slots, i));
                mp->type = T_OBJECT_EX;
                mp->offset = slotoffset;
                ....
                slotoffset += sizeof(PyObject *);
             }
         }
         ....
         type->tp_members = PyHeapType_GET_MEMBERS(et);
         ....
         PyType_Ready(type);
         ....
}

上面我们忽略了一些不相干的代码，代码的主要逻辑就是将__slots__定义的内容放到type的末尾，并且让type的tp_members变量指向type的末尾，这些内容将会在PyType_Ready函数中用到。
这里忽略PyType_Ready的内容，只说明在PyType_Ready中会调用add_members方法，下面看一下add_members的实现：

# in typeobject.c
static int
add_members(PyTypeObject *type, PyMemberDef *memb) {
    PyObject *dict = type->tp_dict;
    for (; memb->name != NULL; memb++) {
          PyObject *descr;
          if (PyDict_GetItemString(dict, memb->name))
              continue;
          descr = PyDescr_NewMember(type, memb);
          if (descr == NULL)
              return -1;
          if (PyDict_SetItemString(dict, memb->name, descr) < 0)
              return -1;
          Py_DECREF(descr);
      }
      return 0;
}

上面的代码逻辑还是很好理解的，将memb里面的内容（即slots的内容）创建为member_descriptor，加入到tp_dict里面去。

Class A的属性设置

实例的属性设置最终会在PyObject_GenericSetAttr方法中处理，PyObject_GenericSetAttr方法又会调用_PyObject_GenericSetAttrWithDict方法，我们来看一下_PyObject_GenericSetAttrWithDict方法到底怎么做的，才可以实现文章开始展示的效果。

int
_PyObject_GenericSetAttrWithDict(PyObject *obj, PyObject *name,
                             PyObject *value, PyObject *dict) {
    PyTypeObject *tp = Py_TYPE(obj);
    descr = _PyType_Lookup(tp, name);
    f = NULL;
    if (descr != NULL &&
        PyType_HasFeature(descr->ob_type,Py_TPFLAGS_HAVE_CLASS)) {
        f = descr->ob_type->tp_descr_set;
        // 对于slots相关的参数将会进入下面的if语句
        if (f != NULL && PyDescr_IsData(descr)) {
            res = f(descr, obj, value);
            goto done;
        }
    }
    ....
    if (dict == NULL) {
        // 对于不是slots提供的属性，尝试去obj自己的dict中查找，
        // 但在构建类时，由于有__slots__属性，所以tp_dictoffset为0，
        // 因此dictptr等于NULL
        dictptr = _PyObject_GetDictPtr(obj);
        if (dictptr != NULL) {
              .... 
        }
    }
    if (dict != NULL) {....}
    if (f != NULL) {....}
    if (descr == NULL) {
        // 此时输出错误信息
        PyErr_Format(PyExc_AttributeError,
                 "'%.100s' object has no attribute '%.200s'",
                 tp->tp_name, PyString_AS_STRING(name));
        goto done;
    }
    ....
}

对于slots定义的变量，会找到对应的descr，调用对应的set方法，对于slots没有定义的变量，最终会跑到抛出异常的那段逻辑。

总结

__slots__的实现还是比较好懂的，Python在构建类的时候，会检查__slots__变量是否存在，如果存在，为slots变量建立member_descriptor，并放入tp_dict里面，并且这个类的tpdictoffset为0，它的实例将不会有自己的\_dict__。

libuv Timer 模块

發表於 2015-11-08 |

Timer handles are used to schedule callbacks to be called in the future.

libuv的Timer模块的实现还是相对简单的，对libuv的解析就从Timer模块开始。

数据结构

Timer模块的Handle Type(这里可以理解为Timer实例的数据结构)是uv_timer_s，其内容如下：

struct uv_timer_s {
    UV_HANDLE_FIELDS
    UV_TIMER_PRIVATE_FIELDS
};

#define UV_TIMER_PRIVATE_FIELDS                           
  uv_timer_cb timer_cb;    #timer到期时的回调函数                                         
  void* heap_node[3];      #用于维护timer的最小堆    
  uint64_t timeout;        #timer的超时时间，其实是到多长时间后timer被触发       
  uint64_t repeat;         #timer是否重复                                                 
  uint64_t start_id;

接口

1. uv_timer_init

uv_timer_init接口的实现很简单。

# file: timer.c
int uv_timer_init(uv_loop_t* loop, uv_timer_t* handle) {
    uv__handle_init(loop, (uv_handle_t*)handle, UV_TIMER);   [1]
    handle->timer_cb = NULL;
    handle->repeat = 0;
    return 0;
}

代码[1]对uv_handle_t进行了初始化。来看一下uv__handle_init宏定义。

#define uv__handle_init(loop_, h, type_)                                    
    do {                                                                        
        (h)->loop = (loop_);                                                   
        (h)->type = (type_);                                                  
        (h)->flags = UV__HANDLE_REF;  
        QUEUE_INSERT_TAIL(&(loop_)->handle_queue, &(h)->handle_queue); [1]
        uv__handle_platform_init(h);                                             
    }                                                                           
    while (0)

其中最关键的部分是将h(uv_handle_t)加入loop->handle_queue的队列中去。

2. uv_timer_start

uv_timer_start接口的主要任务就是将uv_timer_t这个handler加入loop维护的一个最小堆中(timer_heap)

int uv_timer_start(uv_timer_t* handle,
               uv_timer_cb cb,
               uint64_t timeout,
               uint64_t repeat) {
    uint64_t clamped_timeout;

    if (cb == NULL)
        return -EINVAL;

    if (uv__is_active(handle))
        uv_timer_stop(handle);

     ......
     heap_insert((struct heap*) &handle->loop->timer_heap,
          (struct heap_node*) &handle->heap_node,
          timer_less_than);
     uv__handle_start(handle);

      return 0;
}

3.uv_timer_stop

uv_timer_stop做的事情更简单了，他将Timer中timer_heap中删除掉并关闭handler。

小结

libuv里面Timer是非常简单的模块了，

Python Inspect Module 小结

發表於 2015-11-08 | 分類於 python的那些事 |

python的inspect模块正如他们的命名一样，是用于检查运行模块的一些基本信息，有了inspect模块，我们可以做很多有意思的事情，下面主要想探究一下inspect模块

inspect.getmembers

def getmembers(object, predicate=None):
"""Return all members of an object as (name, value) pairs sorted by name.
Optionally, only return members that satisfy a given predicate."""
results = []
# 使用dir(builtin)获取所有的attr-key
for key in dir(object):
    try:
        value = getattr(object, key)
    except AttributeError:
        continue
    # 如果有预测函数则进行预测
    if not predicate or predicate(value):
        results.append((key, value))
results.sort()
return results

getmembers方法的实现非常简单，其内部的实现就是用内建函数dir实现的。

inspect.currentframe

currentframe内部实现是通过sys._getframe实现的。在使用currentframe的时候要注意防止循环引用。

def handle_stackframe_without_leak():
    frame = inspect.currentframe()
try:
    # do something with the frame
finally:
    del frame

inspect的使用

获取调用函数的实例

# file: util.py
def get_caller():
    import inspect
    try:
        frame = inspect.currentframe()
        call_frame = frame.f_back.f_back
        call_frame_name = call_frame.f_code.co_varnames[0]
        call_frame_self = call_frame.f_locals.get(call_frame_name, None)
    except:
        call_frame_self = None
    finally:
        del frame
    return call_frame_self