最近发现 defaultdict 的一个奇技淫巧

>>> from collections import defaultdict >>> ind = defaultdict(lambda: len(ind)) >>> ind["test_a"] 0 >>> ind["test_b"] 1 >>> ind["test_a"] 0 >>> ind["test_z"] 2

ind

len

test_a

奇技淫

15 条回复 2021-03-09 17:19:40 +08:00

ClericPy

2021-03-07 00:52:50 +08:00

有意思, 用的跟个 Enum 似的

aijam

2021-03-07 08:36:46 +08:00

shutongxinq

2021-03-07 09:15:57 +08:00

有意思，赞！

iConnect

2021-03-07 10:00:23 +08:00 via Android

赶紧想想，有哪些使用场景，性能好不好？

laoyuan

2021-03-07 10:22:38 +08:00

性能，性能是关键

24bit

2021-03-07 10:39:21 +08:00

有意思

Contextualist

2021-03-07 11:14:43 +08:00

@iConnect @laoyuan
就 CPython 来说，defaultdict 和 dict 的实现几乎是一样的，前者只是多了个处理键值缺失的方法（__missing__）。这就意味着：1) 如果查找的键存在，其效率和 dict 一样； 2) 否则调用 len （ O(1)，因为这是对象自己维护的一个属性），并插入一对值。

abersheeran

2021-03-07 12:17:17 +08:00

很有趣。

abersheeran

2021-03-07 12:18:09 +08:00

我之前是拿这个做缓存用。https://github.com/abersheeran/baize/blob/master/baize/wsgi.py#L40

jokeface

2021-03-07 14:47:30 +08:00

为什么不会报错，感觉 ind 这个变量应该没有哇

xoyo

2021-03-07 21:05:48 +08:00

@jokeface deferred evaluate

xiaolinjia

2021-03-08 11:03:00 +08:00

在 py37 的 dict 有序后，这样就能按插入顺序取到索引吧。不过这个得先转 list 性能较差。
>>> a = {'a': 'aaa', 'b': 'bbb'}
>>> list(a).index('a')
0

no1xsyzy

2021-03-08 12:56:50 +08:00

@iConnect @laoyuan
性能甚至应当比
if key not in ind: ind[key] = len(ind)
好，因为 call __missing__ 的过程不需要走 python 代码。

倒是线程安全？ asyncio 之类的协程非抢占式调度倒是没问题，抢占式的线程恐怕会造成问题。

Contextualist

2021-03-08 14:03:50 +08:00

@no1xsyzy 好观点。查了一下，这样（在 CPython 中）似乎的确不是线程安全的，因为如果工厂函数是 Python 代码，调用它的这个动作就是一个线程切换点。详见 https://stackoverflow.com/a/17682555，按照这个回答的提示，或许下面这个不优雅写法能行？
ind = defaultdict()
ind.default_factory = ind.__len__

vegetableChick

2021-03-09 17:19:40 +08:00

牛啊牛啊