文档章节

Python中如何使用yield,对于庞大迭代的优化处理

青鸾之旅
 青鸾之旅
发布于 2013/08/02 09:02
字数 3043
阅读 1172
收藏 8
点赞 0
评论 3

一直困扰于yield的使用,今天看到一篇不错的解释,虽然都是英文,不过不要紧,可以跳开,直接看代码的部分就能懂

Improve Your Python: 'yield' and Generators Explained

Posted on Apr 07, 2013 by Jeff Knupp

Prior to beginning tutoring sessions, I ask new students to fill out a brief self-assessment where they rate their understanding of various Python concepts. Some topics ("control flow with if/else" or "defining and using functions") are understood by a majority of students before ever beginning tutoring. There are a handful of topics, however, that almost all students report having no knowledge or very limited understanding of. Of these, "generators and the yield keyword" is one of the biggest culprits. I'm guessing this is the case for mostnovice Python programmers.

Many report having difficulty understanding generators and theyield keyword even after making a concerted effort to teach themselves the topic. I want to change that. In this post, I'll explainwhat the yield keyword does, why it's useful, and how to use it.

Note: In recent years, generators have grown more powerful as features have been added through PEPs. In my next post, I'll explore the true power of yield with respect to coroutines, cooperative multitasking and asynchronous I/O (especially their use in the "tulip"prototype implementation GvR has been working on). Before we get there, however, we need a solid understanding of how the yieldkeyword and generators work.

Coroutines and Subroutines

When we call a normal Python function, execution starts at function's first line and continues until a return statement, exception, or the end of the function (which is seen as an implicit return None) is encountered. Once a function returns control to its caller, that's it. Any work done by the function and stored in local variables is lost. A new call to the function creates everything from scratch.

This is all very standard when discussing functions (more generally referred to as subroutines) in computer programming. There are times, though, when it's beneficial to have the ability to create a "function" which, instead of simply returning a single value, is able to yield a series of values. To do so, such a function would need to be able to "save its work," so to speak.

I said, "yield a series of values" because our hypothetical function doesn't "return" in the normal sense. return implies that the function isreturning control of execution to the point where the function was called. "Yield," however, implies that the transfer of control is temporary and voluntary, and our function expects to regain it in the future.

In Python, "functions" with these capabilities are called generators, and they're incredibly useful. generators (and the yield statement) were initially introduced to give programmers a more straightforward way to write code responsible for producing a series of values. Previously, creating something like a random number generator required a class or module that both generated values and kept track of state between calls. With the introduction of generators, this became much simpler.

To better understand the problem generators solve, let's take a look at an example. Throughout the example, keep in mind the core problem being solved: generating a series of values.

Note: Outside of Python, all but the simplest generators would be referred to as coroutines. I'll use the latter term later in the post. The important thing to remember is, in Python, everything described here as a coroutine is still a generator. Python formally defines the termgenerator; coroutine is used in discussion but has no formal definition in the language.

Example: Fun With Prime Numbers

Suppose our boss asks us to write a function that takes a list of ints and returns some Iterable containing the elements which are prime1numbers.

Remember, an Iterable is just an object capable of returning its members one at a time.

"Simple," we say, and we write the following:

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def get_primes(input_list): result_list = list() for element in input_list: if is_prime(element): result_list.append() return result_list # or better yet... def get_primes(input_list): return (element for element in input_list if is_prime(element)) # not germane to the example, but here's a possible implementation of # is_prime... def is_prime(number): if number > 1: if number == 2: return True if number % 2 == 0: return False for current in range(3, int(math.sqrt(number) + 1), 2): if number % current == 0: return False return True return False

Either get_primes implementation above fulfills the requirements, so we tell our boss we're done. She reports our function works and is exactly what she wanted.

Dealing With Infinite Sequences

Well, not quite exactly. A few days later, our boss comes back and tells us she's run into a small problem: she wants to use our get_primesfunction on a very large list of numbers. In fact, the list is so large that merely creating it would consume all of the system's memory. To work around this, she wants to be able to call get_primes with a start value and get all the primes larger than start (perhaps she's solving Project Euler problem 10).

Once we think about this new requirement, it becomes clear that it requires more than a simple change to get_primes. Clearly, we can't return a list of all the prime numbers from start to infinity (operating on infinite sequences, though, has a wide range of useful applications). The chances of solving this problem using a normal function seem bleak.

Before we give up, let's determine the core obstacle preventing us from writing a function that satisfies our boss's new requirements. Thinking about it, we arrive at the following: functions only get one chance to return results, and thus must return all results at once. It seems pointless to make such an obvious statement; "functions just work that way," we think. The real value lies in asking, "but what if they didn't?"

Imagine what we could do if get_primes could simply return the nextvalue instead of all the values at once. It wouldn't need to create a list at all. No list, no memory issues. Since our boss told us she's just iterating over the results, she wouldn't know the difference.

Unfortunately, this doesn't seem possible. Even if we had a magical function that allowed us to iterate from n to infinity, we'd get stuck after returning the first value:

1
2
3
4
def get_primes(start): for element in magical_infinite_range(start): if is_prime(element): return element

Imagine get_primes is called like so:

1
2
3
4
5
6
7
8
9
def solve_number_10(): # She *is* working on Project Euler #10, I knew it! total = 2 for next_prime in get_primes(3): if next_prime < 2000000: total += next_prime else: print(total) return

Clearly, in get_primes, we would immediately hit the case wherenumber = 3 and return at line 4. Instead of return, we need a way to generate a value and, when asked for the next one, pick up where we left off.

Functions, though, can't do this. When they return, they're done for good. Even if we could guarantee a function would be called again, we have no way of saying, "OK, now, instead of starting at the first line like we normally do, start up where we left off at line 4." Functions have a single entry point: the first line.

Enter the Generator

This sort of problem is so common that a new construct was added to Python to solve it: the generator. A generator "generates" values. Creating generators was made as straightforward as possible through the concept of generator functions, introduced simultaneously.

A generator function is defined like a normal function, but whenever it needs to generate a value, it does so with the yield keyword rather than return. If the body of a def contains yield, the function automatically becomes a generator function (even if it also contains a return statement). There's nothing else we need to do to create one.

generator functions create generator iterators. That's the last time you'll see the term generator iterator, though, since they're almost always referred to as "generators". Just remember that a generator is a special type of iterator. To be considered an iterator, generatorsmust define a few methods, one of which is __next__(). To get the next value from a generator, we use the same built-in function as foriterators: next().

This point bear repeating: to get the next value from a generator, we use the same built-in function as for iterators: next().

(next() takes care of calling the generator's __next__() method). Since a generator is a type of iterator, it can be used in a for loop.

So whenever next() is called on a generator, the generator is responsible for passing back a value to whomever called next(). It does so by calling yield along with the value to be passed back (e.g.yield 7). The easiest way to remember what yield does is to think of it as return (plus a little magic) for generator functions.**

Again, this bears repeating: yield is just return (plus a little magic) forgenerator functions.

Here's a simple generator function:

1
2
3
4
>>> def simple_generator_function(): >>> yield 1 >>> yield 2 >>> yield 3

And here are two simple ways to use it:

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
>>> for value in simple_generator_function(): >>> print(value) 1 2 3 >>> our_generator = simple_generator_function() >>> next(our_generator) 1 >>> next(our_generator) 2 >>> next(our_generator) 3

Magic?

What's the magic part? Glad you asked! When a generator functioncalls yield, the "state" of the generator function is frozen; the values of all variables are saved and the next line of code to be executed is recorded until next() is called again. Once it is, the generator function simply resumes where it left off. If next() is never called again, the state recorded during the yield call is (eventually) discarded.

Let's rewrite get_primes as a generator function. Notice that we no longer need the magical_infinite_range function. Using a simplewhile loop, we can create our own infinite sequence:

1
2
3
4
5
def get_primes(number): while True: if is_prime(number): yield number number += 1

If a generator function calls return or reaches the end its definition, aStopIteration exception is raised. This signals to whoever was callingnext() that the generator is exhausted (this is normal iteratorbehavior). It is also the reason the while True: loop is present inget_primes. If it weren't, the first time next() was called we would check if the number is prime and possibly yield it. If next() were called again, we would uselessly add 1 to number and hit the end of the generator function (causing StopIteration to be raised). Once a generator has been exhausted, calling next() on it will result in an error, so you can only consume all the values of a generator once. The following will not work:

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
>>> our_generator = simple_generator_function() >>> for value in our_generator: >>> print(value) >>> # our_generator has been exhausted... >>> print(next(our_generator)) Traceback (most recent call last): File "<ipython-input-13-7e48a609051a>", line 1, in <module> next(our_generator) StopIteration >>> # however, we can always create a new generator >>> # by calling the generator function again... >>> new_generator = simple_generator_function() >>> print(next(new_generator)) # perfectly valid 1

Thus, the while loop is there to make sure we never reach the end ofget_primes. It allows us to generate a value for as long as next() is called on the generator. This is a common idiom when dealing with infinite series (and generators in general).

Visualizing the flow

Let's go back to the code that was calling get_primes:solve_number_10.

1
2
3
4
5
6
7
8
9
def solve_number_10(): # She *is* working on Project Euler #10, I knew it! total = 2 for next_prime in get_primes(3): if next_prime < 2000000: total += next_prime else: print(total) return

It's helpful to visualize how the first few elements are created when we call get_primes in solve_number_10's for loop. When the for loop requests the first value from get_primes, we enter get_primes as we would in a normal function.

  1. We enter the while loop on line 3
  2. The if condition holds (3 is prime)
  3. We yield the value 3 and control to solve_number_10.

Then, back in solve_number_10:

  1. The value 3 is passed back to the for loop
  2. The for loop assigns next_prime to this value
  3. next_prime is added to total
  4. The for loop requests the next element from get_primes

This time, though, instead of entering get_primes back at the top, we resume at line 5, where we left off.

1
2
3
4
5
def get_primes(number): while True: if is_prime(number): yield number number += 1 # <<<<<<<<<<

Most importantly, number still has the same value it did when we calledyield (i.e. 3). Remember, yield both passes a value to whoever called next(), and saves the "state" of the generator function. Clearly, then, number is incremented to 4, we hit the top of the whileloop, and keep incrementing number until we hit the next prime number (5). Again we yield the value of number to the for loop insolve_number_10. This cycle continues until the for loop stops (at the first prime greater than 2,000,000).

Moar Power

In PEP 342, support was added for passing values into generators. PEP 342 gave generators the power to yield a value (as before), receive a value, or both yield a value and receive a (possibly different) value in a single statement.

To illustrate how values are sent to a generator, let's return to our prime number example. This time, instead of simply printing every prime number greater than number, we'll find the smallest prime number greater than successive powers of a number (i.e. for 10, we want the smallest prime greater than 10, then 100, then 1000, etc.). We start in the same way as get_primes:

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def print_successive_primes(iterations, base=10): # like normal functions, a generator function # can be assigned to a variable prime_generator = get_primes(base) # missing code... for power in range(iterations): # missing code... def get_primes(number): while True: if is_prime(number): # ... what goes here?

The next line of get_primes takes a bit of explanation. While yield number would yield the value of number, a statement of the form other = yield foo means, "yield foo and, when a value is sent to me, setother to that value." You can "send" values to a generator using the generator's send method.

1
2
3
4
5
def get_primes(number): while True: if is_prime(number): number = yield number number += 1

In this way, we can set number to a different value each time the generator yields. We can now fill in the missing code inprint_successive_primes:

1
2
3
4
5
def print_successive_primes(iterations, base=10): prime_generator = get_primes(base) prime_generator.send(None) for power in range(iterations): print(prime_generator.send(base ** power))

Two things to note here: First, we're printing the result ofgenerator.send, which is possible because send both sends a value to the generator and returns the value yielded by the generator (mirroring how yield works from within the generator function).

Second, notice the prime_generator.send(None) line. When you're using send to "start" a generator (that is, execute the code from the first line of the generator function up to the first yield statement), you must send None. This makes sense, since by definition the generator hasn't gotten to the first yield statement yet, so if we sent a real value there would be nothing to "receive" it. Once the generator is started, we can send values as we do above.

Round-up

In the second half of this series, we'll discuss the various ways in whichgenerators have been enhanced and the power they gained as a result. yield has become one of the most powerful keywords in Python. Now that we've built a solid understanding of how yieldworks, we have the knowledge necessary to understand some of the more "mind-bending" things that yield can be used for.

Believe it or not, we've barely scratched the surface of the power ofyield. For example, while send does work as described above, it's almost never used when generating simple sequences like our example. Below, I've pasted a small demonstration of one common way send is used. I'll not say any more about it as figuring out how and why it works will be a good warm-up for part two.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import random def get_data(): """Return 3 random integers between 0 and 9""" return random.sample(range(10), 3) def consume(): """Displays a running average across lists of integers sent to it""" running_sum = 0 data_items_seen = 0 while True: data = yield data_items_seen += len(data) running_sum += sum(data) print('The running average is {}'.format(running_sum / float(data_items_seen))) def produce(consumer): """Produces a set of values and forwards them to the pre-defined consumer  function""" while True: data = get_data() print('Produced {}'.format(data)) consumer.send(data) yield if __name__ == '__main__': consumer = consume() consumer.send(None) producer = produce(consumer) for _ in range(10): print('Producing...') next(producer)

Remember...

There are a few key ideas I hope you take away from this discussion:

  • generators are used to generate a series of values
  • yield is like the return of generator functions
  • The only other thing yield does is save the "state" of a generator function
  • A generator is just a special type of iterator
  • Like iterators, we can get the next value from a generator using next()
    • for gets values by calling next() implicitly

I hope this post was helpful. If you had never heard of generators, I hope you now understand what they are, why they're useful, and how to use them. If you were somewhat familiar with generators, I hope any confusion is now cleared up.

本文转载自:http://www.jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/

共有 人打赏支持
青鸾之旅
粉丝 0
博文 5
码字总数 1701
作品 0
丰台
程序员
加载中

评论(3)

青鸾之旅
青鸾之旅
哦,谢谢,偶然看见的,觉得不错就贴上来了
Dyllian
Dyllian
这篇英文文章已经在本网站oschina上翻译过了,题目:“提高你的Python: 解释‘yield’和‘Generators(生成器)’”,你可以搜一下
Python函数式编程指南(四):生成器

生成器是迭代器,同时也并不仅仅是迭代器,不过迭代器之外的用途实在是不多,所以我们可以大声地说:生成器提供了非常方便的自定义迭代器的途径。 这是函数式编程指南的最后一篇,似乎拖了一...

fjie
2013/10/23
0
0
【原创】Python 之快速性能优化(第一部分)

本文为翻译,原文地址:《Quick Python Performance Optimization: Part I》 This is part I of a two part series of blog post on performance optimization in python. Aim is to just e......

摩云飞
2013/11/19
0
0
python生成式生成器详细解答

Python得yield关键字,yield是python中的生成器 了解生成器需要先了解什么是生成式,切片的目的是从已有的列表中切出一部分返回,而生成式的目的则是从无到有的构建一个列表 一个生成式的案例...

开源中国段子手
2015/07/21
0
0
Python yield 使用浅析(转载)

Python yield 使用浅析 转载http://www.ibm.com/developerworks/cn/opensource/os-cn-python-yield/ 以下是理解归纳 如何生成斐波那契數列 def fab(max): 执行 fab(5) 问题是print太多了 解决...

okker
2013/12/16
0
0
Python3+迭代器与生成器

转载Python3 迭代器与生成器 迭代器 迭代是Python最强大的功能之一,是访问集合元素的一种方式。 迭代器是一个可以记住遍历的位置的对象。 迭代器对象从集合的第一个元素开始访问,直到所有的...

xinet
2017/08/12
0
0
Python高级特性:Iterators、Generators和itertools

作为一门动态脚本语言,Python对编程初学者而言很友好,丰富的第三方库能够给使用者带来很大的便利。而Python同时也能够提供一些高级的特性方便用户使用更为复杂的数据结构。本系列文章共有三...

wt7315
06/26
0
0
关于Python中的yield

在介绍yield前有必要先说明下Python中的迭代器(iterator)和生成器(constructor)。 一、迭代器(iterator) 在Python中,for循环可以用于Python中的任何类型,包括列表、元祖等等,实...

劲风online
2015/07/20
0
0
Python 迭代器和 生成器

一直以为 Python 的生成器是指 列表生成, 好吧,我读书少。 其实呢,生成器是 使用yield 返回实现了迭代器协议的generator 对象。 如下: def init(self, *args):self._data = list(args)d...

MtrS
2014/12/22
0
0
Python向来以慢著称,为啥Instagram却唯独钟爱它?

PyCon 是全世界最大的以 Python 编程语言 为主题的技术大会,大会由 Python 社区组织,每年举办一次。在 Python 2017 上,Instagram 的工程师们带来了一个有关 Python 在 Instagram 的主题演...

好铁
2017/10/23
0
0
python 学习笔记(摘自《Python基础教程第2版》)

这里有一篇很详细的, http://www.worldhello.net/doc/python/python.mm.htm 以下是我的笔记 尽量使用 import, 而不使用 from xx import xx 每个包必须包含一个名为 init.py 的文件,以区分正常...

sailtseng
2012/06/07
0
2

没有更多内容

加载失败,请刷新页面

加载更多

下一页

SpringBoot | 第十章:Swagger2的集成和使用

前言 前一章节介绍了mybatisPlus的集成和简单使用,本章节开始接着上一章节的用户表,进行Swagger2的集成。现在都奉行前后端分离开发和微服务大行其道,分微服务及前后端分离后,前后端开发的...

oKong
今天
2
0
Python 最小二乘法 拟合 二次曲线

Python 二次拟合 随机生成数据,并且加上噪声干扰 构造需要拟合的函数形式,使用最小二乘法进行拟合 输出拟合后的参数 将拟合后的函数与原始数据绘图后进行对比 import numpy as npimport...

阿豪boy
今天
1
0
云拿 无人便利店

附近(上海市-航南路)开了家无人便利店.特意进去体验了一下.下面把自己看到的跟大家分享下. 经得现场工作人员同意后拍了几张照片.从外面看是这样.店门口的指导里强调:不要一次扫码多个人进入....

周翔
昨天
1
0
Java设计模式学习之工厂模式

在Java(或者叫做面向对象语言)的世界中,工厂模式被广泛应用于项目中,也许你并没有听说过,不过也许你已经在使用了。 简单来说,工厂模式的出现源于增加程序序的可扩展性,降低耦合度。之...

路小磊
昨天
161
1
npm profile 新功能介绍

转载地址 npm profile 新功能介绍 npm新版本新推来一个功能,npm profile,这个可以更改自己简介信息的命令,以后可以不用去登录网站来修改自己的简介了 具体的这个功能的支持大概是在6这个版...

durban
昨天
1
0
Serial2Ethernet Bi-redirection

Serial Tool Serial Tool is a utility for developing serial communications, custom protocols or device testing. You can set up bytes to send accordingly to your protocol and save......

zungyiu
昨天
1
0
python里求解物理学上的双弹簧质能系统

物理的模型如下: 在这个系统里有两个物体,它们的质量分别是m1和m2,被两个弹簧连接在一起,伸缩系统为k1和k2,左端固定。假定没有外力时,两个弹簧的长度为L1和L2。 由于两物体有重力,那么...

wangxuwei
昨天
0
0
apolloxlua 介绍

##项目介绍 apolloxlua 目前支持javascript到lua的翻译。可以在openresty和luajit里使用。这个工具分为两种模式, 一种是web模式,可以通过网页使用。另外一种是tool模式, 通常作为大规模翻...

钟元OSS
昨天
2
0
Mybatis入门

简介: 定义:Mybatis是一个支持普通SQL查询、存储过程和高级映射的持久层框架。 途径:MyBatis通过XML文件或者注解的形式配置映射,实现数据库查询。 特性:动态SQL语句。 文件结构:Mybat...

霍淇滨
昨天
2
0
开发技术瓶颈期,如何突破

前言 读书、学习的那些事情,以前我也陆续叨叨了不少,但总觉得 “学习方法” 就是一个永远在路上的话题。个人的能力、经验积累与习惯方法不尽相同,而且一篇文章甚至一本书都很难将学习方法...

_小迷糊
昨天
1
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

返回顶部
顶部