python3多线程协程_python3-----多进程、多线程、多协程
目前計(jì)算機(jī)程序一般會(huì)遇到兩類(lèi)I/O:硬盤(pán)I/O和網(wǎng)絡(luò)I/O。我就針對(duì)網(wǎng)絡(luò)I/O的場(chǎng)景分析下python3下進(jìn)程、線程、協(xié)程效率的對(duì)比。進(jìn)程采用multiprocessing.Pool進(jìn)程池,線程是自己封裝的進(jìn)程池,協(xié)程采用gevent的庫(kù)。用python3自帶的urlllib.request和開(kāi)源的requests做對(duì)比。代碼如下:
importurllib.requestimportrequestsimporttimeimportmultiprocessingimportthreadingimportqueuedefstartTimer():returntime.time()defticT(startTime):
useTime= time.time() -startTimereturn round(useTime, 3)#def tic(startTime, name):#useTime = time.time() - startTime#print('[%s] use time: %1.3f' % (name, useTime))
defdownload_urllib(url):
req=urllib.request.Request(url,
headers={'user-agent': 'Mozilla/5.0'})
res=urllib.request.urlopen(req)
data=res.read()try:
data= data.decode('gbk')exceptUnicodeDecodeError:
data= data.decode('utf8', 'ignore')returnres.status, datadefdownload_requests(url):
req=requests.get(url,
headers={'user-agent': 'Mozilla/5.0'})returnreq.status_code, req.textclassthreadPoolManager:def __init__(self,urls, workNum=10000,threadNum=20):
self.workQueue=queue.Queue()
self.threadPool=[]
self.__initWorkQueue(urls)
self.__initThreadPool(threadNum)def __initWorkQueue(self,urls):for i inurls:
self.workQueue.put((download_requests,i))def __initThreadPool(self,threadNum):for i inrange(threadNum):
self.threadPool.append(work(self.workQueue))defwaitAllComplete(self):for i inself.threadPool:ifi.isAlive():
i.join()classwork(threading.Thread):def __init__(self,workQueue):
threading.Thread.__init__(self)
self.workQueue=workQueue
self.start()defrun(self):whileTrue:ifself.workQueue.qsize():
do,args=self.workQueue.get(block=False)
do(args)
self.workQueue.task_done()else:breakurls= ['http://www.ustchacker.com'] * 10urllibL=[]
requestsL=[]
multiPool=[]
threadPool=[]
N= 20PoolNum= 100
for i inrange(N):print('start %d try' %i)
urllibT=startTimer()
jobs= [download_urllib(url) for url inurls]#for status, data in jobs:
#print(status, data[:10])
#tic(urllibT, 'urllib.request')
urllibL.append(ticT(urllibT))print('1')
requestsT=startTimer()
jobs= [download_requests(url) for url inurls]#for status, data in jobs:
#print(status, data[:10])
#tic(requestsT, 'requests')
requestsL.append(ticT(requestsT))print('2')
requestsT=startTimer()
pool=multiprocessing.Pool(PoolNum)
data=pool.map(download_requests, urls)
pool.close()
pool.join()
multiPool.append(ticT(requestsT))print('3')
requestsT=startTimer()
pool= threadPoolManager(urls, threadNum=PoolNum)
pool.waitAllComplete()
threadPool.append(ticT(requestsT))print('4')importmatplotlib.pyplot as plt
x= list(range(1, N+1))
plt.plot(x, urllibL, label='urllib')
plt.plot(x, requestsL, label='requests')
plt.plot(x, multiPool, label='requests MultiPool')
plt.plot(x, threadPool, label='requests threadPool')
plt.xlabel('test number')
plt.ylabel('time(s)')
plt.legend()
plt.show()
運(yùn)行結(jié)果如下:
從上圖可以看出,python3自帶的urllib.request效率還是不如開(kāi)源的requests,multiprocessing進(jìn)程池效率明顯提升,但還低于自己封裝的線程池,有一部分原因是創(chuàng)建、調(diào)度進(jìn)程的開(kāi)銷(xiāo)比創(chuàng)建線程高(測(cè)試程序中我把創(chuàng)建的代價(jià)也包括在里面)。
在Windows上要想使用進(jìn)程模塊,就必須把有關(guān)進(jìn)程的代碼寫(xiě)在當(dāng)前.py文件的if __name__ == ‘__main__’ :語(yǔ)句的下面,才能正常使用Windows下的進(jìn)程模塊。Unix/Linux下則不需要。
下面是gevent的測(cè)試代碼:
importurllib.requestimportrequestsimporttimeimportgevent.poolimportgevent.monkey
gevent.monkey.patch_all()defstartTimer():returntime.time()defticT(startTime):
useTime= time.time() -startTimereturn round(useTime, 3)#def tic(startTime, name):#useTime = time.time() - startTime#print('[%s] use time: %1.3f' % (name, useTime))
defdownload_urllib(url):
req=urllib.request.Request(url,
headers={'user-agent': 'Mozilla/5.0'})
res=urllib.request.urlopen(req)
data=res.read()try:
data= data.decode('gbk')exceptUnicodeDecodeError:
data= data.decode('utf8', 'ignore')returnres.status, datadefdownload_requests(url):
req=requests.get(url,
headers={'user-agent': 'Mozilla/5.0'})returnreq.status_code, req.text
urls= ['http://www.ustchacker.com'] * 10urllibL=[]
requestsL=[]
reqPool=[]
reqSpawn=[]
N= 20PoolNum= 100
for i inrange(N):print('start %d try' %i)
urllibT=startTimer()
jobs= [download_urllib(url) for url inurls]#for status, data in jobs:
#print(status, data[:10])
#tic(urllibT, 'urllib.request')
urllibL.append(ticT(urllibT))print('1')
requestsT=startTimer()
jobs= [download_requests(url) for url inurls]#for status, data in jobs:
#print(status, data[:10])
#tic(requestsT, 'requests')
requestsL.append(ticT(requestsT))print('2')
requestsT=startTimer()
pool=gevent.pool.Pool(PoolNum)
data=pool.map(download_requests, urls)#for status, text in data:
#print(status, text[:10])
#tic(requestsT, 'requests with gevent.pool')
reqPool.append(ticT(requestsT))print('3')
requestsT=startTimer()
jobs= [gevent.spawn(download_requests, url) for url inurls]
gevent.joinall(jobs)#for i in jobs:
#print(i.value[0], i.value[1][:10])
#tic(requestsT, 'requests with gevent.spawn')
reqSpawn.append(ticT(requestsT))print('4')importmatplotlib.pyplot as plt
x= list(range(1, N+1))
plt.plot(x, urllibL, label='urllib')
plt.plot(x, requestsL, label='requests')
plt.plot(x, reqPool, label='requests geventPool')
plt.plot(x, reqSpawn, label='requests Spawn')
plt.xlabel('test number')
plt.ylabel('time(s)')
plt.legend()
plt.show()
運(yùn)行結(jié)果如下:
從上圖可以看到,對(duì)于I/O密集型任務(wù),gevent還是能對(duì)性能做很大提升的,由于協(xié)程的創(chuàng)建、調(diào)度開(kāi)銷(xiāo)都比線程小的多,所以可以看到不論使用gevent的Spawn模式還是Pool模式,性能差距不大。
因?yàn)樵趃event中需要使用monkey補(bǔ)丁,會(huì)提高gevent的性能,但會(huì)影響multiprocessing的運(yùn)行,如果要同時(shí)使用,需要如下代碼:
gevent.monkey.patch_all(thread=False, socket=False, select=False)
可是這樣就不能充分發(fā)揮gevent的優(yōu)勢(shì),所以不能把multiprocessing Pool、threading Pool、gevent Pool在一個(gè)程序中對(duì)比。不過(guò)比較兩圖可以得出結(jié)論,線程池和gevent的性能最優(yōu)的,其次是進(jìn)程池。附帶得出個(gè)結(jié)論,requests庫(kù)比urllib.request庫(kù)性能要好一些哈:-)
總結(jié)
以上是生活随笔為你收集整理的python3多线程协程_python3-----多进程、多线程、多协程的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 花瓣轻游是什么
- 下一篇: python矩阵左除_matlab学习笔