我这边需要实现一个接口: 参数是
{"pdf_file": "http://myserver/somefolder/somefile.pdf"}
输出是
{"key1": "value1", "key2":"value2"} ~~ PDF 文件为几页到几十页的文字 现在测试结果是 下载这个文件大约需要 5 秒 处理大约需要 5 秒 问题时如何加快这个过程? 网上搜到的一个例子是: ~~~python class Job(BaseModel): uid: UUID = Field(default=uuid4()) status: str = "in_progress" result: int = None jobs: Dict[UUID, Job] = {} async def run_in_process(fn, *args): loop = asyncio.get_event_loop() return await loop.run_in_executor(app.state.executor, fn, *args) # wait and return result async def start_cpu_bound_task(uid: UUID, stream: io.BufferedReader, filename: str) -> None: jobs[uid].result = await run_in_process(cpu_bound_func, stream, filename) jobs[uid].status = "complete" @app.post("/new_cpu_bound_task/", status_code=HTTPStatus.ACCEPTED) async def task_handler(req: InsurancePoliciesExtractionReqeust, background_tasks: BackgroundTasks): new_task = Job() jobs[new_task.uid] = new_task ## 以下是我改动的 cOntent= None async with aiohttp.ClientSession() as session: async with session.get(req.pdf_file) as resp: if resp.status == 200: cOntent= io.BytesIO(await resp.read()) filename = os.path.basename(req.pdf_file) ## 以上是我的改动 background_tasks.add_task(start_cpu_bound_task, new_task.uid, content, filename) return new_task @app.get("/status/{uid}") async def status_handler(uid: UUID): return jobs[uid] @app.on_event("startup") async def startup_event(): app.state.executor = ProcessPoolExecutor() @app.on_event("shutdown") async def on_shutdown(): app.state.executor.shutdown()
1 nyxsonsleep 2022-06-24 00:36:26 +08:00 ![]() 下载多线程。 处理?完全不明白,一个下载文件,下载文件与原文件相同即可,你要处理什么? 哪怕开 4 线程,合并文件的开销也就几毫秒吧。 |