ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

transformers 报错,无法加载执行 bert-base-chinese github.com连不上

2022-04-05 17:32:43  阅读:557  来源: 互联网

标签:bert github hub torch len tokens 报错 py


https://blog.csdn.net/weixin_37935970/article/details/123238677

 

pip install transformers==3.0.2

pip install torch==1.3.1

pip install huggingface_hub

tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-chinese')

(torch1.3) root@iZ2zedmeg2gi9atq5khtlgZ:~/online_doctor/bert_server# python bert_chinese_encode.py
Downloading: "https://github.com/huggingface/pytorch-transformers/archive/main.zip" to /root/.cache/torch/hub/main.zip
Traceback (most recent call last):
File "bert_chinese_encode.py", line 5, in <module>
tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-chinese')
File "/root/torch1.3/lib/python3.6/site-packages/torch/hub.py", line 399, in load
model = _load_local(repo_or_dir, model, *args, **kwargs)
File "/root/torch1.3/lib/python3.6/site-packages/torch/hub.py", line 427, in _load_local
entry = _load_entry_from_hubconf(hub_module, model)
File "/root/torch1.3/lib/python3.6/site-packages/torch/hub.py", line 230, in _load_entry_from_hubconf
_check_dependencies(m)
File "/root/torch1.3/lib/python3.6/site-packages/torch/hub.py", line 219, in _check_dependencies
raise RuntimeError('Missing dependencies: {}'.format(', '.join(missing_deps)))
RuntimeError: Missing dependencies: huggingface_hub

pip install huggingface_hub

 

网络问题:

/etc/hosts 配置,删除;试了无数次终于开始下载了;(配了也不一定有效果)

Downloading: "https://github.com/huggingface/pytorch-transformers/archive/main.zip" to /root/.cache/torch/hub/main.zip

可以再windows下手动下载再上传到.cache下

https://gitee.com/ineo6/hosts

 

(torch1.3) root@iZ2zedmeg2gi9atq5khtlgZ:~/online_doctor/bert_server# python bert_chinese_encode.py
============ huggingface pytorch-transformers
Downloading: "https://github.com/huggingface/pytorch-transformers/archive/main.zip" to /root/.cache/torch/hub/main.zip
============ huggingface pytorch-transformers
Traceback (most recent call last):
File "bert_chinese_encode.py", line 7, in <module>
model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-chinese')
File "/root/torch1.3/lib/python3.6/site-packages/torch/hub.py", line 397, in load
repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose, skip_validation)
File "/root/torch1.3/lib/python3.6/site-packages/torch/hub.py", line 165, in _get_cache_or_reload
repo_owner, repo_name, branch = _parse_repo_info(github)
File "/root/torch1.3/lib/python3.6/site-packages/torch/hub.py", line 119, in _parse_repo_info
with urlopen(f"https://github.com/{repo_owner}/{repo_name}/tree/main/"):
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 526, in open
response = self._open(req, data)
File "/usr/lib/python3.6/urllib/request.py", line 544, in _open
'_open', req)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 1392, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/usr/lib/python3.6/urllib/request.py", line 1352, in do_open
r = h.getresponse()
File "/usr/lib/python3.6/http/client.py", line 1383, in getresponse
response.begin()
File "/usr/lib/python3.6/http/client.py", line 320, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.6/http/client.py", line 289, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
(torch1.3) root@iZ2zedmeg2gi9atq5khtlgZ:~/online_doctor/bert_server# python bert_chinese_encode.py
============ huggingface pytorch-transformers
Using cache found in /root/.cache/torch/hub/huggingface_pytorch-transformers_main
============ huggingface pytorch-transformers
Using cache found in /root/.cache/torch/hub/huggingface_pytorch-transformers_main
Downloading: 66%|████████████████████████████████████████████████████████████████████▏ | 270M/412M [00:21<00:11, 12.1MB/s

 

 

 

"""
pip install transformers==3.0.2

pip install torch==1.3.1

pip install huggingface_hub
"""

import torch
import torch.nn as nn

# 使用torch.hub加载bert中文模型的字映射器
tokenizer = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-chinese')
# 使用torch.hub加载bert中文模型
model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-chinese')


# 编写获取bert编码的函数
def get_bert_encode(text_1, text_2, mark=102, max_len=10):
'''
功能: 使用bert中文模型对输入的文本进行编码
text_1: 代表输入的第一句话
text_2: 代表输入的第二句话
mark: 分隔标记, 是bert预训练模型tokenizer的一个自身特殊标记, 当输入两个文本的时候, 有中间的特殊分隔符, 102
max_len: 限制的最大语句长度, 如果大于max_len, 进行截断处理, 如果小于max_len, 进行0填充的处理
return: 输入文本的bert编码
'''
# 第一步使用tokenizer进行两个文本的字映射
indexed_tokens = tokenizer.encode(text_1, text_2)
# 接下来要对两个文本进行补齐, 或者截断的操作
# 首先要找到分隔标记的位置
k = indexed_tokens.index(mark)

# 第二步处理第一句话, 第一句话是[:k]
if len(indexed_tokens[:k]) >= max_len:
# 长度大于max_len, 进行截断处理
indexed_tokens_1 = indexed_tokens[:max_len]
else:
# 长度小于max_len, 需要对剩余的部分进行0填充
indexed_tokens_1 = indexed_tokens[:k] + (max_len - len(indexed_tokens[:k])) * [0]

# 第三步处理第二句话, 第二句话是[k:]
if len(indexed_tokens[k:]) >= max_len:
# 长度大于max_len, 进行截断处理
indexed_tokens_2 = indexed_tokens[k:k+max_len]
else:
# 长度小于max_len, 需要对剩余的部分进行0填充
indexed_tokens_2 = indexed_tokens[k:] + (max_len - len(indexed_tokens[k:])) * [0]

# 接下来将处理后的indexed_tokens_1和indexed_tokens_2进行相加合并
indexed_tokens = indexed_tokens_1 + indexed_tokens_2

# 需要一个额外的标志列表, 来告诉模型那部分是第一句话, 哪部分是第二句话
# 利用0元素来表示第一句话, 利用1元素来表示第二句话
# 注意: 两句话的长度都已经被我们规范成了max_len
segments_ids = [0] * max_len + [1] * max_len

# 利用torch.tensor将两个列表封装成张量
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensor = torch.tensor([segments_ids])

# 利用模型进行编码不求导
with torch.no_grad():
# 使用bert模型进行编码, 传入参数tokens_tensor和segments_tensor, 最终得到模型的输出encoded_layers
encoded_layers, _ = model(tokens_tensor, token_type_ids=segments_tensor)

return encoded_layers


text_1 = "人生该如何起头"
text_2 = "改变要如何起手"

encoded_layers = get_bert_encode(text_1, text_2)
print(encoded_layers)
print(encoded_layers.shape)














标签:bert,github,hub,torch,len,tokens,报错,py
来源: https://www.cnblogs.com/xhzd/p/16103020.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有