图像分类是计算机视觉的重要领域,它的目标是将图像分类到预定义的标签。近期,许多研究者提出很多不同种类的神经网络,并且极大的提升了分类算法的性能。本文以自己创建的数据集:青春有你2中选手识别为例子,介绍如何使用PaddleHub进行图像分类任务。
#CPU环境启动请务必执行该指令
%set_env CPU_NUM=1
env: CPU_NUM=1
#安装paddlehub
!pip install paddlehub==1.6.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting paddlehub==1.6.0
[?25l Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7f/9f/6617c2b8e9c5d847803ae89924b58bccd1b8fb2c98aa00e16531540591f2/paddlehub-1.6.0-py3-none-any.whl (206kB)
[K |████████████████████████████████| 215kB 10.0MB/s eta 0:00:01
[?25hRequirement already satisfied: protobuf>=3.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (3.10.0)
Requirement already satisfied: pandas; python_version >= "3" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (0.23.4)
Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (2.22.0)
Requirement already satisfied: sentencepiece in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (0.1.85)
Requirement already satisfied: chardet==3.0.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (3.0.4)
Requirement already satisfied: gunicorn>=19.10.0; sys_platform != "win32" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (20.0.4)
Requirement already satisfied: six>=1.10.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (1.12.0)
Requirement already satisfied: numpy; python_version >= "3" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (1.16.4)
Requirement already satisfied: yapf==0.26.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (0.26.0)
Requirement already satisfied: tensorboard>=1.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (2.1.0)
Requirement already satisfied: flask>=1.1.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (1.1.1)
Requirement already satisfied: flake8 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (3.7.9)
Requirement already satisfied: tb-paddle in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (0.3.6)
Requirement already satisfied: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (4.1.0)
Requirement already satisfied: Pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (6.2.0)
Requirement already satisfied: nltk in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (3.4.5)
Requirement already satisfied: cma==2.7.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (2.7.0)
Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (4.1.1.26)
Requirement already satisfied: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (1.21.0)
Requirement already satisfied: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlehub==1.6.0) (5.1.2)
Requirement already satisfied: setuptools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from protobuf>=3.6.0->paddlehub==1.6.0) (41.4.0)
Requirement already satisfied: pytz>=2011k in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas; python_version >= "3"->paddlehub==1.6.0) (2019.3)
Requirement already satisfied: python-dateutil>=2.5.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas; python_version >= "3"->paddlehub==1.6.0) (2.8.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->paddlehub==1.6.0) (2019.9.11)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->paddlehub==1.6.0) (1.25.6)
Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->paddlehub==1.6.0) (2.8)
Requirement already satisfied: grpcio>=1.24.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tensorboard>=1.15->paddlehub==1.6.0) (1.26.0)
Requirement already satisfied: google-auth<2,>=1.6.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tensorboard>=1.15->paddlehub==1.6.0) (1.10.0)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tensorboard>=1.15->paddlehub==1.6.0) (0.33.6)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tensorboard>=1.15->paddlehub==1.6.0) (0.4.1)
Requirement already satisfied: werkzeug>=0.11.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tensorboard>=1.15->paddlehub==1.6.0) (0.16.0)
Requirement already satisfied: markdown>=2.6.8 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tensorboard>=1.15->paddlehub==1.6.0) (3.1.1)
Requirement already satisfied: absl-py>=0.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tensorboard>=1.15->paddlehub==1.6.0) (0.8.1)
Requirement already satisfied: Jinja2>=2.10.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.0->paddlehub==1.6.0) (2.10.1)
Requirement already satisfied: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.0->paddlehub==1.6.0) (7.0)
Requirement already satisfied: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.0->paddlehub==1.6.0) (1.1.0)
Requirement already satisfied: pycodestyle<2.6.0,>=2.5.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8->paddlehub==1.6.0) (2.5.0)
Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8->paddlehub==1.6.0) (0.6.1)
Requirement already satisfied: pyflakes<2.2.0,>=2.1.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8->paddlehub==1.6.0) (2.1.1)
Requirement already satisfied: entrypoints<0.4.0,>=0.3.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8->paddlehub==1.6.0) (0.3)
Requirement already satisfied: moviepy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tb-paddle->paddlehub==1.6.0) (1.0.1)
Requirement already satisfied: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->paddlehub==1.6.0) (1.4.10)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->paddlehub==1.6.0) (0.23)
Requirement already satisfied: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->paddlehub==1.6.0) (0.10.0)
Requirement already satisfied: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->paddlehub==1.6.0) (1.3.0)
Requirement already satisfied: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->paddlehub==1.6.0) (2.0.1)
Requirement already satisfied: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->paddlehub==1.6.0) (1.3.4)
Requirement already satisfied: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->paddlehub==1.6.0) (16.7.9)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard>=1.15->paddlehub==1.6.0) (0.2.7)
Requirement already satisfied: rsa<4.1,>=3.1.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard>=1.15->paddlehub==1.6.0) (4.0)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard>=1.15->paddlehub==1.6.0) (4.0.0)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=1.15->paddlehub==1.6.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.10.1->flask>=1.1.0->paddlehub==1.6.0) (1.1.1)
Requirement already satisfied: imageio-ffmpeg>=0.2.0; python_version >= "3.4" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from moviepy->tb-paddle->paddlehub==1.6.0) (0.3.0)
Requirement already satisfied: proglog<=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from moviepy->tb-paddle->paddlehub==1.6.0) (0.1.9)
Requirement already satisfied: decorator<5.0,>=4.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from moviepy->tb-paddle->paddlehub==1.6.0) (4.4.0)
Requirement already satisfied: imageio<3.0,>=2.5; python_version >= "3.4" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from moviepy->tb-paddle->paddlehub==1.6.0) (2.6.1)
Requirement already satisfied: tqdm<5.0,>=4.11.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from moviepy->tb-paddle->paddlehub==1.6.0) (4.36.1)
Requirement already satisfied: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->pre-commit->paddlehub==1.6.0) (0.6.0)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard>=1.15->paddlehub==1.6.0) (0.4.8)
Requirement already satisfied: oauthlib>=3.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard>=1.15->paddlehub==1.6.0) (3.1.0)
Requirement already satisfied: more-itertools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from zipp>=0.5->importlib-metadata; python_version < "3.8"->pre-commit->paddlehub==1.6.0) (7.2.0)
Installing collected packages: paddlehub
Found existing installation: paddlehub 1.5.0
Uninstalling paddlehub-1.5.0:
Successfully uninstalled paddlehub-1.5.0
Successfully installed paddlehub-1.6.0
加载数据文件
导入python包
!unzip -o file.zip -d ./dataset/
unzip: cannot find or open file.zip, file.zip.zip or file.zip.ZIP.
!unzip ./data/train.zip -d ./dataset/train
!ls
Archive: ./data/train.zip
replace ./dataset/train/anqi/anqi0.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename:
import paddlehub as hub
接下来我们要在PaddleHub中选择合适的预训练模型来Finetune,由于是图像分类任务,因此我们使用经典的ResNet-50作为预训练模型。PaddleHub提供了丰富的图像分类预训练模型,包括了最新的神经网络架构搜索类的PNASNet,我们推荐您尝试不同的预训练模型来获得更好的性能。
import os
def generate_train_tlist():
# 待搜索的目录路径
result=[]
path = "dataset/train"
# 待搜索的名称
stars = {'yushuxin': 0, 'xujiaqi': 1, 'zhaoxiaotang': 2, 'anqi': 3, r'wangchengxuan': 4}
for root, dirs, files in os.walk(path):
for f in files:
ff = os.path.join(root, f)
# print(ff)
fff='t'+ff.strip('dataset/')
# print(fff)
# print(f)
name=f[:-5]
# print(name)
# print(' %s %d'% ( fff, stars[name]))
result.append('%s %d'% ( fff, stars[name]))
return result
train_list=generate_train_tlist()
with open("./dataset/train_list.txt", "w") as f:
for line in train_list:
print(line)
f.writelines(line)
f.writelines("\n")
train/anqi/anqi4.jpg 3
train/anqi/anqi8.jpg 3
train/anqi/anqi7.jpg 3
train/anqi/anqi3.jpg 3
train/anqi/anqi0.jpg 3
train/anqi/anqi5.jpg 3
train/anqi/anqi2.jpg 3
train/anqi/anqi6.jpg 3
train/anqi/anqi9.jpg 3
train/anqi/anqi1.jpg 3
train/wangchengxuan/wangchengxuan5.jpg 4
train/wangchengxuan/wangchengxuan7.jpg 4
train/wangchengxuan/wangchengxuan4.jpg 4
train/wangchengxuan/wangchengxuan2.jpg 4
train/wangchengxuan/wangchengxuan9.jpg 4
train/wangchengxuan/wangchengxuan8.jpg 4
train/wangchengxuan/wangchengxuan3.jpg 4
train/wangchengxuan/wangchengxuan0.jpg 4
train/wangchengxuan/wangchengxuan6.jpg 4
train/wangchengxuan/wangchengxuan1.jpg 4
train/yushuxin/yushuxin0.jpg 0
train/yushuxin/yushuxin9.jpg 0
train/yushuxin/yushuxin5.jpg 0
train/yushuxin/yushuxin4.jpg 0
train/yushuxin/yushuxin3.jpg 0
train/yushuxin/yushuxin2.jpg 0
train/yushuxin/yushuxin7.jpg 0
train/yushuxin/yushuxin6.jpg 0
train/yushuxin/yushuxin8.jpg 0
train/yushuxin/yushuxin1.jpg 0
train/xujiaqi/xujiaqi4.jpg 1
train/xujiaqi/xujiaqi2.jpg 1
train/xujiaqi/xujiaqi6.jpg 1
train/xujiaqi/xujiaqi1.jpg 1
train/xujiaqi/xujiaqi9.jpg 1
train/xujiaqi/xujiaqi8.jpg 1
train/xujiaqi/xujiaqi5.jpg 1
train/xujiaqi/xujiaqi3.jpg 1
train/xujiaqi/xujiaqi0.jpg 1
train/xujiaqi/xujiaqi7.jpg 1
train/zhaoxiaotang/zhaoxiaotang7.jpg 2
train/zhaoxiaotang/zhaoxiaotang6.jpg 2
train/zhaoxiaotang/zhaoxiaotang1.jpg 2
train/zhaoxiaotang/zhaoxiaotang9.jpg 2
train/zhaoxiaotang/zhaoxiaotang2.jpg 2
train/zhaoxiaotang/zhaoxiaotang0.jpg 2
train/zhaoxiaotang/zhaoxiaotang4.jpg 2
train/zhaoxiaotang/zhaoxiaotang5.jpg 2
train/zhaoxiaotang/zhaoxiaotang8.jpg 2
train/zhaoxiaotang/zhaoxiaotang3.jpg 2
!hub install ernie
Module ernie already installed in /home/aistudio/.paddlehub/modules/ernie
module = hub.Module(name="resnet_v2_50_imagenet")
[32m[2020-04-26 13:03:09,478] [ INFO] - Installing resnet_v2_50_imagenet module[0m
[32m[2020-04-26 13:03:09,498] [ INFO] - Module resnet_v2_50_imagenet already installed in /home/aistudio/.paddlehub/modules/resnet_v2_50_imagenet[0m
接着需要加载图片数据集。我们使用自定义的数据进行体验,请查看适配自定义数据
from paddlehub.dataset.base_cv_dataset import BaseCVDataset
class DemoDataset(BaseCVDataset):
def __init__(self):
# 数据集存放位置
self.dataset_dir = "dataset"
super(DemoDataset, self).__init__(
base_path=self.dataset_dir,
train_list_file="train_list.txt",
# validate_list_file="validate_list.txt",
test_list_file="test_list.txt",
label_list_file="label_list.txt",
)
dataset = DemoDataset()
接着生成一个图像分类的reader,reader负责将dataset的数据进行预处理,接着以特定格式组织并输入给模型进行训练。
当我们生成一个图像分类的reader时,需要指定输入图片的大小
data_reader = hub.reader.ImageClassificationReader(
image_width=module.get_expected_image_width(),
image_height=module.get_expected_image_height(),
images_mean=module.get_pretrained_images_mean(),
images_std=module.get_pretrained_images_std(),
dataset=dataset)
[32m[2020-04-26 13:07:58,826] [ INFO] - Dataset label map = {'虞书欣': 0, '许佳琪': 1, '赵小棠': 2, '安崎': 3, '王承渲': 4}[0m
在进行Finetune前,我们可以设置一些运行时的配置,例如如下代码中的配置,表示:
use_cuda
:设置为False表示使用CPU进行训练。如果您本机支持GPU,且安装的是GPU版本的PaddlePaddle,我们建议您将这个选项设置为True;
epoch
:迭代轮数;
batch_size
:每次训练的时候,给模型输入的每批数据大小为32,模型训练时能够并行处理批数据,因此batch_size越大,训练的效率越高,但是同时带来了内存的负荷,过大的batch_size可能导致内存不足而无法训练,因此选择一个合适的batch_size是很重要的一步;
log_interval
:每隔10 step打印一次训练日志;
eval_interval
:每隔50 step在验证集上进行一次性能评估;
checkpoint_dir
:将训练的参数和数据保存到cv_finetune_turtorial_demo目录中;
strategy
:使用DefaultFinetuneStrategy策略进行finetune;
更多运行配置,请查看RunConfig
同时PaddleHub提供了许多优化策略,如AdamWeightDecayStrategy
、ULMFiTStrategy
、DefaultFinetuneStrategy
等,详细信息参见策略
config = hub.RunConfig(
use_cuda=True, #是否使用GPU训练,默认为False;
num_epoch=3, #Fine-tune的轮数;
checkpoint_dir="cv_finetune_turtorial_demo" ,#模型checkpoint保存路径, 若用户没有指定,程序会自动生成;
batch_size=3, #训练的批大小,如果使用GPU,请根据实际情况调整batch_size;
# eval_interval=3, #模型评估的间隔,默认每100个step评估一次验证集;
log_interval=10,
strategy=hub.finetune.strategy.DefaultFinetuneStrategy()) #Fine-tune优化策略;
[32m[2020-04-26 13:06:01,681] [ INFO] - Checkpoint dir: cv_finetune_turtorial_demo[0m
有了合适的预训练模型和准备要迁移的数据集后,我们开始组建一个Task。
由于该数据设置是一个二分类的任务,而我们下载的分类module是在ImageNet数据集上训练的千分类模型,所以我们需要对模型进行简单的微调,把模型改造为一个二分类模型:
input_dict, output_dict, program = module.context(trainable=True)
img = input_dict["image"]
feature_map = output_dict["feature_map"]
feed_list = [img.name]
task = hub.ImageClassifierTask(
data_reader=data_reader,
feed_list=feed_list,
feature=feature_map,
num_classes=dataset.num_labels,
config=config)
[32m[2020-04-26 13:06:04,337] [ INFO] - 267 pretrained paramaters loaded by PaddleHub[0m
我们选择finetune_and_eval
接口来进行模型训练,这个接口在finetune的过程中,会周期性的进行模型效果的评估,以便我们了解整个训练过程的性能变化。
run_states = task.finetune_and_eval()
[32m[2020-04-26 13:06:10,841] [ INFO] - Strategy with slanted triangle learning rate, L2 regularization, [0m
[32m[2020-04-26 13:06:10,873] [ INFO] - Try loading checkpoint from cv_finetune_turtorial_demo/ckpt.meta[0m
[32m[2020-04-26 13:06:10,874] [ INFO] - PaddleHub model checkpoint not found, start from scratch...[0m
[32m[2020-04-26 13:06:10,909] [ INFO] - PaddleHub finetune start[0m
[36m[2020-04-26 13:06:12,540] [ TRAIN] - step 10 / 50: loss=0.89901 acc=0.73333 [step/sec: 7.16][0m
[36m[2020-04-26 13:06:13,915] [ TRAIN] - step 20 / 50: loss=0.38457 acc=1.00000 [step/sec: 10.22][0m
[36m[2020-04-26 13:06:15,447] [ TRAIN] - step 30 / 50: loss=0.11394 acc=1.00000 [step/sec: 6.95][0m
[36m[2020-04-26 13:06:16,902] [ TRAIN] - step 40 / 50: loss=0.06314 acc=1.00000 [step/sec: 7.48][0m
[36m[2020-04-26 13:06:18,353] [ TRAIN] - step 50 / 50: loss=0.04763 acc=1.00000 [step/sec: 7.62][0m
[32m[2020-04-26 13:06:18,432] [ INFO] - Load the best model from cv_finetune_turtorial_demo/best_model[0m
[32m[2020-04-26 13:06:18,433] [ INFO] - Evaluation on test dataset start[0m
share_vars_from is set, scope is ignored.
[34m[2020-04-26 13:06:19,083] [ EVAL] - [test dataset evaluation result] loss=0.00011 acc=1.00000 [step/sec: 19.32][0m
[32m[2020-04-26 13:06:19,085] [ INFO] - Saving model checkpoint to cv_finetune_turtorial_demo/step_51[0m
[32m[2020-04-26 13:06:20,090] [ INFO] - PaddleHub finetune finished.[0m
当Finetune完成后,我们使用模型来进行预测,先通过以下命令来获取测试的图片
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
with open("dataset/test_list.txt","r") as f:
filepath = f.readlines()
# print(filepath)
data = [filepath[0].split(" ")[0],filepath[1].split(" ")[0],filepath[2].split(" ")[0],filepath[3].split(" ")[0],filepath[4].split(" ")[0]]
print(data)
label_map = dataset.label_dict()
index = 0
run_states = task.predict(data=data)
results = [run_state.run_results for run_state in run_states]
print(results)
print(50*'*')
for batch_result in results:
print(batch_result)
print(50*'*')
batch_result = np.argmax(batch_result, axis=2)[0]
print(batch_result)
for result in batch_result:
index += 1
result = label_map[result]
print("input %i is %s, and the predict result is %s" %
(index, data[index - 1], result))
[32m[2020-04-26 13:08:42,487] [ INFO] - PaddleHub predict start[0m
[32m[2020-04-26 13:08:42,487] [ INFO] - Load the best model from cv_finetune_turtorial_demo/best_model[0m
['dataset/test/yushuxin.jpg', 'dataset/test/xujiaqi.jpg', 'dataset/test/zhaoxiaotang.jpg', 'dataset/test/anqi.jpg', 'dataset/test/wangchengxuan.jpg']
[[array([[9.99820173e-01, 7.76551133e-06, 3.23944623e-05, 1.07691012e-04,
3.19731771e-05],
[2.23369602e-06, 9.99973655e-01, 9.16190515e-07, 2.26883185e-05,
5.21339530e-07],
[2.94848701e-06, 1.81872983e-05, 9.99861002e-01, 8.90642877e-06,
1.08885535e-04]], dtype=float32)], [array([[2.0881200e-06, 4.9753922e-05, 5.9350541e-06, 9.9994111e-01,
1.0736142e-06],
[3.4572211e-05, 3.4797737e-05, 3.2656546e-05, 3.2816235e-05,
9.9986517e-01]], dtype=float32)]]
**************************************************
[array([[9.99820173e-01, 7.76551133e-06, 3.23944623e-05, 1.07691012e-04,
3.19731771e-05],
[2.23369602e-06, 9.99973655e-01, 9.16190515e-07, 2.26883185e-05,
5.21339530e-07],
[2.94848701e-06, 1.81872983e-05, 9.99861002e-01, 8.90642877e-06,
1.08885535e-04]], dtype=float32)]
**************************************************
[0 1 2]
input 1 is dataset/test/yushuxin.jpg, and the predict result is 虞书欣
input 2 is dataset/test/xujiaqi.jpg, and the predict result is 许佳琪
input 3 is dataset/test/zhaoxiaotang.jpg, and the predict result is 赵小棠
[array([[2.0881200e-06, 4.9753922e-05, 5.9350541e-06, 9.9994111e-01,
1.0736142e-06],
[3.4572211e-05, 3.4797737e-05, 3.2656546e-05, 3.2816235e-05,
9.9986517e-01]], dtype=float32)]
**************************************************
[3 4]
input 4 is dataset/test/anqi.jpg, and the predict result is 安崎
input 5 is dataset/test/wangchengxuan.jpg, and the predict result is 王承渲
share_vars_from is set, scope is ignored.
[32m[2020-04-26 13:08:42,793] [ INFO] - PaddleHub predict finished.[0m
第一步:爱奇艺《青春有你2》评论数据爬取(参考链接:https://www.iqiyi.com/v_19ryfkiv8w.html#curid=15068699100_9f9bab7e0d1e30c494622af777f4ba39)
第二步:词频统计并可视化展示
第三步:绘制词云
第四步:结合PaddleHub,对评论进行内容审核
说实话,第一次参加paddlepaddle(飞桨)是CV课程,也是7天,那时候还没上班,虽然时间充裕,但基础太差,跟上课程很艰难,很受折磨。上班后,4月22日参加了《 Python小白逆袭大神 》课程,短短的7天学习,我加深了对python的使用的,达到了参课的目的。下面我就我的学习心得交流一下。
第一节课的作业很简单,分别是乘法口诀、目录文件列举,我赶紧抓紧中午的时间做了,好像是第三个提交作业的,这是迄今为止我提交作业最早的一次,累趴下,紧张的不行,怕提交晚了。做完作业,我还发了CSDN,
虽然简单,但是阅读量居然达到了683,这是第一次有这么多人阅读我发的博客。
第二节课是完成《青春有你2》选手数据,包括图片爬取,将爬取图片进行保存。绝大部分代码已经给出来了,我们所要做的就是补齐关键位置代码。照猫画虎,我补全了代码,从百度百科爬取到了图片。最终爬取482张图。
作业2 https://blog.csdn.net/livingbody/article/details/105719799 作业2
这次作业 BeautifulSoup 许多用法需要熟记于心,以后数据抓取会经常用到。其中代码写作格式,很值得借鉴。
第三节课是选手数据分析,是依托第二节课爬取的数据进行的。代码给出了选手区域分布分析方法,并绘制了饼图,对我们而言,很简单,照猫画虎,对选手体重数据进行分析,并给出饼图。
作业3 https://blog.csdn.net/livingbody/article/details/105729897 作业3
这次主要锻炼了numpy、pandas、maatplotlib、json等使用
特别注意的是字体使用
此次是 PaddleHub之《青春有你2》五人识别,普遍抓取图片,然后对其进行训练。我的做法是认真看了图片的数据增强,然后对利用数据增强方法产生数据集,然后进行识别,说起来有点耍赖。。。。。。
这块主要学习 finetune_and_eval 接口的使用,踩得坑比较多。
作业4 https://blog.csdn.net/livingbody/article/details/105766968 作业4
自4月22日开课以来,我都尽可能跟上
这次是 爱奇艺《青春有你2》评论数据爬取 ,利用浏览器net功能,获取comment的url接口,然后对json数据进行获取,最后保存,数据分词、清洗、统计、给出词云,这里还对评论进行了是否属于色情进行检查。
大作业 https://blog.csdn.net/livingbody/article/details/105802156 大作业
这次主要是学习了paddlehub 使用。了解了一些paddlehub创意作品,本人不才,利用人像识别对爬取的“井上玲音 Rei Inoue ”写真集进行处理,获取的全是美女图片。。。。。。
班班别打我
美女 https://blog.csdn.net/livingbody/article/details/105827538 美女抠图
这节课主要是总结。。。。。。
感谢此次课程两位授课老师、感谢班主任、感谢钱老师、感谢我们的助教、感谢可爱的同学们!!!
是你们的尽心尽责,让这次课程集训高效完成,后面还需要反复学习。
再次感谢!!!
PS:一直不中奖,我自己就去买了一个小度在家(移动营业厅买的)鼓励下自己,提高自己的学习兴趣。
结果今晚班班公布分数,我居然也获奖了。
其实我是非常渴望得到 《深度学习导论与应用实践》 故事书。
嘿嘿嘿,语无伦次了。
深度学习一般过程:
收集数据,尤其是有标签、高质量的数据是一件昂贵的工作。
爬虫的过程,就是模仿浏览器的行为,往目标站点发送请求,接收服务器的响应数据,提取需要的信息,并进行保存的过程。
Python为爬虫的实现提供了工具:requests模块、BeautifulSoup库
安装字体文件
首次使用字体图标需要在本机上安装FontAwesome字体文件,点击上面的的字体下载按纽获得字体文件包。解压缩后打开其中的Fonts文件夹,安装里面的FontAwesome.otf或Fontawesome-webfont.ttf这两个字体文件。Win系统双击这字体文件就会提示安装,Mac系统安装字体方法请自行百度。字体安装完成后需要重新启动Axure,重启后选中字体时在字体列表中如果能看到【Fontawesome】,则代表字体已经安装成功。