Mask R-CNN 安装笔记

apt-get install docker.io

地址在  :https://github.com/NVIDIA/nvidia-docker

更新源巨慢。。。

直接下载安装 ,ubuntu 需要用 alien

wget  https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker-1.0.1-1.x86_64.rpm 

apt-get install alien
alien -i nvidia-docker-1.0.1-1.x86_64.rpm
nvidia-docker run caffe2ai/caffe2:c2v0.8.1.cuda8.cudnn7.ubuntu16.04

报错

docker: Error response from daemon: create nvidia_driver_387.26: create nvidia_driver_387.26: Error looking up volume plugin nvidia-docker: legacy plugin: plugin not found.

sudo nvidia-docker-plugin

nvidia-docker-plugin | 2018/01/25 18:30:47 Loading NVIDIA unified memory
nvidia-docker-plugin | 2018/01/25 18:30:47 Loading NVIDIA management library
nvidia-docker-plugin | 2018/01/25 18:30:47 Discovering GPU devices
nvidia-docker-plugin | 2018/01/25 18:30:48 Error: cuda: out of memory

sudo systemctl start nvidia-docker 报错

Jan 26 11:51:34 bj-s-19 systemd[1]: nvidia-docker.service: Main process exited, code=exited, status=217/USER
Jan 26 11:51:34 bj-s-19 systemd[1]: nvidia-docker.service: Control process exited, code=exited status=217
Jan 26 11:51:34 bj-s-19 systemd[1]: nvidia-docker.service: Control process exited, code=exited status=217

vim /usr/lib/systemd/system/nvidia-docker.service

把 USER (nvidia-docker) 换成 root

sudo systemctl start nvidia-docker   成功。

继续  nvidia-docker run caffe2ai/caffe2:c2v0.8.1.cuda8.cudnn7.ubuntu16.04

报错,感觉是caffe docker image 下错了

Detectron ops lib not found at '/usr/local/lib/libcaffe2_detectron_ops_gpu.so'; make sure that your Caffe2 version includes Detectron module

重新下

nvidia-docker pull caffe2ai/caffe2

nvidia-docker run  -d -it caffe2ai/caffe2  #退出 ctrl + p + q (pq按顺序点)
nvidia-docker run -it caffe2ai/caffe2:latest python -m caffe2.python.operator_test.relu_op_test  #测试新docker image

问题依旧,搜了很久也没搞清楚 libcaffe2_detectron_ops_gpu.so 这个东西是怎么来的。

============  重新开始的分割线 ==============

自己编一下 caffe2 试试。。。

准备环境

https://caffe2.ai/docs/getting-started.html?platform=ubuntu&configuration=compile

http://blog.csdn.net/zziahgf/article/details/79141879

git clone --recursive https://github.com/caffe2/caffe2.git && cd caffe2
cd docker/ubuntu-16.04-cuda8-cudnn6-all-options
sed -i -e 's/ --branch v0.8.1//g' Dockerfile 
docker build -t caffe2:cuda8-cudnn6-all-options .
cd $DETECTRON/docker
docker build -t detectron:c2-cuda8-cudnn6 .

run 成功。

nvidia-docker run --rm -it detectron:c2-cuda8-cudnn6 python2 tests/test_batch_permutation_op.py
E0131 17:08:40.230015     1 init_intrinsics_check.cc:54] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0131 17:08:40.230031     1 init_intrinsics_check.cc:54] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0131 17:08:40.230036     1 init_intrinsics_check.cc:54] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
..
----------------------------------------------------------------------
Ran 2 tests in 0.745s

OK
nvidia-docker run -d -itdetectron:c2-cuda8-cudnn6
nvidia-docker ps
nvidia-docker attach xxxx #上一步查出来的sha码。

python2 tools/infer_simple.py \
    --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
    --output-dir /tmp/detectron-visualizations \
    --image-ext jpg \
    --wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
    demo

测试效果。

《Mask R-CNN 安装笔记》上有19条评论

      1. admin 你好
        –output-dir /home/ray/Detectron/demo/output \
        在这个文件架没有输出pdf文件。
        我也是用nvidia-docker. 你是如何看pdf的。

  1. 您好,请问提示out of memory Error from operator是什么原因呢?

    E0206 15:16:13.353634 10920 init_intrinsics_check.cc:54] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
    E0206 15:16:13.353668 10920 init_intrinsics_check.cc:54] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
    E0206 15:16:13.353675 10920 init_intrinsics_check.cc:54] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
    WARNING cnn.py: 40: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
    INFO net.py: 57: Loading weights from: /tmp/detectron-download-cache/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl
    I0206 15:16:18.946543 10920 net_dag_utils.cc:118] Operator graph pruning prior to chain compute took: 0.000200573 secs
    I0206 15:16:18.946779 10920 net_dag.cc:61] Number of parallel execution chains 63 Number of operators = 402
    I0206 15:16:18.967525 10920 net_dag_utils.cc:118] Operator graph pruning prior to chain compute took: 0.000145573 secs
    I0206 15:16:18.967732 10920 net_dag.cc:61] Number of parallel execution chains 30 Number of operators = 358
    I0206 15:16:18.969389 10920 net_dag_utils.cc:118] Operator graph pruning prior to chain compute took: 1.1131e-05 secs
    I0206 15:16:18.969425 10920 net_dag.cc:61] Number of parallel execution chains 5 Number of operators = 18
    INFO infer_simple.py: 111: Processing demo/24274813513_0cfd2ce6d0_k.jpg -> /tmp/detectron-visualizations/24274813513_0cfd2ce6d0_k.jpg.pdf
    terminate called after throwing an instance of ‘caffe2::EnforceNotMet’
    what(): [enforce fail at context_gpu.cu:343] error == cudaSuccess. 2 vs 0. Error at: /home/wn/caffe2/caffe2/core/context_gpu.cu:343: out of memory Error from operator:
    input: “gpu_0/res4_0_branch2b” input: “gpu_0/res4_0_branch2c_w” output: “gpu_0/res4_0_branch2c” name: “” type: “Conv” arg { name: “kernel” i: 1 } arg { name: “exhaustive_search” i: 0 } arg { name: “stride” i: 1 } arg { name: “pad” i: 0 } arg { name: “order” s: “NCHW” } arg { name: “dilation” i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: “CUDNN”
    *** Aborted at 1517901380 (unix time) try “date -d @1517901380” if you are using GNU date ***
    PC: @ 0x7f2179ea3428 gsignal
    *** SIGABRT (@0x2aa8) received by PID 10920 (TID 0x7f2119f50700) from PID 10920; stack trace: ***
    @ 0x7f2179ea34b0 (unknown)
    @ 0x7f2179ea3428 gsignal
    @ 0x7f2179ea502a abort
    @ 0x7f217398f84d __gnu_cxx::__verbose_terminate_handler()
    @ 0x7f217398d6b6 (unknown)
    @ 0x7f217398d701 std::terminate()
    @ 0x7f21739b8d38 (unknown)
    @ 0x7f217a23f6ba start_thread
    @ 0x7f2179f7541d clone
    @ 0x0 (unknown)
    已放弃 (核心已转储)

      1. root@wn:/opt/detectron# nvidia-smi
        Tue Feb 6 16:43:54 2018
        +—————————————————————————–+
        | NVIDIA-SMI 384.111 Driver Version: 384.111 |
        |——————————-+———————-+———————-+
        | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
        | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
        |===============================+======================+======================|
        | 0 GeForce GT 720 Off | 00000000:01:00.0 N/A | N/A |
        | 43% 41C P0 N/A / N/A | 177MiB / 1998MiB | N/A Default |
        +——————————-+———————-+———————-+

        +—————————————————————————–+
        | Processes: GPU Memory |
        | GPU PID Type Process name Usage |
        |=============================================================================|
        | 0 Not Supported |
        +—————————————————————————–+

          1. 我降到1还是不行,然后我又重新编译了CPU版本的,报错:AssertionError: Detectron ops lib not found at ‘/home/wn/caffe2/build/lib/libcaffe2_detectron_ops_gpu.so’; make sure that your Caffe2 version includes Detectron module,我看您的文章里也提到这个错误。

          1. 不是先编译caffe2,然后再编译detectron么?是不是infer_simple.py这个例程只能是用GPU版本的跑呢?程序里面调用了有关GPU的库

          1. 博主你好,我这个也是显存小了吗 我有6G显存.
            python tools/infer_simple.py \
            > –cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
            > –output-dir demo/output \
            > –image-ext jpg \
            > –wts \
            > https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
            > demo
            Found Detectron ops lib: /usr/local/lib/libcaffe2_detectron_ops_gpu.so
            E0604 01:15:39.683818 4569 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
            E0604 01:15:39.683842 4569 init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
            E0604 01:15:39.683848 4569 init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
            INFO io.py: 67: Downloading remote file https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl to /tmp/detectron-download-cache/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl
            [============================================================] 100.0% of 490.5MB file
            WARNING cnn.py: 25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
            INFO net.py: 59: Loading weights from: /tmp/detectron-download-cache/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl
            I0604 01:26:37.565393 4569 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000131357 secs
            I0604 01:26:37.565599 4569 net_dag.cc:46] Number of parallel execution chains 63 Number of operators = 402
            I0604 01:26:37.573647 4569 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000112656 secs
            I0604 01:26:37.573818 4569 net_dag.cc:46] Number of parallel execution chains 30 Number of operators = 358
            I0604 01:26:37.574615 4569 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 1.1352e-05 secs
            I0604 01:26:37.574636 4569 net_dag.cc:46] Number of parallel execution chains 5 Number of operators = 18
            INFO infer_simple.py: 113: Processing demo/19064748793_bb942deea1_k.jpg -> demo/output/19064748793_bb942deea1_k.jpg.pdf
            E0604 01:26:38.643280 4917 net_dag.cc:195] Exception from operator chain starting at ” (type ‘Conv’): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:91] status == CUDNN_STATUS_SUCCESS. 4 vs 0. , Error at: /home/cs/pytorch/caffe2/core/context_gpu.h:91: CUDNN_STATUS_INTERNAL_ERROR Error from operator:
            input: “gpu_0/data” input: “gpu_0/conv1_w” output: “gpu_0/conv1” name: “” type: “Conv” arg { name: “kernel” i: 7 } arg { name: “exhaustive_search” i: 0 } arg { name: “pad” i: 3 } arg { name: “order” s: “NCHW” } arg { name: “stride” i: 2 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: “CUDNN”
            WARNING workspace.py: 185: Original python traceback for operator `0` in network `generalized_rcnn` in exception above (most recent call last):
            WARNING workspace.py: 190: File “tools/infer_simple.py”, line 149, in
            WARNING workspace.py: 190: File “tools/infer_simple.py”, line 101, in main
            WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/core/test_engine.py”, line 328, in initialize_model_from_cfg
            WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/model_builder.py”, line 124, in create
            WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/model_builder.py”, line 89, in generalized_rcnn
            WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/model_builder.py”, line 229, in build_generic_detection_model
            WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/optimizer.py”, line 54, in build_data_parallel_model
            WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/model_builder.py”, line 169, in _single_gpu_build_func
            WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/FPN.py”, line 63, in add_fpn_ResNet101_conv5_body
            WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/FPN.py”, line 104, in add_fpn_onto_conv_body
            WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/ResNet.py”, line 48, in add_ResNet101_conv5_body
            WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/ResNet.py”, line 98, in add_ResNet_convX_body
            WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/ResNet.py”, line 251, in basic_bn_stem
            WARNING workspace.py: 190: File “/home/cs/pytorch/build/caffe2/python/cnn.py”, line 97, in Conv
            WARNING workspace.py: 190: File “/home/cs/pytorch/build/caffe2/python/brew.py”, line 107, in scope_wrapper
            WARNING workspace.py: 190: File “/home/cs/pytorch/build/caffe2/python/helpers/conv.py”, line 186, in conv
            WARNING workspace.py: 190: File “/home/cs/pytorch/build/caffe2/python/helpers/conv.py”, line 139, in _ConvBase
            Traceback (most recent call last):
            File “tools/infer_simple.py”, line 149, in
            main(args)
            File “tools/infer_simple.py”, line 119, in main
            model, im, None, timers=timers
            File “/home/cs/temporary/detectron/detectron/core/test.py”, line 66, in im_detect_all
            model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, boxes=box_proposals
            File “/home/cs/temporary/detectron/detectron/core/test.py”, line 158, in im_detect_bbox
            workspace.RunNet(model.net.Proto().name)
            File “/home/cs/pytorch/build/caffe2/python/workspace.py”, line 217, in RunNet
            StringifyNetName(name), num_iter, allow_fail,
            File “/home/cs/pytorch/build/caffe2/python/workspace.py”, line 178, in CallWithExceptionIntercept
            return func(*args, **kwargs)
            RuntimeError: [enforce fail at context_gpu.h:91] status == CUDNN_STATUS_SUCCESS. 4 vs 0. , Error at: /home/cs/pytorch/caffe2/core/context_gpu.h:91: CUDNN_STATUS_INTERNAL_ERROR Error from operator:
            input: “gpu_0/data” input: “gpu_0/conv1_w” output: “gpu_0/conv1” name: “” type: “Conv” arg { name: “kernel” i: 7 } arg { name: “exhaustive_search” i: 0 } arg { name: “pad” i: 3 } arg { name: “order” s: “NCHW” } arg { name: “stride” i: 2 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: “CUDNN”

    1. 我是加上sudo 就行了 0.0 虽然还是
      E0716 10:30:30.054626 11016 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
      WARNING cnn.py: 25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
      不过有结果了

发表评论

电子邮件地址不会被公开。

This site uses Akismet to reduce spam. Learn how your comment data is processed.