apt-get install docker.io
地址在 :https://github.com/NVIDIA/nvidia-docker
更新源巨慢。。。
直接下载安装 ,ubuntu 需要用 alien
wget https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker-1.0.1-1.x86_64.rpm apt-get install alien alien -i nvidia-docker-1.0.1-1.x86_64.rpm
nvidia-docker run caffe2ai/caffe2:c2v0.8.1.cuda8.cudnn7.ubuntu16.04 报错 docker: Error response from daemon: create nvidia_driver_387.26: create nvidia_driver_387.26: Error looking up volume plugin nvidia-docker: legacy plugin: plugin not found.
sudo nvidia-docker-plugin
nvidia-docker-plugin | 2018/01/25 18:30:47 Loading NVIDIA unified memory nvidia-docker-plugin | 2018/01/25 18:30:47 Loading NVIDIA management library nvidia-docker-plugin | 2018/01/25 18:30:47 Discovering GPU devices nvidia-docker-plugin | 2018/01/25 18:30:48 Error: cuda: out of memory
sudo systemctl start nvidia-docker 报错
Jan 26 11:51:34 bj-s-19 systemd[1]: nvidia-docker.service: Main process exited, code=exited, status=217/USER Jan 26 11:51:34 bj-s-19 systemd[1]: nvidia-docker.service: Control process exited, code=exited status=217 Jan 26 11:51:34 bj-s-19 systemd[1]: nvidia-docker.service: Control process exited, code=exited status=217
vim /usr/lib/systemd/system/nvidia-docker.service
把 USER (nvidia-docker) 换成 root
sudo systemctl start nvidia-docker 成功。
继续 nvidia-docker run caffe2ai/caffe2:c2v0.8.1.cuda8.cudnn7.ubuntu16.04
报错,感觉是caffe docker image 下错了
Detectron ops lib not found at '/usr/local/lib/libcaffe2_detectron_ops_gpu.so'; make sure that your Caffe2 version includes Detectron module
重新下
nvidia-docker pull caffe2ai/caffe2nvidia-docker run -d -it caffe2ai/caffe2 #退出 ctrl + p + q (pq按顺序点)
nvidia-docker run -it caffe2ai/caffe2:latest python -m caffe2.python.operator_test.relu_op_test #测试新docker image
问题依旧,搜了很久也没搞清楚 libcaffe2_detectron_ops_gpu.so 这个东西是怎么来的。
============ 重新开始的分割线 ==============
自己编一下 caffe2 试试。。。
准备环境
https://caffe2.ai/docs/getting-started.html?platform=ubuntu&configuration=compile
http://blog.csdn.net/zziahgf/article/details/79141879
git clone --recursive https://github.com/caffe2/caffe2.git && cd caffe2
cd docker/ubuntu-16.04-cuda8-cudnn6-all-options
sed -i -e 's/ --branch v0.8.1//g' Dockerfile
docker build -t caffe2:cuda8-cudnn6-all-options .
cd $DETECTRON/docker docker build -t detectron:c2-cuda8-cudnn6 .
run 成功。
nvidia-docker run --rm -it detectron:c2-cuda8-cudnn6 python2 tests/test_batch_permutation_op.py E0131 17:08:40.230015 1 init_intrinsics_check.cc:54] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. E0131 17:08:40.230031 1 init_intrinsics_check.cc:54] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. E0131 17:08:40.230036 1 init_intrinsics_check.cc:54] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. .. ---------------------------------------------------------------------- Ran 2 tests in 0.745s OK
nvidia-docker run -d -itdetectron:c2-cuda8-cudnn6 nvidia-docker ps nvidia-docker attach xxxx #上一步查出来的sha码。 python2 tools/infer_simple.py \ --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \ --output-dir /tmp/detectron-visualizations \ --image-ext jpg \ --wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \ demo
测试效果。
我的为什么没有输出.pdf 文件。
python2 tools/infer_simple.py \
> –cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
> –output-dir /home/ray/Detectron/demo/output \
> –image-ext jpg \
> –wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
> demo
朋友,你发错误信息啊。发命令我也不知道为啥啊。
admin 你好
–output-dir /home/ray/Detectron/demo/output \
在这个文件架没有输出pdf文件。
我也是用nvidia-docker. 你是如何看pdf的。
我 attach 进 docker 里run 的。pdf 就在这个目录里了。
我建议你检查下目录权限。
您好,请问提示out of memory Error from operator是什么原因呢?
E0206 15:16:13.353634 10920 init_intrinsics_check.cc:54] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0206 15:16:13.353668 10920 init_intrinsics_check.cc:54] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0206 15:16:13.353675 10920 init_intrinsics_check.cc:54] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
WARNING cnn.py: 40: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
INFO net.py: 57: Loading weights from: /tmp/detectron-download-cache/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl
I0206 15:16:18.946543 10920 net_dag_utils.cc:118] Operator graph pruning prior to chain compute took: 0.000200573 secs
I0206 15:16:18.946779 10920 net_dag.cc:61] Number of parallel execution chains 63 Number of operators = 402
I0206 15:16:18.967525 10920 net_dag_utils.cc:118] Operator graph pruning prior to chain compute took: 0.000145573 secs
I0206 15:16:18.967732 10920 net_dag.cc:61] Number of parallel execution chains 30 Number of operators = 358
I0206 15:16:18.969389 10920 net_dag_utils.cc:118] Operator graph pruning prior to chain compute took: 1.1131e-05 secs
I0206 15:16:18.969425 10920 net_dag.cc:61] Number of parallel execution chains 5 Number of operators = 18
INFO infer_simple.py: 111: Processing demo/24274813513_0cfd2ce6d0_k.jpg -> /tmp/detectron-visualizations/24274813513_0cfd2ce6d0_k.jpg.pdf
terminate called after throwing an instance of ‘caffe2::EnforceNotMet’
what(): [enforce fail at context_gpu.cu:343] error == cudaSuccess. 2 vs 0. Error at: /home/wn/caffe2/caffe2/core/context_gpu.cu:343: out of memory Error from operator:
input: “gpu_0/res4_0_branch2b” input: “gpu_0/res4_0_branch2c_w” output: “gpu_0/res4_0_branch2c” name: “” type: “Conv” arg { name: “kernel” i: 1 } arg { name: “exhaustive_search” i: 0 } arg { name: “stride” i: 1 } arg { name: “pad” i: 0 } arg { name: “order” s: “NCHW” } arg { name: “dilation” i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: “CUDNN”
*** Aborted at 1517901380 (unix time) try “date -d @1517901380” if you are using GNU date ***
PC: @ 0x7f2179ea3428 gsignal
*** SIGABRT (@0x2aa8) received by PID 10920 (TID 0x7f2119f50700) from PID 10920; stack trace: ***
@ 0x7f2179ea34b0 (unknown)
@ 0x7f2179ea3428 gsignal
@ 0x7f2179ea502a abort
@ 0x7f217398f84d __gnu_cxx::__verbose_terminate_handler()
@ 0x7f217398d6b6 (unknown)
@ 0x7f217398d701 std::terminate()
@ 0x7f21739b8d38 (unknown)
@ 0x7f217a23f6ba start_thread
@ 0x7f2179f7541d clone
@ 0x0 (unknown)
已放弃 (核心已转储)
显存不足,你用什么卡跑的?
root@wn:/opt/detectron# nvidia-smi
Tue Feb 6 16:43:54 2018
+—————————————————————————–+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|——————————-+———————-+———————-+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 720 Off | 00000000:01:00.0 N/A | N/A |
| 43% 41C P0 N/A / N/A | 177MiB / 1998MiB | N/A Default |
+——————————-+———————-+———————-+
+—————————————————————————–+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+—————————————————————————–+
2G显存确实有点小。
那如何切换到CPU模式下跑呢?需要重新编译程序么?
我在网上查说降低BATCH_SIZE_PER_IM这个值,原来是512,我降低到256还是不行
再降低一点。
我降到1还是不行,然后我又重新编译了CPU版本的,报错:AssertionError: Detectron ops lib not found at ‘/home/wn/caffe2/build/lib/libcaffe2_detectron_ops_gpu.so’; make sure that your Caffe2 version includes Detectron module,我看您的文章里也提到这个错误。
需要编有 Detectron 库的 caffe。编译参数你把 Detectron 带上。
不是先编译caffe2,然后再编译detectron么?是不是infer_simple.py这个例程只能是用GPU版本的跑呢?程序里面调用了有关GPU的库
查了下,确实 detectron 只能GPU。
博主你好,我这个也是显存小了吗 我有6G显存.
python tools/infer_simple.py \
> –cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
> –output-dir demo/output \
> –image-ext jpg \
> –wts \
> https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
> demo
Found Detectron ops lib: /usr/local/lib/libcaffe2_detectron_ops_gpu.so
E0604 01:15:39.683818 4569 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0604 01:15:39.683842 4569 init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0604 01:15:39.683848 4569 init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
INFO io.py: 67: Downloading remote file https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl to /tmp/detectron-download-cache/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl
[============================================================] 100.0% of 490.5MB file
WARNING cnn.py: 25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
INFO net.py: 59: Loading weights from: /tmp/detectron-download-cache/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl
I0604 01:26:37.565393 4569 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000131357 secs
I0604 01:26:37.565599 4569 net_dag.cc:46] Number of parallel execution chains 63 Number of operators = 402
I0604 01:26:37.573647 4569 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000112656 secs
I0604 01:26:37.573818 4569 net_dag.cc:46] Number of parallel execution chains 30 Number of operators = 358
I0604 01:26:37.574615 4569 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 1.1352e-05 secs
I0604 01:26:37.574636 4569 net_dag.cc:46] Number of parallel execution chains 5 Number of operators = 18
INFO infer_simple.py: 113: Processing demo/19064748793_bb942deea1_k.jpg -> demo/output/19064748793_bb942deea1_k.jpg.pdf
E0604 01:26:38.643280 4917 net_dag.cc:195] Exception from operator chain starting at ” (type ‘Conv’): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:91] status == CUDNN_STATUS_SUCCESS. 4 vs 0. , Error at: /home/cs/pytorch/caffe2/core/context_gpu.h:91: CUDNN_STATUS_INTERNAL_ERROR Error from operator:
input: “gpu_0/data” input: “gpu_0/conv1_w” output: “gpu_0/conv1” name: “” type: “Conv” arg { name: “kernel” i: 7 } arg { name: “exhaustive_search” i: 0 } arg { name: “pad” i: 3 } arg { name: “order” s: “NCHW” } arg { name: “stride” i: 2 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: “CUDNN”
WARNING workspace.py: 185: Original python traceback for operator `0` in network `generalized_rcnn` in exception above (most recent call last):
WARNING workspace.py: 190: File “tools/infer_simple.py”, line 149, in
WARNING workspace.py: 190: File “tools/infer_simple.py”, line 101, in main
WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/core/test_engine.py”, line 328, in initialize_model_from_cfg
WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/model_builder.py”, line 124, in create
WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/model_builder.py”, line 89, in generalized_rcnn
WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/model_builder.py”, line 229, in build_generic_detection_model
WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/optimizer.py”, line 54, in build_data_parallel_model
WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/model_builder.py”, line 169, in _single_gpu_build_func
WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/FPN.py”, line 63, in add_fpn_ResNet101_conv5_body
WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/FPN.py”, line 104, in add_fpn_onto_conv_body
WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/ResNet.py”, line 48, in add_ResNet101_conv5_body
WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/ResNet.py”, line 98, in add_ResNet_convX_body
WARNING workspace.py: 190: File “/home/cs/temporary/detectron/detectron/modeling/ResNet.py”, line 251, in basic_bn_stem
WARNING workspace.py: 190: File “/home/cs/pytorch/build/caffe2/python/cnn.py”, line 97, in Conv
WARNING workspace.py: 190: File “/home/cs/pytorch/build/caffe2/python/brew.py”, line 107, in scope_wrapper
WARNING workspace.py: 190: File “/home/cs/pytorch/build/caffe2/python/helpers/conv.py”, line 186, in conv
WARNING workspace.py: 190: File “/home/cs/pytorch/build/caffe2/python/helpers/conv.py”, line 139, in _ConvBase
Traceback (most recent call last):
File “tools/infer_simple.py”, line 149, in
main(args)
File “tools/infer_simple.py”, line 119, in main
model, im, None, timers=timers
File “/home/cs/temporary/detectron/detectron/core/test.py”, line 66, in im_detect_all
model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, boxes=box_proposals
File “/home/cs/temporary/detectron/detectron/core/test.py”, line 158, in im_detect_bbox
workspace.RunNet(model.net.Proto().name)
File “/home/cs/pytorch/build/caffe2/python/workspace.py”, line 217, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File “/home/cs/pytorch/build/caffe2/python/workspace.py”, line 178, in CallWithExceptionIntercept
return func(*args, **kwargs)
RuntimeError: [enforce fail at context_gpu.h:91] status == CUDNN_STATUS_SUCCESS. 4 vs 0. , Error at: /home/cs/pytorch/caffe2/core/context_gpu.h:91: CUDNN_STATUS_INTERNAL_ERROR Error from operator:
input: “gpu_0/data” input: “gpu_0/conv1_w” output: “gpu_0/conv1” name: “” type: “Conv” arg { name: “kernel” i: 7 } arg { name: “exhaustive_search” i: 0 } arg { name: “pad” i: 3 } arg { name: “order” s: “NCHW” } arg { name: “stride” i: 2 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: “CUDNN”
您好,我的笔记本电脑时GTX 1050 ,4G 显存在跑COCO数据集的时候也出现 out of memmery的问题,4G的notebook版 gtx1050也不行嘛?
现存太小了
@BenQ Chan 算力不够了。
我是加上sudo 就行了 0.0 虽然还是
E0716 10:30:30.054626 11016 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
WARNING cnn.py: 25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
不过有结果了
warning 忽略就好了