Update: 文章写于一年前,有些地方已经不适合了,最近升级了一下深度学习服务器,同时配置了一下环境,新写了文章,可以同时参考: 从零开始搭建深度学习服务器: 基础环境配置(Ubuntu + GTX 1080 TI + CUDA + cuDNN) 从零开始搭建深度学习服务器: 深度学习工具安装(TensorFlow + PyTorch + Torch)
这个系列写了好几篇文章,这是相关文章的索引,仅供参考:
- 深度学习主机攒机小记
- 深度学习主机环境配置: Ubuntu16.04+Nvidia GTX 1080+CUDA8.0
- 深度学习主机环境配置: Ubuntu16.04+GeForce GTX 1080+TensorFlow
- 深度学习服务器环境配置: Ubuntu17.04+Nvidia GTX 1080+CUDA 9.0+cuDNN 7.0+TensorFlow 1.3
- 从零开始搭建深度学习服务器:硬件选择
- 从零开始搭建深度学习服务器: 基础环境配置(Ubuntu + GTX 1080 TI + CUDA + cuDNN)
- 从零开始搭建深度学习服务器: 深度学习工具安装(TensorFlow + PyTorch + Torch)
- 从零开始搭建深度学习服务器: 深度学习工具安装(Theano + MXNet)
- 从零开始搭建深度学习服务器: 1080TI四卡并行(Ubuntu16.04+CUDA9.2+cuDNN7.1+TensorFlow+Keras)
接上文《深度学习主机环境配置: Ubuntu16.04+Nvidia GTX 1080+CUDA8.0》,我们继续来安装 TensorFlow,使其支持GeForce GTX 1080显卡。
1 下载和安装cuDNN
cuDNN全称 CUDA Deep Neural Network library,是NVIDIA专门针对深度神经网络设计的一套GPU计算加速库,被广泛用于各种深度学习框架,例如Caffe, TensorFlow, Theano, Torch, CNTK等。
The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN is part of the NVIDIA Deep Learning SDK.
Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration. It allows them to focus on training neural networks and developing software applications rather than spending time on low-level GPU performance tuning. cuDNN accelerates widely used deep learning frameworks, including Caffe, TensorFlow, Theano, Torch, and CNTK. See supported frameworks for more details.
首先需要下载cuDNN,直接从Nvidia官方下载链接选择一个版本,不过下载cuDNN前同样需要登录甚至填写一个简单的调查问卷: https://developer.nvidia.com/rdp/cudnn-download,这里选择的是支持CUDA8.0的cuDNN v5版本,而支持CUDA8的5.1版本虽然显示在下载选择项里,但是提示:cuDNN 5.1 RC for CUDA 8RC will be available soon - please check back again.
安装cuDNN比较简单,解压后把相应的文件拷贝到对应的CUDA目录下即可:
tar -zxvf cudnn-8.0-linux-x64-v5.0-ga.tgz
cuda/include/cudnn.h
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.5
cuda/lib64/libcudnn.so.5.0.5
cuda/lib64/libcudnn_static.a
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
2 通过源代码方式编译安装TensorFlow GPU版本
TensorFlow的CPU版本安装比较简单,在Ubuntu 环境下通过PIP方式安装即可,具体请参考TensorFlow官方安装文档。这里通过源代码编译安装TensorFlow 0.9版本,使其支持相应的GPU:GTX1080。
1) Python相关环境准备
这里使用的是Python2.7版本,在Ubuntu16.04下安装相关依赖:
sudo apt-get install python-pip
sudo apt-get install python-numpy swig python-dev python-wheel
2)安装Google构建工具Bazel
Bazel是Google官方开源的一个构建工具,用来配合Google的软件开发模式,有以下几个特点:
多语言支持:Bazel支持Java,Objective-C和C++,可以扩展来支持任意的编程语言
高级别的构建语言:工程是通过BUILD语言来描述的。BUILD语言以简洁的文本格式,描述了由多个小的互相关联的库、二进制程序和测试程序来组成的一个项目。而与之相比,Make这类的工具需要描述各个单独的文件和编译的命令
多平台支持:同一套工具和同样的BUILD文件可以用来构建不同架构和不同平台的软件。在Google,我们使用Bazel来构建在我们数据中心系统中运行的服务器端程序和在手机上运行的客户端应用程序。
重现性[Reproducibility]:在BUILD文件中,每个库,测试程序,二进制文件必须明确完整地指定直接依赖。当修改源代码文件后,Bazel使用这个依赖信息就可以知道哪些必须重新构建,哪些任务可以并行执行。这意味者所有的构建都是增量形式的并能够每次都生成相同的结果。
伸缩性[Scalability]:Bazel可以处理巨大的构建;在Google,一个服务器端程序超过100k的源码是常有的事情,如果没有文件被改动,构建过程需要大约200ms
从Bazel github上最新的Linux relase版本:
wget https://github.com/bazelbuild/bazel/releases/download/0.3.0/bazel-0.3.0-installer-linux-x86_64.sh
下载完毕后执行:
chmod +x bazel-0.3.0-installer-linux-x86_64.sh
./bazel-0.3.0-installer-linux-x86_64.sh --user
提示错误:
Java not found, please install the corresponding package
See http://bazel.io/docs/install.html for more information on
应该是没有安装Java环境的问题,bazel需要Java JDK 8,在ubuntu16.04直接apt-get安装即可:
sudo apt-get update
sudo apt-get install default-jre
sudo apt-get install default-jdk
安装完毕后,再次执行Bazel安装脚本:
./bazel-0.3.0-installer-linux-x86_64.sh --user
Bazel installer
---------------# Release 0.3.0 (2016-06-10)
Baseline: a9301fa
Cherry picks:
+ ff30a73: Turn --legacy_external_runfiles back on by default
+ aeee3b8: Fix delete[] warning on fsevents.ccIncompatible changes:
- The --cwarn command line option is not supported anymore. Use
--copt instead.New features:
- On OSX, --watchfs now uses FsEvents to be notified of changes
from the filesystem (previously, this flag had no effect on OS X).
- add support for the '-=', '*=', '/=', and'%=' operators to
skylark. Notably, we do not support '|=' because the semantics
of skylark sets are sufficiently different from python sets.Important changes:
- Use singular form when appropriate in blaze's test result summary
message.
- Added supported for Android NDK revision 11
- --objc_generate_debug_symbols is now deprecated.
- swift_library now generates an Objective-C header for its @objc
interfaces.
- new_objc_provider can now set the USES_SWIFT flag.
- objc_framework now supports dynamic frameworks.
- Symlinks in zip files are now unzipped correctly by http_archive,
download_and_extract, etc.
- swift_library is now able to import framework rules such as
objc_framework.
- Adds "jre_deps" attribute to j2objc_library.
- Release apple_binary rule, for creating multi-architecture
("fat") objc/cc binaries and libraries, targeting ios platforms.
- Aspects documentation added.
- The --ues_isystem_for_includes command line option is not
supported anymore.
- global function 'provider' is removed from .bzl files. Providers
can only be accessed through fields in a 'target' object.## Build informations
- [Build log](http://ci.bazel.io/job/Bazel/JAVA_VERSION=1.8,PLATFORM_NAME=linux-x86_64/595/)
- [Commit](https://github.com/bazelbuild/bazel/commit/e671d29)
Uncompressing......Extracting Bazel installation...
.Bazel is now installed!
Make sure you have "/home/textminer/bin" in your path. You can also activate bash
completion by adding the following line to your ~/.bashrc:
source /home/textminer/.bazel/bin/bazel-complete.bashSee http://bazel.io/docs/getting-started.html to start a new project!
然后在 ~/.bashrc中追加:
source /home/textminer/.bazel/bin/bazel-complete.bash
export PATH=$PATH:/home/textminer/.bazel/bin
追加的第一行的原因在这里:
Bazel comes with a bash completion script. To install it:
Build it with Bazel: bazel build //scripts:bazel-complete.bash.
Copy the script bazel-bin/scripts/bazel-complete.bash to your completion folder (/etc/bash_completion.d directory under Ubuntu). If you don't have a completion folder, you can copy it wherever suits you and simply insert source /path/to/bazel-complete.bash in your ~/.bashrc file (under OS X, put it in your ~/.bash_profile file).
最后执行
source ~/.bashrc
至此,Bazel安装完毕。
3) 编译安装TensorFlow:
首先从github上克隆TensorFlow最新的代码:
git clone https://github.com/tensorflow/tensorflow
代码下载完毕之后,进入tensorflow主目录,执行:
./configure
会有一系列提示:
Please specify the location of python. [Default is /usr/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] y
Google Cloud Platform support will be enabled for TensorFlow
ERROR: It appears that the development version of libcurl is not available. Please install the libcurl3-dev package.
第二项"是否选择Google云平台的支持"选择y之后出现了一个错误,需要libcurl,用apt-get安装,当然,基于国内的网络现状,这一项也可以选择no:
sudo apt-get install libcurl3 libcurl3-dev
安装完毕之后重新执行
./configure
除了两处选择yes or no 的地方外,其他地方一路回车:
Please specify the location of python. [Default is /usr/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] y
Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc nvcc should use as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]:
Please specify the location where CUDA toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]:
Setting up Cuda include
Setting up Cuda lib64
Setting up Cuda bin
Setting up Cuda nvvm
Setting up CUPTI include
Setting up CUPTI lib64
Configuration finished
最后就是通过Bazel进行编译安装了:
bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
这个过程中需要通过git下载和编译google protobuf 和 boringssl:
INFO: Cloning https://github.com/google/protobuf: Receiving objects
INFO: Cloning https://github.com/google/boringssl.git: Receiving objects
....
不过第一次安装的时候遇到报错:
configure: error: zlib not installed
Target //tensorflow/cc:tutorials_example_trainer failed to build
google了一下,需要安装zlib1g-dev:
sudo apt-get install zlib1g-dev
之后重新编译安装TensorFlow就没有问题了,不过需要等待一段时间:
编译TensorFlow成功结束的时候,提示如下:
......
Target //tensorflow/cc:tutorials_example_trainer up-to-date:
bazel-bin/tensorflow/cc/tutorials_example_trainer
INFO: Elapsed time: 897.845s, Critical Path: 533.72s
执行一下TensorFlow官方文档里的例子,看看能否成功调用GTX 1080:
bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.835
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.65GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
000003/000006 lambda = 1.841570 x = [0.669396 0.742906] y = [3.493999 -0.669396]
000006/000007 lambda = 1.841570 x = [0.669396 0.742906] y = [3.493999 -0.669396]
000009/000006 lambda = 1.841570 x = [0.669396 0.742906] y = [3.493999 -0.669396]
000009/000004 lambda = 1.841570 x = [0.669396 0.742906] y = [3.493999 -0.669396]
000000/000005 lambda = 1.841570 x = [0.669396 0.742906] y = [3.493999 -0.669396]
000000/000004 lambda = 1.841570 x = [0.669396 0.742906] y = [3.493999 -0.669396]
......
没有问题,说明这种通过源代码编译TensorFlow使其支持GPU的方式已经成功了。再在Python中调用一下TensorFlow:
import tensorflow as tf
提示错误:
ImportError: cannot import name pywrap_tensorflow
虽然我们通过源代码安装编译的TensorFlow可用,但是Python版本并没有ready,所以继续:
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
sudo pip install /tmp/tensorflow_pkg/tensorflow-0.9.0-py2-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/lib/python2.7/dist-packages (from protobuf==3.0.0b2->tensorflow==0.9.0)
Installing collected packages: six, funcsigs, pbr, mock, protobuf, tensorflow
Successfully installed funcsigs-1.0.2 mock-2.0.0 pbr-1.10.0 protobuf-3.0.0b2 six-1.10.0 tensorflow-0.9.0
我们再次打开ipython,试一下tensorflow官方样例:
Python 2.7.12 (default, Jul 1 2016, 15:12:24)
Type "copyright", "credits" or "license" for more information.
IPython 2.4.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
In [2]: import numpy as np
In [3]: x_data = np.random.rand(100).astype(np.float32)
In [4]: y_data = x_data * 0.1 + 0.3
In [5]: W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
In [6]: b = tf.Variable(tf.zeros([1]))
In [7]: y = W * x_data + b
In [8]: loss = tf.reduce_mean(tf.square(y - y_data))
In [9]: optimizer = tf.train.GradientDescentOptimizer(0.5)
In [10]: train = optimizer.minimize(loss)
In [11]: init = tf.initialize_all_variables()
In [12]: sess = tf.Session()
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.835
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.65GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
In [13]: sess.run(init)
In [14]: for step in range(201):
....: sess.run(train)
....: if step % 20 == 0:
....: print(step, sess.run(W), sess.run(b))
....:
(0, array([-0.10331395], dtype=float32), array([ 0.62236434], dtype=float32))
(20, array([ 0.03067014], dtype=float32), array([ 0.3403711], dtype=float32))
(40, array([ 0.08353967], dtype=float32), array([ 0.30958495], dtype=float32))
(60, array([ 0.09609199], dtype=float32), array([ 0.30227566], dtype=float32))
(80, array([ 0.09907217], dtype=float32), array([ 0.3005403], dtype=float32))
(100, array([ 0.09977971], dtype=float32), array([ 0.30012828], dtype=float32))
(120, array([ 0.0999477], dtype=float32), array([ 0.30003047], dtype=float32))
(140, array([ 0.0999876], dtype=float32), array([ 0.30000722], dtype=float32))
(160, array([ 0.09999706], dtype=float32), array([ 0.30000171], dtype=float32))
(180, array([ 0.09999929], dtype=float32), array([ 0.30000043], dtype=float32))
(200, array([ 0.09999985], dtype=float32), array([ 0.3000001], dtype=float32))
终于OK了,之后就可以尽情享用基于GTX 1080 GPU版的TensorFlow了。
参考:
TensorFlow: Installing from sources
Tensorflow on Ubuntu 16.04 with Nvidia GTX 1080
TensorFlow, Caffe, Chainer と Deep Learning大御所を一気に source code build で GPU向けに setupしてみた
GTX-1080でTensorFlow
Jack47 Bazel解读系列
注:原创文章,转载请注明出处及保留链接“我爱自然语言处理”:https://www.52nlp.cn
本文链接地址:深度学习主机环境配置: Ubuntu16.04+GeForce GTX 1080+TensorFlow https://www.52nlp.cn/?p=9285
非常好的教程!参考你的步骤,Ubuntu 16.04 x64 + GTX 960 + CUDA 7.5 + GCC 5.4.0成功编译安装TensorFlow r0.9。
只是第一次用Bazel编译时提示identifier "__builtin_ia32_mwaitx" is undefined,不过参考这个链接也解决了:
https://github.com/tensorflow/tensorflow/issues/1066
感谢博主的分享!
[回复]
52nlp 回复:
19 7 月, 2016 at 11:24
也谢谢你这个经验分享
[回复]
我写了一个基于hanlp的在线分词展示程序:
中文在线分词演示:http://lanniu.me/nlp/standard,蓝牛在线工具提供
[回复]
我源码安装的tensorflow r0.10 , 但是使用bazel build pip install package的时候,提示一个组件不支持gcc 5.4, 难道我还得源码安装gcc 5.3吗
ubuntu 16.04.1, cuda 8.0, gcc 5.4, python 3.5
[回复]
52nlp 回复:
1 8 月, 2016 at 11:04
不需要源代码安装,ubuntu 有apt-get install的方式,我再之前安装cuda的时候遇到类似问题,所以安装了一个低版本的gcc, g++4.9版本,并用软链接方式替换原有版本,这个你google一下有很多方法。
[回复]
Hi,感谢你的分享,我现在碰到这个问题,在最后用bazel编译的时候,不是要在线git clone 那个probuf嘛,我的机器(centOS)不能够上网的,请问有什么解决方案么~
[回复]
52nlp 回复:
12 8 月, 2016 at 08:55
不能上网会比较麻烦,还有其他几个第三方库会被在线下载编译,你可以尝试先离线下载安装google protobuf等,然后编译的时候加这个选项
--fetch=false
,参考这个issue:https://github.com/bazelbuild/bazel/issues/251
具体我没有用过,你可以试试
[回复]
您好,我是初学者。
有个问题,怎么评价自然语言处理后抽出关键词的正确性呢?比如4w1h中的what抽出时,怎么评价抽出的what就是正确的呢,想了解相关技术,谢谢。
[回复]
52nlp 回复:
24 8 月, 2016 at 11:15
这个没有什么标准,学术界有一些评测集,基于人工提取的关键词做评价。我觉得还是根据你的应用场景而定。
[回复]
楼主,到安装bazel完这步都是对的,但是在tensorflow执行configure文件的时候就和你这个有点不一样了,在这一步就和你这个不一样了,没有将显示下面这些信息:
Setting up Cuda include
Setting up Cuda lib64
Setting up Cuda bin
Setting up Cuda nvvm
Setting up CUPTI include
Setting up CUPTI lib64
直接出现Configuration finished!中间他自动clone了photobuf,grpc.git,后面编译就就报错,怎么解决呀,新手
[回复]
楼主,到安装bazel完这步都是对的,但是在tensorflow执行configure文件的时候就和你这个有点不一样了,在这一步就和你这个不一样了,没有将显示下面这些信息:
Setting up Cuda include
Setting up Cuda lib64
Setting up Cuda bin
Setting up Cuda nvvm
Setting up CUPTI include
Setting up CUPTI lib64
直接出现Configuration finished!中间他自动clone了photobuf,grpc.git,
Downloading from http://pilotfiber.dl.sourceforge.net/project/boost/boost/1.61.0\
/boost_1_61_0.tar.gz: 50MB
后面编译就就报错,怎么解决呀,新手
[回复]
52nlp 回复:
5 9 月, 2016 at 18:55
贴一下报错信息?这个看不出来问题
[回复]
博主您好,我最近也在犹豫要不要配一个GPU环境,但是我不知道性能可以提升多少,可以麻烦您帮我测试一下吗?
这是在源码中测试代码的路径:tensorflow/tensorflow/models/image/mnist/convolutional.py
您直接跑一下,等结束了看一下用时就行。(自己计时)
[回复]
52nlp 回复:
8 9 月, 2016 at 21:46
第一次运行的时候有下载数据的时间,第二次重新运行训练的时间大概50多秒
[回复]
wangty 回复:
10 9 月, 2016 at 14:06
谢啦~还是提升蛮多的。。。我那里测试是要18Min左右
[回复]
hello , i am new to tensorflow, and encounted an issue when installing tensorflow using ./configure, here is mine error information . Need help
sudo ./configure
~/tensorflow ~/tensorflow
Please specify the location of python. [Default is /usr/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n
No Google Cloud Platform support will be enabled for TensorFlow
Found possible Python library paths:
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]
/usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 4
Please specify the location where cuDNN 4 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
Found stale PID file (pid=3471). Server probably died abruptly, continuing...
..
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.
ERROR: /home/keithyin/tensorflow/tensorflow/contrib/session_bundle/BUILD:134:1: no such target '//tensorflow/core:android_lib_lite': target 'android_lib_lite' not declared in package 'tensorflow/core' defined by /home/keithyin/tensorflow/tensorflow/core/BUILD and referenced by '//tensorflow/contrib/session_bundle:session_bundle_lite'.
ERROR: /home/keithyin/tensorflow/tensorflow/core/platform/default/build_config/BUILD:56:1: no such package '@jpeg_archive//': Error downloading from http://www.ijg.org/files/jpegsrc.v9a.tar.gz to /home/keithyin/.cache/bazel/_bazel_root/9192340d7b606ddb9ea35b29a97154c1/external/jpeg_archive: Error downloading http://www.ijg.org/files/jpegsrc.v9a.tar.gz to /home/keithyin/.cache/bazel/_bazel_root/9192340d7b606ddb9ea35b29a97154c1/external/jpeg_archive/jpegsrc.v9a.tar.gz: Connection timed out and referenced by '//tensorflow/core/platform/default/build_config:platformlib'.
ERROR: Evaluation of query "deps((//... union @bazel_tools//tools/jdk:toolchain))" failed: errors were encountered while computing transitive closure.
Configuration finished
[回复]
yuanpu 回复:
17 10 月, 2016 at 11:13
我的做法,实测成功。
1.不停的wget -c " http://www.ijg.org/files/jpegsrc.v9a.tar.gz "直到成功;
2.cp 下载好的jpegsrc.v9a.tar.gz到/var/www/html/files/jpegsrc.v9a.tar.gz;
3.service httpd start
4.vi /etc/hosts,添加 "本机ip http://www.ijg.org"
5.重新configure
跑tensorflow入门文档的例子,convolutional.py
cpu Step 400 (epoch 0.47), 112.5 ms
带1060的gpu Step 400 (epoch 0.47), 6.8 ms
[回复]
bruce 回复:
3 11 月, 2016 at 17:32
楼主你好,我也遇到了同样的问题。
我按照你的方法,提示
http: unrecognized service
之后我装了apache2,使用/etc/init.d/apache2 start启动,并且修改了hosts中的内容后configure
仍然报错:
ERROR: /root/jichen/models/syntaxnet/tensorflow/tensorflow/core/platform/default/build_config/BUILD:56:1: no such package '@jpeg_archive//': Error downloading from http://www.ijg.org/files/jpegsrc.v9a.tar.gz to /root/.cache/bazel/_bazel_root/5430e1fe82152e7cfcaeed1cf65b08cf/external/jpeg_archive: Error downloading http://www.ijg.org/files/jpegsrc.v9a.tar.gz to /root/.cache/bazel/_bazel_root/5430e1fe82152e7cfcaeed1cf65b08cf/external/jpeg_archive/jpegsrc.v9a.tar.gz: Read timed out and referenced by '//tensorflow/core/platform/default/build_config:platformlib'.
ERROR: /root/jichen/models/syntaxnet/tensorflow/tensorflow/core/platform/default/build_config/BUILD:56:1: no such package '@jpeg_archive//': Error downloading from http://www.ijg.org/files/jpegsrc.v9a.tar.gz to /root/.cache/bazel/_bazel_root/5430e1fe82152e7cfcaeed1cf65b08cf/external/jpeg_archive: Error downloading http://www.ijg.org/files/jpegsrc.v9a.tar.gz to /root/.cache/bazel/_bazel_root/5430e1fe82152e7cfcaeed1cf65b08cf/external/jpeg_archive/jpegsrc.v9a.tar.gz: Read timed out and referenced by '//tensorflow/core/platform/default/build_config:platformlib'.
ERROR: Evaluation of query "deps((//tensorflow/... union @bazel_tools//tools/jdk:toolchain))" failed: errors were encountered while computing transitive closure.
[回复]
写的很好,谢谢分享!!
[回复]
楼主:
你好。我也是做到./configure这步,直接出现
INFO: All external dependencies fetched successfully.
Configuration finished
没有
Setting up Cuda include
Setting up Cuda lib64
Setting up Cuda bin
Setting up Cuda nvvm
Setting up CUPTI include
Setting up CUPTI lib64
Configuration finished
后面bulid的时候报错:
./tensorflow/core/framework/allocator.h(155): warning: missing return statement at end of non-void function "tensorflow::Allocator::RequestedSize"
./tensorflow/core/framework/allocator.h(155): warning: missing return statement at end of non-void function "tensorflow::Allocator::RequestedSize"
./tensorflow/core/framework/allocator.h(155): warning: missing return statement at end of non-void function "tensorflow::Allocator::RequestedSize"
gcc-4.9.real: error trying to exec 'as': execvp: No such file or directory
ERROR: /home/wuzongze/tensorflow/tensorflow/core/kernels/BUILD:1601:1: output 'tensorflow/core/kernels/_objs/batch_space_ops_gpu/tensorflow/core/kernels/spacetobatch_functor_gpu.cu.pic.o' was not created.
ERROR: /home/wuzongze/tensorflow/tensorflow/core/kernels/BUILD:1601:1: not all outputs were created.
Target //tensorflow/cc:tutorials_example_trainer failed to build
我是新手,查了不少资料还是没有解决,希望你能指点一二。谢谢。
[回复]
dongleecsu 回复:
28 9 月, 2016 at 19:57
我也遇到了相同的问题,解决方法:在tensorflow/third_party/gpus/crosstool/CROSSTOOL.tpl的大概65行左右加了cxx_builtin_include_directory: "/usr/local/cuda-8.0/include"来解决关于gpu相关的编译错误。
[回复]
wuzongze 回复:
28 9 月, 2016 at 22:45
你好,感谢你的回复。是做了这个修改后重新执行configure和build吗?还是先configure后再修改?
我先在文件的65行加上这句(65行原来的句子下移一行)。然后configure,结果仍然是INFO: All external dependencies fetched successfully.Configuration finished。没有setting up的信息。检查CROSSTOOL.tpl中第65行代码,没有被覆盖掉。
接着我执行build 语句,还是出现一模一样的报错。
会不会是python lib的问题?不知道为什么,我有两个python path, 一个在/usr/local/lib..., 一个在/usr/lib/...里,我选择了default的前者。
[回复]
dongleecsu 回复:
29 9 月, 2016 at 08:42
我是修改完之后再./configure,然后自动下载一大堆东西,完了在build。
你可以用"$ which python" 看一下现在使用的是哪个python,到底和你configure的时候是一样的不。
我猜,可能问题不在于python的问题,我当时机子的配置是:
1. git clone tensorflow之后configure的时候有错,所以换到了rc0.10版本
2. g++和gcc从5.4版本降级到了4.8版本
3. 和你报相同的错 ,在CROSSTOOL文件大概65行左右(有一大堆cxx_builtin_include_directory的地方)添加了cuda include
4. 再次执行./configure 和build就好了
PS. 这是我曾经参考过的两个帖子:
https://github.com/tensorflow/tensorflow/issues/3589
https://github.com/tensorflow/tensorflow/issues/3226
谢谢你的分享,非常有用,我也用2x GTX1080安装tensorflow 0.10成功了。不同的是中间在./configure 的时候结束后就自动下载了。而且会有build的错误,通过git 把版本换成r0.10之外,还在 tensorflow/third_party/gpus/crosstool/CROSSTOOL.tpl的大概65行左右加了cxx_builtin_include_directory: "/usr/local/cuda-8.0/include"来解决关于gpu相关的编译错误。
[回复]
52nlp 回复:
28 9 月, 2016 at 21:08
大赞,谢谢
[回复]
Less Shallow, more insighted.
[回复]
从微信转站过来....以下是原问题
首先感谢分享这篇文章教安装tensorflow....到处是坑。
我跟随这个教程一直做到了import tensorflow as tf那里,显示报错,与预想的一样。
接着输入basel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package这条命令。
显示Error:Building with --config=cuda but Tensorflow is not configured to build with GPU support.Please re-run ./configure and enter 'Y' at the prompt to build with the GPU support.
在之前的configure命令中我的确选的使用GPU。所以不知道这个问题的解决方法是什么。我查询了Stackoverflow和GitHub但都没有搜到可用的回答。如果有什么解决方法的话能请告知我吗?非常感谢!
我的Bazel版本是0.3.1。其他环境与文章中一样。
[回复]
52nlp 回复:
13 10 月, 2016 at 17:18
简单google了一下你这个问题,github和stackoverflow上确实有其他同学遇到了类似的问题,但是解决方法貌似还没有很明确的答案。stackoverflow上这个问题:
http://stackoverflow.com/questions/39769050/failed-to-install-tensorflow-with-ubuntu-16-04-and-cuda-8-0
下有个回答貌似给了一些提示:
“I just installed it and it works fine. One thing you need to do is patch the crosstool file: marcnu.github.io/2016-08-17/… (it is named slightly differently now in tensorflow master, I think CROSSTOOL.tpl) but that does not seem like errors that you have. Maybe you want to clean ~/.cache before running configure. – etarion Sep 29 ”
引用的文章:https://marcnu.github.io/2016-08-17/Tensorflow-v0.10-installed-from-scratch-Ubuntu-16.04-CUDA8.0RC-cuDNN5.1-1080GTX/
一开始就提到了一点:
“While Tensorflow has a great documentation, you have quite a lot of details that are not obvious, especially the part about setting up Nvidia libraries and installing Bazel as you need to read external install guides. There is also a CROSSTOOL change to make to fix an include directory issue. So here is a guide, explaining everything from scratch in a single page.”
都指向了要修改“CROSSTOOL.tpl”这个文件。对应的,上面评论中“dongleecsu”同学给了一个解决方案:
“谢谢你的分享,非常有用,我也用2x GTX1080安装tensorflow 0.10成功了。不同的是中间在./configure 的时候结束后就自动下载了。而且会有build的错误,通过git 把版本换成r0.10之外,还在 tensorflow/third_party/gpus/crosstool/CROSSTOOL.tpl的大概65行左右加了cxx_builtin_include_directory: “/usr/local/cuda-8.0/include”来解决关于gpu相关的编译错误。”
你可以参考一下上述回答和那篇英文文章。
[回复]
52nlp 回复:
13 10 月, 2016 at 17:22
stackoverflow上还有一个回答是建议你在重新configure之前先清空bazel的缓存:
“The patch does not work too and occured the same error. Furthermore, I always deleted the path ~/.cache/bazel before run ./configure. ”
[回复]
lsuni1234 回复:
14 10 月, 2016 at 09:32
多谢~问题已解决
我将~/.cache/bazel删除之后,重新下载tensorflow并运行./configure。
然后重新编译就可以通过了。
您好,我在./configure 遇到了这样的问题,请问怎么解决啊。
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.
ERROR: /home/xzx/tensorflow/tensorflow/tensorflow.bzl:592:26: Traceback (most recent call last):
File "/home/xzx/tensorflow/tensorflow/tensorflow.bzl", line 586
rule(attrs = {"srcs": attr.label_list..."), )}, )
File "/home/xzx/tensorflow/tensorflow/tensorflow.bzl", line 592, in rule
attr.label_list(cfg = "data", allow_files = True)
expected ConfigurationTransition or NoneType for 'cfg' while calling label_list but got string instead: data.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Extension file 'tensorflow/tensorflow.bzl' has errors
[回复]
52nlp 回复:
19 10 月, 2016 at 23:00
google了一下你的问题,这个issue很长的讨论 https://github.com/tensorflow/tensorflow/issues/4319 ,一种方案是升级bazel,你可以试一下。
[回复]
写的特别好
[回复]
wlq@wlq-laptop:~/tensorflow$ ./configure
~/tensorflow ~/tensorflow
Please specify the location of python. [Default is /usr/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] y
Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] y
Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]
/usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]:
Please specify the location where CUDA toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]:
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.
ERROR: /home/wlq/tensorflow/tensorflow/tensorflow.bzl:592:26: Traceback (most recent call last):
File "/home/wlq/tensorflow/tensorflow/tensorflow.bzl", line 586
rule(attrs = {"srcs": attr.label_list...)}, )
File "/home/wlq/tensorflow/tensorflow/tensorflow.bzl", line 592, in rule
attr.label_list(cfg = "data", allow_files = True)
expected ConfigurationTransition or NoneType for 'cfg' while calling label_list but got string instead: data.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Extension file 'tensorflow/tensorflow.bzl' has errors.
wlq@wlq-laptop:~/tensorflow$
博主,请问下这个错误是什么意思啊,都是一步步跟着你的教程来的
[回复]
52nlp 回复:
28 10 月, 2016 at 12:54
具体不清楚,google了一下,貌似需要你升级一下bazel:
https://github.com/tensorflow/tensorflow/issues/4319
PS: tensorflow已经升级到r0.11了,跟着教程走也不一定完全可以过关,遇到问题请google
[回复]
wlq 回复:
28 10 月, 2016 at 12:57
我后来用官网的pip方法安装成功了,r0.11版本
[回复]
林智能 回复:
3 11 月, 2016 at 11:41
您好,你用的是cuda8.0吗?可以通过pip 直接安装gpu版本的tensorflow吗?tensorflow官网不是说要通过源码编译吗?
林智能 回复:
3 11 月, 2016 at 11:54
您好,我采用源码编译tensorflow,编译过程中没有遇到error,但在最后import tensorflow的时候发现调用不成功。
我的编译环境是 Ubuntu14.04+GeForce GTX 1080+cuda8.0+cudnn5.1+TensorFlow最新版。
import tensorflow返回的信息如下:
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
File "convolutional.py", line 35, in
import tensorflow as tf
File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 23, in
from tensorflow.python import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 63, in
from tensorflow.core.framework.graph_pb2 import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/core/framework/graph_pb2.py", line 9, in
from google.protobuf import symbol_database as _symbol_database
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/symbol_database.py", line 165, in
_DEFAULT = SymbolDatabase(pool=descriptor_pool.Default())
AttributeError: 'module' object has no attribute 'Default'
我通过google后没搜索到问题的解决方法,如果你有什么意见和建议还望指出。谢谢!
xgzzzzz 回复:
30 10 月, 2016 at 14:09
升级到bazel3.2就可以了。
这个bug的原因在于tensorflow 0.11更新了几个本地变量,以增强"compatibility"。。。
[回复]
52nlp 回复:
31 10 月, 2016 at 09:40
thanks
[回复]
您好,我采用源码编译tensorflow,编译过程中没有遇到error,但在最后import tensorflow的时候发现调用不成功。
我的编译环境是 Ubuntu14.04+GeForce GTX 1080+cuda8.0+cudnn5.1+TensorFlow最新版。
import tensorflow返回的信息如下:
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
File "convolutional.py", line 35, in
import tensorflow as tf
File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 23, in
from tensorflow.python import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 63, in
from tensorflow.core.framework.graph_pb2 import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/core/framework/graph_pb2.py", line 9, in
from google.protobuf import symbol_database as _symbol_database
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/symbol_database.py", line 165, in
_DEFAULT = SymbolDatabase(pool=descriptor_pool.Default())
AttributeError: 'module' object has no attribute 'Default'
我通过google后没搜索到问题的解决方法,如果你有什么意见和建议还望指出。谢谢!
[回复]
52nlp 回复:
3 11 月, 2016 at 13:42
貌似和python protobuf 库不是最新版本有关:
https://github.com/bazelbuild/bazel/issues/1209
https://github.com/pogodevorg/pgoapi/issues/26
你升级一下试试?
pip install protobuf --upgrade
[回复]
林智能 回复:
7 11 月, 2016 at 19:15
谢谢博主,我更新了protobuf就可以成功import了,也成功地跑了mnist的例子。
[回复]
您好,我用pip安装了tensorflow11.0,然后成功运行了hello,tensorflow的小测试。
$python mnist_with_summaries.py
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:02:00.0
Total memory: 11.92GiB
Free memory: 11.71GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x36a0930
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:03:00.0
Total memory: 11.92GiB
Free memory: 11.81GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x3b34200
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 2 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:82:00.0
Total memory: 11.92GiB
Free memory: 11.81GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x3fc8230
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 3 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:83:00.0
Total memory: 11.92GiB
Free memory: 11.81GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 0 to device ordinal 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 0 to device ordinal 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 1 to device ordinal 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 1 to device ordinal 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 2 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 2 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 3 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 3 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1: Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 2: N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 3: N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN X, pci bus id: 0000:82:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX TITAN X, pci bus id: 0000:83:00.0)
Traceback (most recent call last):
File "mnist_with_summaries.py", line 201, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "mnist_with_summaries.py", line 182, in main
train()
File "mnist_with_summaries.py", line 50, in train
tf.summary.image('input', image_shaped_input, 10)
AttributeError: 'module' object has no attribute 'image'
请问这是platform出了问题吗?应该如何解决呢?谢谢!
[回复]
zyw 回复:
23 11 月, 2016 at 22:26
P.S. mnist/py和mnist_softmax.py 都可正常运行。
[回复]
52nlp 回复:
24 11 月, 2016 at 11:51
报错显示 :
tf.summary.image(‘input’, image_shaped_input, 10)
AttributeError: ‘module’ object has no attribute ‘image’
貌似目前github master里面是这个写法,可以试一下r0.11里这个文件:
https://github.com/tensorflow/tensorflow/blob/r0.11/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py
或者直接把master里那个函数替换为image_summary试试:
tf.image_summary('input', image_shaped_input, 10)
[回复]
博主您好!还是小白初学者,到了./configure这一步,出现了下面的错误,不知道是什么地方出了问题。拜托帮忙分析一下,非常感谢!
root@alex://tensorflow# ./configure
//tensorflow //tensorflow
Please specify the location of python. [Default is /usr/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] y
Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] y
Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]
Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] n
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]:
Please specify the location where CUDA toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]:
Extracting Bazel installation...
Sending SIGTERM to previous Bazel server (pid=16451)... done.
.
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 709
_create_cuda_repository(repository_ctx)
File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 627, in _create_cuda_repository
_get_cuda_config(repository_ctx)
File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 496, in _get_cuda_config
_cuda_toolkit_path(repository_ctx)
File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 137, in _cuda_toolkit_path
str(repository_ctx.path(cuda_toolkit...)
File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 137, in str
repository_ctx.path(cuda_toolkit_path).realpath
Object of type 'path' has no field "realpath".
[回复]
Alex 回复:
2 12 月, 2016 at 11:50
尝试了更新bazel版本,0.3.2和0.4.1都尝试过,仍然不能解决,是否和bazel的安装方式有关?
[回复]
52nlp 回复:
2 12 月, 2016 at 15:29
貌似还是和bazel有关,搜了一下,看一下这个issue: https://github.com/tensorflow/tensorflow/issues/5319
Replacing Bazel 0.3.0 with 0.3.2 resolves this issue on my side.
My system info:
Nvidia driver: 367.48-0ubuntu1
CUDA: cuda-repo-ubuntu1604_8.0.44-1_amd64.deb
cuDNN: cuDNN 5.1
[回复]
Alex 回复:
3 12 月, 2016 at 15:32
感谢博主!已经可以运行啦!!!