對tensorflow中cifar-10文檔的Read操作詳解

2020-02-15 21:18:25

字體：大中小

來源：轉載

供稿：網友

前言

在tensorflow的官方文檔中得卷積神經網絡一章，有一個使用cifar-10圖片數據集的實驗，搭建卷積神經網絡倒不難，但是那個cifar10_input文件著實讓我費了一番心思。配合著官方文檔也算看的七七八八，但是中間還是有一些不太明白，不明白的mark一下，這次記下一些已經明白的。

研究

cifar10_input.py文件的read操作，主要的就是下面的代碼：

if not eval_data:  filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)         for i in xrange(1, 6)]  num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN else:  filenames = [os.path.join(data_dir, 'test_batch.bin')]  num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_EVAL...filename_queue = tf.train.string_input_producer(filenames)...label_bytes = 1 # 2 for CIFAR-100 result.height = 32 result.width = 32 result.depth = 3 image_bytes = result.height * result.width * result.depth # Every record consists of a label followed by the image, with a # fixed number of bytes for each. record_bytes = label_bytes + image_bytes # Read a record, getting filenames from the filename_queue. No # header or footer in the CIFAR-10 format, so we leave header_bytes # and footer_bytes at their default of 0. reader = tf.FixedLengthRecordReader(record_bytes=record_bytes) result.key, value = reader.read(filename_queue) ... if shuffle:  images, label_batch = tf.train.shuffle_batch(    [image, label],    batch_size=batch_size,    num_threads=num_preprocess_threads,    capacity=min_queue_examples + 3 * batch_size,    min_after_dequeue=min_queue_examples) else:  images, label_batch = tf.train.batch(    [image, label],    batch_size=batch_size,    num_threads=num_preprocess_threads,    capacity=min_queue_examples + 3 * batch_size)

開始并不明白這段代碼是用來干什么的，越看越糊涂，因為之前使用tensorflow最多也就是使用哪個tf.placeholder()這個操作，并沒有使用tensorflow自帶的讀寫方法來讀寫，所以上面的代碼看的很費勁兒。不過我在官方文檔的How-To這個document中看到了這個東西：

Batchingdef read_my_file_format(filename_queue): reader = tf.SomeReader() key, record_string = reader.read(filename_queue) example, label = tf.some_decoder(record_string) processed_example = some_processing(example) return processed_example, labeldef input_pipeline(filenames, batch_size, num_epochs=None): filename_queue = tf.train.string_input_producer(   filenames, num_epochs=num_epochs, shuffle=True) example, label = read_my_file_format(filename_queue) # min_after_dequeue defines how big a buffer we will randomly sample #  from -- bigger means better shuffling but slower start up and more #  memory used. # capacity must be larger than min_after_dequeue and the amount larger #  determines the maximum we will prefetch. Recommendation: #  min_after_dequeue + (num_threads + a small safety margin) * batch_size min_after_dequeue = 10000 capacity = min_after_dequeue + 3 * batch_size example_batch, label_batch = tf.train.shuffle_batch(   [example, label], batch_size=batch_size, capacity=capacity,   min_after_dequeue=min_after_dequeue) return example_batch, label_batch

上一篇：python sorted函數原理解析及練習

下一篇：基于python求兩個列表的并集.交集.差集