在好例子网,分享、交流、成长!
您当前所在位置:首页Python 开发实例Python语言基础 → 深度学习之三:深度强化学习(DQN-Deep Q Network)之应用-Flappy Bird

深度学习之三:深度强化学习(DQN-Deep Q Network)之应用-Flappy Bird

Python语言基础

下载此实例
  • 开发语言:Python
  • 实例大小:0.80M
  • 下载次数:35
  • 浏览次数:908
  • 发布时间:2019-08-09
  • 实例类别:Python语言基础
  • 发 布 人:617279194
  • 文件格式:.rar
  • 所需积分:1
 相关标签: flappy bird flappy bird 深度学习 app

实例介绍

【实例简介】

目录

1.达到的目的

2.思路

   2.1.强化学习(RL Reinforcement Learing)

   2.2.深度学习(卷积神经网络CNN)

3.踩过的坑

4.代码实现(python3.5)

5.运行结果与分析


【实例截图】
from clipboard
【核心代码】

#!/usr/bin/env python

from __future__ import print_function

 

import tensorflow as tf

import cv2

import sys

sys.path.append("game/")

try:

    from . import wrapped_flappy_bird as game

except Exception:

    import wrapped_flappy_bird as game

import random

import numpy as np

from collections import deque

'''

先观察一段时间(OBSERVE = 1000 不能过大),

获取state(连续的4) => 进入训练阶段(无上限)=> action

 

'''

GAME = 'bird' # the name of the game being played for log files

ACTIONS = 2 # number of valid actions 往上  往下

GAMMA = 0.99 # decay rate of past observations

OBSERVE = 1000. # timesteps to observe before training

EXPLORE = 3000000. # frames over which to anneal epsilon

FINAL_EPSILON = 0.0001 # final value of epsilon 探索

INITIAL_EPSILON = 0.1 # starting value of epsilon

REPLAY_MEMORY = 50000 # number of previous transitions to remember

BATCH = 32 # size of minibatch

FRAME_PER_ACTION = 1

 

# GAME = 'bird' # the name of the game being played for log files

# ACTIONS = 2 # number of valid actions

# GAMMA = 0.99 # decay rate of past observations

# OBSERVE = 100000. # timesteps to observe before training

# EXPLORE = 2000000. # frames over which to anneal epsilon

# FINAL_EPSILON = 0.0001 # final value of epsilon

# INITIAL_EPSILON = 0.0001 # starting value of epsilon

# REPLAY_MEMORY = 50000 # number of previous transitions to remember

# BATCH = 32 # size of minibatch

# FRAME_PER_ACTION = 1

 

def weight_variable(shape):

    initial = tf.truncated_normal(shape, stddev = 0.01)

    return tf.Variable(initial)

 

def bias_variable(shape):

    initial = tf.constant(0.01, shape = shape)

    return tf.Variable(initial)

# padding = SAME=> new_height = new_width = W / S (结果向上取整)

# padding = VALID=> new_height = new_width = (W F 1) / S (结果向上取整)

def conv2d(x, W, stride):

    return tf.nn.conv2d(x, W, strides = [1, stride, stride, 1], padding = "SAME")

 

def max_pool_2x2(x):

    return tf.nn.max_pool(x, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = "SAME")

"""

 数据流:80 * 80 * 4 

 conv1(8 * 8 * 4 * 32, Stride = 4) pool(Stride = 2)-> 10 * 10 * 32(height = width = 80/4 = 20/2 = 10)

 conv2(4 * 4 * 32 * 64, Stride = 2) -> 5 * 5 * 64 pool(Stride = 2)-> 3 * 3 * 64

 conv3(3 * 3 * 64 * 64, Stride = 1) -> 3 * 3 * 64 = 576

 576 在定义h_conv3_flat变量大小时需要用到,以便进行FC全连接操作

"""

 

def createNetwork():

    # network weights

    W_conv1 = weight_variable([8, 8, 4, 32])

    b_conv1 = bias_variable([32])

 

    W_conv2 = weight_variable([4, 4, 32, 64])

    b_conv2 = bias_variable([64])

 

    W_conv3 = weight_variable([3, 3, 64, 64])

    b_conv3 = bias_variable([64])

 

    W_fc1 = weight_variable([576, 512])

    b_fc1 = bias_variable([512])

    # W_fc1 = weight_variable([1600, 512])

    # b_fc1 = bias_variable([512])

 

    W_fc2 = weight_variable([512, ACTIONS])

    b_fc2 = bias_variable([ACTIONS])

 

    # input layer

    s = tf.placeholder("float", [None, 80, 80, 4])

 

    # hidden layers

    h_conv1 = tf.nn.relu(conv2d(s, W_conv1, 4) b_conv1)

    h_pool1 = max_pool_2x2(h_conv1)

 

    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2, 2) b_conv2)

    h_pool2 = max_pool_2x2(h_conv2)

 

    h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3, 1) b_conv3)

    h_pool3 = max_pool_2x2(h_conv3)

 

    h_pool3_flat = tf.reshape(h_pool3, [-1, 576])

    #h_conv3_flat = tf.reshape(h_conv3, [-1, 1600])

 

    h_fc1 = tf.nn.relu(tf.matmul(h_pool3_flat, W_fc1) b_fc1)

    #h_fc1 = tf.nn.relu(tf.matmul(h_conv3_flat, W_fc1) b_fc1)

 

    # readout layer

    readout = tf.matmul(h_fc1, W_fc2) b_fc2

 

    return s, readout, h_fc1

 

def trainNetwork(s, readout, h_fc1, sess):

    # define the cost function

    a = tf.placeholder("float", [None, ACTIONS])

    y = tf.placeholder("float", [None])

    # reduction_indices = axis  0 :   1:


实例下载地址

深度学习之三:深度强化学习(DQN-Deep Q Network)之应用-Flappy Bird

不能下载?内容有错? 点击这里报错 + 投诉 + 提问

好例子网口号:伸出你的我的手 — 分享

网友评论

发表评论

(您的评论需要经过审核才能显示)

查看所有0条评论>>

小贴士

感谢您为本站写下的评论,您的评论对其它用户来说具有重要的参考价值,所以请认真填写。

  • 类似“顶”、“沙发”之类没有营养的文字,对勤劳贡献的楼主来说是令人沮丧的反馈信息。
  • 相信您也不想看到一排文字/表情墙,所以请不要反馈意义不大的重复字符,也请尽量不要纯表情的回复。
  • 提问之前请再仔细看一遍楼主的说明,或许是您遗漏了。
  • 请勿到处挖坑绊人、招贴广告。既占空间让人厌烦,又没人会搭理,于人于己都无利。

关于好例子网

本站旨在为广大IT学习爱好者提供一个非营利性互相学习交流分享平台。本站所有资源都可以被免费获取学习研究。本站资源来自网友分享,对搜索内容的合法性不具有预见性、识别性、控制性,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,平台无法对用户传输的作品、信息、内容的权属或合法性、安全性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论平台是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二与二十三条之规定,若资源存在侵权或相关问题请联系本站客服人员,点此联系我们。关于更多版权及免责申明参见 版权及免责申明

;
报警