亚洲十八**毛片_亚洲综合影院_五月天精品一区二区三区_久久久噜噜噜久久中文字幕色伊伊 _欧美岛国在线观看_久久国产精品毛片_欧美va在线观看_成人黄网大全在线观看_日韩精品一区二区三区中文_亚洲一二三四区不卡

COMP9414代寫、Python語言編程代做

時間:2024-07-06  來源:  作者: 我要糾錯



COMP9414 24T2
Artificial Intelligence
Assignment 2 - Reinforcement Learning
Due: Week 9, Wednesday, 26 July 2024, 11:55 PM.
1 Problem context
Taxi Navigation with Reinforcement Learning: In this assignment,
you are asked to implement Q-learning and SARSA methods for a taxi nav-
igation problem. To run your experiments and test your code, you should
make use of the Gym library1, an open-source Python library for developing
and comparing reinforcement learning algorithms. You can install Gym on
your computer simply by using the following command in your command
prompt:
pip i n s t a l l gym
In the taxi navigation problem, there are four designated locations in the
grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
episode starts, one taxi starts off at a random square and the passenger is
at a random location (one of the four specified locations). The taxi drives
to the passenger’s location, picks up the passenger, drives to the passenger’s
destination (another one of the four specified locations), and then drops off
the passenger. Once the passenger is dropped off, the episode ends. To show
the taxi grid world environment, you can use the following code:

env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
s t a t e = env . r e s e t ( )
rendered env = env . render ( )
p r i n t ( rendered env )
In order to render the environment, there are three modes known as
“human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
ronment in a way suitable for human viewing, and the output is a graphical
window that displays the current state of the environment (see Fig. 1). The
“rgb array” mode provides the environment’s state as an RGB image, and
the output is a numpy array representing the RGB image of the environment.
The “ansi” mode provides a text-based representation of the environment’s
state, and the output is a string that represents the current state of the
environment using ASCII characters (see Fig. 2).
Figure 1: “human” mode presentation for the taxi navigation problem in
Gym library.
You are free to choose the presentation mode between “human” and
“ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
description, there are six discrete deterministic actions that are presented in
Table 1.
For this assignment, you need to implement the Q-learning and SARSA
algorithms for the taxi navigation environment. The main objective for this
assignment is for the agent (taxi) to learn how to navigate the gird-world
and drive the passenger with the minimum possible steps. To accomplish
the learning task, you should empirically determine hyperparameters, e.g.,
the learning rate α, exploration parameters (such as ? or T ), and discount
factor γ for your algorithm. Your agent should be penalized -1 per step it
2
Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
library. Gold represents the taxi location, blue is the pickup location, and
purple is the drop-off location.
Table 1: Six possible actions in the taxi navigation environment.
Action Number of the action
Move South 0
Move North 1
Move East 2
Move West 3
Pickup Passenger 4
Drop off Passenger 5
takes, receive a +20 reward for delivering the passenger, and incur a -10
penalty for executing “pickup” and “drop-off” actions illegally. You should
try different exploration parameters to find the best value for exploration
and exploitation balance.
As an outcome, you should plot the accumulated reward per episode and
the number of steps taken by the agent in each episode for at least 1000
learning episodes for both the Q-learning and SARSA algorithms. Examples
of these two plots are shown in Figures 3–6. Please note that the provided
plots are just examples and, therefore, your plots will not be exactly like the
provided ones, as the learning parameters will differ for your algorithm.
After training your algorithm, you should save your Q-values. Based on
your saved Q-table, your algorithms will be tested on at least 100 random
grid-world scenarios with the same characteristics as the taxi environment for
both the Q-learning and SARSA algorithms using the greedy action selection
3
Figure 3: Q-learning reward. Figure 4: Q-learning steps.
Figure 5: SARSA reward. Figure 6: SARSA steps.
method. Therefore, your Q-table will not be updated during testing for the
new steps.
Your code should be able to visualize the trained agent for both the Q-
learning and SARSA algorithms. This means you should render the “Taxi-
v3” environment (you can use the “ansi” mode) and run your trained agent
from a random position. You should present the steps your agent is taking
and how the reward changes from one state to another. An example of the
visualized agent is shown in Fig. 7, where only the first six steps of the taxi
are displayed.
2 Testing and discussing your code
As part of the assignment evaluation, your code will be tested by tutors
along with you in a discussion carried out in the tutorial session in week 10.
The assignment has a total of 25 marks. The discussion is mandatory and,
therefore, we will not mark any assignment not discussed with tutors.
Before your discussion session, you should prepare the necessary code for
this purpose by loading your Q-table and the “Taxi-v3” environment. You
should be able to calculate the average number of steps per episode and the
4
Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
algorithm.
average accumulated reward (for a maximum of 100 steps for each episode)
for the test episodes (using the greedy action selection method).
You are expected to propose and build your algorithms for the taxi nav-
igation task. You will receive marks for each of these subsections as shown
in Table 2. Except for what has been mentioned in the previous section, it is
fine if you want to include any other outcome to highlight particular aspects
when testing and discussing your code with your tutor.
For both Q-learning and SARSA algorithms, your tutor will consider the
average accumulated reward and the average taken steps for the test episodes
in the environment for a maximum of 100 steps for each episode. For your Q-
learning algorithm, the agent should perform at most 13 steps per episode on
average and obtain a minimum of 7 average accumulated reward. Numbers
worse than that will result in a score of 0 marks for that specific section.
For your SARSA algorithm, the agent should perform at most 15 steps per
episode on average and obtain a minimum of 5 average accumulated reward.
Numbers worse than that will result in a score of 0 marks for that specific
section.
Finally, you will receive 1 mark for code readability for each task, and
your tutor will also give you a maximum of 5 marks for each task depending
on the level of code understanding as follows: 5. Outstanding, 4. Great,
3. Fair, 2. Low, 1. Deficient, 0. No answer.
5
Table 2: Marks for each task.
Task Marks
Results obtained from agent learning
Accumulated rewards and steps per episode plots for Q-learning
algorithm.
2 marks
Accumulated rewards and steps per episode plots for SARSA
algorithm.
2 marks
Results obtained from testing the trained agent
Average accumulated rewards and average steps per episode for
Q-learning algorithm.
2.5 marks
Average accumulated rewards and average steps per episode for
SARSA algorithm.
2.5 marks
Visualizing the trained agent for Q-learning algorithm. 2 marks
Visualizing the trained agent for SARSA algorithm. 2 marks
Code understanding and discussion
Code readability for Q-learning algorithm 1 mark
Code readability for SARSA algorithm 1 mark
Code understanding and discussion for Q-learning algorithm 5 mark
Code understanding and discussion for SARSA algorithm 5 mark
Total marks 25 marks
3 Submitting your assignment
The assignment must be done individually. You must submit your assignment
solution by Moodle. This will consist of a single .zip file, including three
files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
SARSA (you can choose the format for the Q-tables). Remember your files
with your Q-tables will be called during your discussion session to run the
test episodes. Therefore, you should also provide a script in your Python
code at submission to perform these tests. Additionally, your code should
include short text descriptions to help markers better understand your code.
Please be mindful that providing clean and easy-to-read code is a part of
your assignment.
Please indicate your full name and your zID at the top of the file as a
comment. You can submit as many times as you like before the deadline –
later submissions overwrite earlier ones. After submitting your file a good
6
practice is to take a screenshot of it for future reference.
Late submission penalty: UNSW has a standard late submission
penalty of 5% per day from your mark, capped at five days from the as-
sessment deadline, after that students cannot submit the assignment.
4 Deadline and questions
Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
forum on Moodle to ask questions related to the project. We will prioritise
questions asked in the forum. However, you should not share your code to
avoid making it public and possible plagiarism. If that’s the case, use the
course email cs9414@cse.unsw.edu.au as alternative.
Although we try to answer questions as quickly as possible, we might take
up to 1 or 2 business days to reply, therefore, last-moment questions might
not be answered timely.
For any questions regarding the discussion sessions, please contact directly
your tutor. You can have access to your tutor email address through Table
3.
5 Plagiarism policy
Your program must be entirely your own work. Plagiarism detection software
might be used to compare submissions pairwise (including submissions for
any similar projects from previous years) and serious penalties will be applied,
particularly in the case of repeat offences.
Do not copy from others. Do not allow anyone to see your code.
Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
require further clarification on this matter.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp









 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:FINS5510代寫、代做Python/c++程序語言
  • 下一篇:代寫公式指標 代寫指標股票公式定制開發
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    成人伊人222| 在线一区免费观看| 国产精品v日韩精品v欧美精品网站| 日本在线播放一二三区| 日本免费成人| 久久国产成人精品| 精品欧美一区二区三区| 无人视频在线观看免费| 青青草久久爱| 日韩美女视频19| 精品久久久久久久久久久久久久久久久| 成人精品一区二区三区四区| 精品国产青草久久久久福利| 性欧美1819sex性高清| 亚洲精选在线视频| 欧美激情三级| 欧美性猛交xxxxxx富婆| 伊人狠狠色j香婷婷综合| 蜜桃传媒九九九| 一区二区三区四区电影| 99精品热视频| 国内三级在线观看| 亚洲国产aⅴ精品一区二区| 国产一区二区三区久久久| 欧美性xxxxhd| 免费人成在线观看播放视频| 老色鬼在线视频| 国产亚洲一区二区三区啪| 久久久亚洲精品一区二区三区| 三级中文字幕在线观看| 国产主播一区二区| 7777在线| 欧美美女在线| 亚洲黄色性网站| 欧美一二区视频| 亚洲视频分类| 欧美日韩一区不卡| 国产一区亚洲| 2017亚洲天堂1024| 欧美少妇性性性| 国产精品九九| 在线男人天堂| 在线观看欧美精品| 五月天久久久| 欧美性一二三区| 亚洲1区在线| 久久电影网电视剧免费观看| 毛片手机在线观看| 日韩电影av| 亚洲夂夂婷婷色拍ww47| xxx亚洲日本| 亚洲影视资源| 一区二区在线观看视频在线观看| 综合伊人久久| 台湾十八成人网| 久草在线成人| 欧美午夜在线观看| 日韩中文欧美| 欧美一区=区| 最近最新mv在线观看免费高清| 日韩二区三区四区| 日日夜夜精品一区| 亚洲精品老司机| 三级在线观看一区二区| 女人高潮被爽到呻吟在线观看| 在线视频亚洲一区| 亚洲一级影院| 美臀av在线| 亚洲深夜福利| 7777精品伊人久久久大香线蕉超级流畅| 99国产精品免费网站| 国产精品久久久久久妇女6080| 成人国产精品一区二区免费麻豆| 91精品一区二区三区综合在线爱| 99久久伊人久久99| 69久久精品| 色婷婷亚洲精品| 欧美视频二区| 在线国产日本| 美女性感视频久久| 国产一区二区精品调教| 国产精品影音先锋| 激情综合色综合啪啪开心| 玖玖玖国产精品| 国产乱视频在线观看| 色先锋久久av资源部| 在线成人视屏| 男人av在线| 狠狠色狠色综合曰曰| 国产酒店精品激情| 大桥未久在线视频| 色婷婷精品久久二区二区蜜臀av | 超碰精品在线| 国产精品视频在线看| 看黄网站在线| 亚洲成a人v欧美综合天堂| 99国内精品久久久久久久| a毛片在线看免费观看| 在线观看区一区二| 国产另类ts人妖一区二区| 久久精品凹凸全集| 精品少妇一区二区三区| 精品福利电影| 2222www色视频在线观看| 久久99国产精品久久| 亚洲电影视频在线| 日韩欧美中文字幕一区| 最近日韩中文字幕| 欧美一区二区三区免费在线看| 成人高清免费观看| 一区二区三区网站| 久久久久国产精品嫩草影院| 中文欧美日韩| 日本一区影院| 懂色av中文在线| 欧美日韩免费观看一区二区三区| 色拍拍在线精品视频8848| 亚洲一区二区偷拍精品| 国产日韩精品视频一区| 中文字幕日韩一区| 麻豆国产欧美一区二区三区| 久久综合九色| 午夜日韩视频| 国产精品免费精品自在线观看| а天堂8中文最新版在线官网| aa级大片欧美| 国产一区二区三区四区三区四| 阿v免费在线观看| 欧美高清视频www夜色资源网| 国产精品欧美久久久久无广告 | 美女一区二区三区在线观看| 国产精品普通话对白| 日韩影院二区| 国产在线精彩视频| 欧美不卡在线视频| 在线观看91精品国产麻豆| av成人天堂| 爱情岛亚洲播放路线| 免费资源在线观看| 中文字幕在线免费观看| 激情小说 在线视频| 国产超级va在线视频| 麻豆蜜桃在线| 欧美成人精品一区二区男人看| 一本到三区不卡视频| 蜜桃视频第一区免费观看| 国产色99精品9i| 国产美女被遭强高潮免费网站| 日韩一区欧美一区| 国产欧美自拍一区| 欧美一级精品| 自产国语精品视频| 黑丝一区二区三区| 精品一区二区三区在线播放 | 777.av| 亚洲美女在线国产| 不卡在线视频中文字幕| 久久综合中文| 国产精品一区二区黑丝| 一区二区蜜桃| 天堂久久av| 日韩欧美精品电影| 超免费在线视频| 96av在线| 女人av一区| 日韩av在线免费观看不卡| 波多野结衣在线观看一区二区| 日韩精品福利一区二区三区| 精品久久一区| 在线看片你懂得| 亚洲电影一区二区三区| 国产精品久99| 欧美日韩在线视频一区| 欧美日韩久久久久久| 1069男同网址| 丁香花高清电影在线观看完整版| av电影高清在线观看| 国产精品17p| 国产福利一区二区三区视频在线| 亚洲成人在线观看视频| 先锋影音资源999| 女人高潮被爽到呻吟在线观看| 国产精品一区二区三区av麻| 黑人一区二区| 91精品蜜臀在线一区尤物| 欧美视频精品| 一呦二呦三呦精品国产| 77成人影视| 精品亚洲免费视频| 久久精品视频在线看| 亚洲线精品一区二区三区八戒| 欧洲色大大久久| 欧美日韩国产精品成人| 日韩精品一区二区三区视频在线观看 | av成人动漫| 日韩伦理精品| 日韩一二三区在线观看| 国内精品久久久久久久影视麻豆| 国产精品一二三在| 亚洲一二三专区|