亚洲十八**毛片_亚洲综合影院_五月天精品一区二区三区_久久久噜噜噜久久中文字幕色伊伊 _欧美岛国在线观看_久久国产精品毛片_欧美va在线观看_成人黄网大全在线观看_日韩精品一区二区三区中文_亚洲一二三四区不卡

代做IEMS 5730、代寫 c++,Java 程序設(shè)計

時間:2024-03-11  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the submitted homework.
I declare that the assignment submitted on Elearning system is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________ Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must be created COMPLETELY by oneself ALONE. A student may not share ANY written work or pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has discussed or worked with. If the answer includes content from any other source, the student MUST STATE THE SOURCE. Failure to do so is cheating and will result in sanctions. Copying answers from someone else is cheating even if one lists their name(s) on the homework.
If there is information you need to solve a problem, but the information is not stated in the problem, try to find the data somewhere. If you cannot find it, state what data you need, make a reasonable estimate of its value, and justify any assumptions you make. You will be graded not only on whether your answer is correct, but also on whether you have done an intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.

 Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference [1] and [2] to download the two datasets. Each line in these two files has the following format (TAB separated):
bigram year match_count
An example for 1-grams would be:
volume_count
circumvallate 1978 335 91 circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall, from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7] to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared. Assume the data set contains all the 1-grams in the last 100 years, and the above records are the only records for the word ‘circumvallate’. Then the average value is:
 instead of
(335 + 261) / 2 = 298, (335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences per year along with their corresponding average values sorted in descending order. If multiple bigrams have the same average value, write down anyone you like (that is,

 break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive 2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with the same datasets stored in the HDFS. Rerun the Pig script in this cluster and compare the performance between Pig and Hive in terms of overall run-time and explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small subset of the data instead of the whole data set. Once your Hive commands/ scripts work as desired, you can then run them up on the complete data set.
 
 Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this homework, you will implement a similar-users-detection algorithm for the online movie rating system. Basically, users who rate similar scores for the same movies may have common tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this homework, the similarity between a given pair of users (e.g. A and B) is measured as the total number of movies both A and B have watched divided by the total number of movies watched by either A or B. The following is the formal definition of similarity: Let M(A) be the set of all the movies user A has watched. Then the similarity between user A and user B is defined as:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) = |𝑀(𝐴)∩𝑀(𝐵)| ...........(**) |𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented by its unique userID and each movie is represented by its unique movieID. The format of the data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
  
 cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the list of the 10 pairs of users having the largest number of movies watched by both users in the pair within the corresponding dataset. The format of your answer should be as follows:
<userID A>, <userID B>, <the number of movie both A and B have watched> //top 1 ...
<userID X>, <userID Y>, <the number of movie both X and Y have watched> //top 10
(b) [20 marks] By modifying/ extending part of your codes in part (a), find the Top-K (K=3) most similar users (as defined by Equation (**)) for every user in the datasets [3], [4]. If multiple users have the same similarity, you can just pick any three of them.
(c)
Hint:
1. In part (b), to facilitate the computation of the similarity measure as
defined in (**), you can use the inclusion-exclusion principle, i.e.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標(biāo)簽:

掃一掃在手機(jī)打開當(dāng)前頁
  • 上一篇:&#160;ICT239 代做、代寫 java/c/c++程序
  • 下一篇:代寫COMP9334 Capacity Planning of Computer
  • 無相關(guān)信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗(yàn)證碼平臺 理財 WPS下載

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權(quán)所有
    ICP備06013414號-3 公安備 42010502001045

    亚洲国产精品免费视频| 亚洲黄页一区| 97超碰免费在线| 成人av免费电影网站| 国产美女亚洲精品7777| 91网在线播放| 制服丝袜中文字幕一区| 欧美变态视频| 福利成人在线观看| 在线中文字幕播放| 五月天综合网站| 狠久久av成人天堂| 一区二区在线观看视频| 91精品在线观看入口| 欧美大片专区| 美国av一区二区| 中文无字幕一区二区三区| 国产亚洲欧美日韩日本| 欧美精品一二三区| 在线看你懂得| crdy在线观看欧美| 僵尸再翻生在线观看免费国语| 四虎最新网站| 在线观看的av| 日精品一区二区三区| 亚洲午夜在线视频| 日韩欧美在线网站| 日韩精品一区二区三区| 精品一区二区在线看| 成人高清伦理免费影院在线观看| 精品一区二区三区av| 欧美三片在线视频观看| 蝌蚪视频在线播放| 日韩大片在线播放| 欧美一区二区精品| 第四色在线一区二区| 26uuuu精品一区二区| 午夜精品久久久久久久99水蜜桃| 欧美日韩三级在线| 九色视频一区| 欧美精品日日操| 欧美日韩尤物久久| 老司机一区二区| 久久av资源网| 国产成人一区在线| 国产午夜精品美女毛片视频| 国产情人综合久久777777| 亚洲欧洲性图库| 欧美性生交xxxxx久久久| a级在线观看| 在线成人h网| 菠萝蜜视频在线观看一区| 欧美老人xxxx18| 一区二区精品伦理... | 久久久久.com| 亚洲欧洲www| 在线观看中文字幕不卡| 深夜福利视频在线免费观看| 日韩专区视频| 国产精品综合网| 日本高清网站| 尤物视频免费在线观看| 污网站在线免费看| 久本草在线中文字幕亚洲| 99视频精品| 久久久久久久久久看片| 欧美视频一区二区三区四区| 国产精品日韩精品在线播放| 91在线观看一区二区| 日韩精品最新网址| 欧美中文字幕一区二区| 黄色一区二区在线| 欧美欧美全黄| 国产成人午夜| 欧美自拍偷拍午夜视频| 轻轻色免费在线视频| 另类小说一区二区三区| 亚洲精品videosex极品| 天堂av一区| 久久精品国内一区二区三区| 日韩欧美中文字幕一区| 欧美xoxoxo| 一区二区三区日韩| 日韩片欧美片| 在线免费国产视频| 激情国产一区二区 | 调教一区二区| 在线男人天堂| 国产精品视频一二三区| av资源在线播放| 亚洲色大成网站www久久九九| 欧美.www| 久久精品超碰| 男男互摸gay网站| ㊣最新国产の精品bt伙计久久| 日韩精品一区二区久久| aaa大片在线观看| 亚洲国产成人精品视频| 欧美绝顶高潮抽搐喷水合集| 欧美精品日韩综合在线| 亚洲国产国产| 国产色a在线观看| 日韩午夜电影| 日韩五码电影| 日韩一二三四区| 亚洲国产综合在线观看| 《视频一区视频二区| 浪潮色综合久久天堂| 99热99在线| 久久久精品国产免费观看同学| 国产日本在线视频| 久久精品一区二区三区四区| 国内露脸中年夫妇交换精品| 欧美13~18sex性hd| 9色国产精品| 欧美一区二区三区视频免费| 老汉av免费一区二区三区| а_天堂中文在线| 国产超碰精品在线观看| 精品久久人人做人人爰| 黄色一区二区在线| av成人老司机| 日韩国产欧美视频| 一道在线中文一区二区三区| 国产福利电影在线| 5g影院5g电影天天爽快| 成人91在线观看| 伊人情人综合网| 欧美疯狂性受xxxxx喷水图片| 国产剧情av麻豆香蕉精品| 国产粉嫩在线观看| 欧美日韩在线一区二区| 国产精品高清亚洲| 人体久久天天| 成人禁在线观看网站| 琪琪一区二区三区| 韩国日本一区| 久久久久久久影视| 日本免费视频www| 在线不卡中文字幕播放| 精品电影一区| sm捆绑调教国产免费网站在线观看| 先锋成人影院| 欧美亚洲综合另类| 欧美xxxx中国| 欧美色视频免费| 91精品国产综合久久精品app| 国产不卡视频一区| 亚洲美女色禁图| jlzzjlzz亚洲女人| 成人搞黄视频| 美女胸又www又黄的网站| 欧美一级片在线观看| 色94色欧美sute亚洲线路一久 | 久久成人综合网| 色综合咪咪久久网| 高清毛片在线观看| 91在线直播| 黄p免费网站| 狠狠躁夜夜躁人人躁婷婷91| 久久久久久久综合日本| 精品一区中文字幕| silk一区二区三区精品视频| 精品国产精品一区二区夜夜嗨| 精品在线播放午夜| 任你弄精品视频免费观看| 第九色区av在线| 欧美日韩午夜精品| 久久九九久久九九| 蜜桃视频一区二区| 精品一区二区三区视频在线观看| 在线精品国产| 99久久婷婷国产综合精品电影√| 91麻豆免费在线视频| 国产三级电影在线| 开心丁香婷婷深爱五月 | 免费成人av资源网| 国产精品久久一卡二卡| 欧美日韩中国免费专区在线看| 色八戒一区二区三区| 精品久久久久久久久久久久包黑料 | 老司机午夜精品视频在线观看| 国产精选久久| 成人污版视频| 日韩最新在线| 国产精品一区二区三区av| 香蕉久久夜色精品国产使用方法| 性欧美18一19sex性欧美| 毛片在线播放a| 亚洲热app| 欧美成人明星100排名| 最大av网站| 亚州av中文字幕在线免费观看| 1024在线看片你懂得| 影音先锋欧美激情| 久久97视频| 亚洲成人五区| 亚洲自拍偷拍网| 欧美亚洲专区| 日本一区二区三区免费乱视频|