Question # 1
Problem Scenario 51 : You have been given below code snippet. val a = sc.parallelize(List(1, 2,1, 3), 1) val b = a.map((_, "b")) val c = a.map((_, "c")) Operation_xyz Write a correct code snippet for Operationxyz which will produce below output. Output: Array[(lnt, (lterable[String], lterable[String]))] = Array( (2,(ArrayBuffer(b),ArrayBuffer(c))), (3,(ArrayBuffer(b),ArrayBuffer(c))), (1,(ArrayBuffer(b, b),ArrayBuffer(c, c))) )
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : b.cogroup(c).collect cogroup [Pair], groupWith [Pair] A very powerful set of functions that allow grouping up to 3 key-value RDDs together using their keys. Another example val x = sc.parallelize(List((1, "apple"), (2, "banana"), (3, "orange"), (4, "kiwi")), 2) val y = sc.parallelize(List((5, "computer"), (1, "laptop"), (1, "desktop"), (4, "iPad")), 2) x.cogroup(y).collect Array[(lnt, (lterable[String], lterable[String]))] = Array( (4,(ArrayBuffer(kiwi),ArrayBuffer(iPad))), (2,(ArrayBuffer(banana),ArrayBuffer())), (3,(ArrayBuffer(orange),ArrayBuffer())), (1 ,(ArrayBuffer(apple),ArrayBuffer(laptop, desktop))), (5,{ArrayBuffer(),ArrayBuffer(computer))))
Question # 2
Problem Scenario 88 : You have been given below three files product.csv (Create this file in hdfs) productID,productCode,name,quantity,price,supplierid 1001,PEN,Pen Red,5000,1.23,501 1002,PEN,Pen Blue,8000,1.25,501 1003,PEN,Pen Black,2000,1.25,501 1004,PEC,Pencil 2B,10000,0.48,502 1005,PEC,Pencil 2H,8000,0.49,502 1006,PEC,Pencil HB,0,9999.99,502 2001,PEC,Pencil 3B,500,0.52,501 2002,PEC,Pencil 4B,200,0.62,501 2003,PEC,Pencil 5B,100,0.73,501 2004,PEC,Pencil 6B,500,0.47,502 supplier.csv supplierid,name,phone 501,ABC Traders,88881111 502,XYZ Company,88882222 503,QQ Corp,88883333 products_suppliers.csv productID,supplierID 2001,501 2002,501 2003,501 2004,502 2001,503 Now accomplish all the queries given in solution. 1. It is possible that, same product can be supplied by multiple supplier. Now find each product, its price according to each supplier. 2. Find all the supllier name, who are supplying 'Pencil 3B' 3. Find all the products , which are supplied by ABC Traders.
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : It is possible that, same product can be supplied by multiple supplier. Now find each product, its price according to each supplier. val results = sqlContext.sql(......SELECT products.name AS Product Name', price, suppliers.name AS Supplier Name' FROM products_suppliers JOIN products ON products_suppliers.productlD = products.productID JOIN suppliers ON products_suppliers.supplierlD = suppliers.supplierlD null t results.show() Step 2 : Find all the supllier name, who are supplying 'Pencil 3B' val results = sqlContext.sql(......SELECT p.name AS 'Product Name", s.name AS "Supplier Name' FROM products_suppliers AS ps JOIN products AS p ON ps.productID = p.productID JOIN suppliers AS s ON ps.supplierlD = s.supplierlD WHERE p.name = 'Pencil 3B"",M ) results.show() Step 3 : Find all the products , which are supplied by ABC Traders. val results = sqlContext.sql(......SELECT p.name AS 'Product Name", s.name AS "Supplier Name' FROM products AS p, products_suppliers AS ps, suppliers AS s WHERE p.productID = ps.productID AND ps.supplierlD = s.supplierlD AND s.name = 'ABC Traders".....) results. show()
Question # 3
Problem Scenario 76 : You have been given MySQL DB with following details. user=retail_dba password=cloudera database=retail_db table=retail_db.orders table=retail_db.order_items jdbc URL = jdbc:mysql://quickstart:3306/retail_db Columns of order table : (orderid , order_date , ordercustomerid, order_status} ..... Please accomplish following activities. 1. Copy "retail_db.orders" table to hdfs in a directory p91_orders. 2. Once data is copied to hdfs, using pyspark calculate the number of order for each status. 3. Use all the following methods to calculate the number of order for each status. (You need to know all these functions and its behavior for real exam) - countByKey() -groupByKey() - reduceByKey() -aggregateByKey() - combineByKey()
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Import Single table sqoop import -connect jdbc:mysql://quickstart:3306/retail_db -username=retail dba - password=cloudera -table=orders -target-dir=p91_orders Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs Step 2 : Read the data from one of the partition, created using above command, hadoop fs -cat p91_orders/part-m-00000 Step 3: countByKey #Number of orders by status allOrders = sc.textFile("p91_orders") #Generate key and value pairs (key is order status and vale as an empty string keyValue = aIIOrders.map(lambda line: (line.split(",")[3], "")) #Using countByKey, aggregate data based on status as a key output=keyValue.countByKey()Jtems() for line in output: print(line) Step 4 : groupByKey #Generate key and value pairs (key is order status and vale as an one keyValue = allOrders.map(lambda line: (line.split)",")[3], 1)) #Using countByKey, aggregate data based on status as a key output= keyValue.groupByKey().map(lambda kv: (kv[0], sum(kv[1]}}} tor line in output.collect(): print(line} Step 5 : reduceByKey #Generate key and value pairs (key is order status and vale as an one keyValue = allOrders.map(lambda line: (line.split(","}[3], 1)) #Using countByKey, aggregate data based on status as a key output= keyValue.reduceByKey(lambda a, b: a + b) tor line in output.collect(): print(line} Step 6: aggregateByKey #Generate key and value pairs (key is order status and vale as an one keyValue = allOrders.map(lambda line: (line.split(",")[3], line}} output=keyValue.aggregateByKey(0, lambda a, b: a+1, lambda a, b: a+b} for line in output.collect(): print(line} Step 7 : combineByKey #Generate key and value pairs (key is order status and vale as an one keyValue = allOrders.map(lambda line: (line.split(",")[3], line)) output=keyValue.combineByKey(lambda value: 1, lambda ace, value: acc+1, lambda ace, value: acc+value) tor line in output.collect(): print(line) #Watch Spark Professional Training provided by www.ABCTECH.com to understand more on each above functions. (These are very important functions for real exam)
Question # 4
Problem Scenario 43 : You have been given following code snippet. val grouped = sc.parallelize(Seq(((1,"twoM), List((3,4), (5,6))))) val flattened = grouped.flatMap {A => groupValues.map { value => B } } You need to generate following output. Hence replace A and B Array((1,two,3,4),(1,two,5,6))
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : A case (key, groupValues) B (key._1, key._2, value._1, value._2)
Question # 5
Problem Scenario 44 : You have been given 4 files , with the content as given below: spark11/file1.txt Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework spark11/file2.txt The core of Apache Hadoop consists of a storage part known as Hadoop Distributed File System (HDFS) and a processing part called MapReduce. Hadoop splits files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed. spark11/file3.txt his approach takes advantage of data locality nodes manipulating the data they have access to to allow the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking spark11/file4.txt Apache Storm is focused on stream processing or what some call complex event processing. Storm implements a fault tolerant method for performing a computation or pipelining multiple computations on an event as it flows into a system. One might use Storm to transform unstructured data as it flows into a system into a desired format (spark11Afile1.txt) (spark11/file2.txt) (spark11/file3.txt) (sparkl 1/file4.txt) Write a Spark program, which will give you the highest occurring words in each file. With their file name and highest occurring words.
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create all 4 file first using Hue in hdfs. Step 2 : Load all file as an RDD val file1 = sc.textFile("sparkl1/filel.txt") val file2 = sc.textFile("spark11/file2.txt") val file3 = sc.textFile("spark11/file3.txt") val file4 = sc.textFile("spark11/file4.txt") Step 3 : Now do the word count for each file and sort in reverse order of count. val contentl = filel.flatMap( line => line.split(" ")).map(word => (word,1)).reduceByKey(_ + _).map(item => item.swap).sortByKey(false).map(e=>e.swap) val content.2 = file2.flatMap( line => line.splitf ")).map(word => (word,1)).reduceByKey(_ + _).map(item => item.swap).sortByKey(false).map(e=>e.swap) val content3 = file3.flatMap( line > line.split)" ")).map(word => (word,1)).reduceByKey(_ + _).map(item => item.swap).sortByKey(false).map(e=>e.swap) val content4 = file4.flatMap( line => line.split(" ")).map(word => (word,1)).reduceByKey(_ + _).map(item => item.swap).sortByKey(false).map(e=>e.swap) Step 4 : Split the data and create RDD of all Employee objects. val filelword = sc.makeRDD(Array(file1.name+"->"+content1(0)._1+"-"+content1(0)._2)) val file2word = sc.makeRDD(Array(file2.name+"->"+content2(0)._1+"-"+content2(0)._2)) val file3word = sc.makeRDD(Array(file3.name+"->"+content3(0)._1+"-"+content3(0)._2)) val file4word = sc.makeRDD(Array(file4.name+M->"+content4(0)._1+"-"+content4(0)._2)) Step 5: Union all the RDDS val unionRDDs = filelword.union(file2word).union(file3word).union(file4word) Step 6 : Save the results in a text file as below. unionRDDs.repartition(1).saveAsTextFile("spark11/union.txt")
Question # 6
Problem Scenario 12 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following. 1. Create a table in retailedb with following definition. CREATE table departments_new (department_id int(11), department_name varchar(45), created_date T1MESTAMP DEFAULT NOW()); 2. Now isert records from departments table to departments_new 3. Now import data from departments_new table to hdfs. 4. Insert following 5 records in departmentsnew table. Insert into departments_new values(110, "Civil" , null); Insert into departments_new values(111, "Mechanical" , null); Insert into departments_new values(112, "Automobile" , null); Insert into departments_new values(113, "Pharma" , null); Insert into departments_new values(114, "Social Engineering" , null); 5. Now do the incremental import based on created_date column.
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Login to musql db mysql -user=retail_dba -password=cloudera show databases; use retail db; show tables; Step 2 : Create a table as given in problem statement. CREATE table departments_new (department_id int(11), department_name varchar(45), createddate T1MESTAMP DEFAULT NOW()); show tables; Step 3 : isert records from departments table to departments_new insert into departments_new select a.", null from departments a; Step 4 : Import data from departments new table to hdfs. sqoop import \ -connect jdbc:mysql://quickstart:330G/retail_db \ ~username=retail_dba \ -password=cloudera \ -table departments_new\ -target-dir /user/cloudera/departments_new \ -split-by departments Stpe 5 : Check the imported data. hdfs dfs -cat /user/cloudera/departmentsnew/part" Step 6 : Insert following 5 records in departmentsnew table. Insert into departments_new values(110, "Civil" , null); Insert into departments_new values(111, "Mechanical" , null); Insert into departments_new values(112, "Automobile" , null); Insert into departments_new values(113, "Pharma" , null); Insert into departments_new values(114, "Social Engineering" , null); commit; Stpe 7 : Import incremetal data based on created_date column. sqoop import \ -connect jdbc:mysql://quickstart:330G/retaiI_db \ -username=retail_dba \ -password=cloudera \ -table departments_new\ -target-dir /user/cloudera/departments_new \ -append \ -check-column created_date \ -incremental lastmodified \ -split-by departments \ -last-value "2016-01-30 12:07:37.0" Step 8 : Check the imported value. hdfs dfs -cat /user/cloudera/departmentsnew/part"
Question # 7
Problem Scenario 89 : You have been given below patient data in csv format, patientID,name,dateOfBirth,lastVisitDate 1001,Ah Teck,1991-12-31,2012-01-20 1002,Kumar,2011-10-29,2012-09-20 1003,Ali,2011-01-30,2012-10-21 Accomplish following activities. 1. Find all the patients whose lastVisitDate between current time and '2012-09-15' 2. Find all the patients who born in 2011 3. Find all the patients age 4. List patients whose last visited more than 60 days ago 5. Select patients 18 years old or younger
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1: hdfs dfs -mkdir sparksql3 hdfs dfs -put patients.csv sparksql3/ Step 2 : Now in spark shell // SQLContext entry point for working with structured data val sqlContext = neworg.apache.spark.sql.SQLContext(sc) // this is used to implicitly convert an RDD to a DataFrame. import sqlContext.impIicits._ // Import Spark SQL data types and Row. import org.apache.spark.sql._ // load the data into a new RDD val patients = sc.textFilef'sparksqIS/patients.csv") // Return the first element in this RDD patients.first() //define the schema using a case class case class Patient(patientid: Integer, name: String, dateOfBirth:String , lastVisitDate: String) // create an RDD of Product objects val patRDD = patients.map(_.split(M,M)).map(p => Patient(p(0).tolnt,p(1),p(2),p(3))) patRDD.first() patRDD.count(} // change RDD of Product objects to a DataFrame val patDF = patRDD.toDF() // register the DataFrame as a temp table patDF.registerTempTable("patients"} // Select data from table val results = sqlContext.sql(......SELECT* FROM patients '.....) // display dataframe in a tabular format results.show() //Find all the patients whose lastVisitDate between current time and '2012-09-15' val results = sqlContext.sql(......SELECT * FROM patients WHERE TO_DATE(CAST(UNIX_TIMESTAMP(lastVisitDate, 'yyyy-MM-dd') AS TIMESTAMP)) BETWEEN '2012-09-15' AND current_timestamp() ORDER BY lastVisitDate......) results.showQ /.Find all the patients who born in 2011 val results = sqlContext.sql(......SELECT * FROM patients WHERE YEAR(TO_DATE(CAST(UNIXJTlMESTAMP(dateOfBirth, 'yyyy-MM-dd') AS TIMESTAMP))) = 2011 ......) results. show() //Find all the patients age val results = sqlContext.sql(......SELECT name, dateOfBirth, datediff(current_date(), TO_DATE(CAST(UNIX_TIMESTAMP(dateOfBirth, 'yyyy-MM-dd') AS TlMESTAMP}}}/365 AS age FROM patients Mini > results.show() //List patients whose last visited more than 60 days ago - List patients whose last visited more than 60 days ago val results = sqlContext.sql(......SELECT name, lastVisitDate FROM patients WHERE datediff(current_date(), TO_DATE(CAST(UNIX_TIMESTAMP[lastVisitDate, 'yyyy-MM-dd') AS T1MESTAMP))) > 60......); results. showQ; - Select patients 18 years old or younger SELECT' FROM patients WHERE TO_DATE(CAST(UNIXJTlMESTAMP(dateOfBirth, 'yyyy-MM-dd') AS TIMESTAMP}) > DATE_SUB(current_date(),INTERVAL 18 YEAR); val results = sqlContext.sql(......SELECT' FROM patients WHERE TO_DATE(CAST(UNIX_TIMESTAMP(dateOfBirth, 'yyyy-MM-dd') AS TIMESTAMP)) > DATE_SUB(current_date(), T8*365)......); results. showQ; val results = sqlContext.sql(......SELECT DATE_SUB(current_date(), 18*365) FROM patients......); results.show();
Cloudera CCA175 Exam Dumps
5 out of 5
Pass Your CCA Spark and Hadoop Developer Exam Exam in First Attempt With CCA175 Exam Dumps. Real CCA Spark and Hadoop Developer Exam Questions As in Actual Exam!
— 96 Questions With Valid Answers
— Updation Date : 16-Dec-2024
— Free CCA175 Updates for 90 Days
— 98% CCA Spark and Hadoop Developer Exam Exam Passing Rate
PDF Only Price 99.99$
19.99$
Buy PDF
Speciality
Additional Information
Testimonials
Related Exams
- Number 1 Cloudera CCA Spark and Hadoop Developer study material online
- Regular CCA175 dumps updates for free.
- CCA Spark and Hadoop Developer Exam Practice exam questions with their answers and explaination.
- Our commitment to your success continues through your exam with 24/7 support.
- Free CCA175 exam dumps updates for 90 days
- 97% more cost effective than traditional training
- CCA Spark and Hadoop Developer Exam Practice test to boost your knowledge
- 100% correct CCA Spark and Hadoop Developer questions answers compiled by senior IT professionals
Cloudera CCA175 Braindumps
Realbraindumps.com is providing CCA Spark and Hadoop Developer CCA175 braindumps which are accurate and of high-quality verified by the team of experts. The Cloudera CCA175 dumps are comprised of CCA Spark and Hadoop Developer Exam questions answers available in printable PDF files and online practice test formats. Our best recommended and an economical package is CCA Spark and Hadoop Developer PDF file + test engine discount package along with 3 months free updates of CCA175 exam questions. We have compiled CCA Spark and Hadoop Developer exam dumps question answers pdf file for you so that you can easily prepare for your exam. Our Cloudera braindumps will help you in exam. Obtaining valuable professional Cloudera CCA Spark and Hadoop Developer certifications with CCA175 exam questions answers will always be beneficial to IT professionals by enhancing their knowledge and boosting their career.
Yes, really its not as tougher as before. Websites like Realbraindumps.com are playing a significant role to make this possible in this competitive world to pass exams with help of CCA Spark and Hadoop Developer CCA175 dumps questions. We are here to encourage your ambition and helping you in all possible ways. Our excellent and incomparable Cloudera CCA Spark and Hadoop Developer Exam exam questions answers study material will help you to get through your certification CCA175 exam braindumps in the first attempt.
Pass Exam With Cloudera CCA Spark and Hadoop Developer Dumps. We at Realbraindumps are committed to provide you CCA Spark and Hadoop Developer Exam braindumps questions answers online. We recommend you to prepare from our study material and boost your knowledge. You can also get discount on our Cloudera CCA175 dumps. Just talk with our support representatives and ask for special discount on CCA Spark and Hadoop Developer exam braindumps. We have latest CCA175 exam dumps having all Cloudera CCA Spark and Hadoop Developer Exam dumps questions written to the highest standards of technical accuracy and can be instantly downloaded and accessed by the candidates when once purchased. Practicing Online CCA Spark and Hadoop Developer CCA175 braindumps will help you to get wholly prepared and familiar with the real exam condition. Free CCA Spark and Hadoop Developer exam braindumps demos are available for your satisfaction before purchase order.
Send us mail if you want to check Cloudera CCA175 CCA Spark and Hadoop Developer Exam DEMO before your purchase and our support team will send you in email.
If you don't find your dumps here then you can request what you need and we shall provide it to you.
Bulk Packages
$60
- Get 3 Exams PDF
- Get $33 Discount
- Mention Exam Codes in Payment Description.
Buy 3 Exams PDF
$90
- Get 5 Exams PDF
- Get $65 Discount
- Mention Exam Codes in Payment Description.
Buy 5 Exams PDF
$110
- Get 5 Exams PDF + Test Engine
- Get $105 Discount
- Mention Exam Codes in Payment Description.
Buy 5 Exams PDF + Engine
![](pic/60x60-img-2.webp) Jessica Doe
CCA Spark and Hadoop Developer
We are providing Cloudera CCA175 Braindumps with practice exam question answers. These will help you to prepare your CCA Spark and Hadoop Developer Exam exam. Buy CCA Spark and Hadoop Developer CCA175 dumps and boost your knowledge.
|