Question # 1
Problem Scenario 48 : You have been given below Python code snippet, with intermediate output. We want to take a list of records about people and then we want to sum up their ages and count them. So for this example the type in the RDD will be a Dictionary in the format of {name: NAME, age:AGE, gender:GENDER}. The result type will be a tuple that looks like so (Sum of Ages, Count) people = [] people.append({'name':'Amit', 'age':45,'gender':'M'}) people.append({'name':'Ganga', 'age':43,'gender':'F'}) people.append({'name':'John', 'age':28,'gender':'M'}) people.append({'name':'Lolita', 'age':33,'gender':'F'}) people.append({'name':'Dont Know', 'age':18,'gender':'T'}) peopleRdd=sc.parallelize(people) //Create an RDD peopleRdd.aggregate((0,0), seqOp, combOp) //Output of above line : 167, 5) Now define two operation seqOp and combOp , such that seqOp : Sum the age of all people as well count them, in each partition. combOp : Combine results from all partitions.
|
Answer: See the explanation for Step by Step Solution and configuration.Explanation: Solution : seqOp = (lambda x,y: (x[0] + y['age'],x[1] + 1)) combOp = (lambda x,y: (x[0] + y[0], x[1] + y[1]))
Question # 2
Problem Scenario 46 : You have been given belwo list in scala (name,sex,cost) for each work done. List( ("Deeapak" , "male", 4000), ("Deepak" , "male", 2000), ("Deepika" , "female", 2000),("Deepak" , "female", 2000), ("Deepak" , "male", 1000) , ("Neeta" , "female", 2000)) Now write a Spark program to load this list as an RDD and do the sum of cost for combination of name and sex (as key)
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create an RDD out of this list val rdd = sc.parallelize(List( ("Deeapak" , "male", 4000}, ("Deepak" , "male", 2000), ("Deepika" , "female", 2000),("Deepak" , "female", 2000), ("Deepak" , "male", 1000} , ("Neeta" , "female", 2000}}} Step 2 : Convert this RDD in pair RDD val byKey = rdd.map({case (name,sex,cost) => (name,sex)->cost}) Step 3 : Now group by Key val byKeyGrouped = byKey.groupByKey Step 4 : Nowsum the cost for each group val result = byKeyGrouped.map{case ((id1,id2),values) => (id1,id2,values.sum)} Step 5 : Save the results result.repartition(1).saveAsTextFile("spark12/result.txt")
Question # 3
Problem Scenario 85 : In Continuation of previous question, please accomplish following activities. 1. Select all the columns from product table with output header as below. productID AS ID code AS Code name AS Description price AS 'Unit Price' 2. Select code and name both separated by ' -' and header name should be Product Description'. 3. Select all distinct prices. 4. Select distinct price and name combination. 5. Select all price data sorted by both code and productID combination. 6. count number of products. 7. Count number of products for each code.
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Select all the columns from product table with output header as below. productID AS ID code AS Code name AS Description price AS "Unit Price' val results = sqlContext.sql(......SELECT productID AS ID, code AS Code, name AS Description, price AS Unit Price' FROM products ORDER BY ID""" results.show() Step 2 : Select code and name both separated by ' -' and header name should be "Product Description. val results = sqlContext.sql(......SELECT CONCAT(code,' -', name) AS Product Description, price FROM products""" ) results.showQ Step 3 : Select all distinct prices. val results = sqlContext.sql(......SELECT DISTINCT price AS Distinct Price" FROM products......) results.show() Step 4 : Select distinct price and name combination. val results = sqlContext.sql(......SELECT DISTINCT price, name FROM products""" ) results. showQ Step 5 : Select all price data sorted by both code and productID combination. val results = sqlContext.sql('.....SELECT' FROM products ORDER BY code, productID'.....) results.show() Step 6 : count number of products. val results = sqlContext.sql(......SELECT COUNT(') AS 'Count' FROM products......) results.show() Step 7 : Count number of products for each code. val results = sqlContext.sql(......SELECT code, COUNT('} FROM products GROUP BY code......) results. showQ val results = sqlContext.sql(......SELECT code, COUNT('} AS count FROM products GROUP BY code ORDER BY count DESC......) results. showQ
Question # 4
Problem Scenario 95 : You have to run your Spark application on yarn with each executor Maximum heap size to be 512MB and Number of processor cores to allocate on each executor will be 1 and Your main application required three values as input arguments V1 V2 V3. Please replace XXX, YYY, ZZZ ./bin/spark-submit -class com.hadoopexam.MyTask -master yarn-cluster-num-executors 3 -driver-memory 512m XXX YYY lib/hadoopexam.jarZZZ
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution XXX: -executor-memory 512m YYY: -executor-cores 1 ZZZ : V1 V2 V3 Notes : spark-submit on yarn options Option Description archives Comma-separated list of archives to be extracted into the working directory of each executor. The path must be globally visible inside your cluster; see Advanced Dependency Management. executor-cores Number of processor cores to allocate on each executor. Alternatively, you can use the spark.executor.cores property, executor-memory Maximum heap size to allocate to each executor. Alternatively, you can use the spark.executor.memory-property. num-executors Total number of YARN containers to allocate for this application. Alternatively, you can use the spark.executor.instances property. queue YARN queue to submit to. For more information, see Assigning Applications and Queries to Resource Pools. Default: default.
Question # 5
Problem Scenario 26 : You need to implement near real time solutions for collecting information when submitted in file with below information. You have been given below directory location (if not available than create it) /tmp/nrtcontent. Assume your departments upstream service is continuously committing data in this directory as a new file (not stream of data, because it is near real time solution). As soon as file committed in this directory that needs to be available in hdfs in /tmp/flume location Data echo "I am preparing for CCA175 from ABCTECH.com" > /tmp/nrtcontent/.he1.txt mv /tmp/nrtcontent/.he1.txt /tmp/nrtcontent/he1.txt After few mins echo "I am preparing for CCA175 from TopTech.com" > /tmp/nrtcontent/.qt1.txt mv /tmp/nrtcontent/.qt1.txt /tmp/nrtcontent/qt1.txt Write a flume configuration file named flumes.conf and use it to load data in hdfs with following additional properties. 1. Spool /tmp/nrtcontent 2. File prefix in hdfs sholuld be events 3. File suffix should be Jog 4. If file is not commited and in use than it should have as prefix. 5. Data should be written as text to hdfs
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create directory mkdir /tmp/nrtcontent Step 2 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume6.conf. agent1 .sources = source1 agent1 .sinks = sink1 agent1.channels = channel1 agent1 .sources.source1.channels = channel1 agent1 .sinks.sink1.channel = channel1 agent1 .sources.source1.type = spooldir agent1 .sources.source1.spoolDir = /tmp/nrtcontent agent1 .sinks.sink1 .type = hdfs agent1 .sinks.sink1.hdfs.path = /tmp/flume agent1.sinks.sink1.hdfs.filePrefix = events agent1.sinks.sink1.hdfs.fileSuffix = .log agent1 .sinks.sink1.hdfs.inUsePrefix = _ agent1 .sinks.sink1.hdfs.fileType = Data Stream Step 4 : Run below command which will use this configuration file and append data in hdfs. Start flume service: flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/fIume6.conf -name agent1 Step 5 : Open another terminal and create a file in /tmp/nrtcontent echo "I am preparing for CCA175 from ABCTechm.com" > /tmp/nrtcontent/.he1.txt mv /tmp/nrtcontent/.he1.txt /tmp/nrtcontent/he1.txt After few mins echo "I am preparing for CCA175 from TopTech.com" > /tmp/nrtcontent/.qt1.txt mv /tmp/nrtcontent/.qt1.txt /tmp/nrtcontent/qt1.txt
Question # 6
Problem Scenario 34 : You have given a file named spark6/user.csv. Data is given below: user.csv id,topic,hits Rahul,scala,120 Nikita,spark,80 Mithun,spark,1 myself,cca175,180 Now write a Spark code in scala which will remove the header part and create RDD of values as below, for all rows. And also if id is myself" than filter out row. Map(id -> om, topic -> scala, hits -> 120)
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create file in hdfs (We will do using Hue). However, you can first create in local filesystem and then upload it to hdfs. Step 2 : Load user.csv file from hdfs and create PairRDDs val csv = sc.textFile("spark6/user.csv") Step 3 : split and clean data val headerAndRows = csv.map(line => line.split(",").map(_.trim)) Step 4 : Get header row val header = headerAndRows.first Step 5 : Filter out header (We need to check if the first val matches the first header name) val data = headerAndRows.filter(_(0) != header(O)) Step 6 : Splits to map (header/value pairs) val maps = data.map(splits => header.zip(splits).toMap) step 7: Filter out the user "myself val result = maps.filter(map => mapf'id") != "myself") Step 8 : Save the output as a Text file. result.saveAsTextFile("spark6/result.txt")
Question # 7
Problem Scenario 40 : You have been given sample data as below in a file called spark15/file1.txt 3070811,1963,1096,,"US","CA",,1, 3022811,1963,1096,,"US","CA",,1,56 3033811,1963,1096,,"US","CA",,1,23 Below is the code snippet to process this tile. val field= sc.textFile("spark15/f ilel.txt") val mapper = field.map(x=> A) mapper.map(x => x.map(x=> {B})).collect Please fill in A and B so it can generate below final output Array(Array(3070811,1963,109G, 0, "US", "CA", 0,1, 0) ,Array(3022811,1963,1096, 0, "US", "CA", 0,1, 56) ,Array(3033811,1963,1096, 0, "US", "CA", 0,1, 23) )
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : A. x.split(",”-1) B. if (x. isEmpty) 0 else x
Cloudera CCA175 Exam Dumps
5 out of 5
Pass Your CCA Spark and Hadoop Developer Exam Exam in First Attempt With CCA175 Exam Dumps. Real CCA Spark and Hadoop Developer Exam Questions As in Actual Exam!
— 96 Questions With Valid Answers
— Updation Date : 24-Feb-2025
— Free CCA175 Updates for 90 Days
— 98% CCA Spark and Hadoop Developer Exam Exam Passing Rate
PDF Only Price 49.99$
19.99$
Buy PDF
Speciality
Additional Information
Testimonials
Related Exams
- Number 1 Cloudera CCA Spark and Hadoop Developer study material online
- Regular CCA175 dumps updates for free.
- CCA Spark and Hadoop Developer Exam Practice exam questions with their answers and explaination.
- Our commitment to your success continues through your exam with 24/7 support.
- Free CCA175 exam dumps updates for 90 days
- 97% more cost effective than traditional training
- CCA Spark and Hadoop Developer Exam Practice test to boost your knowledge
- 100% correct CCA Spark and Hadoop Developer questions answers compiled by senior IT professionals
Cloudera CCA175 Braindumps
Realbraindumps.com is providing CCA Spark and Hadoop Developer CCA175 braindumps which are accurate and of high-quality verified by the team of experts. The Cloudera CCA175 dumps are comprised of CCA Spark and Hadoop Developer Exam questions answers available in printable PDF files and online practice test formats. Our best recommended and an economical package is CCA Spark and Hadoop Developer PDF file + test engine discount package along with 3 months free updates of CCA175 exam questions. We have compiled CCA Spark and Hadoop Developer exam dumps question answers pdf file for you so that you can easily prepare for your exam. Our Cloudera braindumps will help you in exam. Obtaining valuable professional Cloudera CCA Spark and Hadoop Developer certifications with CCA175 exam questions answers will always be beneficial to IT professionals by enhancing their knowledge and boosting their career.
Yes, really its not as tougher as before. Websites like Realbraindumps.com are playing a significant role to make this possible in this competitive world to pass exams with help of CCA Spark and Hadoop Developer CCA175 dumps questions. We are here to encourage your ambition and helping you in all possible ways. Our excellent and incomparable Cloudera CCA Spark and Hadoop Developer Exam exam questions answers study material will help you to get through your certification CCA175 exam braindumps in the first attempt.
Pass Exam With Cloudera CCA Spark and Hadoop Developer Dumps. We at Realbraindumps are committed to provide you CCA Spark and Hadoop Developer Exam braindumps questions answers online. We recommend you to prepare from our study material and boost your knowledge. You can also get discount on our Cloudera CCA175 dumps. Just talk with our support representatives and ask for special discount on CCA Spark and Hadoop Developer exam braindumps. We have latest CCA175 exam dumps having all Cloudera CCA Spark and Hadoop Developer Exam dumps questions written to the highest standards of technical accuracy and can be instantly downloaded and accessed by the candidates when once purchased. Practicing Online CCA Spark and Hadoop Developer CCA175 braindumps will help you to get wholly prepared and familiar with the real exam condition. Free CCA Spark and Hadoop Developer exam braindumps demos are available for your satisfaction before purchase order.
Send us mail if you want to check Cloudera CCA175 CCA Spark and Hadoop Developer Exam DEMO before your purchase and our support team will send you in email.
If you don't find your dumps here then you can request what you need and we shall provide it to you.
Bulk Packages
$50
- Get 3 Exams PDF
- Get $33 Discount
- Mention Exam Codes in Payment Description.
Buy 3 Exams PDF
$70
- Get 5 Exams PDF
- Get $65 Discount
- Mention Exam Codes in Payment Description.
Buy 5 Exams PDF
$100
- Get 5 Exams PDF + Test Engine
- Get $105 Discount
- Mention Exam Codes in Payment Description.
Buy 5 Exams PDF + Engine
 Jessica Doe
CCA Spark and Hadoop Developer
We are providing Cloudera CCA175 Braindumps with practice exam question answers. These will help you to prepare your CCA Spark and Hadoop Developer Exam exam. Buy CCA Spark and Hadoop Developer CCA175 dumps and boost your knowledge.
|