Hadoop Certification - CCA - Pyspark - Reading and Saving Sequence Files

Показать описание

Connect with me or follow me at

Рекомендации по теме

Комментарии

Hi, thanks for your videos. I am getting the below exception while running the
command :
Read: dataRDD=sc.sequenceFile("/user/cloudera/pyspark/departmentsSeq", "org.apache.hadoop.io.IntWritable", "org.apache.hadoop.io.Text")

Save: dataRDD.map(lambda x: tuple(x.split(", ", 1))).saveAsNewAPIHadoopFile("/user/cloudera/pyspark/departmentsSequence", "org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat", keyClass="org.apache.hadoop.io.IntWritable", valueClass="org.apache.hadoop.io.Text")

Exception : AttributeError: 'tuple' object has no attribute 'split'

Can you please suggest me the possible cause.

sanjibdharitd

For any technical discussions or doubts, please use our forum - discuss.itversity.com
For practicing on state of the art big data cluster, please sign up on - labs.itversity.com
Lab is under free preview until 12/31/2016 and after that subscription
charges are 14.99$ per 31 days, 34.99$ per 93 days and 54.99$ per 185 days

itversity

Hello Sir,
Do we have to Load and store avro data files too? If we have to, then please help me finding the solution.

"Convert a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS." I wonder if some files stored in HDFS are avro!

Thank you
Uma

umak

Hadoop Certification - CCA - Pyspark - Reading and Saving Sequence Files

Hadoop Certification - CCA - Introduction

Hadoop Certification - CCA - Data Analysis introduction

Hadoop Certification - CCA - Spark Introduction

Hadoop Certification - CCA - Using Cloudera Quickstart VM (Add on)

Hadoop Certification - CCA - How Combiner Works?

Hadoop Certification - CCA - Using Cloudera Quickstart VM

Hadoop Certification - CCA - Submitting pyspark applications

Hadoop Certification - CCA - Common Issues - Connection Refused

Hadoop Certification - CCA - Word Count Explained

Hadoop Certification - CCA - Submitting scala applications

Hadoop Certification - CCA - Copying data from HDFS

Hadoop Certification - CCA - Conclusion and Best of luck

Should You Do A Spark Or Hadoop Certification?

CCA 175 - Hadoop & Spark Developer Certification | Cloudera CCA 175 Exam | Intellipaat

Hadoop Certification - CCA - Hive Metastore

Hadoop Certification - CCA - 01 Flume Introduction

Hadoop Certification - CCA - Pyspark - Filtering data

Hadoop Certification - CCA - Setup Spark 1.2.1 on Quickstart VM

Hadoop Certification - CCA - Scala - 01 Joining Data Sets

Hadoop Certification - CCA - Copying data into HDFS

Hadoop Certification - CCA - Flume - Ingest real time data into HDFS

Hadoop Certification - CCA - Flume - Using HDFS Sink

Hadoop Certification - CCA - Scala - Reading and Saving Sequence Files

Hadoop Certification - CCA - Pyspark - 03 Aggregating Data by key - Introduction