Celebrate the Big Data Problems – #1

Celebrate the Big Data Problems – #1

How we can replace a special or required delimiters during Hive import or ingress from the relation database.

Daily we are facing many big data problems in production, PoC, and more perspective. Do we have any common repo to collect and share?  No, as we know we don’t have any. As always dataottam is looking forward to share the learnings with community to celebrate their similar, same kind of problems.  And also if you have any new kind of big data problem, we can jointly debate and experiment to celebrate our big data problem.

So we, dataottam have come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, Problem, Solutions) Framework.

Context:

Whether we are moving a small collection of selfies between apps or moving very large data sets remains a challenge. So Hadoop is one of the big data problem solvers, but transferring data to and from relation databases is still remains challenge post Hadoop stands. Hence SQL to Hadoop – Sqoop was created to perform bidirectional data transfer between Hadoop and all other external structured data sources.

Problem:

How we can replace a special or required delimiters during Hive import or ingress from the relation database.

Solutions:

If we use – -hive-import options to import the data and selects the record count in the destination to check we will find more records than the source due to their delimiters.

So we can instruct Sqoop to automatically clean our data using – – hive-drop-import-delims*, which will remove all 1, \t, and \n characters from all string based columns. And also we can replace the special character using – -hive-delims-replacement**.

Sqoop import \

–connect  jdbc:mysql://mysql.example.com/sqoop \

–username dataottam \

–password dataottam \

  Continue Reading…

You may also like...

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x