Guvi-blog-logo

Celebrate the Big Data Problems – #1

Celebrate the Big Data Problems – #1

How we can replace a special or required delimiters during Hive import or ingress from the relation database.

Daily we are facing many big data problems in production, PoC, and more perspective. Do we have any common repo to collect and share?  No, as we know we don’t have any. As always dataottam is looking forward to share the learnings with community to celebrate their similar, same kind of problems.  And also if you have any new kind of big data problem, we can jointly debate and experiment to celebrate our big data problem.

So we, dataottam have come up with blog sharing initiative called “Celebrate the Big Data Problems”. In this series of blogs we will share our big data problems using CPS (Context, Problem, Solutions) Framework.

Context:

Whether we are moving a small collection of selfies between apps or moving very large data sets remains a challenge. So Hadoop is one of the big data problem solvers, but transferring data to and from relation databases is still remains challenge post Hadoop stands. Hence SQL to Hadoop – Sqoop was created to perform bidirectional data transfer between Hadoop and all other external structured data sources.

Problem:

How we can replace a special or required delimiters during Hive import or ingress from the relation database.

Solutions:

If we use – -hive-import options to import the data and selects the record count in the destination to check we will find more records than the source due to their delimiters.

So we can instruct Sqoop to automatically clean our data using – – hive-drop-import-delims*, which will remove all 1, \t, and \n characters from all string based columns. And also we can replace the special character using – -hive-delims-replacement**.

Sqoop import \

–connect  jdbc:mysql://mysql.example.com/sqoop \

–username dataottam \

–password dataottam \

  Continue Reading…

Contact Form

By clicking 'Submit' you Agree to Guvi Terms & Conditions.

Our Learners Work at

Our Popular Course

Share this post

Author Bio

admin
admin

Our Live Classes

Learn Javascript, HTML, CSS, Java, Data Structure, MongoDB & more
Learn Python, Machine Learning, NLP, Tableau, PowerBI & more
Learn Selenium, Python, Java, Jenkins, Jmeter, API Testing & more
Learn Networking, Security Testing, IAM, Access Management & more

Hey wait, Don’t miss New Updates from GUVI!

Get Your Course Now

Related Articles