What is the hype about this buzz word “Data Science”?
Data science is a tool that turns data into real-world actions. These include machine learning, database technologies, statistics, programming, and domain-specific technologies. It’s about collecting, analyzing, communicating and modeling the data around us in a productive way. Data science includes, Experimentation, Exploratory analysis, analytics,business intelligence, etc and the most popular ones are Machine Learning and AI. Machine learning and AI dominates because of the trends that all the new companies and startups want to establish their structure or base. But big companies like Google and Facebook are already so much established, they only want to improve their each and every bit of their product and service. Being a data scientist is not about how much advance your model is but how much impact you are creating with your work. you don’t only collect data but you analyze them and Strategies to solve problems or to make the product reach out better. This is the reason why so many product based companies are hiring Data scientists because with all the resources now available you can actually make a drastic change in the industry.
How to Become One?
In today’s world, the internet is a chaotic place. There are plenty of resources to dive into. Picking the best course and sticking to it is the most difficult task for anyone. I have wasted so much of my time in switching between different courses and at last feeling lost. I guess nowadays this happens with every skill you want to learn. So the difficult task is not learning any skill but to stick to one well-planned and organized learning path.
Where do we start?
programming :
No need to say that programming is the fundamental skill for not only Data scientists but any job related by not limited to Tech World. Now, the most important question is which language we should work on. Now, some people will argue that R is so much good at mathematical modeling. I wouldn’t disagree But data science continues to be about much, MUCH more than math and statistics. Python is something which will you give you more mileage in doing a broad spectrum of works. With Python, you get a lot more return on your learning investment and can do a larger range of tasks like data wrangling and setting up web services. Another reason I would suggest Python is that it is so much easy to learn and implement. You can automate a lot of tasks and do some cool things with it. For learning the syntax and getting friendly with it you can refer to Python Docs Some Books: “Learn Python 3 The Hard Way”, “Automate The Boring Stuff With Python” (These books will make you Python God if you finish it.) I’m not really a fan of reading books and believe to learn stuff from more interactive ways like watching lectures or tutorials. These are some links for learning Python :
Get yourself familiar with Numpy, Pandas, & Matplotlib. Learn how to load, manipulate, and visualize data. Mastery of these libraries will be crucial to your personal projects. the only way you will learn these libraries is by using them. Don’t feel like you have to memorize every method or function name, that comes with practice. If you forget, Google it.
Quick Tip : Python is a pretty amazing language and sometimes it is so overwhelming, you can do almost anything and everything from it. An advice is to not waste more time in exploring python, you only need to get familiar with its the syntax. Data science is more above that.
Mathematics :
The mathematical foundation is the key skill every Data scientists must have. You can’t even be able to understand a bit of the problem if you don’t have the required skill. Statistics, Probability & Linear Algebra is a prerequisite for all of the machine learning and data analysis work. Luckily Indians are assumed to be very good in mathematics(I’m the exception). If you already have a solid understanding, spend a week or two brushing up on key concepts. Focus especially hard on descriptive statistics. Being able to understand a data set is a skill worth its weight in gold. Some of the useful Links are :
Machine Learning :
Now that you have learned the basic math of Data science, you’ll now be able to understand various Machine Learning Algorithms in depth. You need these algorithms to derive insights, correlations, classification, clustering of Data. These algorithms will help you to understand how a real-world problem can be solved and how the data can be analyzed and manipulated according to that. These algorithms are like weapons to tackle such massive problems with humongous data. Although there can be multiple approaches to solve a problem after working a little bit on some projects, you’ll have pretty much idea about which algorithm to use in which case. Some useful tutorials are :
- Learning From Data (Introductory Machine Learning)
- Machine Learning(One of best course recommended by everyone) (By Andrew Ng) Some of the example data sets to play around : Machine Learning Repository The Scikit-learn documentation has excellent tutorials on the application of common algorithms.
After getting familiar with various Machine Learning Algorithms, you can move further to learn some Deep Learning Techniques. concerned with algorithms inspired by the structure and function of the brain called artificial neural networks. You’ll find it more interesting as you’ll go deeper into this.
Database :
Learning how to work with databases is necessary as the data you’ll be working with are eventually going to reside there. So database manipulation is a required skill set for any data scientist to work in the industry. You can give yourself a plus benefit if you’ll get to know how to set up and manages these databases on a cloud server. Various courses for this are :
Various tools for Big Data :
Last Advice :
Learning Online is not an easy task, and with all these massive numbers of resources, it is very easy to get side-tracked while learning online. Data Science is a different path other than all fields of Computer Science and at some point in time, you’ll be frustrated that you have wasted all your time in learning these and your output is zero. But don’t be disheartened by this, Data Science is like a marathon and you’ll eventually have to invest a lot of time in this, leaving all other fields. You will feel like your friends and colleagues learning different other things are much more ahead of you. But after all the self-driven education and regularly organized time investment, you’ll be eventually at another level than any of your contenders. It won’t be easy but to motivate your own education you will need perseverance and discipline. Then you can break into the data science industry no matter what your situation is.
Don’t settle for just learning a concept and then moving to the next thing. The process of learning doesn’t stop until you can apply a concept to the real world. Work on some projects to implement these things. Kaggle have lots of various resources and competitions to apply your knowledge in some real-world projects and to learn from it. You’ll feel so much motivated after being able to solve these problems, and who knows you can win some cash prizes too!
The applications of data science are endless. It is a life long learning process so enjoy your educational and adventurous journey to make some impact in the real world and to make this world a better place. I wish you all the best for your wonderful upcoming future as a Data Scientist.
Want to connect? Github | LinkedIn | Twitter | Facebook | Quora