Skip to main content

WHAT IS DATA SCIENCE ?

Data Science is the science of -
  •  Collecting data 
  •  Storing data
  •  Processing data
  •  Describing data
  •  Modelling data

A data scientist takes that raw data whether that be from daily users or surveyed statistics and uncovers hidden insight that can help enable companies to make smarter business decisions. 



COLLECTING DATA -


Data collection is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcome.

Where does data come from?

Traditional data may come from the basic customer records, or historical stock price information.

Now  consistently growing number of companies and industries use and generate big data. Consider social media communities, for example, Facebook, Google and LinkedIn or financial trading data. Temperature measuring grids in various geographical locations ,as well as machine data from sensors in industrial equipment also account to data.  And, of course, wearable tech like calorie counter, heart-rate monitor, etc

Note- Data is the foundation of data science; it is the material on which all the analyses are based.


STORING DATA -

Data Storing in a data science process refers to storing of useful data which you may use in your data science process to dig the actionable insights out of it. 

How the data is stored ?

Small data or traditional data which is structured, stored in databases usually by us and you have full control over it.

Big data is normally the data which needs to be stored on different servers and it’s coming out from multiple sources. It may be from sources which are continuously generating huge data. It has a lot of noise and is unstructured normally.

PROCESSING DATA-


Data processing is the conversion of data into usable and desired form.

Data processing includes-

Data wrangling or data munging
Data cleaning
Data scaling , normalising and  standardising

Note- While working on big data ,if we want to standardise very large amount of data. Then we have to do  Distributed processing . Softwares like 'Hadoop' allows us to do so.


DESCRIBING DATA-


The description and graphing of study data result in better analysis and presentation of data.

Methods are presented for summarizing data numerically, including presentation of data in tables and calculation of statistics for central tendency, variability, and distribution. 
Methods are also presented for displaying data graphically, including line graphs, bar graphs, histograms, and frequency polygons.


MODELLING DATA -


A data model determines how data is exposed to the end user. Optimally creating and structuring database tables to answer business questions is the desired role of data modeling, setting the stage for the best data analysis possible by exposing the end user to the most relevant data they require.

Statistical modelling is used for simple and intuitive models whereas for complex and flexible models algorithmic modelling is generally used.

Comments

Popular posts from this blog

How are Data Science, Machine Learning and Deep Learning related ?

The modern technologies such as Data science ,Machine learning and  Deep learning are quite confusing in terms of their difference and definition. As they all are interconnected with each other.  Although each has a distinct purpose and functionality. Let’s find out the relation between these technologies. Data Science is a much broader term than machine learning. Applying Machine learning and Deep learning techniques are aspects of data science. Data Science is basically the study of : Collecting data Storing data Processing data Describing data Modelling data Now modelling of data is basically done by two types :  Statistical modelling  Algorithmic modelling In statistical modelling  we basically use very simple models for robust statistical analysis & statistical guarantees. (it is most suited for low-dimensional data) But in case of more complex relationship we use alternative approach (i.e algorithmic modelling) and build complex mod