Data Science is the science of -
- Collecting data
- Storing data
- Processing data
- Describing data
- Modelling data
A data scientist takes that raw data whether that be from daily users or surveyed statistics and uncovers hidden insight that can help enable companies to make smarter business decisions. COLLECTING DATA -
Data collection is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcome.
Where does data come from?
Traditional data may come from the basic customer records, or historical stock price information.Now consistently growing number of companies and industries use and generate big data. Consider social media communities, for example, Facebook, Google and LinkedIn or financial trading data. Temperature measuring grids in various geographical locations ,as well as machine data from sensors in industrial equipment also account to data. And, of course, wearable tech like calorie counter, heart-rate monitor, etc
Note- Data is the foundation of data science; it is the material on which all the analyses are based.
STORING DATA -
Data Storing in a data science process refers to storing of useful data which you may use in your data science process to dig the actionable insights out of it.How the data is stored ?
Small data or traditional data which is structured, stored in databases usually by us and you have full control over it.Big data is normally the data which needs to be stored on different servers and it’s coming out from multiple sources. It may be from sources which are continuously generating huge data. It has a lot of noise and is unstructured normally.
PROCESSING DATA-
Data processing is the conversion of data into usable and desired form.
Data processing includes-
Data wrangling or data mungingData cleaning
Data scaling , normalising and standardising
Note- While working on big data ,if we want to standardise very large amount of data. Then we have to do Distributed processing . Softwares like 'Hadoop' allows us to do so.
DESCRIBING DATA-
The description and graphing of study data result in better analysis and presentation of data.
Methods are presented for summarizing data numerically, including presentation of data in tables and calculation of statistics for central tendency, variability, and distribution.
Methods are also presented for displaying data graphically, including line graphs, bar graphs, histograms, and frequency polygons.
MODELLING DATA -
A data model determines how data is exposed to the end user. Optimally creating and structuring database tables to answer business questions is the desired role of data modeling, setting the stage for the best data analysis possible by exposing the end user to the most relevant data they require.
Statistical modelling is used for simple and intuitive models whereas for complex and flexible models algorithmic modelling is generally used.
Comments
Post a Comment