Understanding data is the first and most important step in data science.
Before analysis, modeling, or machine learning, a data scientist must know what type of data they are working with and how it was collected.
In Part 2 of our Statistics for Data Science series, you’ll learn:
- Different types of data used in data science
- How data is collected in real-world projects
- Sampling methods and their importance
- Hands-on examples to classify datasets
🎯 Goal of This Post
Understand your data before analyzing it.
Incorrect data understanding leads to:
- Wrong statistical methods
- Poor model performance
- Misleading insights
📌 Types of Data in Data Science
Data can be classified in multiple ways depending on its nature and usage.
🔹 Qualitative vs Quantitative Data
📘 Qualitative Data (Categorical Data)
Qualitative data describes qualities or characteristics and is non-numeric.
Examples:
- Gender (Male/Female)
- Product category
- Customer feedback (Good, Bad, Average)
- City names
📌 Used for:
- Classification
- Sentiment analysis
- Grouping and segmentation
📗 Quantitative Data (Numerical Data)
Quantitative data represents numbers and measurable values.
Examples:
- Age
- Salary
- Temperature
- Number of purchases
📌 Used for:
- Statistical calculations
- Regression models
- Forecasting
🔹 Discrete vs Continuous Data
📘 Discrete Data
Discrete data consists of countable values.
Examples:
- Number of customers
- Number of defects
- Number of website visits
📌 Values are whole numbers.
📗 Continuous Data
Continuous data can take any value within a range.
Examples:
- Height
- Weight
- Time
- Temperature
📌 Can have decimal values.
🔹 Structured vs Unstructured Data
📘 Structured Data
Structured data is organized in rows and columns.
Examples:
- Excel files
- SQL tables
- CSV datasets
📌 Easy to analyze using SQL, Excel, Python, or BI tools.
📗 Unstructured Data
Unstructured data has no predefined format.
Examples:
- Text documents
- Emails
- Images
- Videos
-
Social media posts
📌 Requires advanced processing (NLP, Computer Vision).
📌 Data Collection Methods in Data Science
Understanding how data is collected helps assess data quality and bias.
🔹 Common Data Collection Techniques
1️⃣ Surveys & Questionnaires
- Online forms
- Feedback surveys
- Market research
📌 Risk: Response bias
2️⃣ Observational Data
- Website click tracking
- User behavior logs
- Sensor data
📌 Real-time and unbiased
3️⃣ Experiments (A/B Testing)
- Marketing experiments
- Product feature testing
📌 Controlled and reliable
4️⃣ Transactional Data
- Sales records
- Banking transactions
- E-commerce logs
📌 Highly structured and reliable
5️⃣ Third-Party Data
- Government datasets
- APIs
- External vendors
📌 Verify credibility and freshness
📌 Sampling Methods in Statistics
Sampling allows us to study a subset of data instead of the entire population.
🔹 Types of Sampling Methods
📘 Random Sampling
- Every unit has equal chance
- Reduces bias
📘 Stratified Sampling
- Population divided into groups (strata)
- Sample taken from each group
📌 Used in surveys and finance
📘 Systematic Sampling
- Every nth observation selected
📌 Simple and efficient
📘 Convenience Sampling
- Easily available data
📌 Risk: High bias
📌 Why Sampling Matters in Data Science
- Saves time and cost
- Makes large datasets manageable
- Enables faster experimentation
- Supports inferential statistics
🧪 Hands-On: Classify Sample Datasets
Let’s classify real-world datasets.
| Dataset | Qualitative / Quantitative | Discrete / Continuous | Structured / Unstructured |
|---|---|---|---|
| Customer Gender | Qualitative | Discrete | Structured |
| Monthly Salary | Quantitative | Continuous | Structured |
| Product Reviews | Qualitative | N/A | Unstructured |
| Number of Orders | Quantitative | Discrete | Structured |
| Website Session Time | Quantitative | Continuous | Structured |
🧠Key Takeaways
✔ Always identify data type before analysis
✔ Choose statistical methods based on data nature
✔ Understand data collection to avoid bias
✔ Sampling impacts accuracy and conclusions
🔗 What’s Next in This Series?
👉 Part 3: Descriptive Statistics – Mean, Median, Mode & Variability
0 Comments
If you have any doubt please comment or write us to - datahark12@gmail.com