📊 Types of Data & Data Collection Methods in Data Science (Part 2)

Understanding data is the first and most important step in data science.

Before analysis, modeling, or machine learning, a data scientist must know what type of data they are working with and how it was collected.

In Part 2 of our Statistics for Data Science series, you’ll learn:

  • Different types of data used in data science
  • How data is collected in real-world projects
  • Sampling methods and their importance
  • Hands-on examples to classify datasets

Data types in data science


🎯 Goal of This Post

Understand your data before analyzing it.

Incorrect data understanding leads to:

  • Wrong statistical methods
  • Poor model performance
  • Misleading insights


📌 Types of Data in Data Science

Data can be classified in multiple ways depending on its nature and usage.


🔹 Qualitative vs Quantitative Data

📘 Qualitative Data (Categorical Data)

Qualitative data describes qualities or characteristics and is non-numeric.

Examples:

  • Gender (Male/Female)
  • Product category
  • Customer feedback (Good, Bad, Average)
  • City names

📌 Used for:

  • Classification
  • Sentiment analysis
  • Grouping and segmentation


📗 Quantitative Data (Numerical Data)

Quantitative data represents numbers and measurable values.

Examples:

  • Age
  • Salary
  • Temperature
  • Number of purchases

📌 Used for:

  • Statistical calculations
  • Regression models
  • Forecasting


🔹 Discrete vs Continuous Data

📘 Discrete Data

Discrete data consists of countable values.

Examples:

  • Number of customers
  • Number of defects
  • Number of website visits

📌 Values are whole numbers.


📗 Continuous Data

Continuous data can take any value within a range.

Examples:

  • Height
  • Weight
  • Time
  • Temperature

📌 Can have decimal values.


🔹 Structured vs Unstructured Data

📘 Structured Data

Structured data is organized in rows and columns.

Examples:

  • Excel files
  • SQL tables
  • CSV datasets

📌 Easy to analyze using SQL, Excel, Python, or BI tools.


📗 Unstructured Data

Unstructured data has no predefined format.

Examples:

  • Text documents
  • Emails
  • Images
  • Videos

  • Social media posts

📌 Requires advanced processing (NLP, Computer Vision).


📌 Data Collection Methods in Data Science

Understanding how data is collected helps assess data quality and bias.


🔹 Common Data Collection Techniques

1️⃣ Surveys & Questionnaires

  • Online forms
  • Feedback surveys
  • Market research

📌 Risk: Response bias


2️⃣ Observational Data

  • Website click tracking
  • User behavior logs
  • Sensor data

📌 Real-time and unbiased


3️⃣ Experiments (A/B Testing)

  • Marketing experiments
  • Product feature testing

📌 Controlled and reliable


4️⃣ Transactional Data

  • Sales records
  • Banking transactions
  • E-commerce logs

📌 Highly structured and reliable


5️⃣ Third-Party Data

  • Government datasets
  • APIs
  • External vendors

📌 Verify credibility and freshness


📌 Sampling Methods in Statistics

Sampling allows us to study a subset of data instead of the entire population.


🔹 Types of Sampling Methods

📘 Random Sampling

  • Every unit has equal chance
  • Reduces bias


📘 Stratified Sampling

  • Population divided into groups (strata)
  • Sample taken from each group

📌 Used in surveys and finance


📘 Systematic Sampling

  • Every nth observation selected

📌 Simple and efficient


📘 Convenience Sampling

  • Easily available data

📌 Risk: High bias


📌 Why Sampling Matters in Data Science

  • Saves time and cost
  • Makes large datasets manageable
  • Enables faster experimentation
  • Supports inferential statistics


🧪 Hands-On: Classify Sample Datasets

Let’s classify real-world datasets.

DatasetQualitative / QuantitativeDiscrete / ContinuousStructured / Unstructured
Customer GenderQualitativeDiscreteStructured
Monthly SalaryQuantitativeContinuousStructured
Product ReviewsQualitativeN/AUnstructured
Number of OrdersQuantitativeDiscreteStructured
Website Session TimeQuantitativeContinuousStructured

🧠 Key Takeaways

✔ Always identify data type before analysis
✔ Choose statistical methods based on data nature
✔ Understand data collection to avoid bias
✔ Sampling impacts accuracy and conclusions


🔗 What’s Next in This Series?

👉 Part 3: Descriptive Statistics – Mean, Median, Mode & Variability

Post a Comment

0 Comments