Which of the following  provide a standard  API for   doing computation with  MongoDB :-

Machine learning is a subset   of  ____________:-

pandas consist of   moving  window  statistics :-

Every column in a table represents ;-

pandas consist of   an integrated   group by engine   for aggregating  and transforming data sets ;-

ML is  a field of  AI consisting of learning  algorithms  that -

Which of the following of a   random variable  is a measure  of spread  :-

Arrangement of  attributes into hierarchical are called :-

How many categories  of data   models   in DBMS :-

Positive residuals and negative  residuals are below the line ;-

Which of the following can   potentially  change  the   dtype  of a series :-

Which is a subset of  Machine learning :-

Most statistical datasets   are data frames  made up of rows and  columns ;-

Hierarchical   clustering  should be   mainly used for  exploration:-

Data visualisation  decrease  the   insight  and take slower   decision ;-

Which is correct feature  of  Data Frame :-

That one is not  considered as the  example of data  type in MS access ;-

Values in object oriented   models are   stored in :-

The binomial  random  variables  are obtained  as the sum of   iid  Gaussian  trials ;-

Which areas   are affected  by BI ;-

Which of the following   is also referred  to as random  variable :-

A member function  that changes   the state is a ;-

Which technology is used to process and analyze   large   scale data  sets in data science;-

  1. What is the primary goal of data preprocessing in data science?

    • A. Data visualization
    • B. Data cleaning
    • C. Model training
    • D. Feature extraction

    Correct Answer: B. Data cleaning

  2. In machine learning, what does the term “overfitting” refer to?

    • A. Model fitting the training data well but failing on new data
    • B. Model generalizing well to new data
    • C. Model underperforming on both training and new data
    • D. Model having an optimal fit on the training data

    Correct Answer: A. Model fitting the training data well but failing on new data

  3. Which statistical measure provides a central tendency of a dataset?

    • A. Standard deviation
    • B. Median
    • C. Range
    • D. Variance

    Correct Answer: B. Median

  4. What is the purpose of the k-nearest neighbors (KNN) algorithm?

    • A. Classification
    • B. Regression
    • C. Clustering
    • D. Feature scaling

    Correct Answer: A. Classification

  5. Which data type is used to represent categorical variables in Pandas?

    • A. Float
    • B. String
    • C. Object
    • D. Int

    Correct Answer: C. Object

  6. What does the acronym SQL stand for in the context of databases?

    • A. Structured Question Language
    • B. System Query Language
    • C. Standardized Query Language
    • D. Structured Query Language

    Correct Answer: D. Structured Query Language

  7. Which of the following is a supervised learning algorithm?

    • A. K-means clustering
    • B. Decision tree
    • C. Apriori algorithm
    • D. Principal Component Analysis (PCA)

    Correct Answer: B. Decision tree

  8. What is the purpose of the term “bagging” in ensemble learning?

    • A. Boosting the model performance
    • B. Reducing model complexity
    • C. Training multiple models independently
    • D. Feature selection

    Correct Answer: C. Training multiple models independently

  9. What is the main advantage of using a NoSQL database over a traditional relational database?

    • A. ACID compliance
    • B. Schema flexibility
    • C. Strict consistency
    • D. Tabular data representation

    Correct Answer: B. Schema flexibility

  10. In a confusion matrix, what does the term “precision” measure?

  • A. Proportion of true positives to actual positives
  • B. Proportion of true negatives to actual negatives
  • C. Proportion of true positives to predicted positives
  • D. Proportion of true negatives to predicted negatives

Correct Answer: C. Proportion of true positives to predicted positives

