What are the Data Filters?
Data Filters –A feature that allows users to select, sort and view only the information that is most important to them. This oftentimes makes large quantities of data become easier to digest.
Data filtering is the process of choosing a smaller part of your data set and using that subset for viewing or analysis. Filtering is generally (but not always) temporary – the complete data set is kept, but only part of it is used for the calculation.
Filtering may be used to:
- Look at results for a particular period of time.
- Calculate results for particular groups of interest.
- Exclude erroneous or “bad” observations from an analysis.
- Train and validate statistical models.
Filtering requires you to specify a rule or logic to identify the cases you want to include in your analysis. Filtering can also be referred to as “subsetting” data, or a data “drill-down”.
One reason for filtering data is to remove observations that may contain errors or are undesirable for analysis. For example, you may want to remove respondents who did not complete the survey, respondents who raced through the survey and selected answers without paying attention to what they were answering (“speeders”), or cases where data entered manually has been entered with mistakes. In other areas of research, a multivariate technique may only be applicable to cases where there is complete information for all the variables that were measured, and so a filter may be constructed to remove cases where some observations are missing.
Filtering can be used to evaluate the performance of statistical algorithms and models. The basic idea is to split up the sample into two or more groups and to then apply the analysis independently to each group and compare the results. This kind of filtering would select cases from the data at random, rather than using some rule which is based on the data. This ensures a valid comparison and is often referred to as training, testing, and validating.