Meta Matters: Leveraging Metadata to Improve Data Use and Effectiveness

August 6, 2015

Roadmap

Motivation
About Our Data
Analytic Strategy
- Methods
- Study Design
Results
- Overall Model Fit
- Data use across schools and relationship to accountability system results
Lessons Learned
- Implications for Building Leaders
- Implications for IT Leadership
- Implications for LEA Leadership
Next Steps
- Mixed Methods Approaches
- Higher Resolution Replication
- How to do this in your organization (pending time available)

Motivation

Why Metadata?

Tyler & McNamara (2011) used system log files to measure educators' uptake of student data dashboards in Cleveland, OH.
The data are generated in background processes and stored automatically on servers without regardless of whether or not we use it or want to analyze it.
Metadata not only provide base measures of uptake, but can also provide a rich set of contextual and content variables that are well suited for data mining and machine learning applications.
What questions needed answers?
How are we evaluating data use, analytic capacity, and needs across the organization?
Should we reinvest resources into existing data tools?
What structures and systems support educators' and leaders' using data to improve outcomes for students?
How can we use our human capital assets more strategically with regards to data analysis, usage, and incorporating data into decision making practices?
Due to resource constraints, we were not able to fully address each of the questions.
- We focused on trying to answer the first two questions through our analyses.
- However, we used the last two questions to guide our recommendations for moving the work forward and when considering the implications of the work more broadly.

Process at a glance

About Our Data

Sample

What data were used for our work?
- An Northwest Evaluation Association provided Fayette County Public Schools with a CSV file they prepared from their system log files.
  - Although a nice gesture, the tools we were working with for exploratory data analyses actually make it easier to work with the raw data.
- Each record included a user ID and a timestamp.
- We have a total of 36,000 user interaction/system logged records across 1,800 educators from November 2014 through May 2015.
- 3,865 staff did not use the data system at all.
- The staff worked across 62 distinct schools/programs.
How were the data used to classify/group users?
- First we classified how educators used the system (e.g., classifying the sessions of usage/interactions or within users classes)
- Then we classified the educators themselves (e.g., classifying the users into groups or between users classes)

So, what do these groups of educators look like and how would you classify them?

How would you classify the users in your district?
- What metrics would you use to inform your classification?
- How many types of users (e.g., groups) would you believe to exist a priori?
What characteristics do you think the members of the groups would have in common?
- Would members of the same group typically view the same material?
- view reports at the same time?
- spend the same amount of time viewing reports?
- access the same number of reports on a daily or overall basis?
Does it matter whether the behavior causes the groups, the groups cause the observed behavior, or some mixture of the two?
Would you feel comfortable making all of these assumptions a priori? OR
Would you feel comfortable making an assumption along the lines of:
- Membership in a group/cluster causes the pattern of observed behaviors in the data.

Classifying users

So far each of the user groups seem to have some type of behavior that clearly distinguishes them from other users (e.g., one group views a report type more than other groups - on average - over time).
However, the reports viewed by the users isn't completely orthogonal.
- In otherwords, a single report isn't correlated with only a single group, but is instead correlated with all of the groups to different degrees.
If you wanted to see a better example of how a single report can be correlated with multiple groups in a way that isn't quite so easy to distinguish:
Lu & Billy

How does all of the data factor into the way we classify users?

Visualizing multidimensional data can be very difficult and can be confusing to explain to diverse stakeholder groups.
- But, heatmaps can provide a high level of abstraction that we can relate to familiar displays of data.
- Imagine a crosstabulation where each row in the table represents a group of users and each column represents some factor of interest (e.g., report types viewed).
- Each cell would then show the amount of observations that are classified by each row and column value (e.g., for a given group - or row - how many times did they access a particular report - or column)

Analytic Strategy

Study Design

Tools used:
- Elasticsearch, Kibana, and Logstash technology stack (the ELK Stack) used for Exploratory Data Analysis
- Stata 14 MP8 used to clean and prep final data used for analyses
- Mplus input files and data source created with StatTransfer 13
- R used to automate fitting models in Mplus and to create slide deck
We fitted 32 distinct multi-level Latent Class models using Mplus 7.3
- For each combination of:
  - Single vs Multi-level
  - Without Covariates vs With Covariates
  - All Staff vs Users Only
  - Two, Three, Four, and Five Latent Classes (this only varied between users in the multi-level models)
After fitting and testing our models, we also examined some correlations between school level accountability system results from the previous year and our results aggregated to the school level (e.g., number/proportions of user groups, report views, and/or job types at the school level)

Methods

Potential alternatives to our approach:
- Broadly, what we were doing was clustering the data - or using the data to identify groups where patterns were more similar to each other (e.g., within group) than they were to others (e.g., between groups).
- Clustering algorithms (e.g., K-Means/Medians, Hierarchical Agglomerative Clustering, Dendrograms, etc…) are built around assumptions that the data are randomly sampled from the population of interest.
- Clustering algorithms also make no assumptions about the underlying structure/nature leading to the observed data points (e.g., the clusters could cause the groups, the groups could cause the clusters, or they could just be correlated).
What is Latent Class Analysis (LCA) and how is it different?
- LCA like other latent variable models is derived from a set of causal assumptions; more specifically, we assume that the data we are able to observe (e.g., viewing NWEA report types) is caused by some underlying variable that cannot be directly measured.
- Unlike clustering algorithms, LCA is a method that is both robust enough to deal with the repeated-measures nature of our data, and flexible enough to allow us to specify the relationships between the observed and latent variables.
- Unlike other latent variable models, LCA does not assume that the latent variable is continuous and \(N(\mu,\sigma)\) distributed, but instead allows us to estimate a nominal scale variable identifying a set of mutually exclusive and completely exhaustive groups - or classes.

LCA and Covariates

One of the challenges with LCA is how covariates are allowed to enter the model and whether or not the covariates define class membership or not.
- You can think of the approach used in LCA as fairly analogous to the way IRT methods are applied to test data to estimate a scaled score.
- We wouldn't want student demographics to define the scaled scores, so the measurement model - IRT in the case of test data - is fitted to the data without any covariates in the model.
- Once those values are predicted, we can test how well demographic characteristics predict the test scores.
In our case, we fitted the LCA model to the data, used the estimated coefficients as parameter constraints (e.g., to hold the measurement model constant), and then added covariates to see how well they predicted class membership.
- Henry and Muthén (2010) suggest "Once a multilevel latent class structure is specified, covariates can be introduced at both Level 1 and Level 2." (p. 202)
- Vermunt (2010) and Gudicha and Vermunt (2013) also found evidence that the manner in which covariates are introduced to the model can have substantial effects on the quality and correctness of the classification.

How are school staff interacting with the data system overall?

Results

Overall Model Fit

Sophisticated statistical modeling are attractive to analytic staff, but is there any indication that the methods are appropriate for the data?
- Single-level models fail to retrieve good estimates when the heterogeneity exists at both the first and second heirarchical levels and our results are consistent with the existing literature on the topic (Muthén & Asparouhov, 2009)
With limited exception, we see marginal gains in fitting the model after adjusting for educator covariates that do not predict class membership, but seem to reduce error variance.
- These findings are consistent with the known properties of the estimators discussed in Lubke and Muthén (2007).
- We retained the last model in the table on the next slide based on existing literature on model fit selection critierion (Muthén, 2011)
Results for the parameters that were not of primary interest are suppressed, but available upon request.
Results for the parameters of interest provide varying degrees of support for our intuition:
- Both the total number of days accessed and the indicator for elementary educators displayed a strong correlation with some of the between user classes.

Identifying the best fitting model

What patterns do we observe between prior years' accountability system results, the distribution of staff members classified by user groups, and the proportion of reports viewed by school data users?

Are the Latent Classes invariant across user characteristics?

Could job type and/or total number of days the user accessed the system significantly predict how the users were classified?
- Is it better to have a classification model that is dependent or independent of those characteristics?

What do the correlations between prior years' accountability system results, the distribution of staff members classified by user groups, and the proportion of reports viewed by school data users look like?

Lessons Learned

Implications for Building Leaders

How are we evaluating data use, analytic capacity, and needs across the organization?
- We saw that across these distinct user groups there is limited overlap in the data they tend to use.
- If our intent is to develop more robust site-based teams of data users and expertise, building leaders should consider building educator teams in such a way as to maximize the different user types within teams (e.g., have a team of 5 educators with each of these characteristics).
Should we reinvest resources into existing data tools?
- Data use, expertise, and comfort clearly varies across school sites.
- If the only end users working with the system are already those who are tech savvy, it may be worthy to discuss options that are more inviting and accessable to a wider set of users.
- If the building staff generally feel comfortable accessing the system, it would be better to consider advocating for additional and more in-depth training.

Implications for IT Leadership

How are we evaluating data use, analytic capacity, and needs across the organization?
- IT/IS Teams cannot provide customer service without knowing the customer.
- Although IT/IS staff may be less likely to interact with end-users on a regular basis, using tools like the ELK Stack can provide them with an easy to maintain system to collect, clean, and store data they can use to identify deficiencies and strengths of the system.
- Most importantly, however, is that these data are immediately actionable in decision-making processes affecting development and investment choices.
Should we reinvest resources into existing data tools?
- If we had observed stable and long time lapses between reports being viewed it could indicate system performance issues.
- If the tool does not prioritize the user experience, we should not expect the users to prioritize the tool. In otherwords, if the platform is unresponsive, sluggish, and generally does not provide a good user experience, we should likely look for an alternative solution.
- Consider insourcing vs outsourcing curves when determining how to move forward in the best way for the district.

Implications for LEA Leadership

How are we evaluating data use, analytic capacity, and needs across the organization?
- Monitoring and analyzing usage is an efficient way to begin understanding both if, how, and what data use positively affects student outcomes.
- Tools that can automate the collection, cleaning, and storage process can save the organization significant resources (particularly the time that analytic staff would otherwise spend cleaning and parsing data).
- However, organizations should make a concerted effort to go beyond solely quantitative measures and be deliberate in collecting qualitative data to provide context to the interpretation of the results.
Should we reinvest resources into existing data tools?
- At a broader level, we should have a better understanding of how the data use affects student outcomes in order to have a reasonable cost/benefit analysis to drive the decision.
- Barring that, we really need to know more about why some educators do or do not use the data system.
- If the reason for the lack of uptake is training, it could be more effective to invest in additional support and training.
- If the reason for the lack of uptake is users' negative experiences with the system, we may want to consider investing in either new software platforms and/or hardware upgrades to improve the user experience.

Next Steps & Future Directions

We found some correlations between data use and the prior year's accountability system results, but data were not available for the current year.
- Higher scores on the KY State Accountability System Science Gap metric were positively correlated with the number of days users accessed the data systems and the number of reports accessed; there were also small/weak correlations with higher Science Gap accountability scores and the proportion of group 1 and group 2 users.
- There were positive correlations between the proportion of group 2 users and KY's accountability system scores.
- The correlations between the proportion of different user types and the prior year's accountability system results varies in size and magnitude across both user types and accountability system metrics.

Mixed Methods Approaches

We cannot overstate the importance of including robust qualitative research along side the quantative research in this area.
- We are attempting to capture characteristics of people that are directly measurable without having a widely accepted definition a priori.
- Using qualitative data can provide the much needed context in which the quantiative results should be interpretted and considered.
There are several different strategies we feel are viable implementation candidates:
- Use a combination of survey, interview, and/or focus groups with users to better understand their interactions with the system.
- Use a Master observer model to score and classify user sessions; these data could then be used as a training set for machine learning algorithms to automate the processes in the future.

Higher Resolution Replication

We couldn't model the exact relationship we wanted given the time.
- We don't believe that it is just a matter of the reports an individual views independent of time, and believe modeling the use path (e.g., the conditional probability of viewing a given report next) would be more informative.
- Modeling use paths could also be used to develop recommender systems that would act as immediate intervention systems (e.g., providing recommendations for which report to view next) to users that may be struggling with the system.
Estimate relationships between data access/usage of classroom educators and interim/benchmark performance measures.
- Focus on the process of data use (e.g., specific pathways in which educators interact with the data systems) and how those distinct pathways affect student outcomes.

Extras

How to do this in your organization (pending time available)

A quick start style installer is available at : Installation Script
When our group met yesterday, we asked Lisa to install things on my computer. So…Lisa…how hard was it to install all of the software?