Working in the [Data] Mines

Billy Buchanan
Director of Data, Research, and Accountability
Fayette County Public Schools

Slides are available at:
https://wbuchanan.github.io/sdpDataMining

Code examples are available at:
https://github.com/wbuchanan/sdpDataMining

Overview

  • Background
  • Use Cases
  • Doing data mining
  • Your Work

Ask questions as you think of them.

Honestly, I don't mind and don't bite.

I'll be switching back and forth between the slide deck and some programming environments. If you need me to go back over something or to flip back to the previous screen feel free to ask.

Background

Why should you care about this?

  1. $1,507,849
  2. $2,913,122
  3. $548,435
  4. $625,452
  5. $770,304

Why should you care about this?

  1. $1,507,849
  2. $2,913,122
  3. $548,435
  4. $625,452
  5. $770,304

What is data mining?

  • Exploratory Data Analysis
  • Text/Sentiment Analysis
  • Machine Learning/Inferential Statistics

Types of ML

  • Supervised machine learning is used to build predictive models when the outcomes are observed.
  • Unsupervised machine learning is used to classify observations into unobserved groups.

Use Cases


Quantifying uptake of new tools

Meta Matters

  • Were staff actually using the assessment data the district was paying so much for?
  • Could we classify users into distinct groups based on observable behavior?
  • How can we learn more about the user experience without having to do tons of surveys?


Understanding and Improving the User Experience

If you're developing reports, you should want to answer these questions:

  • Are users making use of the interactive components?
  • Are there specific pathways that lead users to leverage some of these tools more frequently?
  • What effect does changing one of the user interface elements have on the overall user experience?

Monitoring. The Vendor shall describe in detail what capabilities the BI platform has to collect usage metrics from end-users using the product. This should specifically address the level of detail that is possible to record (e.g., can the BI platform allow us to track when specific elements in the user-interface are used, can this be time and user stamped).

A/B Testing. The Vendor shall also discuss whether or not the BI platform provides any native support or functionality to implement A/B testing. If the platform does not support this functionality, the Vendor shall discuss how the support clients testing and revising user interface elements and interaction functionality in reports and dashboards.

UI/UX Collection. It is expected that [we] will have the ability to collect extensive telemetry data to develop a more robust understanding of the user experience, for continuous improvement of the reports delivered to end-users, and to more carefully evaluate how data use and access affects teaching and learning. The Vendor shall describe in detail what capabilities exist to collect these data, how those data would be stored, and the level of detail at which these data can be collected. If the proposed solution does not include the capacity to capture and store data on interactions with user interface “widgets”, controllers, forms, and event-listener firings, the Vendor shall propose methods by which [we] may accomplish this goal.

Funnel/Click-Thru. The logging and data collection capabilities of the system must allow [us] to conduct funnel and/or click-thru analyses. The collection requirements at this level are time- and user-stamped records that include the name/id of the report being viewed and a session ID that is sequential by user and day. These data should be collected from the point at which the user logs into the system and continue until the user logs out or a timeout disconnects the user from the system.



Analyzing the Analyzers

  • How similar are the research/data requests that your department gets?
  • Are there seasonal components in the requests (e.g., specific requests happen regularly at certain points of the year)?
  • How does the quality/clarity of the request/research questions vary by end users?


Conversion/Click-Thru

  • When users go to your organization's website, where do they go from there?
  • Do highly effective educators navigate dashboard systems in similar ways to less experienced/skilled educators?
  • How long are users taking to view specific dashboards/reports?


Current/Historical Legislative Trends

  • How easy is it for leaders in your organization to quickly identify and see proposed legislative changes?
  • Using open APIs like Legiscan we can track state legislation and mine the text to identify trends in the proposed legislation
  • This can also make it easier to keep your parents and community members informed about proposed changes to laws that effect the education sector in your community.

Doing data mining

Tools

The Elastic Stack

  • Elasticsearch is a Not Only Structured Query Language (NoSQL) database built on Apache Lucene
  • Logstash is the tool that is missing from your system administrators' lives
  • Kibana makes everything prettier and a bit more user-friendly

Once you have data loaded into Elasticsearch, you can query the text from Kibana to do some exploratory data analysis.

If the data are rectangularized from the example code:

								http://localhost:9200/legislation/bill:description="education"
							
Or, if the data are based on the mapping with nested objects

								http://localhost:9200/legiscan/bill_doc:description="education"
							 


Looking at user interaction data:

These user interaction data are all generated from an R/Shiny-based application I put together to help visualize the Mississippi's Statewide Accountability System (MSAS).

The code for the tool is available here: MSASExplorer



Three methods to query the data

Using the curl command line tool:

curl -XGET '127.0.0.1:9200/shinyapp/userData/_search?q=*'  

Directly from the web browser

http://127.0.0.1:9200/shinyapp/userData/_search?q=* 

Using the Sense extension/Kibana application


							GET shinyapp/userData/_search
{
  "query": {
    "match_all": {}
  }
}
						


Can also put together more complex queries and restrict the fields that are returned


							GET /research_requests/request/_search
{
    "query": {
        "bool": {
            "must": {
                "match" : {
                    "InternalExternal" : "Internal"
                }
            },
            "must_not": {
                "match" : {
                    "Status" : "Completed"
                }
            }
        }
    },
    "fields": [
       "StatusDate", "AssignedTo", "Status", "DataUsedFor", "DataNeeded"
    ]
}
						


Find education legislation sponsored by democrats that was passed into law


							GET /legiscan/bill_doc/_search
{
    "from": 0, "size": 500,
    "query" : { "bool" : {
        "must": [ { "match": { "description": "education" } },
           { "nested": { "path": "progress",
  		"query": { "match": { "progress.event": "Passed" } } }
            },
            { "nested": { "path": "sponsors",
                "query": { "match": { "sponsors.party": "D" } } }
            }] }
    }
}
						


Or find charter laws sponsored by republicans


							GET /legiscan/bill_doc/_search
{
    "from": 0, "size": 500,
    "query" : { "bool" : {
        "must": [ { "match": { "description": "charter" } },
            { "match": { "description": "school" } },
            { "nested": { "path": "sponsors",
               "query": { "match": { "sponsors.party": "R" } } }
            }] }
    }
}
						

Can also query records based on the data underlying the position of the mouse in the visualization


							GET shinyapp/userData/_search
{
  "query" : {
    "range": {
      "msasplot_mouse_over.data.x": {
        "gte": 40,
        "lte": 92
      }
    }
  }
}
						

Other tools

Future Directions



Recommender Systems

  • Recommend click-thru patterns of effective educators to other educators in dashboard/reporting systems
  • Already being used in personalized learning contexts like School of One
  • Could also be used to recommend optimal student-teacher dyads for student assignments


Differentiation

  • No need to differentiate instruction based on unidimensional frameworks
  • Using LCA/LTA and/or other clustering algorithms, we can suggest empirically derived groups based on data from multiple dimensions
  • We can use the same techniques to identify heterogeneous groups of educators so we can improve our service delivery model with regards to supporting their use of data.

Your Work

Oh...and just in case I forgot to mention this already....

I'm Hiring

If you or anyone you know might be interested in moving to Wildcat country to change how data is used in education, let me know or click on this link to see the posting.