jsonio now features several options for loading JSON data into Stata. Part of this process involves a depth-first traversal of the JSON tree which is used to flatten the JSON data into a two dimensional key-value pair object. This process also renames the nodes in such a way that each terminal node (e.g., a node containing no children or a scalar value only) can be uniquely identified based on its ancestry/lineage. Doing so allows users to use simple regular expressions to query collections of nodes from the data. The examples below attempt to show these features, as well as illustrate the difference between the key-value and row-value modes that define how the data will be loaded into Stata.

Examples

The examples below are all focused on the ingest side of jsonio. Currently, two modes are supported (one which loads the data in an n x 2 structure of key and value pairs and the other which loads the values in separate variables in the analog of a row vector).

Example 1

Reading JSON data into a single row vector

. // Start the profiler
. profiler on

.
. // Shows example of reading a JSON file from Disk and loading it into individual variables in the dataset
. // The "(legs_[0-9]/((start)|(end))_location/((lat)|(lng)))" argument passed to the elements parameter
. // is used to query only the latitude or longitude values for start or end locations defined for 
. // individual legs of the trip
. jsonio rv, file("~/Desktop/waypointsResponse.json") nourl ob(1) elem("(legs_[0-9]/((start)|(end))_location/((lat)|(lng)))")

. 
. // Describe the parsed/returned data
. desc 

Contains data
  obs:             1                          
 vars:            12                          
 size:            96                          
--------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------------
jsonvar1        double  %10.0g                /routes_1/legs_1/end_location/lat
jsonvar2        double  %10.0g                /routes_1/legs_1/end_location/lng
jsonvar3        double  %10.0g                /routes_1/legs_1/start_location/lat
jsonvar4        double  %10.0g                /routes_1/legs_1/start_location/lng
jsonvar5        double  %10.0g                /routes_1/legs_2/end_location/lat
jsonvar6        double  %10.0g                /routes_1/legs_2/end_location/lng
jsonvar7        double  %10.0g                /routes_1/legs_2/start_location/lat
jsonvar8        double  %10.0g                /routes_1/legs_2/start_location/lng
jsonvar9        double  %10.0g                /routes_1/legs_3/end_location/lat
jsonvar10       double  %10.0g                /routes_1/legs_3/end_location/lng
jsonvar11       double  %10.0g                /routes_1/legs_3/start_location/lat
jsonvar12       double  %10.0g                /routes_1/legs_3/start_location/lng
--------------------------------------------------------------------------------------
Sorted by: 
     Note: Dataset has changed since last saved.

. 
. // Show the returned elements
. li, compress sep(0)

     +-----------------------------------------------------------------------------------------------------------------------------------------------------+
     |  jsonvar1     jsonvar2    jsonvar3     jsonvar4    jsonvar5     jsonvar6    jsonvar7     jsonvar8    jsonvar9    jsonvar10   jsonvar11    jsonvar12 |
     |-----------------------------------------------------------------------------------------------------------------------------------------------------|
  1. | 42.378175   -71.060226   42.359824   -71.059812   42.442609   -71.229336   42.378175   -71.060226   42.460387   -71.348931   42.442609   -71.229336 |
     +-----------------------------------------------------------------------------------------------------------------------------------------------------+

. 
. // Clear data from memory
. clear

Example 2

Same as above, but uses the key-value mode instead.

. 
. // Load the same data into Stata in a key-value pair structure
. jsonio kv, file("~/Desktop/waypointsResponse.json") nourl elem("(legs_[0-9]/((start)|(end))_location/((lat)|(lng)))")

. 
. // Describe the data set in memory
. desc 

Contains data
  obs:            12                          
 vars:             2                          
 size:           624                          
------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------
key             str44   %44s                  
value           double  %10.0g                
------------------------------------------------------------------
Sorted by: 
     Note: Dataset has changed since last saved.

. 
. // Show the data
. li, fast sep(0)

     +-----------------------------------------------------------+
     |                                          key        value |
     |-----------------------------------------------------------|
  1. |            /routes_1/legs_1/end_location/lat    42.378175 |
  2. |            /routes_1/legs_1/end_location/lng   -71.060226 |
  3. |          /routes_1/legs_1/start_location/lat    42.359824 |
  4. |          /routes_1/legs_1/start_location/lng   -71.059812 |
  5. |            /routes_1/legs_2/end_location/lat    42.442609 |
  6. |            /routes_1/legs_2/end_location/lng   -71.229336 |
  7. |          /routes_1/legs_2/start_location/lat    42.378175 |
  8. |          /routes_1/legs_2/start_location/lng   -71.060226 |
  9. |            /routes_1/legs_3/end_location/lat    42.460387 |
 10. |            /routes_1/legs_3/end_location/lng   -71.348931 |
 11. |          /routes_1/legs_3/start_location/lat    42.442609 |
 12. |          /routes_1/legs_3/start_location/lng   -71.229336 |
     +-----------------------------------------------------------+

. 
. // Clear data from memory
. clear

Example 3

Handling of terminal nodes stored in Array objects.

. 
. // This shows how arrays containing terminal nodes are handled
. jsonio kv, file("~/Desktop/waypointsResponse.json") nourl elem("types") 

. 
. // Show the returned/parsed data elements
. li

     +------------------------------------------------------+
     |                                   key          value |
     |------------------------------------------------------|
  1. | /geocoded_waypoints_1/types/element_1       locality |
  2. | /geocoded_waypoints_1/types/element_2      political |
  3. | /geocoded_waypoints_2/types/element_1   neighborhood |
  4. | /geocoded_waypoints_2/types/element_2      political |
  5. | /geocoded_waypoints_3/types/element_1       locality |
     |------------------------------------------------------|
  6. | /geocoded_waypoints_3/types/element_2      political |
  7. | /geocoded_waypoints_4/types/element_1       locality |
  8. | /geocoded_waypoints_4/types/element_2      political |
     +------------------------------------------------------+

. 
. // Clear data from memory
. clear

Example 4

Reading JSON directly from webservices/REST APIs

. 
. // Read data directly from a URL
. jsonio kv, file("http://maps.googleapis.com/maps/api/directions/json?origin=1250+W+Broadway+Ave,+Minneapolis,+MN&destination=4+Yawkey+Way,+Boston,+MA&waypoint
> s=optimize:true|Chicago,IL|Lexington,KY|Providence,RI") 

. 
. // Describe the data set in memory
. desc

Contains data
  obs:         1,327                          
 vars:             2                          
 size:       368,762                          
-----------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-----------------------------------------------------------------
key             str44   %44s                  
value           strL    %9s                   
-----------------------------------------------------------------
Sorted by: 
     Note: Dataset has changed since last saved.

. 
. // Show the number of elements retrieved from the request
. count
  1,327

. 
. // List some of the entries from the returned results
. li in 20/30, fast sep(0)

     +----------------------------------------------------------+
     |                                          key       value |
     |----------------------------------------------------------|
 20. |               /routes_1/bounds/northeast/lng   -71.030.. |
 21. |               /routes_1/bounds/southwest/lat   38.0067.. |
 22. |               /routes_1/bounds/southwest/lng   -93.296.. |
 23. |                         /routes_1/copyrights   Map dat.. |
 24. |               /routes_1/legs_1/distance/text      413 mi |
 25. |              /routes_1/legs_1/distance/value      664140 |
 26. |               /routes_1/legs_1/duration/text   6 hours.. |
 27. |              /routes_1/legs_1/duration/value       22846 |
 28. |                 /routes_1/legs_1/end_address   Chicago.. |
 29. |            /routes_1/legs_1/end_location/lat   41.8781.. |
 30. |            /routes_1/legs_1/end_location/lng   -87.629.. |
     +----------------------------------------------------------+

In addition to the flexibility provided with these features, the programs is also extremely efficient at the parsing/structuring of the data elements. Below is the print out from the Stata profiler

. 
. // Report the profiling of the commands
. profiler report
clear
     4    0.021  clear
label
     4    0.000  label
jsonio
     4    0.000  jsonio
     1    0.002  rowval
     3    0.217  keyval
          0.219  Total
desc
     3    0.000  desc
describe
     3    0.000  describe
Overall total count =     22
Overall total time  =      0.240 (sec)

It is clear that the key-value mode is a bit less efficient. The is the result of trying to handle the type casting automatically for users; it requires verifying that all of the requested nodes are of the same type.