Latest update to JSONIO adds import support

No changes have been made to the export functionality or ggeocode command, but the jsonio command has been modified to add subcommands to support import functions.

Two import modes

jsonio supports loading the data into Stata in two different methods.

Key-Value mode

key-value mode loads the data into two variables with the names key and value. This can be useful for cases where you want to work with a single JSON payload/dataset that you want to further restructure. One of the disadvantages is that all data are loaded as string if all of the nodes are not the same type. If the data are the same type, they will automatically be cast to the correct data type.

Key Value
/grandparent/parent_1/child_1/terminal_node node value
/grandparent/parent_1/child_2/terminal_node node value
/grandparent/parent_1/child_3/terminal_node node value
/grandparent/parent_1/child_4/terminal_node node value
/grandparent/parent_2/child_1/terminal_node node value
/grandparent/parent_2/child_2/terminal_node node value
/grandparent/parent_3/child_1/terminal_node node value
/grandparent/parent_3/child_2/terminal_node node value
/grandparent/parent_3/child_3/terminal_node node value
   

The table above shows a generic structure representing the way that nodes are renamed by the program internally. Terminal nodes (with the exception of array elements) have no numeric IDs, but their ancestors include numeric values that indicate the order in which the node appears in the traversal and also allows the name names to uniquely identify the values.

Row-Value mode

In row-value mode, the same hypothetical example above would be added as a single 1 x 9 row vector of values, which can be useful if making calls to an API based on the values from individual records.

jsonval1 jsonval2 jsonval3 jsonval4 jsonval5 jsonval6 jsonval7 jsonval8 jsonval9
value value value value value value value value value
                 

You’ll notice that the variable names are different, and the stub (jsonval in the example above) can be specified by users. Additionally, the lineage of each node is still retained when loading the data into Stata in this format. The difference is that in row-value mode, the lineage is stored in variable labels :

Variable Name Variable Label
jsonval1 /grandparent/parent_1/child_1/terminal_node
jsonval2 /grandparent/parent_1/child_2/terminal_node
jsonval3 /grandparent/parent_1/child_3/terminal_node
jsonval4 /grandparent/parent_1/child_4/terminal_node
jsonval5 /grandparent/parent_2/child_1/terminal_node
jsonval6 /grandparent/parent_2/child_2/terminal_node
jsonval7 /grandparent/parent_3/child_1/terminal_node
jsonval8 /grandparent/parent_3/child_2/terminal_node
jsonval9 /grandparent/parent_3/child_3/terminal_node
   

Only a few tests have been performed, but the traversal method should be fairly robust to arbitrarily structured JSON (e.g., regardless of what gets thrown at it, it should be able to flatten it and load it as key-value or row-value). If you find any problems please submit an issue in the project repository that can be replicated (e.g., if I can’t see the JSON causing the issue it becomes difficult to address the problem).