Data Mining Volunteer - Daniel Kesner

CARR strives to develop and grow itself as an evidence-based advocacy organization.  And evidence means numbers and data mining.  As an intern with CARR, Daniel's job was to use his technical background to analyze and process data that could be used to answer real-world questions related to CARR’s advocacy goals. He first took the raw, unformatted datasets provided by the California Secretary of State and the Community Care Licensing Division (CCL) and formatted the data making them easy to interpret and extract information. .  

The project used these datasets to build a list of legal entities who are licensees of facilities.  To ensure accuracy the list was limited to only those facilities listed in both the Secretary of State and the CCL datasets. To implement these stated goals, he developed an algorithm that found matching entities across the two separate datasets by comparing key words and phrases and choosing the most likely matches. This algorithm was able to identify over 3,000 direct matches between the two databases. This list was later refined to only include additional parameters meeting CARR's parameters.  Data visualization tools (R and Matlab) were used to create charts showing various properties of the data.  The Java code used to develop and implement this algorithm can be found on GitHub at https://github.com/danielkesner/CARR.  Daniel says he earned a lot about how to translate real-world problems into software and how to communicate those results back to people who need them.  His volunteer project for CARR extended beyond his graduation from SDSU.  His work added to the growing body of data CARR has available to expand its advocacy efforts.