Argos-AI’s Applied Research program provides data-driven risk research for short and long-term projects, aimed at solving some of the most pressing business and governmental challenges. To accelerate our success, we partner with private industry, academia, and other research organizations whose deep domain knowledge complements our Cyber Risk and Data Science expertise.
Argos-AI and its partners pursue game-changing advances in machine learning and artificial intelligence to mitigate operational risk. Our research and focus combine advances in data engineering (data pipelines, data lakes) with innovations in machine learning, natural language processing and human-computer-data interaction to solve some of the toughest challenges in operational risk management.
Our data engineering framework is at the core of the skill development, research and services we deliver. The framework is agnostic to any problem, data, and technology and includes six steps, planning, identification, collection, processing, analysis, and visualization:
1. Planning is the most important step of them all and at its center is defining the problem to be solved. If there is no clear and identifiable problem and or questions to be answered then the framework will not work. Then we define resources and scope of the project.
2. Identification is where we then break down the problem to its most basic elements to identify key IT/OT data and data sources. The data types are internal and external data needed to deliver the project; structured data; unstructured data; OS, application, network, and event logs; vulnerability scans; open source intelligence (OSINT) and; Dark Data found in enterprise application systems.
3. Collection of identified data is done with database and web scraping tools; custom scripts; log and file collectors; OSINT Feeds and then stored. Problems during collection can degrade data integrity and introduce data bias.
4. Processing of the data includes transforming the data into a usable format; cleaning the data; enrich; calculate, aggregate, and normalize. Effective data processing functions are often time consuming but are critical to future modeling.
5. Analysis is done using diverse data science techniques from Machine Learning; Natural Language Processing; and Deep Learning. We apply descriptive, diagnostic, predictive, or prescriptive models to answer our key problem questions.
6. Visualization of analyzed data are critical to analysts and decision makers. We recognize that results are consumed in different ways through dashboards, reports, business intelligence tools, ad hoc queries, and API’s and deliver accordingly. At this stage we assess the quality of the framework, data, and modeling and the process is continuously repeated.