Introduction
Data is the lifeblood of risk assessment and analytics. Every data scientist dreams of having access to clean, reliable, unbiased, and labeled data in large volumes. Data is ubiquitous, yet elusive – it is scattered across various sources, often fragmented and incomplete. It’s no surprise that over 90% of the time spent on any AI project is dedicated to data preparation.
To do a thorough risk assessment, we must analyze risk signals embedded in various data sources, including both micro and macro data. In this blog, I explore the significance of both micro and macro data in risk assessment and further dive into the concept of aggregated intelligence, highlighting its advantages and how the data network effect enhances the efficacy of overall risk assessment.
Micro vs. Macro Data
Micro data sources offer detailed, individual-level information, such as firmographic and technographic data, providing detailed information about a business. On the other hand, macro data provides much broader information, such as industry trends, economic indicators or threat intel, offering an external market view of the risk.
While having direct access to specific data points for a business can be valuable, it can also lead to a narrow viewpoint of the risk, it is limited in scope and does not capture broader trends or patterns that impact risk assessment. Macro data, with its large-scale broader view, helps fill these gaps, providing a more holistic understanding of the risk. Incorporating macro understanding, such as industry trends or the threat landscape, significantly improves the overall understanding of risk.
Aggregated Intelligence
Aggregated intelligence is the process of gathering and analyzing large amounts of data from various sources to derive insights and make informed decisions. It involves combining data from multiple sources to identify patterns, trends, and correlations that may not be apparent when looking at individual data points. Aggregated intelligence uses the Data Network Effect (DNE), which increases the value of a dataset as more data points are added.
Businesses in the same industry vertical often encounter similar cyber threats and vulnerabilities. This is because they tend to follow common industry practices, use similar technologies, and comply with the same regulatory requirements. Even humans, often considered the weakest link in the cyber kill chain, change roles or positions within the industry. At Cowbell, we have over forty-five million businesses in our risk pool. The collective intelligence of the pool improves with the addition of more businesses or increased contribution from each business. This collective intelligence is akin to macro data but is derived directly from the micro data of the businesses in the pool.
Example: Aggregating claims and cyber incident data across various industries helps us understand commonly exploited vulnerabilities. This aggregated intelligence is then used to assess the cyber risk of individual businesses within those industries.
A data imputation model is a statistical method used to fill in missing values in a dataset. Missing data is a common issue in data science projects. Data imputation models leverage aggregated intelligence by utilizing patterns and trends from a large dataset to predict and fill in missing values more accurately. By using data imputation models alongside macro and micro data, we are able to make more informed decisions and better manage risks in our portfolios.
Example: Understanding technology vendors used by businesses in a specific vertical helps us understand vendors that may be used by other businesses of similar size in the same industry. This is one of the examples where Data Imputation Models are used to fill the gaps in understanding.
Negative framing
While data imputation models are valuable, there are cases where filling in the gaps is not advisable. The absence of data carries significant meaning and provides valuable insights, a concept known as negative framing. For instance, the absence of data on security incidents could suggest either effective security measures or inadequate monitoring.
Conclusion
The integration of aggregated intelligence and data imputation models has revolutionized risk assessment. By synthesizing information from diverse micro and macro data sources, these models offer a more holistic and accurate view of potential risks. The use of advanced algorithms to impute missing data ensures that decisions are based on the most complete information available, significantly enhancing the reliability of risk assessments. More robust risk assessment also enhances operational efficiencies, supports compliance, and fosters a proactive culture of risk management.
As we continue to refine these technologies and expand our data networks, the potential to predict and mitigate risks before they materialize will only increase. This progression not only benefits data scientists and analysts but also fortifies industries against unforeseen challenges, securing a more stable and predictable future.