Background
Many insurance companies use Excel to manipulate and analyze their model point files for projecting future liabilities, pricing products, and reserving. The model point files tend to be very large, often exceeding Excel's limits.
Companies regularly try using Excel’s built-in functions and pivot tables to aggregate, transform, and analyze these model points. However, this is not sustainable due to several issues:
- Data size: Model point files can be millions of rows, leading to slow performance and frequent crashes in Excel.
- Complexity: Calculations tend to be complex, involving nested formulas, manual adjustments, and multiple linked spreadsheets.
- Errors: Manual processes introduce errors that are difficult to trace. Auditing and verifying calculations is time-consuming and error-prone.
- Collaboration: Multiple users editing different versions of spreadsheets leads to confusion and inconsistency in the results.
Benefits of a Migration
Companies that decide to migrate from Excel to a combination of SQL for data storage and manipulation, and R for complex calculations and reporting will have the following benefits:
- Handle larger data sets more efficiently.
- Automate processes to reduce manual intervention and improve accuracy.
- Improve performance for querying, updating, and transforming data.
- Enable advanced actuarial calculations (e.g., mortality modeling, profit testing) using more robust tools.
- Ensure data integrity and reproducibility across departments.
- Streamline collaboration through a centralized, auditable workflow.
Implementation
1. Data Storage in SQL:
- The company moves its model point files into a relational database, with each policy record stored in tables. SQL databases are ideal because they:
- Efficiently store and retrieve millions of rows.
- Allow complex data transformations via SQL queries.
- Scale well as the company’s data grows.
- Indexes can be applied to frequently queried fields like policy number, premium, and age to speed up queries.
2. Data Manipulation and Aggregation in SQL:
- SQL queries can replace Excel's pivot tables and formulas for summarizing and grouping data. Examples include:
- Aggregating policies by age group to calculate total premiums, claims liabilities, or exposure.
- Filtering policies to look at specific product types or regions.
- Running updates to reflect changes in assumptions (e.g., updated mortality tables).
- SQL stored procedures can be written to automate routine data manipulations, ensuring consistency and reducing manual input.
3. Complex Calculations in R:
- Once the data is queried and filtered in SQL, R can be used for complex actuarial calculations. For example:
- Survival models: The company can use R's survival package to model policyholder mortality rates.
- Profit testing: R scripts calculated expected profits by projecting future cash flows under various interest rate scenarios.
- Risk factor analysis: R’s glm function was used to build generalized linear models to assess the impact of different factors (e.g., age, gender, smoking status) on mortality or lapse rates.
- Parallel processing in R allowed for the simultaneous calculation of thousands of model point records, vastly improving processing time compared to Excel.
4. Data Visualization and Reporting:
- R’s ggplot2 package and Shiny apps can create dynamic dashboards and visualizations, allowing management to view up-to-date projections and results interactively, replacing Excel’s static charts.
- Markdown reports can be generated automatically from R scripts, reducing the need for manual report creation and ensuring consistency in presentation.
Benefits Realized
- Performance Improvements:
- SQL handled large datasets (millions of model points) much faster than Excel.
- Data retrieval that took hours or days in Excel was completed in seconds or minutes using SQL.
- R's parallel processing capabilities significantly reduced the time needed for actuarial calculations.
- Automation and Reproducibility:
- The migration to R and SQL allows companies to fully automate their processes.
- Version control for R scripts using Git enables full traceability of changes and collaboration across teams.
- Accuracy and Reduced Errors:
- SQL’s built-in data integrity features ensured that only valid data was entered.
- R scripts were more reliable and transparent than Excel formulas, reducing the risk of human errors.
- Advanced Analysis:
- R’s statistical tools enabled advanced analyses previously difficult in Excel.
- Predictive models in R were integrated with the SQL database for real-time adjustments.
- Scalability and Flexibility:
- The SQL and R solution scaled effortlessly, handling larger data sets without requiring significant changes.
- External data could be easily integrated into the SQL database and R scripts.
Challenges Faced
- Learning Curve: The actuarial team had to invest time in learning SQL and R, which paid off in improved processes.
- Data Migration: Migrating historical data from Excel to SQL required careful planning, including data cleaning.
Conclusion
By transitioning from Excel to SQL and R, the company achieved significant improvements in performance, accuracy, and scalability. This transformation enabled better decision-making, increased efficiency, and reduced the risk of errors, positioning the company for future growth.
This case exemplifies how companies handling large model point files and actuarial data can leverage the power of R and SQL to overcome the limitations of Excel, streamline processes, and enhance their analytical capabilities.