README.md

# Option Pricing Engine with Market Data Pipeline 
## 📌 Project Description

This repository implements a **production-style quantitative valuation pipeline** for equity options, combining high-performance pricing models with a full data and calibration workflow.

The system goes beyond a standalone pricer: it integrates **market data ingestion, structured storage, numerical pricing, and volatility surface calibration** into a single reproducible framework.
### The goal of this project 

The goal of this project is to serve as a **modular foundation for quantitative modeling and experimentation** in option pricing and financial time series.

Rather than implementing a single model, the system is designed to support:

- benchmarking different pricing approaches (analytical, simulation-based, and data-driven),
- comparing numerical methods under realistic market data conditions,
- and extending toward more advanced workflows such as statistical learning and model calibration.

A key objective is to create an environment where **new ideas from research can be implemented, tested, and evaluated within a consistent pipeline**, rather than in isolated scripts or notebooks.

This includes:

- integrating alternative pricing methodologies into a shared framework,
- analyzing model behavior across time and market regimes,
- and building reproducible pipelines for both numerical and data-driven approaches.

Ultimately, the project aims to bridge:
- **theoretical models** (e.g. stochastic processes, volatility parameterizations),
- **numerical methods** (simulation, calibration),
- and **data-driven techniques** (time-series analysis, machine learning),

within a single, extensible system. Moving closer to a production-grade pipeline. 
### What the system does

The system supports the following workflow:

- Ingest listed option market data (Yahoo Finance)
- Normalize and store it in a relational database (PostgreSQL)
- Compute implied volatilities from observed prices
- Calibrate parametric volatility surfaces (SVI)
- Run pricing models (Black-Scholes, Monte Carlo)
- Expose fast pricing routines via Python for analysis and research

---
This project aims to **unify these components into a coherent system**, with clear interfaces between:

- **Data layer** (ingestion, storage, schema)
- **Model layer** (C++ pricing engines)
- **Analytics layer** (Python calibration and diagnostics)
- **Execution layer** (reproducible pipelines)

---

### Technology choices

The architecture deliberately combines multiple technologies, each chosen for a specific role:

- **C++ (C++20)**  
  Used for performance-critical pricing components (Monte Carlo, closed-form models) and clean domain modeling.

- **Python**  
  Used for orchestration, data processing, calibration (SVI), and rapid experimentation.

- **pybind11**  
  Bridges C++ and Python, enabling high-performance models to be used in flexible workflows.

- **PostgreSQL + SQLAlchemy**  
  Provides structured, queryable storage for market data and supports reproducible calibration pipelines.

---

### Key challenges addressed

This project tackles several non-trivial challenges:

- **Bridging performance and usability**  
  Integrating a C++ pricing engine into a Python-driven research pipeline.

- **Data consistency and reproducibility**  
  Designing a schema and ingestion process that supports reliable downstream calibration.

- **Implied volatility inversion and calibration**  
  Implementing stable numerical inversion and robust SVI fitting under noisy market data.

- **System design over isolated models**  
  Ensuring that data, models, and workflows interact cleanly as a unified system.

---

### Future directions

Planned improvements focus on moving further toward production-grade systems:

- Arbitrage-free implied volatility surface construction
- More robust calibration and smoothing techniques
- Performance optimization (parallel Monte Carlo, batching)
- Extension to additional data sources and APIs
- Improved testing of end-to-end data and calibration pipelines
- comparing classical stochastic models vs data-driven approaches for pricing or volatility forecasting

## What is included

- `cpp/`: core C++ pricing library (Monte Carlo + Black-Scholes closed form), DB ingestion hooks, and pybind bindings.
- `qengine/`: Python package exposing the native extension (`import qengine`).
- `src/ImpliedVolatility/`: SVI calibration and implied-volatility tooling.
- `src/data/`: data ingestion, SQL schema, and analytics helpers.
- `tests/`: C++ unit tests (GoogleTest).
- `scripts/`: operational scripts, including PostgreSQL setup.
- `docs/`: Doxygen configuration and generated API docs (ignored in git for publication).

## Quickstart

### 1) Clone and create a Python environment

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .
pip install pandas yfinance sqlalchemy psycopg2-binary matplotlib scipy
```

### 2) Configure environment variables

```bash
cp .env.example .env
```

Then edit `.env` with your local database credentials.

### 3) Create database and schema

Use the idempotent setup script:

```bash
source .env
python scripts/setup_postgres.py
```

This script creates/updates:
- database role (`DB_USER`)
- database (`DB_NAME`)
- tables/indexes from `src/data/sql/schema.sql`

### 4) Build C++ extension and run tests

```bash
cmake -S . -B build
cmake --build build -j
ctest --test-dir build --output-on-failure
```

### 5) Run Yahoo options ingestion

```bash
source .env
python src/data/ingestion/ingest_yahoo_options.py
```

`PIPELINE_SYMBOLS` in `.env` controls which symbols are ingested (comma-separated, e.g. `SPY,AAPL,QQQ`).

## Generating C++ API docs

```bash
cmake --build build --target docs
```
Update README.md Add a precise project description 2026-04-02 15:50:18 +00:00			`# Option Pricing Engine with Market Data Pipeline`
			`## 📌 Project Description`
Initial commit 2026-03-03 21:52:20 +00:00
Update README.md Add a precise project description 2026-04-02 15:50:18 +00:00			`This repository implements a production-style quantitative valuation pipeline for equity options, combining high-performance pricing models with a full data and calibration workflow.`

			`The system goes beyond a standalone pricer: it integrates market data ingestion, structured storage, numerical pricing, and volatility surface calibration into a single reproducible framework.`
			`### The goal of this project`

			`The goal of this project is to serve as a modular foundation for quantitative modeling and experimentation in option pricing and financial time series.`

			`Rather than implementing a single model, the system is designed to support:`

			`- benchmarking different pricing approaches (analytical, simulation-based, and data-driven),`
			`- comparing numerical methods under realistic market data conditions,`
			`- and extending toward more advanced workflows such as statistical learning and model calibration.`

			`A key objective is to create an environment where new ideas from research can be implemented, tested, and evaluated within a consistent pipeline, rather than in isolated scripts or notebooks.`

			`This includes:`

			`- integrating alternative pricing methodologies into a shared framework,`
			`- analyzing model behavior across time and market regimes,`
			`- and building reproducible pipelines for both numerical and data-driven approaches.`

			`Ultimately, the project aims to bridge:`
			`- theoretical models (e.g. stochastic processes, volatility parameterizations),`
			`- numerical methods (simulation, calibration),`
			`- and data-driven techniques (time-series analysis, machine learning),`

			`within a single, extensible system. Moving closer to a production-grade pipeline.`
			`### What the system does`

			`The system supports the following workflow:`

			`- Ingest listed option market data (Yahoo Finance)`
			`- Normalize and store it in a relational database (PostgreSQL)`
			`- Compute implied volatilities from observed prices`
			`- Calibrate parametric volatility surfaces (SVI)`
			`- Run pricing models (Black-Scholes, Monte Carlo)`
			`- Expose fast pricing routines via Python for analysis and research`

			`---`
			`This project aims to unify these components into a coherent system, with clear interfaces between:`

			`- Data layer (ingestion, storage, schema)`
			`- Model layer (C++ pricing engines)`
			`- Analytics layer (Python calibration and diagnostics)`
			`- Execution layer (reproducible pipelines)`

			`---`

			`### Technology choices`

			`The architecture deliberately combines multiple technologies, each chosen for a specific role:`

			`- C++ (C++20)`
			`Used for performance-critical pricing components (Monte Carlo, closed-form models) and clean domain modeling.`

			`- Python`
			`Used for orchestration, data processing, calibration (SVI), and rapid experimentation.`

			`- pybind11`
			`Bridges C++ and Python, enabling high-performance models to be used in flexible workflows.`

			`- PostgreSQL + SQLAlchemy`
			`Provides structured, queryable storage for market data and supports reproducible calibration pipelines.`

			`---`

			`### Key challenges addressed`

			`This project tackles several non-trivial challenges:`

			`- Bridging performance and usability`
			`Integrating a C++ pricing engine into a Python-driven research pipeline.`

			`- Data consistency and reproducibility`
			`Designing a schema and ingestion process that supports reliable downstream calibration.`

			`- Implied volatility inversion and calibration`
			`Implementing stable numerical inversion and robust SVI fitting under noisy market data.`

			`- System design over isolated models`
			`Ensuring that data, models, and workflows interact cleanly as a unified system.`

			`---`

			`### Future directions`

			`Planned improvements focus on moving further toward production-grade systems:`

			`- Arbitrage-free implied volatility surface construction`
			`- More robust calibration and smoothing techniques`
			`- Performance optimization (parallel Monte Carlo, batching)`
			`- Extension to additional data sources and APIs`
			`- Improved testing of end-to-end data and calibration pipelines`
			`- comparing classical stochastic models vs data-driven approaches for pricing or volatility forecasting`
Create core structure 2026-03-03 23:33:32 +01:00
Add publication-ready documentation and reproducible experiment package. Rewrite the README with secure setup instructions, add dedicated setup/security docs, and include the standalone local-volatility instability experiment materials for reproducible analysis. Made-with: Cursor 2026-04-02 16:30:56 +02:00			`## What is included`

			- `cpp/`: core C++ pricing library (Monte Carlo + Black-Scholes closed form), DB ingestion hooks, and pybind bindings.
			- `qengine/`: Python package exposing the native extension (`import qengine`).
			- `src/ImpliedVolatility/`: SVI calibration and implied-volatility tooling.
			- `src/data/`: data ingestion, SQL schema, and analytics helpers.
			- `tests/`: C++ unit tests (GoogleTest).
			- `scripts/`: operational scripts, including PostgreSQL setup.
			- `docs/`: Doxygen configuration and generated API docs (ignored in git for publication).

			`## Quickstart`

			`### 1) Clone and create a Python environment`

			```bash
			`python3 -m venv .venv`
			`source .venv/bin/activate`
			`pip install --upgrade pip`
			`pip install -e .`
			`pip install pandas yfinance sqlalchemy psycopg2-binary matplotlib scipy`
			```

			`### 2) Configure environment variables`

			```bash
			`cp .env.example .env`
			```

			Then edit `.env` with your local database credentials.

			`### 3) Create database and schema`

			`Use the idempotent setup script:`

			```bash
			`source .env`
			`python scripts/setup_postgres.py`
			```

			`This script creates/updates:`
			- database role (`DB_USER`)
			- database (`DB_NAME`)
			- tables/indexes from `src/data/sql/schema.sql`

			`### 4) Build C++ extension and run tests`

			```bash
			`cmake -S . -B build`
			`cmake --build build -j`
			`ctest --test-dir build --output-on-failure`
			```

			`### 5) Run Yahoo options ingestion`

			```bash
			`source .env`
			`python src/data/ingestion/ingest_yahoo_options.py`
			```

			`PIPELINE_SYMBOLS` in `.env` controls which symbols are ingested (comma-separated, e.g. `SPY,AAPL,QQQ`).

			`## Generating C++ API docs`

			```bash
			`cmake --build build --target docs`
			```