Hi @Neelabha_Samadder,
Excellent research work and comprehensive data analysis! This is exactly the kind of detailed investigation needed to understand Upstoxās data consistency. Your 24.9% mismatch rate with production evidence is significant. Let me address your three critical questions:
1. Which endpoint should be considered authoritative for OHLC?
Based on standard market data practices: HISTORICAL endpoint is authoritative for backtesting and live/backtest reconciliation.
Reason: Historical data goes through post-market settlement, consolidation, and corporate action adjustments. Intraday data is provisional/raw market feed that hasnāt been finalized.
Provisional vs. Finalized distinction:
- Intraday API: Real-time quotes, pre-settlement state
- Historical API: Post-settlement, finalized, adjusted for corporate actions (splits, dividends)
2. Are these differences expected by design (provisional vs finalized)?
YES - Partially expected, BUT your 24.9% mismatch is EXCESSIVE:
Small differences (< 0.5%) are normal because:
- Settlement adjustments happen after market close
- Intraday can include partial fills not yet finalized
- Volume corrections during post-market processing
- Exchange feeds corrections
HOWEVER - Your findings suggest real issues:
- 0.57% close price difference (max) is unusually large for same-day reconciliation
- 24.9% of candles having ANY difference suggests potential API backend problems
- Typical systems achieve 95%+ exact match on same-day timestamps
Likely root causes:
- Data warehouse sync lag - Intraday and Historical hitting different DB servers with stale data
- Timestamp normalization issue - Both endpoints rounding/bucketing timestamps differently
- Volume reconciliation mismatch - Pre-settlement vs post-settlement volume states
- Exchange feed version mismatch - One endpoint on newer feed version than other
3. Recommended method to validate live vs backtest?
Production Validation Strategy (Use Historical API as Source of Truth):
# Reconciliation framework
class CandleReconciliation:
def validate_live_vs_backtest(self, live_candle, historical_candle, tolerance_pct=0.1):
"""
Validates live trading results against historical backtest
Source of Truth: Historical API
Tolerance: 0.1% price deviation (production standard)
"""
# Rule 1: Use HISTORICAL as baseline for next candle
# Never validate INTRADAY candles with INTRADAY
# Always wait for market close, then validate against HISTORICAL
issues = []
# Check each OHLC component with tolerance
for component in ['open', 'high', 'low', 'close']:
hist_val = historical_candle[component]
live_val = live_candle[component]
# Calculate deviation
deviation_pct = abs((live_val - hist_val) / hist_val) * 100
if deviation_pct > tolerance_pct:
issues.append({
'component': component,
'deviation_pct': deviation_pct,
'live': live_val,
'historical': hist_val,
'severity': 'CRITICAL' if deviation_pct > 0.5 else 'WARNING'
})
# Volume check (typically most stable)
vol_deviation = abs((live_candle['volume'] - historical_candle['volume']) /
historical_candle['volume']) * 100
return {
'validated': len(issues) == 0,
'issues': issues,
'volume_deviation_pct': vol_deviation,
'recommendation': 'Use HISTORICAL for backtesting, reconcile live trades EOD'
}
Practical Implementation for Your Trading System:
-
Live Trading (During Market Hours):
- Use INTRADAY for real-time decisions (you have no choice)
- Document all live trade entries/exits with INTRADAY candle data
- Record timestamps, OHLCV exactly as received
-
EOD Reconciliation (After Market Close):
- Fetch same timeframe from HISTORICAL API
- Compare your live trade candles against HISTORICAL
- Flag any discrepancies > 0.1% as data quality issues
- Document for exchange validation if needed
-
Backtest (Pre-market Analysis):
- Use HISTORICAL API exclusively
- Apply same 0.1% tolerance rules
- Store results separately from live trading results
- Reconcile live outcomes vs backtest predictions
-
Recommended approach for YOUR case (given 24.9% mismatch):
# Until Upstox fixes this, use this workaround:
# For live trading: Use INTRADAY (real-time requirement)
live_candles = api_instance.intraday("NSE_EQ|INE237A01028", "5m")
# For backtesting: Use HISTORICAL only
historical_candles = api_instance.historical(
"NSE_EQ|INE237A01028",
"5m",
from_date=start_date,
to_date=end_date
)
# Store separately
record_live_trades(live_candles) # For live performance tracking
record_backtest(historical_candles) # For strategy validation
# Reconciliation rule: Accept +/- 0.5% tolerance given Upstox's current state
# This temporarily accepts their data issues while you work with Upstox support
Action Items (Priority Order):
-
IMMEDIATE: Contact Upstox dev team with your case_for_upstox.zip file
- Tag @Ushnota @VINIT @MohitGolecha
- Include: 24.9% mismatch rate, sample tickers, date range
- Request: Investigation of DB sync between Intraday and Historical endpoints
-
SHORT-TERM: Use Historical API as source of truth for all backtesting
- Accept that same-day INTRADAY vs HISTORICAL will have these discrepancies
- Wait 1-2 hours post-market close for Historical API to fully settle
-
MEDIUM-TERM: Implement 0.5% tolerance in your live/backtest validator
- Document all mismatches for Upstox audit trail
- Track which tickers have consistent mismatches (data quality indicator)
-
LONG-TERM: Monitor if Upstox fixes this post-investigation
Your data analysis is solid evidence of a real backend issue. The fact that itās consistent across 6 tickers over 3 days suggests systemic problem, not random edge cases.
-VENKATA