🎲 Random Number Generation: The Art of Controlled Chaos

Imagine you're a casino owner who needs perfectly fair dice, but also needs to replay exact games for security reviews. That's random number generation in NumPy! 🎰 It gives you randomness when you need it, reproducibility when you require it, and distributions for every occasion - from simulating quantum physics to generating synthetic datasets for machine learning.

The Random Universe: More Than Just Dice Rolls 🌌

Random numbers are the lifeblood of data science. They power Monte Carlo simulations, bootstrap statistics, neural network initialization, data augmentation, A/B testing, and so much more. But here's the secret: computer-generated random numbers aren't truly random - they're pseudorandom, following deterministic algorithms that appear random. This is actually a feature, not a bug!

graph TB A[Random Number Generation] --> B[Pseudorandom] A --> C[Uses] B --> D[Seed-based] B --> E[Reproducible] B --> F[Deterministic] C --> G[Simulation] C --> H[Sampling] C --> I[Testing] C --> J[ML Training] G --> K[Monte Carlo] H --> L[Bootstrap] I --> M[Synthetic Data] J --> N[Weight Init] style A fill:#667eea style B fill:#4ecdc4 style C fill:#ffd93d

The Slot Machine Analogy 🎰

Think of NumPy's random number generator as a sophisticated slot machine. The seed is like setting the starting position of all the reels. Pull the lever (call a random function) and you get a result. Reset to the same seed, pull again, and you get the exact same sequence! This is crucial for debugging, reproducible research, and testing.

import numpy as np
import matplotlib.pyplot as plt

# Random Number Generation Fundamentals 🎲

def explore_random_basics():
    """
    Understanding the foundations of random number generation.
    Every data scientist's essential toolkit!
    """
    print("🎲 Random Number Generation Basics")
    print("=" * 60)
    
    # The Random Number Generator (RNG)
    rng = np.random.RandomState(42)  # Create a generator with seed
    
    print("Creating Random Numbers:")
    print("-" * 40)
    
    # Basic random generation
    random_float = rng.random()  # Single random float [0, 1)
    random_array = rng.random(5)  # Array of random floats
    random_matrix = rng.random((3, 3))  # Random matrix
    
    print(f"Single random float: {random_float:.6f}")
    print(f"Random array: {random_array}")
    print(f"Random matrix:\n{random_matrix}")
    
    # Different ranges
    print("\n📊 Random Numbers in Different Ranges:")
    
    # Random integers
    random_ints = rng.randint(0, 10, size=10)  # 10 random ints [0, 10)
    print(f"Random integers (0-9): {random_ints}")
    
    # Random floats in custom range
    random_range = rng.uniform(-5, 5, size=5)  # Uniform in [-5, 5]
    print(f"Random floats (-5 to 5): {random_range}")
    
    # Random choice from array
    options = ['red', 'green', 'blue', 'yellow']
    random_colors = rng.choice(options, size=10, replace=True)
    print(f"Random choices: {random_colors}")
    
    # Shuffling
    deck = np.arange(52)  # Card deck
    rng.shuffle(deck)  # In-place shuffle
    print(f"Shuffled deck (first 10): {deck[:10]}")

explore_random_basics()

# The Importance of Seeds 🌱
def demonstrate_seed_importance():
    """
    Seeds: The key to reproducibility in data science!
    """
    print("\n🌱 The Power of Seeds")
    print("=" * 60)
    
    # Without setting seed - different every time
    print("Without seed (run multiple times - different results):")
    for i in range(3):
        print(f"  Run {i+1}: {np.random.random(3).round(3)}")
    
    # With seed - same every time
    print("\nWith seed=42 (always same results):")
    for i in range(3):
        np.random.seed(42)
        print(f"  Run {i+1}: {np.random.random(3).round(3)}")
    
    # Practical example: Train/test split
    print("\n🔄 Reproducible Train/Test Split:")
    
    data = np.arange(100)
    
    # First split
    np.random.seed(123)
    np.random.shuffle(data)
    train1, test1 = data[:80], data[80:]
    
    # Reset and split again
    data = np.arange(100)
    np.random.seed(123)  # Same seed
    np.random.shuffle(data)
    train2, test2 = data[:80], data[80:]
    
    print(f"First split - test set (first 5): {test1[:5]}")
    print(f"Second split - test set (first 5): {test2[:5]}")
    print(f"Identical splits: {np.array_equal(test1, test2)}")

demonstrate_seed_importance()

Probability Distributions: Your Random Toolkit 📊

Different problems need different types of randomness. NumPy provides a rich collection of probability distributions, each with unique properties and use cases. Think of them as different dice - some fair, some weighted, some with infinite sides!

🎯 Uniform Distribution

Use Case: Equal probability for all outcomes

Real World: Dice rolls, random sampling, Monte Carlo

np.random.uniform(low=0, high=1, size=1000)

📈 Normal (Gaussian) Distribution

Use Case: Natural phenomena, measurement errors

Real World: Heights, test scores, stock returns

np.random.normal(loc=0, scale=1, size=1000)

⏱️ Exponential Distribution

Use Case: Time between events

Real World: Customer arrivals, component failures

np.random.exponential(scale=1.0, size=1000)

🎯 Binomial Distribution

Use Case: Number of successes in n trials

Real World: Coin flips, A/B testing, quality control

np.random.binomial(n=10, p=0.5, size=1000)

import numpy as np
import matplotlib.pyplot as plt

class DistributionExplorer:
    """
    Master class for exploring probability distributions.
    Your guide to the random universe! 🌟
    """
    
    def __init__(self, seed=42):
        self.rng = np.random.RandomState(seed)
        
    def uniform_distribution(self):
        """Uniform: Every outcome equally likely"""
        print("🎲 Uniform Distribution")
        print("=" * 60)
        
        # Generate uniform random numbers
        uniform_data = self.rng.uniform(0, 10, size=10000)
        
        print("Properties:")
        print(f"  Range: [0, 10)")
        print(f"  Mean (theoretical): {5.0}")
        print(f"  Mean (actual): {uniform_data.mean():.3f}")
        print(f"  Std (theoretical): {10/np.sqrt(12):.3f}")
        print(f"  Std (actual): {uniform_data.std():.3f}")
        
        # Real-world example: Random sampling
        print("\n📊 Example: Random Sampling")
        population = np.arange(1, 101)  # Population of 100 items
        sample = self.rng.choice(population, size=10, replace=False)
        print(f"Random sample of 10: {sorted(sample)}")
        
        # Dice simulation
        print("\n🎲 Example: Dice Rolling")
        dice_rolls = self.rng.randint(1, 7, size=1000)
        for face in range(1, 7):
            prob = (dice_rolls == face).mean()
            print(f"  Face {face}: {prob*100:.1f}% (expected: 16.7%)")
    
    def normal_distribution(self):
        """Normal: The bell curve - nature's favorite"""
        print("\n📊 Normal (Gaussian) Distribution")
        print("=" * 60)
        
        # Generate normal random numbers
        mean, std = 100, 15  # IQ scores example
        normal_data = self.rng.normal(mean, std, size=10000)
        
        print("Properties:")
        print(f"  Mean (μ): {mean}")
        print(f"  Std Dev (σ): {std}")
        print(f"  Actual mean: {normal_data.mean():.3f}")
        print(f"  Actual std: {normal_data.std():.3f}")
        
        # 68-95-99.7 rule
        within_1_std = np.sum(np.abs(normal_data - mean) <= std) / len(normal_data)
        within_2_std = np.sum(np.abs(normal_data - mean) <= 2*std) / len(normal_data)
        within_3_std = np.sum(np.abs(normal_data - mean) <= 3*std) / len(normal_data)
        
        print("\n📐 68-95-99.7 Rule:")
        print(f"  Within 1σ: {within_1_std*100:.1f}% (expected: 68.3%)")
        print(f"  Within 2σ: {within_2_std*100:.1f}% (expected: 95.4%)")
        print(f"  Within 3σ: {within_3_std*100:.1f}% (expected: 99.7%)")
        
        # Real-world example: Height distribution
        print("\n👥 Example: Height Distribution")
        male_heights = self.rng.normal(175, 7, size=1000)  # cm
        female_heights = self.rng.normal(162, 6, size=1000)  # cm
        
        print(f"Male heights: mean={male_heights.mean():.1f}cm, std={male_heights.std():.1f}cm")
        print(f"Female heights: mean={female_heights.mean():.1f}cm, std={female_heights.std():.1f}cm")
        print(f"Tallest 5% of males: >{np.percentile(male_heights, 95):.1f}cm")
        
        # Central Limit Theorem demonstration
        print("\n🎯 Central Limit Theorem Demo:")
        sample_means = []
        for _ in range(1000):
            # Average of 30 uniform random numbers
            sample = self.rng.uniform(0, 1, size=30)
            sample_means.append(sample.mean())
        
        sample_means = np.array(sample_means)
        print(f"Uniform samples become normal when averaged!")
        print(f"Mean of sample means: {sample_means.mean():.3f} (expected: 0.5)")
        print(f"Std of sample means: {sample_means.std():.3f}")
    
    def exponential_distribution(self):
        """Exponential: Time until next event"""
        print("\n⏱️ Exponential Distribution")
        print("=" * 60)
        
        # Customer arrival times (average 1 every 5 minutes)
        avg_time = 5.0  # minutes
        arrival_intervals = self.rng.exponential(avg_time, size=1000)
        
        print("Properties:")
        print(f"  Average interval (λ): {avg_time} minutes")
        print(f"  Actual mean: {arrival_intervals.mean():.3f} minutes")
        print(f"  Median: {np.median(arrival_intervals):.3f} minutes")
        
        # Memoryless property
        print("\n🔄 Memoryless Property:")
        print("  Probability of waiting > 5 min: "
              f"{(arrival_intervals > 5).mean()*100:.1f}%")
        print("  If already waited 5 min, prob of 5 more: "
              f"{(arrival_intervals > 10).sum() / (arrival_intervals > 5).sum()*100:.1f}%")
        
        # Real-world example: Server response times
        print("\n💻 Example: Server Response Times")
        response_times = self.rng.exponential(0.1, size=10000)  # 100ms average
        
        print(f"Average response: {response_times.mean()*1000:.1f}ms")
        print(f"95th percentile: {np.percentile(response_times, 95)*1000:.1f}ms")
        print(f"99th percentile: {np.percentile(response_times, 99)*1000:.1f}ms")
    
    def binomial_distribution(self):
        """Binomial: Counting successes"""
        print("\n🎯 Binomial Distribution")
        print("=" * 60)
        
        # Coin flips
        n_flips = 10
        p_heads = 0.5
        coin_results = self.rng.binomial(n_flips, p_heads, size=1000)
        
        print("Coin Flip Experiment:")
        print(f"  Flips per trial: {n_flips}")
        print(f"  Probability of heads: {p_heads}")
        print(f"  Expected heads: {n_flips * p_heads}")
        print(f"  Actual mean heads: {coin_results.mean():.3f}")
        
        # Distribution of results
        for k in range(11):
            prob = (coin_results == k).mean()
            print(f"  Exactly {k} heads: {prob*100:5.1f}%")
        
        # Real-world example: A/B testing
        print("\n🧪 Example: A/B Testing")
        
        # Version A: 10% conversion rate
        # Version B: 12% conversion rate
        visitors_per_version = 1000
        
        conversions_A = self.rng.binomial(1, 0.10, size=visitors_per_version)
        conversions_B = self.rng.binomial(1, 0.12, size=visitors_per_version)
        
        print(f"Version A: {conversions_A.sum()} conversions ({conversions_A.mean()*100:.1f}%)")
        print(f"Version B: {conversions_B.sum()} conversions ({conversions_B.mean()*100:.1f}%)")
        
        # Statistical significance (simplified)
        difference = conversions_B.mean() - conversions_A.mean()
        print(f"Observed lift: {difference*100:.1f}%")
    
    def poisson_distribution(self):
        """Poisson: Counting events in fixed time/space"""
        print("\n📈 Poisson Distribution")
        print("=" * 60)
        
        # Website visits per hour
        avg_visits = 100
        hourly_visits = self.rng.poisson(avg_visits, size=24)  # 24 hours
        
        print("Website Traffic Model:")
        print(f"  Average visits/hour: {avg_visits}")
        print(f"  Actual daily total: {hourly_visits.sum()}")
        print(f"  Peak hour: {hourly_visits.max()} visits")
        print(f"  Quiet hour: {hourly_visits.min()} visits")
        
        # Relationship to exponential
        print("\n🔗 Poisson-Exponential Connection:")
        print("  Poisson: Number of events in fixed time")
        print("  Exponential: Time between events")
        
        # If average is 100 visits/hour, time between visits:
        time_between = 60 / avg_visits  # minutes
        print(f"  Average time between visits: {time_between:.2f} minutes")
        
        # Real-world example: Call center
        print("\n☎️ Example: Call Center Planning")
        calls_per_minute = self.rng.poisson(3, size=60)  # 60 minutes
        
        print(f"Calls in 1 hour: {calls_per_minute.sum()}")
        print(f"Minutes with 0 calls: {(calls_per_minute == 0).sum()}")
        print(f"Minutes with >5 calls: {(calls_per_minute > 5).sum()}")
        print(f"Staff needed (1 per 2 calls): {int(np.ceil(calls_per_minute.max() / 2))}")
    
    def multivariate_normal(self):
        """Multivariate Normal: Correlated random variables"""
        print("\n🌐 Multivariate Normal Distribution")
        print("=" * 60)
        
        # Correlated variables: Height and Weight
        mean = [175, 75]  # Mean height (cm), weight (kg)
        
        # Covariance matrix
        # Variance of height: 49 (std=7)
        # Variance of weight: 100 (std=10)
        # Correlation: 0.7
        correlation = 0.7
        cov = [[49, correlation * 7 * 10],
               [correlation * 7 * 10, 100]]
        
        data = self.rng.multivariate_normal(mean, cov, size=1000)
        heights = data[:, 0]
        weights = data[:, 1]
        
        print("Correlated Variables (Height & Weight):")
        print(f"  Height: mean={heights.mean():.1f}cm, std={heights.std():.1f}cm")
        print(f"  Weight: mean={weights.mean():.1f}kg, std={weights.std():.1f}kg")
        
        # Calculate actual correlation
        actual_corr = np.corrcoef(heights, weights)[0, 1]
        print(f"  Correlation (expected): {correlation}")
        print(f"  Correlation (actual): {actual_corr:.3f}")
        
        # Conditional distribution
        tall_people = weights[heights > 185]
        short_people = weights[heights < 165]
        
        print(f"\nConditional Distributions:")
        print(f"  Weight of tall people (>185cm): {tall_people.mean():.1f}kg")
        print(f"  Weight of short people (<165cm): {short_people.mean():.1f}kg")
        print(f"  Difference: {tall_people.mean() - short_people.mean():.1f}kg")

# Explore all distributions
explorer = DistributionExplorer()
explorer.uniform_distribution()
explorer.normal_distribution()
explorer.exponential_distribution()
explorer.binomial_distribution()
explorer.poisson_distribution()
explorer.multivariate_normal()

Monte Carlo Simulations: Solving Problems with Randomness 🎰

Monte Carlo methods use random sampling to solve problems that might be deterministic in principle. It's like estimating the area of a lake by randomly throwing stones and counting how many land in the water!

graph LR A[Complex Problem] --> B[Monte Carlo Simulation] B --> C[Random Sampling] C --> D[Many Iterations] D --> E[Statistical Analysis] E --> F[Solution Estimate] G[Examples] --> H[π Estimation] G --> I[Option Pricing] G --> J[Risk Analysis] G --> K[Integration] H --> B I --> B J --> B K --> B style B fill:#4ecdc4 style F fill:#51cf66

import numpy as np

class MonteCarloSimulations:
    """
    Monte Carlo methods: Solving complex problems with random numbers.
    When in doubt, simulate it out! 🎲
    """
    
    def __init__(self, seed=42):
        self.rng = np.random.RandomState(seed)
    
    def estimate_pi(self, n_samples=1000000):
        """
        Estimate π by randomly throwing darts at a square!
        Classic Monte Carlo example.
        """
        print("🎯 Estimating π with Monte Carlo")
        print("=" * 60)
        
        # Generate random points in [0,1] x [0,1]
        x = self.rng.random(n_samples)
        y = self.rng.random(n_samples)
        
        # Check if points fall inside unit circle
        inside_circle = (x**2 + y**2) <= 1
        
        # π/4 = area of quarter circle / area of square
        pi_estimate = 4 * inside_circle.mean()
        
        print(f"Number of samples: {n_samples:,}")
        print(f"Points inside circle: {inside_circle.sum():,}")
        print(f"π estimate: {pi_estimate:.6f}")
        print(f"Actual π: {np.pi:.6f}")
        print(f"Error: {abs(pi_estimate - np.pi):.6f}")
        
        # Show convergence
        print("\n📈 Convergence Analysis:")
        sample_sizes = [100, 1000, 10000, 100000, 1000000]
        for n in sample_sizes:
            x_subset = x[:n]
            y_subset = y[:n]
            inside = (x_subset**2 + y_subset**2) <= 1
            estimate = 4 * inside.mean()
            error = abs(estimate - np.pi)
            print(f"  n={n:>7,}: π≈{estimate:.5f}, error={error:.5f}")
        
        return pi_estimate
    
    def stock_price_simulation(self, S0=100, mu=0.05, sigma=0.2, 
                              T=1.0, n_steps=252, n_paths=10000):
        """
        Simulate stock prices using Geometric Brownian Motion.
        The foundation of option pricing!
        """
        print("\n📈 Stock Price Monte Carlo Simulation")
        print("=" * 60)
        
        print(f"Initial price: ${S0}")
        print(f"Annual return (μ): {mu*100:.1f}%")
        print(f"Annual volatility (σ): {sigma*100:.1f}%")
        print(f"Time horizon: {T} year")
        print(f"Paths simulated: {n_paths:,}")
        
        # Time steps
        dt = T / n_steps
        
        # Generate random shocks
        Z = self.rng.randn(n_paths, n_steps)
        
        # Simulate paths using GBM formula
        # dS = μS dt + σS dW
        returns = (mu - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * Z
        
        # Calculate prices
        price_paths = np.zeros((n_paths, n_steps + 1))
        price_paths[:, 0] = S0
        
        for t in range(1, n_steps + 1):
            price_paths[:, t] = price_paths[:, t-1] * np.exp(returns[:, t-1])
        
        # Statistics
        final_prices = price_paths[:, -1]
        
        print(f"\n📊 Results after {T} year:")
        print(f"  Mean price: ${final_prices.mean():.2f}")
        print(f"  Median price: ${np.median(final_prices):.2f}")
        print(f"  Std deviation: ${final_prices.std():.2f}")
        
        # Percentiles (Value at Risk)
        percentiles = [1, 5, 25, 50, 75, 95, 99]
        print(f"\n📉 Price Distribution:")
        for p in percentiles:
            value = np.percentile(final_prices, p)
            print(f"  {p:>2}th percentile: ${value:.2f}")
        
        # Probability of profit/loss
        prob_profit = (final_prices > S0).mean()
        prob_double = (final_prices > 2 * S0).mean()
        prob_loss_50 = (final_prices < 0.5 * S0).mean()
        
        print(f"\n🎲 Probabilities:")
        print(f"  Profit (>$100): {prob_profit*100:.1f}%")
        print(f"  Double (>$200): {prob_double*100:.1f}%")
        print(f"  Loss >50% (<$50): {prob_loss_50*100:.1f}%")
        
        return price_paths
    
    def option_pricing_black_scholes(self, S0=100, K=105, r=0.05, 
                                    sigma=0.2, T=0.25, n_simulations=100000):
        """
        Price a European call option using Monte Carlo.
        Compare with Black-Scholes formula.
        """
        print("\n💰 Option Pricing with Monte Carlo")
        print("=" * 60)
        
        print("Option Parameters:")
        print(f"  Current price (S₀): ${S0}")
        print(f"  Strike price (K): ${K}")
        print(f"  Risk-free rate: {r*100:.1f}%")
        print(f"  Volatility: {sigma*100:.1f}%")
        print(f"  Time to expiry: {T} years")
        
        # Simulate final prices
        Z = self.rng.randn(n_simulations)
        ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * Z)
        
        # Calculate payoffs
        payoffs = np.maximum(ST - K, 0)  # Call option payoff
        
        # Discount to present value
        option_price = np.exp(-r * T) * payoffs.mean()
        
        print(f"\n📊 Monte Carlo Results:")
        print(f"  Simulations: {n_simulations:,}")
        print(f"  Option price: ${option_price:.3f}")
        
        # Standard error
        std_error = payoffs.std() / np.sqrt(n_simulations) * np.exp(-r * T)
        print(f"  Standard error: ${std_error:.4f}")
        print(f"  95% CI: [${option_price - 1.96*std_error:.3f}, "
              f"${option_price + 1.96*std_error:.3f}]")
        
        # Black-Scholes formula for comparison
        from scipy.stats import norm
        d1 = (np.log(S0/K) + (r + 0.5*sigma**2)*T) / (sigma*np.sqrt(T))
        d2 = d1 - sigma*np.sqrt(T)
        bs_price = S0*norm.cdf(d1) - K*np.exp(-r*T)*norm.cdf(d2)
        
        print(f"\n✅ Black-Scholes price: ${bs_price:.3f}")
        print(f"Monte Carlo error: ${abs(option_price - bs_price):.4f}")
        
        # Greeks estimation via Monte Carlo
        print(f"\n🔢 Greeks Estimation:")
        
        # Delta: price sensitivity to stock price
        S_up = S0 * 1.01
        Z_new = self.rng.randn(n_simulations)
        ST_up = S_up * np.exp((r - 0.5*sigma**2)*T + sigma*np.sqrt(T)*Z_new)
        payoffs_up = np.maximum(ST_up - K, 0)
        price_up = np.exp(-r*T) * payoffs_up.mean()
        
        delta = (price_up - option_price) / (S_up - S0)
        print(f"  Delta (∂V/∂S): {delta:.4f}")
        
        return option_price
    
    def portfolio_var_simulation(self, n_assets=5, n_days=252, n_simulations=10000):
        """
        Calculate Value at Risk (VaR) for a portfolio.
        Risk management through simulation.
        """
        print("\n⚠️ Portfolio Value at Risk (VaR)")
        print("=" * 60)
        
        # Portfolio setup
        weights = self.rng.dirichlet(np.ones(n_assets))  # Random weights sum to 1
        initial_value = 1000000  # $1M portfolio
        
        print(f"Portfolio Setup:")
        print(f"  Initial value: ${initial_value:,}")
        print(f"  Number of assets: {n_assets}")
        print(f"  Weights: {weights.round(3)}")
        
        # Historical returns parameters (annualized)
        mean_returns = self.rng.uniform(0.05, 0.15, n_assets)
        volatilities = self.rng.uniform(0.15, 0.30, n_assets)
        
        # Correlation matrix (realistic correlations)
        correlation = 0.3 + 0.5 * self.rng.random((n_assets, n_assets))
        np.fill_diagonal(correlation, 1)
        correlation = (correlation + correlation.T) / 2  # Symmetrize
        
        # Covariance matrix
        cov_matrix = np.outer(volatilities, volatilities) * correlation
        
        # Simulate portfolio returns
        daily_returns = self.rng.multivariate_normal(
            mean_returns/252, 
            cov_matrix/252, 
            size=(n_simulations, n_days)
        )
        
        # Calculate portfolio returns
        portfolio_returns = daily_returns @ weights
        
        # Calculate portfolio values
        portfolio_values = initial_value * np.cumprod(1 + portfolio_returns, axis=1)
        final_values = portfolio_values[:, -1]
        
        # VaR calculations
        var_95 = initial_value - np.percentile(final_values, 5)
        var_99 = initial_value - np.percentile(final_values, 1)
        cvar_95 = initial_value - final_values[final_values <= np.percentile(final_values, 5)].mean()
        
        print(f"\n📊 Risk Metrics (1 year horizon):")
        print(f"  VaR 95%: ${var_95:,.2f}")
        print(f"  VaR 99%: ${var_99:,.2f}")
        print(f"  CVaR 95% (Expected Shortfall): ${cvar_95:,.2f}")
        
        # Probability of various losses
        prob_loss = (final_values < initial_value).mean()
        prob_loss_10 = (final_values < 0.9 * initial_value).mean()
        prob_loss_25 = (final_values < 0.75 * initial_value).mean()
        
        print(f"\n🎯 Loss Probabilities:")
        print(f"  Any loss: {prob_loss*100:.1f}%")
        print(f"  Loss > 10%: {prob_loss_10*100:.1f}%")
        print(f"  Loss > 25%: {prob_loss_25*100:.1f}%")
        
        # Best and worst scenarios
        print(f"\n📈 Scenario Analysis:")
        print(f"  Best case (99th percentile): ${np.percentile(final_values, 99):,.2f}")
        print(f"  Worst case (1st percentile): ${np.percentile(final_values, 1):,.2f}")
        print(f"  Expected value: ${final_values.mean():,.2f}")
        
        return final_values

# Run Monte Carlo simulations
mc = MonteCarloSimulations()
mc.estimate_pi()
price_paths = mc.stock_price_simulation()
option_price = mc.option_pricing_black_scholes()
portfolio_values = mc.portfolio_var_simulation()

Random Sampling and Bootstrapping 🎯

Random sampling is the foundation of statistical inference. Bootstrapping takes this further - it's like creating parallel universes of your data to understand uncertainty!

import numpy as np

class RandomSampling:
    """
    Advanced sampling techniques for data science.
    From simple random sampling to sophisticated bootstrap methods!
    """
    
    def __init__(self, seed=42):
        self.rng = np.random.RandomState(seed)
    
    def sampling_methods(self):
        """Demonstrate various sampling techniques"""
        print("🎯 Sampling Methods")
        print("=" * 60)
        
        # Create a population
        population = np.arange(1, 101)  # 100 items
        
        # 1. Simple Random Sampling
        print("1️⃣ Simple Random Sampling:")
        simple_sample = self.rng.choice(population, size=10, replace=False)
        print(f"  Sample: {sorted(simple_sample)}")
        
        # 2. Systematic Sampling
        print("\n2️⃣ Systematic Sampling:")
        k = len(population) // 10  # Every kth element
        start = self.rng.randint(0, k)
        systematic_sample = population[start::k]
        print(f"  Every {k}th element starting at {start}: {systematic_sample}")
        
        # 3. Stratified Sampling
        print("\n3️⃣ Stratified Sampling:")
        # Divide into strata
        strata1 = population[:25]   # Low
        strata2 = population[25:75]  # Medium  
        strata3 = population[75:]    # High
        
        # Sample from each stratum
        sample1 = self.rng.choice(strata1, size=3, replace=False)
        sample2 = self.rng.choice(strata2, size=5, replace=False)
        sample3 = self.rng.choice(strata3, size=2, replace=False)
        
        stratified_sample = np.concatenate([sample1, sample2, sample3])
        print(f"  Low stratum: {sorted(sample1)}")
        print(f"  Medium stratum: {sorted(sample2)}")
        print(f"  High stratum: {sorted(sample3)}")
        
        # 4. Weighted Sampling
        print("\n4️⃣ Weighted Sampling:")
        # Higher numbers have higher probability
        weights = population / population.sum()
        weighted_sample = self.rng.choice(population, size=10, 
                                        replace=True, p=weights)
        print(f"  Sample (biased toward higher values): {sorted(weighted_sample)}")
        
        # 5. Reservoir Sampling (for streaming data)
        print("\n5️⃣ Reservoir Sampling (streaming):")
        
        def reservoir_sampling(stream, k):
            """Sample k items from a stream of unknown length"""
            reservoir = []
            
            for i, item in enumerate(stream):
                if i < k:
                    reservoir.append(item)
                else:
                    # Replace with probability k/(i+1)
                    j = self.rng.randint(0, i + 1)
                    if j < k:
                        reservoir[j] = item
            
            return reservoir
        
        # Simulate a stream
        stream = iter(range(1, 1001))  # 1000 items
        reservoir = reservoir_sampling(stream, 10)
        print(f"  Sample from stream: {sorted(reservoir)}")
    
    def bootstrap_confidence_intervals(self):
        """
        Bootstrap: Estimate uncertainty by resampling.
        The statistician's time machine!
        """
        print("\n🔄 Bootstrap Confidence Intervals")
        print("=" * 60)
        
        # Original sample (e.g., customer satisfaction scores)
        original_data = self.rng.normal(75, 10, size=100)
        
        print(f"Original sample:")
        print(f"  Size: {len(original_data)}")
        print(f"  Mean: {original_data.mean():.2f}")
        print(f"  Std: {original_data.std():.2f}")
        
        # Bootstrap resampling
        n_bootstrap = 10000
        bootstrap_means = []
        
        for _ in range(n_bootstrap):
            # Resample with replacement
            bootstrap_sample = self.rng.choice(original_data, 
                                             size=len(original_data), 
                                             replace=True)
            bootstrap_means.append(bootstrap_sample.mean())
        
        bootstrap_means = np.array(bootstrap_means)
        
        # Calculate confidence intervals
        confidence_levels = [90, 95, 99]
        
        print(f"\n📊 Bootstrap Results ({n_bootstrap} resamples):")
        print(f"  Bootstrap mean: {bootstrap_means.mean():.2f}")
        print(f"  Bootstrap SE: {bootstrap_means.std():.3f}")
        
        print(f"\n🎯 Confidence Intervals:")
        for conf in confidence_levels:
            alpha = (100 - conf) / 2
            lower = np.percentile(bootstrap_means, alpha)
            upper = np.percentile(bootstrap_means, 100 - alpha)
            print(f"  {conf}% CI: [{lower:.2f}, {upper:.2f}]")
        
        # Compare with theoretical CI (assuming normal)
        theoretical_se = original_data.std() / np.sqrt(len(original_data))
        theoretical_ci_95 = [
            original_data.mean() - 1.96 * theoretical_se,
            original_data.mean() + 1.96 * theoretical_se
        ]
        print(f"\n📐 Theoretical 95% CI: [{theoretical_ci_95[0]:.2f}, {theoretical_ci_95[1]:.2f}]")
        
        return bootstrap_means
    
    def permutation_testing(self):
        """
        Permutation tests: Non-parametric hypothesis testing.
        When assumptions don't hold, shuffle it!
        """
        print("\n🔀 Permutation Testing")
        print("=" * 60)
        
        # Two groups (e.g., treatment vs control)
        group_A = self.rng.normal(100, 15, size=50)  # Control
        group_B = self.rng.normal(105, 15, size=50)  # Treatment (5 point improvement)
        
        observed_diff = group_B.mean() - group_A.mean()
        
        print("Hypothesis Test: Is there a difference between groups?")
        print(f"  Group A mean: {group_A.mean():.2f}")
        print(f"  Group B mean: {group_B.mean():.2f}")
        print(f"  Observed difference: {observed_diff:.2f}")
        
        # Permutation test
        n_permutations = 10000
        combined = np.concatenate([group_A, group_B])
        permuted_diffs = []
        
        for _ in range(n_permutations):
            # Shuffle and split
            self.rng.shuffle(combined)
            perm_A = combined[:50]
            perm_B = combined[50:]
            
            # Calculate difference
            perm_diff = perm_B.mean() - perm_A.mean()
            permuted_diffs.append(perm_diff)
        
        permuted_diffs = np.array(permuted_diffs)
        
        # Calculate p-value
        p_value = (np.abs(permuted_diffs) >= np.abs(observed_diff)).mean()
        
        print(f"\n📊 Permutation Test Results:")
        print(f"  Permutations: {n_permutations}")
        print(f"  P-value: {p_value:.4f}")
        print(f"  Significant at α=0.05? {'Yes ✅' if p_value < 0.05 else 'No ❌'}")
        
        # Effect size
        pooled_std = np.sqrt((group_A.var() + group_B.var()) / 2)
        cohens_d = observed_diff / pooled_std
        print(f"  Cohen's d (effect size): {cohens_d:.3f}")
        
        return permuted_diffs
    
    def cross_validation_splits(self):
        """
        Random splits for cross-validation.
        Essential for machine learning!
        """
        print("\n✂️ Cross-Validation Splits")
        print("=" * 60)
        
        # Dataset
        n_samples = 100
        X = np.arange(n_samples).reshape(-1, 1)  # Features
        y = np.arange(n_samples)  # Labels
        
        # 1. Train/Test Split
        print("1️⃣ Train/Test Split (80/20):")
        
        indices = np.arange(n_samples)
        self.rng.shuffle(indices)
        
        train_size = int(0.8 * n_samples)
        train_idx = indices[:train_size]
        test_idx = indices[train_size:]
        
        print(f"  Train indices (first 10): {sorted(train_idx[:10])}")
        print(f"  Test indices (first 10): {sorted(test_idx[:10])}")
        
        # 2. K-Fold Cross-Validation
        print("\n2️⃣ K-Fold Cross-Validation (k=5):")
        
        k = 5
        fold_size = n_samples // k
        
        for fold in range(k):
            start = fold * fold_size
            end = start + fold_size if fold < k-1 else n_samples
            
            val_idx = indices[start:end]
            train_idx = np.concatenate([indices[:start], indices[end:]])
            
            print(f"  Fold {fold+1}: Val size={len(val_idx)}, Train size={len(train_idx)}")
        
        # 3. Stratified K-Fold (for imbalanced classes)
        print("\n3️⃣ Stratified K-Fold (preserving class distribution):")
        
        # Create imbalanced labels
        y_imbalanced = np.array([0]*70 + [1]*30)  # 70% class 0, 30% class 1
        
        # Stratified split
        class_0_idx = np.where(y_imbalanced == 0)[0]
        class_1_idx = np.where(y_imbalanced == 1)[0]
        
        self.rng.shuffle(class_0_idx)
        self.rng.shuffle(class_1_idx)
        
        fold_size_0 = len(class_0_idx) // k
        fold_size_1 = len(class_1_idx) // k
        
        for fold in range(k):
            val_0 = class_0_idx[fold*fold_size_0:(fold+1)*fold_size_0]
            val_1 = class_1_idx[fold*fold_size_1:(fold+1)*fold_size_1]
            val_idx = np.concatenate([val_0, val_1])
            
            print(f"  Fold {fold+1}: Class 0={len(val_0)}, Class 1={len(val_1)}")

# Explore sampling techniques
sampler = RandomSampling()
sampler.sampling_methods()
bootstrap_means = sampler.bootstrap_confidence_intervals()
permuted_diffs = sampler.permutation_testing()
sampler.cross_validation_splits()

Best Practices and Common Pitfalls 🎯

🌱 Always Set Seeds for Reproducibility

# Good: Reproducible
np.random.seed(42)
results = np.random.random(100)

# Better: Use RandomState
rng = np.random.RandomState(42)
results = rng.random(100)

# Best: Use new Generator (NumPy 1.17+)
rng = np.random.default_rng(42)
results = rng.random(100)

⚠️ Common Pitfalls

Not setting seeds: Results change every run
Using global random state: Functions affect each other
Wrong distribution: Using uniform when you need normal
Ignoring numerical stability: Log of zero, division issues
Sample size too small: Monte Carlo needs many samples

✅ Best Practices

Use appropriate distributions: Match the problem domain
Check convergence: Increase samples until stable
Validate with known solutions: Test on problems with analytical answers
Document random choices: Explain why you chose specific distributions
Use vectorization: Generate all random numbers at once

Summary: Your Random Number Toolkit 🎲

🎯 Key Takeaways:

Reproducibility: Always set seeds for reproducible research
Distributions Matter: Choose the right distribution for your problem
Monte Carlo Power: Solve complex problems through simulation
Bootstrap Magic: Estimate uncertainty without assumptions
Vectorize Generation: Generate all random numbers at once for speed
Test and Validate: Always verify your random procedures

# Quick Random Reference Card
import numpy as np

# SETUP
# -----
rng = np.random.RandomState(42)  # Reproducible generator

# BASIC GENERATION
# ----------------
rng.random(size)           # Uniform [0, 1)
rng.randint(low, high, size)  # Integers [low, high)
rng.choice(array, size)    # Random selection
rng.shuffle(array)         # In-place shuffle
rng.permutation(array)     # Return shuffled copy

# DISTRIBUTIONS
# -------------
rng.uniform(low, high, size)      # Uniform
rng.normal(mu, sigma, size)        # Normal/Gaussian
rng.exponential(scale, size)       # Exponential
rng.poisson(lam, size)            # Poisson
rng.binomial(n, p, size)          # Binomial
rng.multivariate_normal(mean, cov, size)  # Multivariate

# SAMPLING
# --------
# Simple random sample
sample = rng.choice(population, size=n, replace=False)

# Bootstrap sample
boot = rng.choice(data, size=len(data), replace=True)

# Stratified sample
strata_samples = [rng.choice(s, size=n) for s in strata]

# MONTE CARLO PATTERN
# -------------------
n_simulations = 10000
results = []
for _ in range(n_simulations):
    simulation = rng.random()  # Your random process
    results.append(simulation)
estimate = np.mean(results)
std_error = np.std(results) / np.sqrt(n_simulations)

print("🎲 Master randomness to master data science!")