OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Calculation of Expected Frequency and Chi-Square Implementation

  • Thread starter Thread starter Karandeep Saluja
  • Start date Start date
K

Karandeep Saluja

Guest
I have data in a CSV file named "Data.csv," specifically focusing on the "Air Temperature" column. I attempted to fit the data using two distributions: "gamma" and "log-normal." Now, I am unsure whether I have calculated the expected frequency correctly and whether my implementation of the Chi-Square Test is accurate.

Here are the specific issues I am facing:

Expected Frequency Calculation: I need clarification on the correct method to calculate the expected frequencies for the Chi-Square Test when fitting "gamma" and "log-normal" distributions to my data.

Chi-Square Test Implementation: I am uncertain if my implementation of the Chi-Square Test is correct. I would appreciate guidance on the proper steps and any necessary corrections.

If possible, could someone please provide the correct approach to calculating expected frequencies and the correct implementation of the Chi-Square Test for my case?

Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
from scipy.stats import gamma, lognorm, kstest, chisquare, chi2_contingency
import warnings

warnings.simplefilter(action='ignore', category=FutureWarning)

df = pd.read_csv(f'Data.csv')

# Extract data
temperature = df['Air_Temperature(AT)']


def cs(n, y):
    return chisquare(n, np.sum(n) / np.sum(y) * y)


def interpret_goodness_of_fit(chi2_p, ks_p, alpha=0.05):
    if chi2_p > alpha or ks_p > alpha:
        return "Good fit"
    else:
        return "Not a good fit"


# Calculate bin size
number_of_bins = int(1 + np.log2(len(temperature)))
print("Number of bins:", number_of_bins)

# Fit gamma distribution
gamma_params = gamma.fit(temperature)
print("Gamma distribution parameters:", gamma_params)

# Fit log-normal distribution
lognorm_params = lognorm.fit(temperature)
print("Log-normal distribution parameters:", lognorm_params)
print("---------------")

# Generate fitted gamma distribution
x = np.linspace(min(temperature), max(temperature), 100)
gamma_pdf_fitted = gamma.pdf(x, *gamma_params)

# Generate fitted log-normal distribution
lognorm_pdf_fitted = lognorm.pdf(x, *lognorm_params)

# Plot histogram with KDE and fitted distributions
plt.figure(figsize=(10, 6))
sns.histplot(df, x="Air_Temperature(AT)", bins=number_of_bins, kde=True, stat='density', label='Data with KDE')
plt.plot(x, gamma_pdf_fitted, 'r-', label='Fitted Gamma Distribution')
plt.plot(x, lognorm_pdf_fitted, 'g-', label='Fitted Log-normal Distribution')
plt.xlabel('Air Temperature')
plt.ylabel('Density')
plt.title(f"Density Plot")
plt.legend()
plt.show()

# Chi-Square Test
observed_freq, bins = np.histogram(temperature, bins=number_of_bins, density=False)

expected_freq_gamma = len(temperature) * gamma.cdf(bins[1:], *gamma_params) - len(temperature) * gamma.cdf(bins[:-1], *gamma_params)
                                                                                                       

expected_freq_lognorm = len(temperature) * lognorm.cdf(bins[1:], *lognorm_params) - len(temperature) * lognorm.cdf(bins[:-1], *lognorm_params)


result_goodness_of_fit_gamma = cs(observed_freq, expected_freq_gamma)
result_goodness_of_fit_log_normal = cs(observed_freq, expected_freq_lognorm)
print(f"Result of Goodness of Fit of Gamma is {result_goodness_of_fit_gamma}")
print(f"Result of Goodness of Fit of Log-Normal is {result_goodness_of_fit_log_normal}")
print("---------------")

# KS-Test
ks_stat_gamma, ks_p_gamma = kstest(temperature, 'gamma', args=gamma_params)
ks_stat_lognorm, ks_p_lognorm = kstest(temperature, 'lognorm', args=lognorm_params)

print(f"Gamma KS statistic: {ks_stat_gamma}, p-value: {ks_p_gamma}")
print(f"Log-normal KS statistic: {ks_stat_lognorm}, p-value: {ks_p_lognorm}")
print("---------------")

chi2_stat_gamma, chi2_p_gamma = cs(observed_freq, expected_freq_gamma)
chi2_stat_lognorm, chi2_p_lognorm = cs(observed_freq, expected_freq_lognorm)

gamma_fit = interpret_goodness_of_fit(chi2_p_gamma, ks_p_gamma)
lognorm_fit = interpret_goodness_of_fit(chi2_p_lognorm, ks_p_lognorm)

print(f"Gamma distribution goodness of fit: {gamma_fit}")
print(f"Log-normal distribution goodness of fit: {lognorm_fit}")
<p>I have data in a CSV file named "Data.csv," specifically focusing on the "Air Temperature" column. I attempted to fit the data using two distributions: "gamma" and "log-normal." Now, I am unsure whether I have calculated the expected frequency correctly and whether my implementation of the Chi-Square Test is accurate.</p>
<p>Here are the specific issues I am facing:</p>
<p>Expected Frequency Calculation: I need clarification on the correct method to calculate the expected frequencies for the Chi-Square Test when fitting "gamma" and "log-normal" distributions to my data.</p>
<p>Chi-Square Test Implementation: I am uncertain if my implementation of the Chi-Square Test is correct. I would appreciate guidance on the proper steps and any necessary corrections.</p>
<p>If possible, could someone please provide the correct approach to calculating expected frequencies and the correct implementation of the Chi-Square Test for my case?</p>
<pre><code>import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
from scipy.stats import gamma, lognorm, kstest, chisquare, chi2_contingency
import warnings

warnings.simplefilter(action='ignore', category=FutureWarning)

df = pd.read_csv(f'Data.csv')

# Extract data
temperature = df['Air_Temperature(AT)']


def cs(n, y):
return chisquare(n, np.sum(n) / np.sum(y) * y)


def interpret_goodness_of_fit(chi2_p, ks_p, alpha=0.05):
if chi2_p > alpha or ks_p > alpha:
return "Good fit"
else:
return "Not a good fit"


# Calculate bin size
number_of_bins = int(1 + np.log2(len(temperature)))
print("Number of bins:", number_of_bins)

# Fit gamma distribution
gamma_params = gamma.fit(temperature)
print("Gamma distribution parameters:", gamma_params)

# Fit log-normal distribution
lognorm_params = lognorm.fit(temperature)
print("Log-normal distribution parameters:", lognorm_params)
print("---------------")

# Generate fitted gamma distribution
x = np.linspace(min(temperature), max(temperature), 100)
gamma_pdf_fitted = gamma.pdf(x, *gamma_params)

# Generate fitted log-normal distribution
lognorm_pdf_fitted = lognorm.pdf(x, *lognorm_params)

# Plot histogram with KDE and fitted distributions
plt.figure(figsize=(10, 6))
sns.histplot(df, x="Air_Temperature(AT)", bins=number_of_bins, kde=True, stat='density', label='Data with KDE')
plt.plot(x, gamma_pdf_fitted, 'r-', label='Fitted Gamma Distribution')
plt.plot(x, lognorm_pdf_fitted, 'g-', label='Fitted Log-normal Distribution')
plt.xlabel('Air Temperature')
plt.ylabel('Density')
plt.title(f"Density Plot")
plt.legend()
plt.show()

# Chi-Square Test
observed_freq, bins = np.histogram(temperature, bins=number_of_bins, density=False)

expected_freq_gamma = len(temperature) * gamma.cdf(bins[1:], *gamma_params) - len(temperature) * gamma.cdf(bins[:-1], *gamma_params)


expected_freq_lognorm = len(temperature) * lognorm.cdf(bins[1:], *lognorm_params) - len(temperature) * lognorm.cdf(bins[:-1], *lognorm_params)


result_goodness_of_fit_gamma = cs(observed_freq, expected_freq_gamma)
result_goodness_of_fit_log_normal = cs(observed_freq, expected_freq_lognorm)
print(f"Result of Goodness of Fit of Gamma is {result_goodness_of_fit_gamma}")
print(f"Result of Goodness of Fit of Log-Normal is {result_goodness_of_fit_log_normal}")
print("---------------")

# KS-Test
ks_stat_gamma, ks_p_gamma = kstest(temperature, 'gamma', args=gamma_params)
ks_stat_lognorm, ks_p_lognorm = kstest(temperature, 'lognorm', args=lognorm_params)

print(f"Gamma KS statistic: {ks_stat_gamma}, p-value: {ks_p_gamma}")
print(f"Log-normal KS statistic: {ks_stat_lognorm}, p-value: {ks_p_lognorm}")
print("---------------")

chi2_stat_gamma, chi2_p_gamma = cs(observed_freq, expected_freq_gamma)
chi2_stat_lognorm, chi2_p_lognorm = cs(observed_freq, expected_freq_lognorm)

gamma_fit = interpret_goodness_of_fit(chi2_p_gamma, ks_p_gamma)
lognorm_fit = interpret_goodness_of_fit(chi2_p_lognorm, ks_p_lognorm)

print(f"Gamma distribution goodness of fit: {gamma_fit}")
print(f"Log-normal distribution goodness of fit: {lognorm_fit}")
</code></pre>
 

Latest posts

Top