0

I've been given a coding exercise to get me up to speed with MATLAB, but I'm having a bit of an issue getting out what I need.

I am trying to numerically integrate the Gaussian probability function on given intervals (variable bins). I am trying to do this by using the trapezoidal rule. I would expect the sum of all those integrals to be close to one, however what I'm getting in my results is getting smaller (for example, attaining a value of 0.5 when I set the variable bins to 100).

What I'm essentially doing is subdividing the x-axis into "bins", and taking the integral of a Gaussian PDF along the points contained within the bins. So, for example, say that I have the x-axis running from 0 to 19.9, incrementing by 0.1, so I have 200 "points". Now, say that I want to have ten bins. Then, each bin will contain 20 points, and I need to perform an integral approximation between every point in each bin.

Now, I've kind of got the following happening so far;

clear

%Sets location of the Gaussian PDF
s_1 = 5;
s_2 = 15;

%Sets the domain of the x-axis
x = [0:0.1:19.9];

%Sets the number of bins, and the size of each bin
b = 100;
points = numel(x);
binsize = (points)/b;

%Arranges all of the prescribed x-values into their necessary bins
t = reshape(x,binsize,b);

%Calculates p(r_n|s_n)
for i = 1:b
    r_s1(i) = trapz(t(:,i),normpdf(t(:,i),s_1));
    r_s2(i) = trapz(t(:,i),normpdf(t(:,i),s_2));
end

Now, for a small number of bins (say, 10), the approximation comes out quite nicely, and the sums of the elements of r_s1 and r_s2 both come out relatively close to 1 (as the integral of the Gaussian PDF should be 1). However, as I start bumping up the sizes of my bins, the value of this sum starts dropping. All things considered, I'd have expected it to stay around the same value.

Is this an error in my mathematical approach to the problem I've been posed, or have I messed up my code??

9
  • Are you sure you don't want to use the Normal cumulative distribution function normcdf? Also: You didn't really specify what you are actually trying to solve (the mathematics). Commented Feb 14, 2015 at 14:09
  • I don't really know what your actual question is, but in regards to why the values will get smaller: Integrating with the trapezoidal rule you will be quite likely overestimating the integral, as the areas where you are integrating are mostly convex parts of the function. So finer integration will give more accurate results. You should obviously not really get to 1 if you don't do integration from -Inf to Inf as the support of the GPDF is infinite. So basically your integration rule is overestimating and will become more accurate with finer bins. Commented Feb 14, 2015 at 14:22
  • I guess my question is really asking what I'm doing wrong, in a sense. Inputting that code with a bin size of 10 gives me a value of about 0.9. However, if increase the bin size to say, 100, I get a value closer to 0.5. My impression was that an increase in bins would lead to an increase in the accuracy of the approximation, or at the very least, not change it too dramatically. I feel like there's an error in what I've written to make the estimation drop so much. Commented Feb 14, 2015 at 14:26
  • Which value is 0.5? To ask: What am I doing wrong, you must first explain what your goal is. I can't find it in your question. Commented Feb 14, 2015 at 14:35
  • The sum of all the elements of the r_s1 array. Same for r_s2. Commented Feb 14, 2015 at 14:36

1 Answer 1

1

The bins you are using don't partition the interval [0, 19.9]. Looking at the variable t:

t = 
         0    0.2000    0.4000    0.6000    ...
    0.1000    0.3000    0.5000    0.7000    ...

You see that your first bin ends at 0.1, but the second bin starts only at 0.2, so in this case you are only integrating over half the interval, ergo the result ~0.5.

The solution that changes the least of your code would probably be to add

t = [x(1), t(end,1:end-1); t];

After your t = reshape(x,binsize,b); reshape line.

Sign up to request clarification or add additional context in comments.

2 Comments

Do you know of any way that I could partition the set to include all the gaps within the intervals?? Would I just set up something that would take the last value in one column and the first value in the proceeding column??
@Jack: I added a quick fix to your problem, but I guess you would be better off, if you represented the bins via the two values bin(i,:) = [0, 0.1] and then use linspace(bin(i,1), bin(i,2), numberOfPointsYouWant) instead.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.