OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Finding the Camera pose from the projection matrix

  • Thread starter Thread starter Theo
  • Start date Start date
T

Theo

Guest
I am trying to determine the projection matrix and then the camera pose using a set of object coordinates in a world frame and their corresponding pixel coordinates on an image. Through online resources I have found that you can rewrite the projection equations in the form AP = 0 where A is a matrix containing the information on the pixel and world coordinates and P is the projection matrix in the form of a 12 by 1 vector. This I've seen solved using a least squares regression, in the form of an eigenvalue problem or using singular value decomposition and all of the methods seem to give the same p.

I have the following code:

Code:
import numpy as np

class cal:
    def __init__(self, coords):
        
        if coords[:,0].shape[0] < 6: raise ValueError('Not enough observation points, 6 or more required')

        uv, xyz = np.split(coords, (2,), axis = 1)

        self.A = np.zeros((2*coords.shape[0],12))

        for i in range(coords.shape[0]):
            self.A[i*2:i*2 + 2,:] = np.vstack((np.concatenate([xyz[i,:], np.ones(1), np.zeros((4)), -uv[i,0]*np.hstack((xyz[i,:], 1))]),
                                               np.concatenate([np.zeros((4)), xyz[i,:], np.ones(1), -uv[i,1]*np.hstack((xyz[i,:], 1))])))

        
        eig_val, eig_vec = np.linalg.eig(self.A.T @ self.A)
        min_eig = np.argmin(eig_val)
        self.p = eig_vec[:,min_eig].reshape((3,4))
        
        R_inv, k_inv = np.linalg.qr(np.linalg.inv(self.p[:,:3]))

        self.R = R_inv.T
        self.k = np.linalg.inv(k_inv)
        self.k /= self.k[-1,-1]

        self.T = -np.linalg.inv(self.p[:,:3]) @ self.p[:,-1]

        print('Camera matrix is:')
        print(self.k)
        print('Rotation matrix is:')
        print(self.R)
        print('Camera position is:')
        print(self.T)

    def get_uv(self, xyz: np.ndarray) -> np.ndarray:

        uv_homg = self.p @ np.hstack((xyz, np.ones_like(xyz[:,0:1]))).T
        
        uv_norm = uv_homg / uv_homg[2,:]

        print('u, v are:')
        print(uv_norm[:-1,:])

        return uv_norm

    def get_xyz(self, uv, zc):

        uv_homg = np.hstack((uv, np.ones_like(uv[:,0:1]))).T

        k_inv = np.linalg.inv(self.k)

        xyz_c = k_inv @ uv_homg

        xyz_c *= zc

        
        xyz = self.R.T @ (xyz_c - self.T[np.newaxis,:].T)
        print('x, y, z via k, r and t are:')
        print(xyz)
       

        xyz = np.linalg.pinv(self.p) @ uv_homg * zc 
        xyz /= xyz[-1,0]

        print('x, y, z via pinv are:')
        print(xyz)

        return xyz

Using the following data: (in the form [pixel coordinates u and v, world coordinates x,y,z in mm])

Code:
coords = np.array([[678,   716,     0,     0,     0 ],
[676,   628,     0,     0,    18.67],
[675,   528,     0,     0,    37.33],
[677,   419,     0,     0,    56  ],
[745,   651,    16.67,   0,     0  ],
[804,   599,    37.33,   0,     0  ],
[854,   558,    56,     0,     0  ],
[595,   666,     0,    16.67,   0  ],
[519,   625,     0,    37.33,   0  ],
[449,   588,     0,    56,     0  ],
[748,   573,    16.67,   0,    16.67],
[808,   524,    37.33,   0,    16.67],
[859,   484,    56,     0,    16.67],
[748,   478,    16.67,   0,    37.33],
[813,   432,    37.33,   0,    37.33],
[869,   391,    56,     0,    37.33],
[752,   374,    16.67,   0,    56  ],
[819,   331,    37.33,   0,    56  ],
[877,   295,    56,     0,    56  ],
[590,   584,     0,    16.67,  16.67],
[512,   544,     0,    37.33,  16.67],
[439,   508,     0,    56,    16.67],
[585,   488,     0,    16.67,  37.33],
[501,   451,     0,    37.33,  37.33],
[425,   417,     0,    56,    37.33],
[581,   385,     0,    16.67,  56  ],
[488,   348,     0,    37.33,  56  ],
[408,   318,     0,    56,    56  ],
[660,   340,    16.67,  16.67,  56  ],
[643,   275,    37.33,  37.33,  56  ],
[631,   225,    56,    56,    56  ],
[727,   301,    37.33,  16.67,  56  ],
[568,   309,    16.67,  37.33,  56  ],
[492,   279,    16.67,  56,    56  ],
[567,   250,    37.33,  56,    56  ],
[787,   271,    56,    16.67,  56  ],
[704,   247,    56,    37.33,  56  ]])

sensx = 8.65                   # sensor size in the x direction
imgx, imgy =  (1440,1080)      # image size in pixels
mx = imgx/sensx
index = 28

cal = cal(coords)

cal.get_uv(coords[index,2:][np.newaxis,:])

cal.get_xyz(coords[index,:2][np.newaxis,:], np.linalg.norm(coords[index,2:] - cal.T))

print('calculated f is')
print(cal.k[0,0]/mx)

I get: (testing with the following point: [660, 340, 16.67, 16.67, 56]

Code:
Camera matrix is:
[[ 1.10085803e+03 -8.97333229e+00  7.11294043e+02]
 [-0.00000000e+00 -1.09218290e+03  4.21825076e+02]
 [ 0.00000000e+00  0.00000000e+00  1.00000000e+00]]
Rotation matrix is:
[[-0.60453385  0.79650852 -0.0106299 ]
 [-0.38712166 -0.30542782 -0.86997165]
 [-0.6961865  -0.52181225  0.49298715]]
Camera position is:
[-103.60456911  -83.9594023   130.69993225]
u, v are:
[[658.26719869]
 [344.14651874]]
x, y, z via k, r and t are:
[[-125.28143248]
 [  24.10266151]
 [ -64.18072346]]
x, y, z via pinv are:
[[33.74160228]
 [30.52774348]
 [46.34949045]
 [ 1.        ]]
calculated f is
8.817057341328972

Firstly looking at the camera matrix k, in theory k[0,1] should be 0 or close to it which it is not. The locations of the image centre u_x, u_y k[0,2] and k[1,2] should be in the centre of the image and therefore imgx/2 and imgy/2, we can see that u_x is approximately correct however u_y is quite far of (should be 540).

The camera position looks correct to me (it is the approximate position where I took the picture).

u and v are close to within a couple of pixels.

The focal length f is incorrect as I took the picture with my phone with a focal length of 4mm.

However when I calculate the world coordinates based of of the pixel coordinates (and the depth from the camera) using two methods,

  1. taking the projection of the pixels in the camera frame using the depth and k and then transforming into the world frame using R and T
  2. Directly inverting the equation [u,v,1] = P [x,y,z,1] using the pseudo-inverse of P but neither are giving the correct world coordinates

Additionally I took R and T and plotted the camera reference frame in the world reference frame I get the following:

enter image description here

From the way I defined my axes the z axis (in blue) is the optical axis of the camera and therefore should be pointing (roughly) towards the origin of the world frame which it is not which leads me to guess that the rotation matrix is incorrect.

Does anybody know what I have done wrong?
<p>I am trying to determine the projection matrix and then the camera pose using a set of object coordinates in a world frame and their corresponding pixel coordinates on an image. Through online resources I have found that you can rewrite the projection equations in the form <strong>A</strong>P = 0 where A is a matrix containing the information on the pixel and world coordinates and P is the projection matrix in the form of a 12 by 1 vector. This I've seen solved using a least squares regression, in the form of an eigenvalue problem or using singular value decomposition and all of the methods seem to give the same p.</p>
<p>I have the following code:</p>
<pre><code>import numpy as np

class cal:
def __init__(self, coords):

if coords[:,0].shape[0] < 6: raise ValueError('Not enough observation points, 6 or more required')

uv, xyz = np.split(coords, (2,), axis = 1)

self.A = np.zeros((2*coords.shape[0],12))

for i in range(coords.shape[0]):
self.A[i*2:i*2 + 2,:] = np.vstack((np.concatenate([xyz[i,:], np.ones(1), np.zeros((4)), -uv[i,0]*np.hstack((xyz[i,:], 1))]),
np.concatenate([np.zeros((4)), xyz[i,:], np.ones(1), -uv[i,1]*np.hstack((xyz[i,:], 1))])))


eig_val, eig_vec = np.linalg.eig(self.A.T @ self.A)
min_eig = np.argmin(eig_val)
self.p = eig_vec[:,min_eig].reshape((3,4))

R_inv, k_inv = np.linalg.qr(np.linalg.inv(self.p[:,:3]))

self.R = R_inv.T
self.k = np.linalg.inv(k_inv)
self.k /= self.k[-1,-1]

self.T = -np.linalg.inv(self.p[:,:3]) @ self.p[:,-1]

print('Camera matrix is:')
print(self.k)
print('Rotation matrix is:')
print(self.R)
print('Camera position is:')
print(self.T)

def get_uv(self, xyz: np.ndarray) -> np.ndarray:

uv_homg = self.p @ np.hstack((xyz, np.ones_like(xyz[:,0:1]))).T

uv_norm = uv_homg / uv_homg[2,:]

print('u, v are:')
print(uv_norm[:-1,:])

return uv_norm

def get_xyz(self, uv, zc):

uv_homg = np.hstack((uv, np.ones_like(uv[:,0:1]))).T

k_inv = np.linalg.inv(self.k)

xyz_c = k_inv @ uv_homg

xyz_c *= zc


xyz = self.R.T @ (xyz_c - self.T[np.newaxis,:].T)
print('x, y, z via k, r and t are:')
print(xyz)


xyz = np.linalg.pinv(self.p) @ uv_homg * zc
xyz /= xyz[-1,0]

print('x, y, z via pinv are:')
print(xyz)

return xyz
</code></pre>
<p>Using the following data:
(in the form [pixel coordinates u and v, world coordinates x,y,z in mm])</p>
<pre><code>coords = np.array([[678, 716, 0, 0, 0 ],
[676, 628, 0, 0, 18.67],
[675, 528, 0, 0, 37.33],
[677, 419, 0, 0, 56 ],
[745, 651, 16.67, 0, 0 ],
[804, 599, 37.33, 0, 0 ],
[854, 558, 56, 0, 0 ],
[595, 666, 0, 16.67, 0 ],
[519, 625, 0, 37.33, 0 ],
[449, 588, 0, 56, 0 ],
[748, 573, 16.67, 0, 16.67],
[808, 524, 37.33, 0, 16.67],
[859, 484, 56, 0, 16.67],
[748, 478, 16.67, 0, 37.33],
[813, 432, 37.33, 0, 37.33],
[869, 391, 56, 0, 37.33],
[752, 374, 16.67, 0, 56 ],
[819, 331, 37.33, 0, 56 ],
[877, 295, 56, 0, 56 ],
[590, 584, 0, 16.67, 16.67],
[512, 544, 0, 37.33, 16.67],
[439, 508, 0, 56, 16.67],
[585, 488, 0, 16.67, 37.33],
[501, 451, 0, 37.33, 37.33],
[425, 417, 0, 56, 37.33],
[581, 385, 0, 16.67, 56 ],
[488, 348, 0, 37.33, 56 ],
[408, 318, 0, 56, 56 ],
[660, 340, 16.67, 16.67, 56 ],
[643, 275, 37.33, 37.33, 56 ],
[631, 225, 56, 56, 56 ],
[727, 301, 37.33, 16.67, 56 ],
[568, 309, 16.67, 37.33, 56 ],
[492, 279, 16.67, 56, 56 ],
[567, 250, 37.33, 56, 56 ],
[787, 271, 56, 16.67, 56 ],
[704, 247, 56, 37.33, 56 ]])

sensx = 8.65 # sensor size in the x direction
imgx, imgy = (1440,1080) # image size in pixels
mx = imgx/sensx
index = 28

cal = cal(coords)

cal.get_uv(coords[index,2:][np.newaxis,:])

cal.get_xyz(coords[index,:2][np.newaxis,:], np.linalg.norm(coords[index,2:] - cal.T))

print('calculated f is')
print(cal.k[0,0]/mx)

</code></pre>
<p>I get:
(testing with the following point: <code>[660, 340, 16.67, 16.67, 56]</code></p>
<pre><code>Camera matrix is:
[[ 1.10085803e+03 -8.97333229e+00 7.11294043e+02]
[-0.00000000e+00 -1.09218290e+03 4.21825076e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
Rotation matrix is:
[[-0.60453385 0.79650852 -0.0106299 ]
[-0.38712166 -0.30542782 -0.86997165]
[-0.6961865 -0.52181225 0.49298715]]
Camera position is:
[-103.60456911 -83.9594023 130.69993225]
u, v are:
[[658.26719869]
[344.14651874]]
x, y, z via k, r and t are:
[[-125.28143248]
[ 24.10266151]
[ -64.18072346]]
x, y, z via pinv are:
[[33.74160228]
[30.52774348]
[46.34949045]
[ 1. ]]
calculated f is
8.817057341328972
</code></pre>
<p>Firstly looking at the camera matrix k, in theory k[0,1] should be 0 or close to it which it is not. The locations of the image centre u_x, u_y k[0,2] and k[1,2] should be in the centre of the image and therefore imgx/2 and imgy/2, we can see that u_x is approximately correct however u_y is quite far of (should be 540).</p>
<p>The camera position looks correct to me (it is the approximate position where I took the picture).</p>
<p>u and v are close to within a couple of pixels.</p>
<p>The focal length f is incorrect as I took the picture with my phone with a focal length of 4mm.</p>
<p>However when I calculate the world coordinates based of of the pixel coordinates (and the depth from the camera) using two methods,</p>
<ol>
<li>taking the projection of the pixels in the camera frame using the depth and k and then transforming into the world frame using R and T</li>
<li>Directly inverting the equation [u,v,1] = P [x,y,z,1] using the pseudo-inverse of P
but neither are giving the correct world coordinates</li>
</ol>
<p>Additionally I took R and T and plotted the camera reference frame in the world reference frame I get the following:</p>
<p><a href="https://i.sstatic.net/7n7Z7ceK.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/7n7Z7ceK.png" alt="enter image description here" /></a></p>
<p>From the way I defined my axes the z axis (in blue) is the optical axis of the camera and therefore should be pointing (roughly) towards the origin of the world frame which it is not which leads me to guess that the rotation matrix is incorrect.</p>
<p>Does anybody know what I have done wrong?</p>
 

Online statistics

Members online
0
Guests online
3
Total visitors
3
Top