T

#### Theo

##### Guest

**A**P = 0 where A is a matrix containing the information on the pixel and world coordinates and P is the projection matrix in the form of a 12 by 1 vector. This I've seen solved using a least squares regression, in the form of an eigenvalue problem or using singular value decomposition and all of the methods seem to give the same p.

I have the following code:

Code:

```
import numpy as np
class cal:
def __init__(self, coords):
if coords[:,0].shape[0] < 6: raise ValueError('Not enough observation points, 6 or more required')
uv, xyz = np.split(coords, (2,), axis = 1)
self.A = np.zeros((2*coords.shape[0],12))
for i in range(coords.shape[0]):
self.A[i*2:i*2 + 2,:] = np.vstack((np.concatenate([xyz[i,:], np.ones(1), np.zeros((4)), -uv[i,0]*np.hstack((xyz[i,:], 1))]),
np.concatenate([np.zeros((4)), xyz[i,:], np.ones(1), -uv[i,1]*np.hstack((xyz[i,:], 1))])))
eig_val, eig_vec = np.linalg.eig(self.A.T @ self.A)
min_eig = np.argmin(eig_val)
self.p = eig_vec[:,min_eig].reshape((3,4))
R_inv, k_inv = np.linalg.qr(np.linalg.inv(self.p[:,:3]))
self.R = R_inv.T
self.k = np.linalg.inv(k_inv)
self.k /= self.k[-1,-1]
self.T = -np.linalg.inv(self.p[:,:3]) @ self.p[:,-1]
print('Camera matrix is:')
print(self.k)
print('Rotation matrix is:')
print(self.R)
print('Camera position is:')
print(self.T)
def get_uv(self, xyz: np.ndarray) -> np.ndarray:
uv_homg = self.p @ np.hstack((xyz, np.ones_like(xyz[:,0:1]))).T
uv_norm = uv_homg / uv_homg[2,:]
print('u, v are:')
print(uv_norm[:-1,:])
return uv_norm
def get_xyz(self, uv, zc):
uv_homg = np.hstack((uv, np.ones_like(uv[:,0:1]))).T
k_inv = np.linalg.inv(self.k)
xyz_c = k_inv @ uv_homg
xyz_c *= zc
xyz = self.R.T @ (xyz_c - self.T[np.newaxis,:].T)
print('x, y, z via k, r and t are:')
print(xyz)
xyz = np.linalg.pinv(self.p) @ uv_homg * zc
xyz /= xyz[-1,0]
print('x, y, z via pinv are:')
print(xyz)
return xyz
```

Using the following data: (in the form [pixel coordinates u and v, world coordinates x,y,z in mm])

Code:

```
coords = np.array([[678, 716, 0, 0, 0 ],
[676, 628, 0, 0, 18.67],
[675, 528, 0, 0, 37.33],
[677, 419, 0, 0, 56 ],
[745, 651, 16.67, 0, 0 ],
[804, 599, 37.33, 0, 0 ],
[854, 558, 56, 0, 0 ],
[595, 666, 0, 16.67, 0 ],
[519, 625, 0, 37.33, 0 ],
[449, 588, 0, 56, 0 ],
[748, 573, 16.67, 0, 16.67],
[808, 524, 37.33, 0, 16.67],
[859, 484, 56, 0, 16.67],
[748, 478, 16.67, 0, 37.33],
[813, 432, 37.33, 0, 37.33],
[869, 391, 56, 0, 37.33],
[752, 374, 16.67, 0, 56 ],
[819, 331, 37.33, 0, 56 ],
[877, 295, 56, 0, 56 ],
[590, 584, 0, 16.67, 16.67],
[512, 544, 0, 37.33, 16.67],
[439, 508, 0, 56, 16.67],
[585, 488, 0, 16.67, 37.33],
[501, 451, 0, 37.33, 37.33],
[425, 417, 0, 56, 37.33],
[581, 385, 0, 16.67, 56 ],
[488, 348, 0, 37.33, 56 ],
[408, 318, 0, 56, 56 ],
[660, 340, 16.67, 16.67, 56 ],
[643, 275, 37.33, 37.33, 56 ],
[631, 225, 56, 56, 56 ],
[727, 301, 37.33, 16.67, 56 ],
[568, 309, 16.67, 37.33, 56 ],
[492, 279, 16.67, 56, 56 ],
[567, 250, 37.33, 56, 56 ],
[787, 271, 56, 16.67, 56 ],
[704, 247, 56, 37.33, 56 ]])
sensx = 8.65 # sensor size in the x direction
imgx, imgy = (1440,1080) # image size in pixels
mx = imgx/sensx
index = 28
cal = cal(coords)
cal.get_uv(coords[index,2:][np.newaxis,:])
cal.get_xyz(coords[index,:2][np.newaxis,:], np.linalg.norm(coords[index,2:] - cal.T))
print('calculated f is')
print(cal.k[0,0]/mx)
```

I get: (testing with the following point:

`[660, 340, 16.67, 16.67, 56]`

Code:

```
Camera matrix is:
[[ 1.10085803e+03 -8.97333229e+00 7.11294043e+02]
[-0.00000000e+00 -1.09218290e+03 4.21825076e+02]
[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
Rotation matrix is:
[[-0.60453385 0.79650852 -0.0106299 ]
[-0.38712166 -0.30542782 -0.86997165]
[-0.6961865 -0.52181225 0.49298715]]
Camera position is:
[-103.60456911 -83.9594023 130.69993225]
u, v are:
[[658.26719869]
[344.14651874]]
x, y, z via k, r and t are:
[[-125.28143248]
[ 24.10266151]
[ -64.18072346]]
x, y, z via pinv are:
[[33.74160228]
[30.52774348]
[46.34949045]
[ 1. ]]
calculated f is
8.817057341328972
```

Firstly looking at the camera matrix k, in theory k[0,1] should be 0 or close to it which it is not. The locations of the image centre u_x, u_y k[0,2] and k[1,2] should be in the centre of the image and therefore imgx/2 and imgy/2, we can see that u_x is approximately correct however u_y is quite far of (should be 540).

The camera position looks correct to me (it is the approximate position where I took the picture).

u and v are close to within a couple of pixels.

The focal length f is incorrect as I took the picture with my phone with a focal length of 4mm.

However when I calculate the world coordinates based of of the pixel coordinates (and the depth from the camera) using two methods,

- taking the projection of the pixels in the camera frame using the depth and k and then transforming into the world frame using R and T
- Directly inverting the equation [u,v,1] = P [x,y,z,1] using the pseudo-inverse of P but neither are giving the correct world coordinates

Additionally I took R and T and plotted the camera reference frame in the world reference frame I get the following:

From the way I defined my axes the z axis (in blue) is the optical axis of the camera and therefore should be pointing (roughly) towards the origin of the world frame which it is not which leads me to guess that the rotation matrix is incorrect.

Does anybody know what I have done wrong?

<p>I am trying to determine the projection matrix and then the camera pose using a set of object coordinates in a world frame and their corresponding pixel coordinates on an image. Through online resources I have found that you can rewrite the projection equations in the form <strong>A</strong>P = 0 where A is a matrix containing the information on the pixel and world coordinates and P is the projection matrix in the form of a 12 by 1 vector. This I've seen solved using a least squares regression, in the form of an eigenvalue problem or using singular value decomposition and all of the methods seem to give the same p.</p>

<p>I have the following code:</p>

<pre><code>import numpy as np

class cal:

def __init__(self, coords):

if coords[:,0].shape[0] < 6: raise ValueError('Not enough observation points, 6 or more required')

uv, xyz = np.split(coords, (2,), axis = 1)

self.A = np.zeros((2*coords.shape[0],12))

for i in range(coords.shape[0]):

self.A[i*2:i*2 + 2,:] = np.vstack((np.concatenate([xyz[i,:], np.ones(1), np.zeros((4)), -uv[i,0]*np.hstack((xyz[i,:], 1))]),

np.concatenate([np.zeros((4)), xyz[i,:], np.ones(1), -uv[i,1]*np.hstack((xyz[i,:], 1))])))

eig_val, eig_vec = np.linalg.eig(self.A.T @ self.A)

min_eig = np.argmin(eig_val)

self.p = eig_vec[:,min_eig].reshape((3,4))

R_inv, k_inv = np.linalg.qr(np.linalg.inv(self.p[:,:3]))

self.R = R_inv.T

self.k = np.linalg.inv(k_inv)

self.k /= self.k[-1,-1]

self.T = -np.linalg.inv(self.p[:,:3]) @ self.p[:,-1]

print('Camera matrix is:')

print(self.k)

print('Rotation matrix is:')

print(self.R)

print('Camera position is:')

print(self.T)

def get_uv(self, xyz: np.ndarray) -> np.ndarray:

uv_homg = self.p @ np.hstack((xyz, np.ones_like(xyz[:,0:1]))).T

uv_norm = uv_homg / uv_homg[2,:]

print('u, v are:')

print(uv_norm[:-1,:])

return uv_norm

def get_xyz(self, uv, zc):

uv_homg = np.hstack((uv, np.ones_like(uv[:,0:1]))).T

k_inv = np.linalg.inv(self.k)

xyz_c = k_inv @ uv_homg

xyz_c *= zc

xyz = self.R.T @ (xyz_c - self.T[np.newaxis,:].T)

print('x, y, z via k, r and t are:')

print(xyz)

xyz = np.linalg.pinv(self.p) @ uv_homg * zc

xyz /= xyz[-1,0]

print('x, y, z via pinv are:')

print(xyz)

return xyz

</code></pre>

<p>Using the following data:

(in the form [pixel coordinates u and v, world coordinates x,y,z in mm])</p>

<pre><code>coords = np.array([[678, 716, 0, 0, 0 ],

[676, 628, 0, 0, 18.67],

[675, 528, 0, 0, 37.33],

[677, 419, 0, 0, 56 ],

[745, 651, 16.67, 0, 0 ],

[804, 599, 37.33, 0, 0 ],

[854, 558, 56, 0, 0 ],

[595, 666, 0, 16.67, 0 ],

[519, 625, 0, 37.33, 0 ],

[449, 588, 0, 56, 0 ],

[748, 573, 16.67, 0, 16.67],

[808, 524, 37.33, 0, 16.67],

[859, 484, 56, 0, 16.67],

[748, 478, 16.67, 0, 37.33],

[813, 432, 37.33, 0, 37.33],

[869, 391, 56, 0, 37.33],

[752, 374, 16.67, 0, 56 ],

[819, 331, 37.33, 0, 56 ],

[877, 295, 56, 0, 56 ],

[590, 584, 0, 16.67, 16.67],

[512, 544, 0, 37.33, 16.67],

[439, 508, 0, 56, 16.67],

[585, 488, 0, 16.67, 37.33],

[501, 451, 0, 37.33, 37.33],

[425, 417, 0, 56, 37.33],

[581, 385, 0, 16.67, 56 ],

[488, 348, 0, 37.33, 56 ],

[408, 318, 0, 56, 56 ],

[660, 340, 16.67, 16.67, 56 ],

[643, 275, 37.33, 37.33, 56 ],

[631, 225, 56, 56, 56 ],

[727, 301, 37.33, 16.67, 56 ],

[568, 309, 16.67, 37.33, 56 ],

[492, 279, 16.67, 56, 56 ],

[567, 250, 37.33, 56, 56 ],

[787, 271, 56, 16.67, 56 ],

[704, 247, 56, 37.33, 56 ]])

sensx = 8.65 # sensor size in the x direction

imgx, imgy = (1440,1080) # image size in pixels

mx = imgx/sensx

index = 28

cal = cal(coords)

cal.get_uv(coords[index,2:][np.newaxis,:])

cal.get_xyz(coords[index,:2][np.newaxis,:], np.linalg.norm(coords[index,2:] - cal.T))

print('calculated f is')

print(cal.k[0,0]/mx)

</code></pre>

<p>I get:

(testing with the following point: <code>[660, 340, 16.67, 16.67, 56]</code></p>

<pre><code>Camera matrix is:

[[ 1.10085803e+03 -8.97333229e+00 7.11294043e+02]

[-0.00000000e+00 -1.09218290e+03 4.21825076e+02]

[ 0.00000000e+00 0.00000000e+00 1.00000000e+00]]

Rotation matrix is:

[[-0.60453385 0.79650852 -0.0106299 ]

[-0.38712166 -0.30542782 -0.86997165]

[-0.6961865 -0.52181225 0.49298715]]

Camera position is:

[-103.60456911 -83.9594023 130.69993225]

u, v are:

[[658.26719869]

[344.14651874]]

x, y, z via k, r and t are:

[[-125.28143248]

[ 24.10266151]

[ -64.18072346]]

x, y, z via pinv are:

[[33.74160228]

[30.52774348]

[46.34949045]

[ 1. ]]

calculated f is

8.817057341328972

</code></pre>

<p>Firstly looking at the camera matrix k, in theory k[0,1] should be 0 or close to it which it is not. The locations of the image centre u_x, u_y k[0,2] and k[1,2] should be in the centre of the image and therefore imgx/2 and imgy/2, we can see that u_x is approximately correct however u_y is quite far of (should be 540).</p>

<p>The camera position looks correct to me (it is the approximate position where I took the picture).</p>

<p>u and v are close to within a couple of pixels.</p>

<p>The focal length f is incorrect as I took the picture with my phone with a focal length of 4mm.</p>

<p>However when I calculate the world coordinates based of of the pixel coordinates (and the depth from the camera) using two methods,</p>

<ol>

<li>taking the projection of the pixels in the camera frame using the depth and k and then transforming into the world frame using R and T</li>

<li>Directly inverting the equation [u,v,1] = P [x,y,z,1] using the pseudo-inverse of P

but neither are giving the correct world coordinates</li>

</ol>

<p>Additionally I took R and T and plotted the camera reference frame in the world reference frame I get the following:</p>

<p><a href="https://i.sstatic.net/7n7Z7ceK.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/7n7Z7ceK.png" alt="enter image description here" /></a></p>

<p>From the way I defined my axes the z axis (in blue) is the optical axis of the camera and therefore should be pointing (roughly) towards the origin of the world frame which it is not which leads me to guess that the rotation matrix is incorrect.</p>

<p>Does anybody know what I have done wrong?</p>