webcam-powered 3d mouse

Products by Inhahe · Apr 22, 2026

☆☆☆☆☆ No ratings yet

mouse has webcam on it, software figures out 3d position and orientation based on data from the webcam (ignoring things in the environment that are animated).

alternatively: just make software that uses any webcam

use a random reference object (contrast outline). translation of reference outline indicates rotation. size change indicates distance from it. skewing indicates the other two dimensions relative to it. when object decides to be animated or goes out of view, pick a different reference object keeping current position/rotation assumptions.

use an algorithm for outlining that won't be affected by camera's automatic lighting and color adjustments.

what to do for buttons? keyboard i guess. but some cameras have a take-picture button.

problem: gets more complicated when reference outline is based on a 3d shape.
solution: print out the reference object and put it somewhere, or print it on screen

what shape to use? i think a green triangle. each vertex has an identifying mark, probably just a color. one cyan, one magenta, one yellow.
the shape tells the software everything except for which corner is forward. the colors do that part.
the software scans the image for the green line, once it finsd it it follows it to a dot.
the green lines may extend beyond the vertexes, past the opposite sides of the colored dots.

should use a square instead of triangle?
circle instead of triangle?

green circle with 4 red lines extending from it to find it and a single magenta dot on the perimeter of the circle, or one of the lines is magenta.
straight lines might be easier to follow than a circle.

three lines emanating from the center. one red line, one green line and one blue line.
no, a square outline. one side red, one blue, one green, one black.
no, 3 black and one green.

algorithm:
work in fractions of pixels determined by where brightness of pixel at the border is between dark and light pixels

scan, find one of the lines
  trace the square until you get to the fourth side.
  at this point we have:
    angles of the lines
    which one is red (even if it's the last one which we didn't scan)
    lengths of the lines
  angles and sizes of opposite lines can be different, because of perspective
  we should probably base everything off of two specific lines
  that gives us three points.
    or should we use all 4 points?
  maybe we should take the algorithm to texture the image onto a 2d service given its position and normal vector. then we'd have its position and normal from us which we need to translate to our position and normal vector from it. won't work, texture is per pixel.

1 2
3 4

rotate so that 1 is above 3. we don't know how much we rotated until we know 1-3's angle.
len 1-3 vs. len 3-4 could tell us the angle except that we don't know perspective.
len 1-2. vs. len 3-4 might tell us perspective.

maybe make it easier and print out a grid of colored squares or dots or lines or lines+dots.
  concentric circles?
  lines and concentric circles?
    size of circle at widest point = distance from center
      not exactly true because of perspective
    rotation around camera's z where ratio between bottom to center and top to center is greatest or least is straight up rotation
      interesting thought: geometric absolute value is x>=1?x:1/x
    once you rotate, then horizontal width indicates distance from center
    length of center to bottom, center to top and center to side indicate perspective
    larger printout = more perspective?
    start knowing the size of the image, then give an image with a different size and see if results are off
    if we know angle of view and resolution, we know the angles TVC and CVB, we know length VC, we know that TB is a straight line, we know length TC and CB which are equal.
    now what about translation
     could this make our z rotation not up?
     it would also skew our distance estimate
     we need to somehow discern between translation and angle
       a change in angle translates the image but doesn't change it.
       hence if we rotationally adjust the camera so we're looking at the image, there is no translation anymore.

0. sparsely scan horizontal lines of the image until we come to a green line. follow the green line until we find the perimeter of the circle.  trace the circle.  take note of the location of the center of the red dot, otherwise treating red as equal to black.
1. rotate image by a so that CB/CT is greatest. -a is how much the camera is rotated around z.
2. x/width of circle is our distance in inches, where x is the true diameter of circle in inches * number of pixels per inch at 1 inch away
3. deduce from angle of view and resolution angles TVC (top of circle / camera / center of circle) and BVC.  VC was determined in step 2.  TVB is a straight line, and TC=CB=TB/2.  using trig we can now get our height. 
4. now that we know our distance from the center of the circle and height we know our x angle and y angle based on how displaced the center is from the center of the image. 
5. the x distance from the center to the red dot should be fine. the y distance needs to be adjusted.   y = y/cos(pi/2- &lt;bcv) i think.   now take atan2(y, x) and we know how much to rotate us around z=center to get our x displacement.  as we rotate that, also rotate our camera's y rotation to adjust (should just be a matter of angle subtraction).   now we have x, y, z, and orientation, though orientation might want to be converted to a vector. we rotated around z first, then x, then y.

our orientation actually isn't right yet because our camera is supposed to be pointed downward, not forward.

we won't have much freedom to roll and pitch. parts of the circle would go out of the image. making the image smaller owuld work at the possible expense of precision. using concentric circles could give us whatever precision we can get. we'd have to be sure to know which circle we're in.

a graph-paper-based solution might be better. is that harder?

we could rotate the image to make the lines straight across, but that's partially a y rotation and partially a z rotation.
use special graph paper. in the north direction the line sections go red, green, blue, red, green, blue, etc. this way we don't need a specific spot of the paper to know north.
the lines can be relatively far apart. no closer than a half an inch.
actually this solution neglects to take note of where the center of the paper is. it can be counted, however, presuming two edges are in the picture. that limits low y range though.

best solution: colored dots on paper. scan camera image circularly from outside in until we arrive at far away color dots. because each dot is color coded ,we know exactly what our dots' absolute locations are. (how do we distinguish between horizontal and vertical?)

printout:
    a single black non-filled circle whose diameter is just less the width of the paper
    green cross-hairs intersecting the center of the circle, one of them goes corner-to-corner of the paper. they stop just short of the edge.
    circle and lines should probably be thick.
    a red dot big enough to be seen from anywhere is at the north end of the circle's perimeter and its diameter is the width of the circle drawing.
    cross-hairs do not go on top of the circle.

size of printout affects only mouse pointer sensitivity (i think)

idea: turn a regular mouse into a 3d mouse.
difficult to keep track of motion given all those contours.
   ideal way might be to have the camera from below the height of the mouse and detect the laser hole shape
     that doesnt really give enough distance-from-camera resolution because the laser hole is small.
    maybe some object recognition library, but it might not have a fast enough framerate.

forget all that. have the webcam in one place and use a pencil to draw. only problem, must calibrate or be relative to the camera.

actually, color a bright green stripe going down your finger nail.

from now on we compute as if home is right in front of the camera, we do the translation for centerpoint at the end.

put your finger at your perferred centerpoint and press space, or remember your previous settings.

scans image for bright green line

assume the stripe is a particular size < will it be wrong if the assumption is wrong?

judge distance based on apparent size relative to assumed size

problem: tilting finger in two directions will confuse the size judger.

new plan: use a colored marble

distance = 30*1000/size # assuming the object is 30 pixels wide and appears that way at 1000 pixels distance for no good reason
# if we knew x angle of view and x resolution (640) we could calculate the 1000. 30 still depends on object size (marble) and dpi (96)

problem: auto-focus can't keep up with z-motion. how to solve? what the hell is wrong with infinity focus cameras?

Categories

webcam-powered 3d mouse