Kinect - The Sensor

The sensor itself matches the style and appearance of - you guessed it - the new Xbox 360 S. That means its every surface is glossy black plastic which looks nice, but immediately shows fingerprints, dust, and scratches. If you’re OCD like me and don’t happen to live in a clean room, this is a constant but minor annoyance. Oh well, at least it matches the console.

The Kinect is noticeably bottom-heavy, with a center of gravity closer to the bottom and somewhere in the base most likely. This makes sense since the horizontal arm housing the optical system and microphone array pivots vertically - the base is a stage of sorts which lets the top arm sweep through around 30 degrees to adapt itself to placement on top of a TV or below on a table, basically to suit your entertainment setup.

 

There’s a three-axis MEMS accelerometer onboard ostensibly for determining the tilt of the sensor relative to the base of the device - if you put Kinect on top of a TV for example, there’s no assurance that the surface the Kinect is resting on is normal to the ground. 

Down on the bottom you’ll notice an obvious grating, under which hides a four-microphone array for some beamforming goodness. The result is that the Kinect can sense which direction sounds are coming from, for isolating a single speaker rather than an omnidirectional or even stereo microphone. That's critically important for also picking out voice commands while there's music and game noise blaring in the same room.

The most prominent physical feature on the Kinect, however, is its relatively sophisticated optical system. From left to right is an IR laser projector (more on this in a second), RGB camera, and IR camera. Between the RGB camera and IR laser is simply a green LED for status - it plays no role in the actual optical system. Also on the bottom of the sensor is a nondescript class-1 laser product notice - yes, the Kinect indeed uses an IR laser, but it’s completely eye safe at class-1.

There’s also a requisite laser warning buried in one of the three booklets that ship inside the Kinect sensor box, but you’ll notice that Microsoft has been careful to avoid drawing attention to the fact that there’s a laser in Kinect - people have a strange aversion to looking at or into perfectly eye-safe lasers. There’s no marked wavelength that I could find, but the Kinect’s IR projector is visible to my naked eye and exhibits trademark laser speckle - my guesses are that the laser is between 750 and 900 nm, with 808 and 880 nm being common commercial “IR” laser diode wavelengths. 

From the Xbox’s perspective, there are two separate video streams which come from the Kinect - one 640x480 (VGA) 30FPS stream from the RGB color camera, and one 640x480 11-bit 30FPS image stream which is the output 3D depth image after processing. Kinect implements a subset of Prime Sense’s natural interaction reference design, which originally specified a much higher resolution color sensor and 60FPS depth image. Other subtle differences are that Prime Sense specifies two audio streams, whereas Kinect uses an obvious four, but such tweaks are ultimately the result of Microsoft having to maintain a delicate balance between optical performance and staying within a reasonable price point. 

Original Prime Sense reference design specification

How Kinect senses depth is half of the magic behind how it works - the other half is software. There’s actually not a lot behind how Kinect creates that 11-bit depth image once you understand how it works. Kinect uses a structured light IR projector and sensor system - something widely used in both industrial manufacturing and inspection. The principle of structured light sensing is that given a specific angle between emitter and sensor, depth can be recovered from simple triangulation. Expand this to a predictable structure, and the corresponding image shift directly relates to depth. 

Example structured-light system optical system, from  Spacecraft hazard avoidance utilizing structured light

In Kinect, that system is comprised of an IR laser (which I’ve already touched on), a carefully engineered diffraction grating (in this case, the diffraction grating is actually a computer-generated hologram - CGH - with a specific periodic structure), and a relatively standard CMOS detector with a band-pass filter centered at the IR laser wavelength. Inspecting the sensor with the naked eye, you can easily see the characteristic rainbow-effect from the CGH atop the IR projector, and that shiny layer on the IR-sensitive CMOS is likely a band-pass filter.

That carefully-engineered CGH produces a specific periodic structure of IR light when the laser shines through it. There’s a computationally-derived periodic structure of undoubtedly square cells inside which diffract light into a periodic structure. The first day I had Kinect, I immediately set out to find what that pattern was, and it’s simple to measure. Although the laser itself is intense enough to see with the naked eye at the source, the projected pattern thankfully isn’t, but the solution is trivial. Stick a lambertian reflector (read: piece of paper) in front of Kinect, and you can see the pattern with any IR-sensitive device. Though my DSLR has a low-pass IR cut filter like most cameras (since IR is generally undesirable in visible imaging systems), the most immediate device I found which was sensitive enough to photograph the pattern was a smartphone camera - an iPhone 4. The image is obviously false-color - this radiation is actually in the far red/near IR part of the spectrum. That pattern is below:

You can immediately notice some things - first, there are 9 clearly visible repeating blocks with a specific semi-random pattern of points inside. This structure is repeated across the blocks, and it’s obvious this pattern arises from a holographic structure from the bright 0-order point at the center of each block. Note also how the structure is also engineered to be spherical, which gives it that curvy edge shape - the paper is actually being held perpendicular to the projector. This projection grid defines the field of view of the Kinect sensor, and the distance between points inside the grid likewise defines the spatial resolution of the depth sensor. I’m only holding the paper about a foot away from the projector. Close up inside one of those cells you can see the structure which consists of many small points:

The projected image doesn’t change in time - it’s fixed this way. The IR CMOS sensor images this pattern projected onto the room and scene, and given the camera’s displacement a few inches from the projector, from the displacements in the semi-random projected pattern is able to back out the corresponding depth image. That compuation is done onboard the Kinect itself, and it’s entirely possible (read: likely) that the IR sensor inside the Kinect is higher than the 640x480 resolution of the resulting image.

When iFixit tore apart the Kinect, it was immediately apparent that Microsoft had devoted a lot of engineering into the sensor’s cooling solution, which at first thought seems strange - why do two cameras and a bunch of microphones need fans? Cue RROD jokes. Further, in the disassembly photos, I noticed a peltier cooler on the back of the IR laser diode. 

The peltier cooler is what's being pried off - Courtesy iFixit

The reason for the cooler should now be obvious - diffraction gratings are extremely wavelength-dependent, and the Kinect functions or doesn’t based on its ability to properly detect the projected IR image. The result is that the IR diode likely needs to be kept inside a window of under 10 degrees C of some temperature so the laser’s peak output is at or very close to the wavelength the CGH was designed to work at. The other consideration is that the top of most TVs where you could conceivably place a Kinect can be notably warm. Kinect’s somewhat overengineered thermal design now makes sense - the peltier cooler likely gets hot on the back side which connects to the metal base plate, the front which touches the diode laser gets cool, and the fan sweeps air through the whole box when required. RROD jokes aside, making sure that the system is carefully thermally regulated is an important part of the optical design. 

I decided I was going to see if I could make the Kinect fans turn on, or the device overheat. I left the Kinect sensor turned on for 48+ hours atop my rather-warm LCD TV, and later an even hotter plasma TV and could never once feel it get noticeably warm, or even detect airflow through the Kinect. It’s possible that the fans were spinning, but if that’s the case I couldn’t detect it. 

Introduction and Hardware Environmental Constraints of using Kinect
Comments Locked

72 Comments

View All Comments

  • Noriaki - Thursday, December 9, 2010 - link

    Yeah, I just meant the 350 part. Referring to it as the 360 S to distinguish it from the 360 Pro makes sense to me.
  • Noriaki - Thursday, December 9, 2010 - link

    PS: Thanks for the in-depth coverage. This is the first time I feel like I got a good idea of what having a Kinect in my living room would mean for practical things like where my couch lives.
  • Aikouka - Thursday, December 9, 2010 - link

    Brian, have you tried out DanceMasters (DM)? I noticed you commented on how your girlfriend compared Dance Central (DC) to DDR where DanceMasters is actually created by Konami.

    I own DM and have played the DC game, and I have to say... you'd probably be disappointed in the DM menu system as it is a tad bit harder to control. Way too often I found myself skipping past the option I wanted (as all options move left or right) and you then have to raise your (right) hand to select it. The problem comes when it might see you move your hand out to the right and actually shift your choice over one right before you raise your hand.

    The part where I think DM beats DC is the actual dancing. In the demo for DC, I found it awkwardly difficult to pay attention to the way the dancer was moving (left or right, etc) and the upcoming movement that was shown on the right side of the screen. When my brother and his girlfriend played, I noticed one huge trend... we *all* would miss the first dance move after they changed from one move to another.

    This is kind of better in DM, because it uses arrows that signify how your hand (or hands) should move in a second or two. There are also circles that will appear on the screen and you must hit them with either your hands or feet (obvious depending on the location). The last movement is the "pose silhouettes" that appear on the left and right side and are green in color. They move to the center of the screen and when the two silhouettes combine, you are supposed to be in that pose. The only problem is that it's not terribly picky on what you do in between these three types of inputs and another problem is that it loves to put circles beneath your feet (so you keep moving), but they're hard to notice. I found this easy to combat by simply always moving your feet.

    Overall, there's a huge difference in the style of music between the two as well, which influenced my decision. I've never been a "Top 40s" kinda guy and I've played DDR quite a bit, so I went with the game that had the music style I was used to ( and also considering that I've heard quite a bit of Eurobeat, which DM also has ).

    The one thing that was always fun about DC was the "freestyle" section where it shows you as this sort of glowing silhouette and you just do whatever dance you want. At the end, it will play this back to you in a sort of time-lapse video (which you can then save).

    To talk about a different game, I noticed some problem with jumping in Kinect... mostly in the rail-based obstacle course. Maybe I was just doing little hops and Kinect didn't register it... maybe it was a problem with the cargo pants (khaki color, so they're fairly flesh-toned) that I was wearing. It was pretty crazy though... at one point I had to duck down, so I dropped to my knees and then needed to switch sides, so I pulled out the Starfox-esque barrel roll! Kinect did actually sense that correctly.

    The one thing I did notice is that it is *very* common in Kinect Adventures for it to yell at me about getting too close or too far... especially in the bubble popping mini-game.
  • Brian Klug - Thursday, December 9, 2010 - link

    I haven't checked out Dance Masters, but I'm starting to think that I definitely should. I think at the time when I originally put together this list of games, that wasn't available and I overlooked it. I'll grab it and maybe update with a page or two.

    It's interesting how dancing games are quickly becoming something Kinect is very well suited for. I definitely agree about jumping and the clothing choice, I have a pair of cargo shorts that just don't work with most of the titles, and Kinect Adventures does yell a lot about position, agreed.

    -Brian
  • GSJ - Thursday, December 9, 2010 - link

    If only i could use this with my p.c.....
  • ExarKun333 - Thursday, December 9, 2010 - link

    I purchased the Kinect soon after the launch and I came to pretty much the same conclusions are Brian. The lag is there, but acceptable. More importantly, the games are FUN. MS did a great job with this launch. The hardware is easy to install and configure, and the games are easy to pick up on. I agreed 100% that the menus in DC are superior to the method in the Dashboard. Maybe an update at some time? :)
  • knowom - Thursday, December 9, 2010 - link

    267ms on a DAW would be completely unthinkable in fact most people that play or record music try to stay below 10ms.

    I'm just using that example as a clear easy to demonstrate reason to why 267ms is a abysmal amount of input lag you can completely rule out music based kinect games as well as any twitch/quick reflex input games or applications.
  • SodaAnt - Thursday, December 9, 2010 - link

    In the comment about the laser probably being 650-700 nm, that is wrong. 650nm is the wavelength of a normal red laser pointer. Normal IR diodes lase at either 780, 808 or 980nm.
  • Brian Klug - Thursday, December 9, 2010 - link

    Hmm it's on the fringes of what I'd consider visible, but 780 probably is a much better choice. I'll update.

    -Brian
  • mcnabney - Thursday, December 9, 2010 - link

    Okay, I am probably the only prig to bring this up, but the distances required seem to be a problem for some not often thought of reasons.

    9-12 feet of clearance appears to be required, which means the sofa will need to be 15' back (unless you move it every time you fire up the Kinnect).

    I would point out that you would need to have a 55"+ HDTV to fully benefit from the 1080p image from the Kinnect playing area, For regular game playing / movie watching on the sofa (behind the playing area) you would require a 100"!!!!! screen to fully resolve the 1080p image that the Xbox360 or a BluRay player is capable of displaying.

    For this thing to really work in a robust media environment (and most living rooms) it should have been able to to work perfectly with the user standing 3-5' from the screen.

Log in

Don't have an account? Sign up now