U.S. patent application number 12/897758 was filed with the patent office on 2011-01-27 for augmented reality device for presenting virtual imagery registered to a viewed surface.
This patent application is currently assigned to Silverbrook Research Pty Ltd. Invention is credited to Paul Lapstun, Kia Silverbrook.
Application Number | 20110018903 12/897758 |
Document ID | / |
Family ID | 35756905 |
Filed Date | 2011-01-27 |
United States Patent
Application |
20110018903 |
Kind Code |
A1 |
Lapstun; Paul ; et
al. |
January 27, 2011 |
AUGMENTED REALITY DEVICE FOR PRESENTING VIRTUAL IMAGERY REGISTERED
TO A VIEWED SURFACE
Abstract
An augmented reality device for inserting virtual imagery into a
user's view of their physical environment. The device comprises: a
see-through display device including a wavefront modulator; a
camera for imaging a surface in the physical environment; and a
controller. The controller is configured for capturing an image of
the surface; determining the virtual imagery to be displayed at a
predetermined position relative to the surface; determining a
position of the surface relative to the augmented reality device;
generating an image based on the virtual imagery and on the
position of the surface relative to the augmented reality device;
and displaying the generated image via the display device. Based on
pixel depth information, the controller modulates the wavefront
curvature of light emitted for each pixel so that the user sees the
virtual imagery at the predetermined position relative to the
surface regardless of changes in position of the user's eyes with
respect to the display device.
Inventors: |
Lapstun; Paul; (Balmain,
AU) ; Silverbrook; Kia; (Balmain, AU) |
Correspondence
Address: |
SILVERBROOK RESEARCH PTY LTD
393 DARLING STREET
BALMAIN
2041
AU
|
Assignee: |
Silverbrook Research Pty
Ltd
|
Family ID: |
35756905 |
Appl. No.: |
12/897758 |
Filed: |
October 4, 2010 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11193481 |
Aug 1, 2005 |
|
|
|
12897758 |
|
|
|
|
Current U.S.
Class: |
345/633 |
Current CPC
Class: |
G06F 3/011 20130101;
G06F 3/0321 20130101; G02B 26/06 20130101; G02B 27/017 20130101;
G02B 2027/0187 20130101; G02B 2027/014 20130101; G02B 27/0093
20130101; G02B 30/27 20200101; G02B 2027/0123 20130101; G06F 3/013
20130101; H04N 13/344 20180501 |
Class at
Publication: |
345/633 |
International
Class: |
G09G 5/00 20060101
G09G005/00 |
Foreign Application Data
Date |
Code |
Application Number |
Aug 3, 2004 |
AU |
2004904324 |
Aug 3, 2004 |
AU |
2004904325 |
Aug 20, 2004 |
AU |
2004904740 |
Aug 24, 2004 |
AU |
2004904803 |
Sep 21, 2004 |
AU |
2004905413 |
Jan 5, 2005 |
AU |
2005900034 |
Claims
1. An augmented reality device for inserting virtual imagery into a
user's view of their physical environment, the device comprising: a
see-through display device through which the user can view the
physical environment, said display device including a wavefront
modulator; a camera for imaging at least one surface in the
physical environment; and a controller configured for: capturing,
using the camera, at least one image of the surface; determining,
at least partially from the at least one captured image, the
virtual imagery to be displayed at a predetermined position
relative to the surface; determining, at least partially from the
at least one captured image, a position of the surface relative to
the augmented reality device; generating at least one image based
on the virtual imagery and on the position of the surface relative
to the augmented reality device, the image including pixel depth
information; and displaying the generated image via the display
device, including modulating, based on the pixel depth information,
the wavefront curvature of the light emitted for each pixel, so
that the user sees the virtual imagery at the predetermined
position relative to the surface regardless of changes in position
of the user's eyes with respect to the display device.
2. An augmented reality device according to claim 1 wherein the
display device has two see-through displays, one for each of the
user's eyes respectively.
3. An augmented reality device according to claim 1 wherein the
display device, the camera and the controller are adapted to be
worn on the user's head.
4. An augmented reality device according to claim 1 wherein the
display device has a virtual retinal display (VRD) for each of the
user's eyes, each of the VRD's scanning at least one beam of light
into a raster pattern and modulating the or each beam to produce
spatial variations in the virtual imagery.
5. An augmented reality device according to claim 4 wherein the VRD
scans red, green and blue beams of light to produce color pixels in
the raster pattern.
6. An augmented reality device according to claim 5 wherein the
VRDs present different images to each of the user's eyes, the
differences being based on eye separation and the distance to the
predetermined position of the virtual imagery so as to create a
perception of depth via stereopsis.
7. An augmented reality device according to claim 1 wherein the
wavefront modulator uses a deformable membrane mirror, liquid
crystal phase corrector, a variable focus liquid lens or a variable
focus liquid mirror.
8. An augmented reality device according to claim 1 wherein the
virtual imagery is a movie, a computer application interface,
computer application output, hand drawn strokes, text, images or
graphics.
9. An augmented reality device according to claim 1 wherein the
display device has pupil trackers to detect an approximate point of
fixation of the user's gaze such that a virtual cursor can be
projected into the virtual imagery and navigated using gaze
direction.
10. An augmented reality device according to claim 1, further
comprising an optical range finder for determining range
information using time-of-flight measurement, wherein the
controller is configured for determining the position of the
surface relative to the augmented reality device using the range
information in combination with the at least one captured
image.
11. An augmented reality device according to claim 1 wherein the
surface has a pattern of coded data disposed thereon, and wherein
the controller is configured to at least partially identify the
virtual imagery to be displayed using at least part of the pattern
of coded data contained in the at least one captured image.
12. An augmented reality device according to claim 11, wherein the
pattern of coded data is indicative of an identity of the surface,
and wherein the controller is configured to at least partially
identify the virtual imagery to be displayed based on the identity
of the surface.
13. An augmented reality device according to claim 1, wherein the
surface has a pattern of coded data disposed thereon, and wherein
the controller is configured to at least partially determine the
position of the surface relative to the augmented reality device
using at least part of the pattern of coded data contained in the
at least one captured image.
14. An augmented reality device according to claim 13, wherein the
pattern of coded data disposed on the surface includes a grid of
target elements.
15. An augmented reality device according to claim 13, wherein the
pattern of coded data disposed on the surface is indicative of a
plurality of coordinate locations on the surface, and wherein the
controller is configured to at least partially determine the
position of the surface relative to the augmented reality device
using at least one of the coordinate locations.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. application Ser.
No. 11/193,481 filed Aug. 1, 2005 all of which is herein
incorporated by reference.
FIELD OF THE INVENTION
[0002] The present invention relates to the fields of interactive
paper, printing systems, computer publishing, computer
applications, human-computer interfaces, information appliances,
augmented reality, and head-mounted displays.
CO-PENDING REFERENCES
[0003] Ser. Nos. 11/193,481 11/193,482 11/193,479
CROSS-REFERENCES
TABLE-US-00001 [0004] 6,750,901 6,476,863 6,788,336 7,249,108
6,566,858 6,331,946 6,246,970 6,442,525 7,346,586 7,685,423
6,374,354 7,246,098 6,816,968 6,757,832 6,334,190 6,745,331
7,249,109 7,197,642 7,093,139 7,509,292 7,685,424 7,743,262
7,210,038 7,401,223 7,702,926 7,716,098 7,364,256 7,258,417
7,293,853 7,328,968 7,270,395 7,461,916 7,510,264 7,334,864
7,255,419 7,284,819 7,229,148 7,258,416 7,273,263 7,270,393
6,984,017 7,347,526 7,357,477 7,465,015 7,364,255 7,357,476
7,758,148 7,284,820 7,341,328 7,246,875 7,322,669 7,243,835
10/815,630 7,703,693 10/815,638 7,251,050 10/815,642 7,097,094
7,137,549 10/815,618 7,156,292 10/815,635 7,357,323 7,654,454
7,137,566 7,131,596 7,128,265 7,207,485 7,197,374 7,175,089
10/815,617 7,537,160 7,178,719 7,506,808 7,207,483 7,296,737
7,270,266 7,605,940 7,128,270 7,784,681 7,677,445 7,506,168
7,441,712 7,663,789 11/041,609 11/041,626 7,537,157 7,801,742
7,395,963 7,457,961 7,739,509 7,467,300 7,467,299 7,565,542
7,457,007 7,150,398 7,159,777 7,450,273 7,188,769 7,097,106
7,070,110 7,243,849 6,623,101 6,406,129 6,505,916 6,457,809
6,550,895 6,457,812 7,152,962 6,428,133 7,204,941 7,282,164
7,465,342 7,278,727 7,417,141 7,452,989 7,367,665 7,138,391
7,153,956 7,423,145 7,456,277 7,550,585 7,122,076 7,148,345
7,470,315 7,572,327 7,416,280 7,252,366 7,488,051 7,360,865
6,746,105 7,156,508 7,159,972 7,083,271 7,165,834 7,080,894
7,201,469 7,090,336 7,156,489 7,413,283 7,438,385 7,083,257
7,258,422 7,255,423 7,219,980 7,591,533 7,416,274 7,367,649
7,118,192 7,618,121 7,322,672 7,077,505 7,198,354 7,077,504
7,614,724 7,198,355 7,401,894 7,322,676 7,152,959 7,213,906
7,178,901 7,222,938 7,108,353 7,104,629 7,246,886 7,128,400
7,108,355 6,991,322 7,287,836 7,118,197 7,575,298 7,364,269
7,077,493 6,962,402 7,686,429 7,147,308 7,524,034 7,118,198
7,168,790 7,172,270 7,229,155 6,830,318 7,195,342 7,175,261
7,465,035 7,108,356 7,118,202 7,510,269 7,134,744 7,510,270
7,134,743 7,182,439 7,210,768 7,465,036 7,134,745 7,156,484
7,118,201 7,111,926 7,431,433 7,018,021 7,401,901 7,468,139
7,448,729 7,246,876 7,431,431 7,419,249 7,377,623 7,334,876
7,249,901 7,477,987 7,156,289 7,178,718 7,225,979 7,540,429
7,584,402 11/084,806 7,721,948 7,079,712 6,825,945 7,330,974
6,813,039 7,190,474 6,987,506 6,824,044 7,038,797 6,980,318
6,816,274 7,102,772 7,350,236 6,681,045 6,678,499 6,679,420
6,963,845 6,976,220 6,728,000 7,110,126 7,173,722 6,976,035
6,813,558 6,766,942 6,965,454 6,995,859 7,088,459 6,720,985
7,286,113 6,922,779 6,978,019 6,847,883 7,131,058 7,295,839
7,406,445 7,533,031 6,959,298 6,973,450 7,150,404 6,965,882
7,233,924 7,707,082 7,593,899 7,175,079 7,162,259 6,718,061
7,464,880 7,012,710 6,825,956 7,451,115 7,222,098 7,590,561
7,263,508 7,031,010 6,972,864 6,862,105 7,009,738 6,989,911
6,982,807 7,518,756 6,829,387 6,714,678 6,644,545 6,609,653
6,651,879 10/291,555 7,293,240 7,467,185 7,415,668 7,044,363
7,004,390 6,867,880 7,034,953 6,987,581 7,216,224 7,506,153
7,162,269 7,162,222 7,290,210 7,293,233 7,293,234 6,850,931
6,865,570 6,847,961 10/685,583 7,162,442 10/685,584 7,159,784
7,557,944 7,404,144 6,889,896 7,174,056 6,996,274 7,162,088
7,388,985 7,417,759 7,362,463 7,259,884 7,167,270 7,388,685
6,986,459 10/954,170 7,181,448 7,590,622 7,657,510 7,324,989
7,231,293 7,174,329 7,369,261 7,295,922 7,200,591 7,693,828
11/020,260 11/020,321 11/020,319 7,466,436 7,347,357 11/051,032
7,382,482 7,602,515 7,446,893 11/082,815 7,389,423 7,401,227
6,991,153 6,991,154 7,589,854 7,551,305 7,322,524 7,068,382
7,007,851 6,957,921 6,457,883 7,044,381 7,094,910 7,091,344
7,122,685 7,038,066 7,099,019 7,062,651 6,789,194 6,789,191
7,529,936 7,278,018 7,360,089 7,526,647 7,467,416 6,644,642
6,502,614 6,622,999 6,669,385 6,827,116 7,011,128 7,416,009
6,549,935 6,987,573 6,727,996 6,591,884 6,439,706 6,760,119
7,295,332 7,064,851 6,826,547 6,290,349 6,428,155 6,785,016
6,831,682 6,741,871 6,927,871 6,980,306 6,965,439 6,840,606
7,036,918 6,977,746 6,970,264 7,068,389 7,093,991 7,190,491
7,511,847 7,663,780 10/962,412 7,177,054 7,364,282 10/965,733
10/965,933 7,728,872 7,468,809 7,180,609 7,538,793 7,466,438
7,292,363 7,515,292 6,982,798 6,870,966 6,822,639 6,474,888
6,627,870 6,724,374 6,788,982 7,263,270 6,788,293 6,946,672
6,737,591 7,091,960 7,369,265 6,792,165 7,105,753 6,795,593
6,980,704 6,768,821 7,132,612 7,041,916 6,797,895 7,015,901
7,289,882 7,148,644 10/778,056 10/778,058 7,515,186 7,567,279
10/778,062 7,096,199 7,286,887 7,400,937 7,474,930 7,324,859
7,218,978 7,245,294 7,277,085 7,187,370 7,609,410 7,660,490
10/919,379 7,019,319 7,593,604 7,660,489 7,043,096 7,148,499
7,463,250 7,590,311 11/155,557 7,055,739 7,233,320 6,830,196
6,832,717 7,182,247 7,120,853 7,082,562 6,843,420 7,793,852
6,789,731 7,057,608 6,766,944 6,766,945 7,289,103 7,412,651
7,299,969 7,264,173 7,549,595 7,111,791 7,077,333 6,983,878
7,564,605 7,134,598 7,431,219 6,929,186 6,994,264 7,017,826
7,014,123 7,134,601 7,150,396 7,469,830 7,017,823 7,025,276
7,284,701 7,080,780 7,376,884 10/492,169 7,469,062 7,359,551
7,444,021 7,308,148 7,630,962 10/531,229 7,630,553 7,630,554
10/510,391 7,660,466 7,526,128 6,957,768 7,456,820 7,170,499
7,106,888 7,123,239 6,982,701 6,982,703 7,227,527 6,786,397
6,947,027 6,975,299 7,139,431 7,048,178 7,118,025 6,839,053
7,015,900 7,010,147 7,133,557 6,914,593 7,437,671 6,938,826
7,278,566 7,123,245 6,992,662 7,190,346 7,417,629 7,468,724
7,715,035 7,221,781 11/102,843 6,593,166 7,132,679 6,940,088
7,119,357 10/727,162 7,377,608 7,399,043 7,121,639 7,165,824
7,152,942 10/727,157 7,181,572 7,096,137 7,302,592 7,278,034
7,188,282 7,592,829 10/727,179 10/727,192 7,770,008 7,707,621
7,523,111 7,573,301 7,660,998 7,783,886 10/754,938 10/727,160
7,369,270 6,795,215 7,070,098 7,154,638 6,805,419 6,859,289
6,977,751 6,398,332 6,394,573 6,622,923 6,747,760 6,921,144
7,092,112 7,192,106 7,457,001 7,173,739 6,986,560 7,008,033
7,551,324 7,195,328 7,182,422 7,374,266 7,427,117 7,448,707
7,281,330 10/854,503 7,328,956 7,735,944 7,188,928 7,093,989
7,377,609 7,600,843 10/854,498 7,390,071 10/854,526 7,549,715
7,252,353 7,607,757 7,267,417 10/854,505 7,517,036 7,275,805
7,314,261 7,281,777 7,290,852 7,484,831 7,758,143 10/854,527
7,549,718 10/854,520 7,631,190 7,557,941 7,757,086 10/854,501
7,266,661 7,243,193 10/854,518 7,448,734 7,425,050 7,364,263
7,201,468 7,360,868 7,234,802 7,303,255 7,287,846 7,156,511
10/760,264 7,258,432 7,097,291 7,645,025 10/760,248 7,083,273
7,367,647 7,374,355 7,441,880 7,547,092 10/760,206 7,513,598
10/760,270 7,198,352 7,364,264 7,303,251 7,201,470 7,121,655
7,293,861 7,232,208 7,328,985 7,344,232 7,083,272 7,621,620
7,669,961 7,331,663 7,360,861 7,328,973 7,427,121 7,407,262
7,303,252 7,249,822 7,537,309 7,311,382 7,360,860 7,364,257
7,390,075 7,350,896 7,429,096 7,384,135 7,331,660 7,416,287
7,488,052 7,322,684 7,322,685 7,311,381 7,270,405 7,303,268
7,470,007 7,399,072 7,393,076 7,681,967 7,588,301 7,249,833
7,524,016 7,490,927 7,331,661 7,524,043 7,300,140 7,357,492
7,357,493 7,566,106 7,380,902 7,284,816 7,284,845 7,255,430
7,390,080 7,328,984 7,350,913 7,322,671 7,380,910 7,431,424
7,470,006 7,585,054 7,347,534 7,441,865 7,469,989 7,367,650
6,454,482 6,808,330 6,527,365 6,474,773 6,550,997 7,093,923
6,957,923 7,131,724 7,396,177 7,168,867 7,125,098
BACKGROUND OF THE INVENTION
[0005] Virtual reality completely occludes a person's view of their
physical reality (usually with goggles or a helmet) and substitutes
an artificial, or virtual view projected on to the inside of an
opaque visor. Augmented reality changes a user's view of the
physical environment by adding virtual imagery to the user's field
of view (FOV).
[0006] Augmented reality typically relies on either a see-through
Head Mounted Display (HMD) or a video-based HMD. A video-based HMD
captures video of the user's field of view, augments it with
virtual imagery, and redisplays it for the user's eyes to see. A
see-through HMD, as discussed above, optically combines virtual
imagery with the user's actual field of view. A video-based HMD has
the advantage that registration between the real world and the
virtual imagery is relatively easy to achieve, since parallax due
to eye position relative to the HMD does not occur. It has the
disadvantage that it is typically bulky and has a narrow field of
view, and typically provides poor depth cues (i.e. a sense of depth
or the distance from the eye to an object).
[0007] A see-through HMD has the advantage that it can be
relatively less bulky with a wider field of view, and can provide
good depth cues. It has the disadvantage that registration between
the real world and the virtual imagery is difficult to achieve
without intrusive calibration procedures and sophisticated eye
tracking
[0008] Registration between the real world and the virtual imagery
can be provided by inertial sensors to track head movement, or by
tracking fiducial markers positioned in the physical environment.
The HMD uses the fiducials as reference points for the virtual
imagery. A HMD often relies on inertial tracking to maintain
registration during head movement, but this is a somewhat
inaccurate approach.
[0009] The use of fiducials in the real world is less popular
because fiducial tracking is usually not fast enough for typical
user head movements, fiducials are typically sparsely placed making
fiducial detection complex, and the fiducial encoding capacity is
typically small which limits the number of individual fiducials
that can uniquely identify themselves. This can lead to fiducial
ambiguity in large installations.
SUMMARY OF THE INVENTION
[0010] According to a first aspect, the present invention provides
an augmented reality device for inserting virtual imagery into a
user's view of their physical environment, the device
comprising:
[0011] a display device through which the user can view the
physical environment;
[0012] an optical sensing device for sensing at least one surface
in the physical environment; and,
[0013] a controller for projecting the virtual imagery via the
display device; wherein during use,
[0014] the controller uses wave front modulation to match the
curvature of the wave fronts of light reflected from the display
device to the user's eyes with the curvature of the wave fronts of
light that would be transmitted through the device display if the
virtual imagery were situated at a predetermined position relative
to the surface, such that the user sees the virtual imagery at the
predetermined position regardless of changes in position of the
user's eyes with respect to the see-through display.
[0015] The human visual system's ability to locate a point in space
is determined by the center and radius of curvature of the
wavefronts emitted by the point as they impinge on the eyes. A
three dimensional object can be thought of as an infinite number of
point sources in space.
[0016] The present invention puts each pixel of the virtual image
projected by the display device at a predetermined point relative
to the sensed surface with a wavefront display that adjusts the
curvature of the waves to correspond to the position of the point.
This keeps the virtual image in registration with the user's field
of view without first establishing (and maintaining) registration
between the eye and the see-through display.
[0017] Optionally, the display device has a see-through display for
one of the user's eyes. Alternatively, the display device has two
see-through displays, one for each of the user's eyes
respectively.
[0018] Optionally, the surface has a pattern of coded data disposed
on it, such that the controller uses information from the coded
data to identify the virtual imagery to be displayed.
[0019] Optionally, the display device, the optical sensing device
and the controller are adapted to be worn on the user's head.
[0020] Optionally, the optical sensing device is a camera-based and
during use, provides identity and position data related to the
coded surface to the controller for determining the virtual imagery
displayed.
[0021] Optionally, display device has a virtual retinal display
(VRD) for each of the user's eyes, each of the VRD's scans at least
one beam of light into a raster pattern and modulates the or each
beam to produce spatial variations in the virtual imagery.
Optionally, the VRD scans red, green and blue beams of light to
produce color pixels in the raster pattern.
[0022] Optionally, the VRD's present a slightly different image to
each of the user's eyes, the slight differences being based on eye
separation, and the distance to the predetermined position of the
virtual imagery to create a perception of depth via stereopsis.
[0023] Optionally, the wavefront modulator uses a deformable
membrane mirror, liquid crystal phase corrector, a variable focus
liquid lens or a variable focus liquid mirror.
[0024] Optionally, the wave front modulator uses a deformable
membrane mirror, liquid crystal phase corrector, a variable focus
liquid lens or a variable focus liquid mirror.
[0025] Optionally, the virtual imagery is a movie, a computer
application interface, computer application output, hand drawn
strokes, text, images or graphics.
[0026] Optionally, the display device has pupil trackers to detect
an approximate point of fixation of the user's gaze such that a
virtual cursor can be projected into the virtual imagery and
navigated using gaze direction.
[0027] Additional Aspects
[0028] Related aspects of the invention are set out below together
with the a discussion of their backgrounds to provide suitable
context for the broad descriptions of these aspects.
Head Mounted Display with Coded Surface Sensor
BACKGROUND
[0029] As discussed above, the use of fiducials in the real world
is less popular because fiducial tracking is usually not fast
enough for typical user head movements, fiducials are typically
sparsely placed making fiducial detection complex, and the fiducial
encoding capacity is typically small which limits the number of
individual fiducials that can uniquely identify themselves. This
can lead to fiducial ambiguity in large installations.
SUMMARY
[0030] Accordingly, this aspect provides an augmented reality
device for a user in a physical environment with a coded surface,
the device comprising:
[0031] a display device through which the user can view the
physical environment;
[0032] an optical sensing device for sensing the coded surface;
and,
[0033] a controller for determining an identity, position and
orientation of the coded surface; wherein,
[0034] the controller projects virtual imagery via the display
device such that the virtual imagery is viewed by the user in a
predetermined position with respect to the coded surface.
[0035] By providing a coded surface instead of sparse fiducials,
the invention avoids tracking and ambiguity problems. The
relatively dense coding allows the surface to be accurately
positioned and oriented to maintain registration with the virtual
imagery.
[0036] Optionally, the display device has a see-through display for
one of the user's eyes. Alternatively, the display device has two
see-through displays, one for each of the user's eyes
respectively.
[0037] Optionally, the augmented reality device further comprises a
hand-held sensor for sensing and decoding information from the
coded surface.
[0038] Optionally, the coded surface has first and second coded
data disposed on it in first and second two dimensional patterns
respectively, the first pattern having a scale sized such that the
optical sensing device can capture images with a resolution
suitable for the display device to decode the first coded data, and
the second pattern having a scale sized such that the hand-held
sensor can capture images with a resolution suitable for it to
decode the second coded data.
[0039] Optionally, the hand-held sensor is an electronic stylus
with a writing nib wherein during use, the stylus captures images
of the second pattern when the nib is in contact with, or proximate
to, the coded surface.
[0040] Optionally, the display device, the optical sensing device
and the controller are adapted to be worn on the user's head.
[0041] Optionally, the optical sensing device is camera-based and
during use, provides identity and position data related to the
coded surface to the controller for determining the virtual imagery
displayed.
[0042] Optionally, the display device has a virtual retinal display
(VRD) for each of the user's eyes, each of the VRD's scans at least
one beam of light into a raster pattern and modulates the or each
beam to produce spatial variations in the virtual imagery.
Optionally, the VRD scans red, green and blue beams of light to
produce color pixels in the raster pattern.
[0043] Optionally, each of the virtual retinal displays have a
wavefront modulator to match the curvature of the wavefronts of
light reflected from the see-through display to the user's eyes
with the curvature of the wave fronts of light that would be
transmitted through the see-through display for that eye if the
virtual imagery were actual imagery at a predetermined position
relative to the coded surface, such that the user views the virtual
imagery at the predetermined position regardless of changes in
position of the user's eyes with respect to the see-through
display.
[0044] Optionally, each of the virtual retinal displays present a
slightly different image to each of the user's eyes, the slight
differences being based on eye separation, and the distance to the
predetermined position of the virtual imagery to create a
perception of depth via stereopsis.
[0045] Optionally, the wavefront modulator uses a deformable
membrane mirror, liquid crystal phase corrector, a variable focus
liquid lens or a variable focus liquid mirror.
[0046] Optionally, the virtual imagery is a movie, a computer
application interface, computer application output, hand drawn
strokes, text, images or graphics.
[0047] Optionally, the display device has pupil trackers to detect
an approximate point of fixation of the user's gaze such that a
virtual cursor can be projected into the virtual imagery and
navigated using gaze direction.
Virtual Retinal Display with Occlusion Support
BACKGROUND
[0048] A virtual retinal display (VRD) projects a beam of light
onto the eye, and scans the beam rapidly across the eye in a
two-dimensional raster pattern. It modulates the intensity of the
beam during the scan, based on a source video signal, to produce a
spatially-varying image. The combination of human persistence of
vision and a sufficiently fast and bright scan creates the
perception of an object in the user's field of view.
[0049] The VRD renders occlusions as part of any displayed virtual
imagery, according to the user's current viewpoint relative to
their physical environment. It does not, however, intrinsically
support occlusion parallax according to the position of the user's
eye relative to the HMD unless it uses eye tracking for this
purpose. In the absence of eye tracking, the HMD renders each VRD
view according to a nominal eye position. If the actual eye
position deviates from the assumed eye position, then the wavefront
display nature of the VRD prevents misregistration between the real
world and the virtual imagery, but in the presence of occlusions
due to real or virtual objects, it may lead to object overlap or
holes.
SUMMARY
[0050] Accordingly, this aspect provides an augmented reality
device for inserting virtual imagery into a user's view, the device
comprising:
[0051] an optical sensing device for optically sensing the user's
physical environment; and,
[0052] a display device with a virtual retinal display for
projecting a beam of light as a raster pattern of pixels, each
pixel having a wavefront of light with a curvature that provides
the user with spatial cues as to the perceived origin of the pixel
such that the user perceives the virtual imagery to be at a
predetermined location in the physical environment; wherein during
use,
[0053] the virtual retinal display accounts for any occlusions that
at least partially obscure the user's view of the perceived
location of the virtual imagery by using a spatial light modulator
that blocks occluded parts of the wavefront and allows non-occluded
parts of the wavefront to pass.
[0054] To support occlusion parallax, the VRD can be augmented with
a spatial light (amplitude) modulator (SLM) such as a digital
micromirror device (DMD). The SLM can be introduced immediately
after the wavefront modulator and before the raster scanner. The
video generator provides the SLM with an occlusion map associated
with each pixel in the raster pattern. The SLM passes non-occluded
parts of the wavefront but blocks occluded parts. The
amplitude-modulation capability of the SLM may be multi-level, and
each map entry in the occlusion map may be correspondingly
multi-level. However, in the limit case the SLM is a binary device,
i.e. either passing light or blocking light, and the occlusion map
is similarly binary.
[0055] Optionally, the VRD projects red, green and blue beams of
light, the intensity of each beam being modulated to color each
pixel of the raster pattern.
[0056] Optionally, the VRD has a video generator for providing the
spatial light modulator with an occlusion map for each pixel of the
raster pattern.
[0057] Optionally, the display device has a controller connected to
the optical sensing device and an image generator for providing
image data to the video generator in response to the controller,
such that the virtual imagery is selected and positioned by the
controller. Optionally, the controller has a data connection to an
external source for receiving data related to the virtual
imagery.
[0058] Optionally, the display device has a see-through display
such that the VRD projects the raster pattern via the see-through
display.
[0059] In a particularly preferred form the display device has two
of the VRDs and two of the see-through displays, one VRD and
see-through display for each eye.
[0060] Optionally, the occlusion is a physical occlusion or a
virtual occlusion generated by the controller to at least partially
obscure the virtual imagery.
[0061] Optionally, the display device and the optical sensing
device are adapted to be worn on the user's head.
[0062] Optionally, the optical sensing device senses a surface in
the physical environment, the surface having a pattern of coded
data disposed on it, such that the display device uses information
from the coded data to select and position the virtual imagery to
be displayed.
[0063] Optionally, the optical sensing device is camera-based and
during use, provides identity and position data related to the
coded surface to the controller for determining the virtual imagery
displayed.
[0064] Optionally, the VRD has a wavefront modulator to match the
curvature of the wavefronts of light projected for each pixel in
the raster pattern, with the curvature of the wavefronts of light
that would be transmitted through the see-through display if the
virtual imagery were actual imagery at a predetermined position
relative to the coded surface, such that the user views the virtual
imagery at the predetermined position regardless of changes in
position of the user's eyes with respect to the see-through
display.
[0065] Optionally, the spatial light modulator uses a digital
micromirror device to create an occlusion shadow in the scanned
raster pattern.
[0066] Optionally, the camera generates an occlusion map for the
scanned raster patterns in the source video signal, and the spatial
light modulator uses the occlusion map to control the digital
micromirror device.
[0067] Optionally, each of the VRDs presents a slightly different
image to each of the user's eyes, the slight differences being
based on eye separation, and the distance to the predetermined
position of the virtual imagery to create a perception of depth via
stereopsis.
[0068] Optionally, the wave front modulator has a deformable
membrane mirror, liquid crystal phase corrector, a variable focus
liquid lens or a variable focus liquid mirror.
[0069] Optionally, the virtual imagery is a movie, a computer
application interface, computer application output, hand drawn
strokes, text, images or graphics.
[0070] Optionally, the display device has pupil trackers to detect
an approximate point of fixation of the user's gaze such that a
virtual cursor can be projected into the virtual imagery and
navigated using gaze direction.
BRIEF DESCRIPTION OF THE DRAWINGS
[0071] Preferred embodiments of the invention will now be described
by way of example only with reference to the accompanying drawings,
in which:
[0072] FIG. 1 shows the structure of a complete tag;
[0073] FIG. 2 shows a symbol unit cell;
[0074] FIG. 3 shows nine symbol unit cells;
[0075] FIG. 4 shows the bit ordering in a symbol;
[0076] FIG. 5 shows a tag with all bits set;
[0077] FIG. 6 shows a tag group made up of four tag types;
[0078] FIG. 7 shows the continuous tiling of tag groups;
[0079] FIG. 8 shows the interleaving of codewords A, B, C & D
with a tag;
[0080] FIG. 9 shows a codeword layout;
[0081] FIG. 10 shows a tag and its eight immediate neighbours
labelled with its corresponding bit index;
[0082] FIG. 11 shows a user wearing a HMD with single eye
display;
[0083] FIG. 12 shows a user wearing a HMD with respective displays
for each eye;
[0084] FIG. 13 is a schematic representation of a camera capturing
light rays from two point sources;
[0085] FIG. 14 is a schematic representation of a display of the
image of the two points sources captured by the camera of FIG.
13;
[0086] FIG. 15 is a schematic representation of a wavefront display
of a virtual point source of light;
[0087] FIG. 16 is a diagrammatic representation of a HMD with a
single eye display;
[0088] FIG. 17a schematically shows a wavefront display using a
DMM;
[0089] FIG. 17b schematically shows the wavefront display of FIG.
17a with the DMM deformed to diverge the project beam;
[0090] FIG. 18a schematically shows a wavefront display using a
deformable liquid lens;
[0091] FIG. 18b schematically shows the wavefront display of FIG.
18a with the liquid lens deformed to diverge the projected
beam;
[0092] FIG. 19 diagrammatically shows the modification to the HMD
of FIG. 16 in order to support occlusions;
[0093] FIG. 20 schematically shows the wavefront display of FIG. 15
with occlusion support;
[0094] FIG. 21 schematically shows the wavefront display of FIG.
18b modified for occlusion support;
[0095] FIG. 22 is a diagrammatic representation of a HMD with a
binocular display;
[0096] FIG. 23 shows a HMD directly linked to the Netpage
server;
[0097] FIG. 24 shows the HMD linked to a Netpage Pen and a Netpage
server via a communications network.
[0098] FIG. 25 shows a HMD linked to a Netpage relay which is in
turn linked to a Netpage server via a communications network;
[0099] FIG. 26 schematically shows a HMD with image warper;
[0100] FIG. 27 shows a HMD linked to a cursor navigation and
selection devices;
[0101] FIG. 28 shows a HMD with biometric sensors;
[0102] FIG. 29 shows a physical Netpage with pen-scale and
HMD-scale tag patterns;
[0103] FIG. 30 shows the SVD on a printed Netpage;
[0104] FIG. 31 shows printed calculator with a SVD for the display
and Netpage pen;
[0105] FIG. 32 shows a printed form with a SVD for a text field
displaying confidential information;
[0106] FIG. 33 shows the page of FIG. 29 with handwritten
annotations captured as digital ink and shown as a SVD;
[0107] FIG. 34 shows a Netpage with static and dynamic page
elements incorporated into the SVD;
[0108] FIG. 35 shows a mobile phone with display screen printed
with pen-scale and HMD-scale tag patterns;
[0109] FIG. 36 shows a mobile phone with SVD that extends beyond
the display screen;
[0110] FIG. 37 shows a mobile phone with display screen and keypad
provided by the SVD;
[0111] FIG. 38 shows a cinema screen with HMD-scale tag pattern for
screening movies as SVD's;
[0112] FIG. 39 shows a video monitor with HMD-scale tag pattern for
a SVD of a video signal from a range of sources; and
[0113] FIG. 40 shows a computer screen with pen-scale and HMD-scale
tag patterns, and a tablet with a pen-scale tag pattern for an SVD
of a keyboard.
DETAILED DESCRIPTION
[0114] As discussed above, the invention is well suited for
incorporation in the Assignee's Netpage system. In light of this,
the invention has been described as a component of a broader
Netpage architecture. However, it will be readily appreciated that
augmented reality devices have much broader application in many
different fields. Accordingly, the present invention is not
restricted to a Netpage context.
[0115] Additional cross referenced documents are listed at the end
of the Detailed Description. These documents are predominantly
non-patent literature and have been numbered for identification at
the relevant part of the description. The disclosures of these
documents are incorporated by cross reference.
Netpage Surface Coding
[0116] Introduction
[0117] This section defines a surface coding used by the Netpage
system (described in co-pending application Docket No. NPS110US as
well as many of the other cross referenced documents listed above)
to imbue otherwise passive surfaces with interactivity in
conjunction with Netpage sensing devices (described below).
[0118] When interacting with a Netpage coded surface, a Netpage
sensing device generates a digital ink stream which indicates both
the identity of the surface region relative to which the sensing
device is moving, and the absolute path of the sensing device
within the region.
[0119] Surface Coding
[0120] The Netpage surface coding consists of a dense planar tiling
of tags. Each tag encodes its own location in the plane. Each tag
also encodes, in conjunction with adjacent tags, an identifier of
the region containing the tag. In the Netpage system, the region
typically corresponds to the entire extent of the tagged surface,
such as one side of a sheet of paper.
[0121] Each tag is represented by a pattern which contains two
kinds of elements. The first kind of element is a target. Targets
allow a tag to be located in an image of a coded surface, and allow
the perspective distortion of the tag to be inferred. The second
kind of element is a macrodot. Each macrodot encodes the value of a
bit by its presence or absence.
[0122] The pattern is represented on the coded surface in such a
way as to allow it to be acquired by an optical imaging system, and
in particular by an optical system with a narrowband response in
the near-infrared. The pattern is typically printed onto the
surface using a narrowband near-infrared ink.
[0123] Tag Structure
[0124] FIG. 1 shows the structure of a complete tag 200. Each of
the four black circles 202 is a target. The tag 200, and the
overall pattern, has four-fold rotational symmetry at the physical
level.
[0125] Each square region represents a symbol 204, and each symbol
represents four bits of information. Each symbol 204 shown in the
tag structure has a unique label 216. Each label 216 has an
alphabetic prefix and a numeric suffix.
[0126] FIG. 2 shows the structure of a symbol 204. It contains four
macrodots 206, each of which represents the value of one bit by its
presence (one) or absence (zero).
[0127] The macrodot 206 spacing is specified by the parameter s
throughout this specification. It has a nominal value of 143 .mu.m,
based on 9 dots printed at a pitch of 1600 dots per inch. However,
it is allowed to vary within defined bounds according to the
capabilities of the device used to produce the pattern.
[0128] FIG. 3 shows an array 208 of nine adjacent symbols 204. The
macrodot 206 spacing is uniform both within and between symbols
208.
[0129] FIG. 4 shows the ordering of the bits within a symbol
204.
[0130] Bit zero 210 is the least significant within a symbol 204;
bit three 212 is the most significant. Note that this ordering is
relative to the orientation of the symbol 204. The orientation of a
particular symbol 204 within the tag 200 is indicated by the
orientation of the label 216 of the symbol in the tag diagrams (see
for example FIG. 1). In general, the orientation of all symbols 204
within a particular segment of the tag 200 is the same, consistent
with the bottom of the symbol being closest to the centre of the
tag.
[0131] Only the macrodots 206 are part of the representation of a
symbol 204 in the pattern. The square outline 214 of a symbol 204
is used in this specification to more clearly elucidate the
structure of a tag 204. FIG. 5, by way of illustration, shows the
actual pattern of a tag 200 with every bit 206 set. Note that, in
practice, every bit 206 of a tag 200 can never be set.
[0132] A macrodot 206 is nominally circular with a nominal diameter
of (5/9)s. However, it is allowed to vary in size by .+-.10%
according to the capabilities of the device used to produce the
pattern.
[0133] A target 202 is nominally circular with a nominal diameter
of (17/9)s. However, it is allowed to vary in size by .+-.10%
according to the capabilities of the device used to produce the
pattern.
[0134] The tag pattern is allowed to vary in scale by up to .+-.10%
according to the capabilities of the device used to produce the
pattern. Any deviation from the nominal scale is recorded in the
tag data to allow accurate generation of position samples.
[0135] Tag Groups
[0136] Tags 200 are arranged into tag groups 218. Each tag group
contains four tags arranged in a square. Each tag 200 has one of
four possible tag types, each of which is labelled according to its
location within the tag group 218. The tag type labels 220 are 00,
10, 01 and 11, as shown in FIG. 6.
[0137] FIG. 7 shows how tag groups are repeated in a continuous
tiling of tags, or tag pattern 222. The tiling guarantees the any
set of four adjacent tags 200 contains one tag of each type
220.
[0138] Codewords
[0139] The tag contains four complete codewords. The layout of the
four codewords is shown in FIG. 8. Each codeword is of a punctured
2.sup.4-ary (8, 5) Reed-Solomon code. The codewords are labelled A,
B, C and D. Fragments of each codeword are distributed throughout
the tag 200.
[0140] Two of the codewords are unique to the tag 200. These are
referred to as local codewords 224 and are labelled A and B. The
tag 200 therefore encodes up to 40 bits of information unique to
the tag.
[0141] The remaining two codewords are unique to a tag type, but
common to all tags of the same type within a contiguous tiling of
tags 222. These are referred to as global codewords 226 and are
labelled C and D, subscripted by tag type. A tag group 218
therefore encodes up to 160 bits of information common to all tag
groups within a contiguous tiling of tags.
Reed-Solomon Encoding
[0142] Codewords are encoded using a punctured 2.sup.4-ary (8, 5)
Reed-Solomon code. A 2.sup.4-ary (8, 5) Reed-Solomon code encodes
20 data bits (i.e. five 4-bit symbols) and 12 redundancy bits (i.e.
three 4-bit symbols) in each codeword. Its error-detecting capacity
is three symbols. Its error-correcting capacity is one symbol.
[0143] FIG. 9 shows a codeword 228 of eight symbols 204, with five
symbols encoding data coordinates 230 and three symbols encoding
redundancy coordinates 232. The codeword coordinates are indexed in
coefficient order, and the data bit ordering follows the codeword
bit ordering.
[0144] A punctured 2.sup.4-ary (8,5) Reed-Solomon code is a
2.sup.4-ary (15,5) Reed-Solomon code with seven redundancy
coordinates removed. The removed coordinates are the most
significant redundancy coordinates.
[0145] The code has the following primitive polynominal:
p(x)=x.sup.4+x+1 (EQ 1)
[0146] The code has the following generator polynominal:
g(x)=(x+.alpha.)(x+.alpha..sup.2) . . . (x+.alpha..sup.10) (EQ
2)
[0147] For a detailed description of Reed-Solomon codes, refer to
Wicker, S. B. and V. K. Bhargava, eds., Reed-Solomon Codes and
Their Applications, IEEE Press, 1994, the contents of which are
incorporated herein by reference.
[0148] The Tag Coordinate Space
[0149] The tag coordinate space has two orthogonal axes labelled x
and y respectively. When the positive x axis points to the right,
then the positive y axis points down.
[0150] The surface coding does not specify the location of the tag
coordinate space origin on a particular tagged surface, nor the
orientation of the tag coordinate space with respect to the
surface. This information is application-specific. For example, if
the tagged surface is a sheet of paper, then the application which
prints the tags onto the paper may record the actual offset and
orientation, and these can be used to normalise any digital ink
subsequently captured in conjunction with the surface.
[0151] The position encoded in a tag is defined in units of tags.
By convention, the position is taken to be the position of the
centre of the target closest to the origin.
[0152] Tag Information Content
[0153] Table 1 defines the information fields embedded in the
surface coding. Table 2 defines how these fields map to
codewords.
TABLE-US-00002 TABLE 1 Field definitions field width description
per codeword codeword type 2 The type of the codeword, i.e. one of
A (b'00'), B (b'01'), C (b'10') and D (b'11'). per tag tag type 2
The type.sup.1 of the tag, i.e. one of 00 (b'00'), 01 (b'01'), 10
(b'10') and 11 (b'11'). x coordinate 13 The unsigned x coordinate
of the tag.sup.2. y coordinate 13 The unsigned y coordinate of the
tag.sup.b. active area flag 1 A flag indicating whether the tag is
a member of an active area. b'1' indicates membership. active area
1 A flag indicating whether an active map flag area map is present.
b'1' indicates the presence of a map (see next field). If the map
is absent then the value of each map entry is derived from the
active area flag (see previous field). active area 8 A map.sup.3 of
which of the tag's map immediate eight neighbours are members of an
active area. b'1' indicates membership. data fragment 8 A fragment
of an embedded data stream. Only present if the active area map is
absent. per tag group encoding 8 The format of the encoding. format
0: the present encoding Other values are TBA. region flags 8 Flags
controlling the interpretation and routing of region-related
information. 0: region ID is an EPC 1: region is linked 2: region
is interactive 3: region is signed 4: region includes data 5:
region relates to mobile application Other bits are reserved and
must be zero. tag size 16 The difference between the actual tag
adjustment size and the nominal tag size.sup.4, in 10 nm units, in
sign-magnitude format. region ID 96 The ID of the region containing
the tags. CRC 16 A CRC.sup.5 of tag group data. total 320
.sup.1corresponds to the bottom two bits of the x and y coordinates
of the tag .sup.2allows a maximum coordinate value of approximately
14 m .sup.3FIG. 29 indicates the bit ordering of the map
[0154] FIG. 10 shows a tag 200 and its eight immediate neighbours,
each labelled with its corresponding bit index in the active area
map. An active area map indicates whether the corresponding tags
are members of an active area. An active area is an area within
which any captured input should be immediately forwarded to the
corresponding Netpage server for interpretation. It also allows the
Netpage sensing device to signal to the user that the input will
have an immediate effect.
TABLE-US-00003 TABLE 2 Mapping of fields to codewords codeword
field codeword bits field width bits A.sub. 1:0 codeword type 2 all
(b'00') 10:2 x coordinate 9 12:4 19:11 y coordinate 9 12:4 B.sub.
1:0 codeword type 2 all (b'01') 2 tag type 1 0 5:2 x coordinate 4
3:0 6 tag type 1 1 9:6 y coordinate 4 3:0 10 active area flag 1 all
11 active area map 1 all flag 19:12 active area map 8 all 19:12
data fragment 8 all C.sub.00 1:0 codeword type 2 all (b'10') 9:2
encoding format 8 all 17:10 region flags 8 all 19:18 tag size 2 1:0
adjustment C.sub.01 1:0 codeword type 2 all (b'10') 15:2 tag size
14 15:2 adjustment 19:16 region ID 4 3:0 C.sub.10 1:0 codeword type
2 all (b'10') 19:2 region ID 18 21:4 C.sub.11 1:0 codeword type 2
all (b'10') 19:2 region ID 18 39:22 D.sub.00 1:0 codeword type 2
all (b'11') 19:2 region ID 18 57:40 D.sub.01 1:0 codeword type 2
all (b'11') 19:2 region ID 18 75:58 D.sub.10 1:0 codeword type 2
all (b'11') 19:2 region ID 18 93:76 D.sub.11 1:0 codeword type 2
all (b'11') 3:2 region ID 2 95:94 19:4 CRC 16 all .sup.4the nominal
tag size is 1.7145 mm (based on 1600 dpi, 9 dots per macrodot, and
12 macrodots per tag) .sup.5CCITT CRC-16 [7]
[0155] Note that the tag type can be moved into a global codeword
to maximise local codeword utilization. This in turn can allow
larger coordinates and/or 16-bit data fragments (potentially
configurably in conjunction with coordinate precision). However,
this reduces the independence of position decoding from region ID
decoding and has not been included in the specification at this
time.
[0156] Embedded Data
[0157] If the "region includes data" flag in the region flags is
set then the surface coding contains embedded data. The data is
encoded in multiple contiguous tags' data fragments, and is
replicated in the surface coding as many times as it will fit.
[0158] The embedded data is encoded in such a way that a random and
partial scan of the surface coding containing the embedded data can
be sufficient to retrieve the entire data. The scanning system
reassembles the data from retrieved fragments, and reports to the
user when sufficient fragments have been retrieved without
error.
[0159] As shown in Table 3, a 200-bit data block encodes 160 bits
of data. The block data is encoded in the data fragments of A
contiguous group of 25 tags arranged in a 5.times.5 square. A tag
belongs to a block whose integer coordinate is the tag's coordinate
divided by 5. Within each block the data is arranged into tags with
increasing x coordinate within increasing y coordinate.
[0160] A data fragment may be missing from a block where an active
area map is present. However, the missing data fragment is likely
to be recoverable from another copy of the block.
[0161] Data of arbitrary size is encoded into a superblock
consisting of a contiguous set of blocks arranged in a rectangle.
The size of the superblock is encoded in each block. A block
belongs to a superblock whose integer coordinate is the block's
coordinate divided by the superblock size. Within each superblock
the data is arranged into blocks with increasing x coordinate
within increasing y coordinate.
[0162] The superblock is replicated in the surface coding as many
times as it will fit, including partially along the edges of the
surface coding.
[0163] The data encoded in the superblock may include more precise
type information, more precise size information, and more extensive
error detection and/or correction data.
TABLE-US-00004 TABLE 3 Embedded data block field width description
data type 8 The type of the data in the superblock. Values include:
0: type is controlled by region flags 1: MIME Other values are TBA.
superblock 8 The width of the superblock, in blocks. width
superblock 8 The height of the superblock, in height blocks. data
160 The block data. CRC 16 A CRC.sup.6 of the block data. total 200
.sup.6CCITT CRC-16 [7]
[0164] Cryptographic Signature of Region ID
[0165] If the "region is signed" flag in the region flags is set
then the surface coding contains a 160-bit cryptographic signature
of the region ID. The signature is encoded in a one-block
superblock.
[0166] In an online environment any signature fragment can be used,
in conjunction with the region ID, to validate the signature. In an
offline environment the entire signature can be recovered by
reading multiple tags, and can then be validated using the
corresponding public signature key. This is discussed in more
detail in Netpage Surface Coding Security section of the cross
reference co-pending application Docket No. NPS100US, the content
of which is incorporated within the present specification.
[0167] MIME Data
[0168] If the embedded data type is "MIME" then the superblock
contains Multipurpose Internet Mail Extensions (MIME) data
according to RFC 2045 (see Freed, N., and N. Borenstein,
"Multipurpose Internet Mail Extensions (MIME)--Part One: Format of
Internet Message Bodies", RFC 2045, November 1996), RFC 2046 (see
Freed, N., and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME)--Part Two: Media Types", RFC 2046, November 1996)
and related RFCs. The MIME data consists of a header followed by a
body. The header is encoded as a variable-length text string
preceded by an 8-bit string length. The body is encoded as a
variable-length type-specific octet stream preceded by a 16-bit
size in big-endian format.
[0169] The basic top-level media types described in RFC 2046
include text, image, audio, video and application.
[0170] RFC 2425 (see Howes, T., M. Smith and F. Dawson, "A MIME
Content-Type for Directory Information", RFC 2045, September 1998)
and RFC 2426 (see Dawson, F., and T. Howes, "vCard MIME Directory
Profile", RFC 2046, September 1998) describe a text subtype for
directory information suitable, for example, for encoding contact
information which might appear on a business card.
[0171] Encoding and Printing Considerations
[0172] The Print Engine Controller (PEC) supports the encoding of
two fixed (per-page) 2.sup.4-ary (15,5) Reed-Solomon codewords and
six variable (per-tag) 2.sup.4-ary (15,5) Reed-Solomon codewords.
Furthermore, PEC supports the rendering of tags via a rectangular
unit cell whose layout is constant (per page) but whose variable
codeword data may vary from one unit cell to the next. PEC does not
allow unit cells to overlap in the direction of page movement.
[0173] A unit cell compatible with PEC contains a single tag group
consisting of four tags. The tag group contains a single A codeword
unique to the tag group but replicated four times within the tag
group, and four unique B codewords. These can be encoded using five
of PEC's six supported variable codewords. The tag group also
contains eight fixed C and D codewords. One of these can be encoded
using the remaining one of PEC's variable codewords, two more can
be encoded using PEC's two fixed codewords, and the remaining five
can be encoded and pre-rendered into the Tag Format Structure (TFS)
supplied to PEC.
[0174] PEC imposes a limit of 32 unique bit addresses per TFS row.
The contents of the unit cell respect this limit. PEC also imposes
a limit of 384 on the width of the TFS. The contents of the unit
cell respect this limit.
[0175] Note that for a reasonable page size, the number of variable
coordinate bits in the A codeword is modest, making encoding via a
lookup table tractable. Encoding of the B codeword via a lookup
table may also be possible. Note that since a Reed-Solomon code is
systematic, only the redundancy data needs to appear in the lookup
table.
[0176] Imaging and Decoding Considerations
[0177] The minimum imaging field of view required to guarantee
acquisition of an entire tag has a diameter of 39.6s (i.e.
(2.times.(12+2)) {square root over (2)}s), allowing for arbitrary
alignment between the surface coding and the field of view. Given a
macrodot spacing of 143 .mu.m, this gives a required field of view
of 5.7 mm.
[0178] Table 4 gives pitch ranges achievable for the present
surface coding for different sampling rates, assuming an image
sensor size of 128 pixels.
TABLE-US-00005 TABLE 4 Pitch ranges achievable for present surface
coding for different sampling rates; dot pitch = 1600 dpi, macrodot
pitch = 9 dots, viewing distance = 30 mm, nib-to-FOV separation = 1
mm, image sensor size = 128 pixels sampling rate pitch range 2 -40
to +49 2.5 -27 to +36 3 -10 to +18
[0179] Given the present surface coding, the corresponding decoding
sequence is as follows: [0180] locate targets of complete tag
[0181] infer perspective transform from targets [0182] sample and
decode any one of tag's four codewords [0183] determine codeword
type and hence tag orientation [0184] sample and decode required
local (A and B) codewords [0185] codeword redundancy is only 12
bits, so only detect errors [0186] on decode error flag bad
position sample [0187] determine tag x-y location, with reference
to tag orientation [0188] infer 3D tag transform from oriented
targets [0189] determine nib x-y location from tag x-y location and
3D transform [0190] determine active area status of nib location
with reference to active area map [0191] generate local feedback
based on nib active area status [0192] determine tag type from A
codeword [0193] sample and decode required global (C and D)
codewords (modulo window alignment, with reference to tag type)
[0194] although codeword redundancy is only 12 bits, correct
errors; subsequent CRC verification will detect erroneous error
correction [0195] verify tag group data CRC [0196] on decode error
flag bad region ID sample [0197] determine encoding type, and
reject unknown encoding [0198] determine region flags [0199]
determine region ID [0200] encode region ID, nib x-y location, nib
active area status in digital ink [0201] route digital ink based on
region flags
[0202] Note that region ID decoding need not occur at the same rate
as position decoding.
[0203] Note that decoding of a codeword can be avoided if the
codeword is found to be identical to an already-known good
codeword.
Head Mounted Display
[0204] The Netpage system provides a paper- and pen-based interface
to computer-based and typically network-based information and
applications. The Netpage coding is discussed in detail above and
the Netpage pen is described in the above cross referenced
documents and in particular, a co-filed US application, temporarily
identified here by its docket NPS109US.
[0205] The Netpage Head Mounted Display is an augmented reality
device that can use surfaces coded with Netpage tag patterns to
situate a virtual image in a user's field of view. The virtual
imagery need not be in precise registration with the tagged
surface, but can be `anchored` to the tag pattern so that it
appears to be part of the user's physical environment regardless of
whether they change their direction of gaze.
[0206] Overview
[0207] A printed Netpage, when presented in a user's field of view
(FOV), can be augmented with dynamic imagery virtually projected
onto the page via a see-through head-mounted display (HMD) worn by
the user. The imagery is selected according to the unique identity
of the Netpage, and is virtually projected to match the
three-dimensional position and orientation of the page with respect
to the user. The imagery therefore appears locked to the surface of
the page, even as the position and orientation of the page changes
due to head or page movement. The HMD provides the correct
stereopsis, vergence and accommodation cues to allow fatigue-free
perception of the imagery "on" the surface. "Stereopsis",
"vergence" and "accommodation" relate to depth cues that the brain
uses for three dimensional spatial awareness of objects in the FOV.
These terms are explained below in the description of the Human
Visual System.
[0208] Although the imagery is "attached" to the surface, it can
still be three-dimensional and extend "out of the surface. The page
is coded with identity- and position-indicating tags in the usual
way, but at a larger scale to allow longer-range acquisition. The
HMD uses a Netpage sensor to image the tags and thereby identify
the page and determine its position and orientation. If the page
also supports pen interaction, then it may be coded with two sets
of tags at different scales and utilising different infrared inks;
or it may be coded with a multi-resolution tags which can be imaged
and decoded at multiple scales; or the HMD tag sensor can be
adapted to image and decode pen-scale tags. In any case the whole
page surface is ideally tagged so that it remains identifiable even
when partially obscured, such as by another page or by the user's
hand. The Netpage HMD is lightweight and portable. It uses a radio
interface to query a Netpage system and obtain static and dynamic
page data. It uses an on-board processor to determine page position
and orientation, and to project imagery in real time to minimise
display latency.
[0209] The Netpage HMD, in conjunction with a suitable Netpage,
therefore provides a situated virtual display (SVD) capability. The
display is situated in that its location and content are
page-driven. It is virtual in that it is only virtually projected
on the page and is therefore only seen by the user. Note that the
Netpage Viewer [8] and the Netpage Explorer [3] both provide
Netpage SVD capabilities, but in more constrained forms.
[0210] An SVD can be used to display a video clip embedded in a
printed news article; it can be used to show an object virtually
associated with a page, such as a "pasted" photo; it can be used to
show "secret" information associated with a page; and it can be
used to show the page itself, for example in the absence of ambient
light. More generally, an SVD can transform a page (or any surface)
into a general-purpose display device, and more generally still,
into a general-purpose computer system interface. SVDs can augment
or subsume all current "display" applications, whether they be
static or dynamic, passive or interactive, personal or shared,
including such applications as commercial print publications,
on-demand printed documents, product packaging, posters and
billboards, television, cinema, personal computers, personal
digital assistants (PDAs), mobile phones, smartphones and other
personal devices. As well as augmenting the planar surfaces of
essentially two-dimensional objects such as paper pages, SVDs can
equally augment the multi-faceted or non-planar surfaces of
three-dimensional objects.
[0211] Augmented reality in general typically relies on either a
see-through HMD or a video-based HMD [15]. A video-based HMD
captures video of the user's field of view, augments it with
virtual imagery, and redisplays it for the user's eyes to see. A
see-through HMD, as discussed above, optically combines virtual
imagery with the user's actual field of view. A video-based HMD has
the advantage that registration between the real world and the
virtual imagery is relatively easy to achieve, since parallax due
to eye position relative to the HMD doesn't occur. It has the
disadvantage that it is typically bulky and has a narrow field of
view, and typically provides poor depth cues.
[0212] As shown in FIGS. 11 and 12, a see-through HMD has the
advantage that it can be relatively less bulky with a wider field
of view, and can provide good depth cues. It has the disadvantage
that registration between the real world and the virtual imagery is
difficult to achieve without intrusive calibration procedures and
sophisticated eye tracking A HMD often relies on inertial tracking
to maintain registration during head movement, since fiducial
tracking is usually insufficiently fast, but this is a somewhat
inaccurate approach.
[0213] In a basic form, the HMD 300 may have a single display 302
for one eye only. However, as shown in FIG. 12 by using a wavefront
display 304, 306 for each eye respectively, the Netpage HMD 300
achieves perfect registration in a see-through display without
calibration or tracking
[0214] The use of fiducials in the real world to provide a basis
for registration is well-established in augmented reality
applications [15, 44]. However, fiducials are typically sparsely
placed, making fiducial detection complex, and the fiducial
encoding capacity is typically small, leading to a small fiducial
identity space and fiducial ambiguity in large installations.
[0215] The surface coding used by the Netpage system is dense,
overcoming sparseness issues encountered with fiducials. The
Netpage system guarantees global identifier uniqueness, overcoming
ambiguity issues encountered with fiducials. More broadly, the
Netpage system provides the first systematic and practical
mechanism for coding a significant proportion of the surfaces with
which people interact on a day-to-day basis, providing an
unprecedented opportunity to deploy augmented reality technology in
a consumer setting. The scope of Netpage applications, and the
universality of the devices used to interact with Netpage coded
surfaces, makes the acquisition and assimilation of Netpage devices
extremely attractive to consumers.
[0216] The tag image processing and decoding system developed for
Netpage operates in real time at high-quality display frame rates
(e.g. 100 Hz or higher). It therefore obviates the need for
inaccurate inertial tracking
[0217] The Human Visual System
[0218] The human eye consists of a converging lens system, made up
of the cornea and crystalline lens, and a light-sensitive array of
photoreceptors, the retina, onto which the lens system projects a
real image of the eye's field of view. The cornea provides a fixed
amount of focus which constitutes over two thirds of the eye's
focusing power, while the crystalline lens provides variable focus
under the control of the ciliary muscles which surround it. When
the muscles are relaxed the lens is almost flat and the eye is
focused at infinity. As the muscles contract the lens bulges,
allowing the eye to focus more closely. The point of closest
achievable focus, the near point, recedes with age. It may be less
than 10 cm in a teenager, but usually exceeds 25 cm by middle
age.
[0219] A diaphragm known as the iris controls the amount of light
entering the eye and defines its entrance pupil. It can expand to
as much as 8 mm in darkness and contract to as little as 2 mm in
bright light.
[0220] The limits of the visual field of the eye are about 60
degrees upwards, 75 degrees downwards, 60 degrees inwards (in the
nasal direction), and about 90 degrees outwards (in the temporal
direction). The visual fields of the two eyes overlap by about 120
degrees centrally. This defines the region of binocular vision.
[0221] The retina consists of an uneven distribution of about 130
million photoreceptor cells. Most of these, the so-called rods,
exhibit broad spectral sensitivity in the visible spectrum. A much
smaller number (about 7 million), the so-called cones, variously
exhibit three kinds of relatively narrower spectral sensitivity,
corresponding to short, medium and long wavelength parts of the
visible spectrum. The rods confer monochrome sensitivity in low
lighting conditions, while the cones confer color sensitivity in
relatively brighter lighting conditions. The human visual system
effectively interpolates short, medium and long-wavelength cone
stimuli in order to perceive spectral color.
[0222] The highest density of cones occurs in a small central
region of the retina known as the macula. The macula contains the
fovea, which in turn contains a tiny rod-free central region known
as the foveola. The retina subtends about 3.3 degrees of visual
angle per mm. The macula, at about 5 mm, subtends about 17 degrees;
the fovea, at about 1.5 mm, about 5 degrees; and the foveola, at
about 0.4 mm, about 1.3 degrees. The density of photoreceptors in
the retina falls off gradually with eccentricity, in line with
increasing photoreceptor size. A line through the center of the
foveola and the center of the pupil defines the eye's visual axis.
The visual axis is tilted inwards (in the nasal direction) by about
5 degrees with respect to the eye's optical axis.
[0223] The photoreceptors in the retina connect to about a million
retinal ganglion cells which convey visual information to the brain
via the optic nerve. The density of ganglion cells falls off
linearly with eccentricity, and much more rapidly than the density
of photoreceptors. This linear fall-off confers scale-invariant
imaging. In the foveola, each ganglion cell connects to an
individual cone. Elsewhere in the retina a single ganglion cell may
connect to many tens of rods and cones. Foveal visual acuity peaks
at around 4 cycles per degree, is a couple of orders of magnitude
less at 30 cycles per degree, and is immeasurable beyond about 60
cycles per degree [33]. This upper limit is consistent with the
maximum cone density in the foveola of around twice this number,
and the corresponding ganglion cell density. Visual acuity drops
rapidly with eccentricity. For a 5-degree visual field, it drops to
50% of peak acuity at the edges. For a 30-degree visual field, it
drops to 5%.
[0224] The human visual system provides two distinct modes of
visual perception, operating in parallel. The first supports global
analysis of the visual field, allowing a object of interest to be
detected, for example due to movement. The second supports detailed
analysis of the object of interest.
[0225] In order to perceive and analyse an object of interest in
detail, the head and/or the eyes are rapidly moved to align the
eyes' visual axes with the object of interest. This is referred to
as fixation, and allows high-resolution foveal imaging of the
object if interest. Fixational movements, or saccades, and
fixational pauses, during which foveal imaging takes place, are
interleaved to allow the brain to perceive and analyse an extended
object in detail. An initial gross saccade of arbitrary magnitude
provides initial fixation. This is followed by a series of finer
saccades, each of at most a few degrees, which scan the object onto
the foveola. Microsaccades, a fraction of a degree in extent, are
implicated in the perception of very fine detail, such as
individual text characters. An ocular tremor, known as nystagmus,
ensures continuous relative movement between the retina and a fixed
scene. Without this tremor, retinal adaptation would cause the
perceived image to fade out.
[0226] Although peripheral attention usually leads to foveal
attention via fixation, the brain is also capable of attending to a
peripheral point of interest without fixating on it.
[0227] Light emitted by a point source creates a series of
spherical wavefronts centered on the point source. When the
wavefronts impinge on the human eye, the human visual system is
able to change the shape of the crystalline lens to bring the
wavefronts to a point of focus on the retina. This is referred to
as accommodation. The curvature of each wavefront as it impinges on
the eye is the inverse of the distance from the point source to the
eye. The smaller the distance, the greater the wavefront curvature,
and the greater the accommodation required. The greater the
distance, the flatter the wavefronts, and the smaller the
accommodation required.
[0228] In order to fixate on a point source, the human visual
system rotates each eye so that the point source is aligned with
the visual axis of each eye. This is referred to as vergence.
Vergence in turn helps control the accommodation response, and a
mismatch between vergence and accommodation cues can therefore
cause eye strain.
[0229] The state of accommodation and vergence of the eyes in turn
provides the visual system with a cue to the distance from the eyes
to the point source, i.e. with a sense of depth.
[0230] The disparity between the relative positions of multiple
point sources in the two eyes' fields of view provides the visual
system with a cue to their relative depth. This disparity is
referred to as binocular parallax. The visual system's process of
fusing the inputs from the two eyes and thereby perceiving depth is
referred to as stereopsis. Stereopsis in turn helps achieve
vergence and accommodation.
[0231] Binocular parallax and motion parallax, i.e. parallax
induced by relative motion, are the two most powerful depth cues
used by the human visual system. Note that parallax may also lead
to an occlusion disparity.
[0232] The visual system's ability to locate a point source in
space is therefore determined by the center and radius of curvature
of the wavefronts emitted by the point source as they impinge on
the eyes. Furthermore, the discussion of point sources applies
equally to extended objects in general, by considering the surface
of each extended object as consisting of an infinite number of
point sources. In practice, due to the finite resolving power of
the visual system, a finite number of point sources is suffice to
model an extended object.
[0233] Persistence of vision describes the inability of the human
visual system, and the retina in particular, to detect changes in
intensity occurring above a certain critical frequency. This
critical fusion frequency (CFF) is between 50 and 60 Hz, and is
somewhat dependent on contrast and luminance conditions. It
provides the basis for the human visual system's flicker-free
perception of projected film and video.
[0234] Three-Dimensional Displays
[0235] If one imagines a spherical camera capable of capturing
three-dimensional images of its surrounding space, and a
corresponding spherical display capable of displaying them, then a
defining characteristic of the display is that it becomes invisible
when placed in the same location as the camera, no matter how it is
viewed. The display emits the same light as would have been emitted
by the space it occupies had it not been present. More
conventionally, one can imagine a camera surface capable of
recording all light penetrating it from one side, and a
corresponding display surface capable of emitting corresponding
light. This is illustrated in FIG. 13, where the camera 308 is
shown capturing a subset of rays 310 emitted by a pair of point
sources 312. FIG. 14 shows the display 314 is shown emitting
corresponding rays 316. In reality, a larger number of rays are
captured and displayed than shown in FIG. 14, so a viewer will
perceive the point sources 312 as being correctly located at fixed
points in three-dimensional space, independently of viewing
position.
[0236] The capture and manipulation of true three-dimensional image
data has been the subject of much research in recent years, mainly
for the purpose of constructing novel views. The images captured by
an infinite collection of infinitely small spherical cameras define
the so-called plenoptic function [42], while the light penetrating
an arbitrary surface in three dimensions defines a so-called light
field [36,30]. Both functions, although theoretically continuous,
are typically discretized for practical manipulation, and are
resampled to construct novel views. Although the discussion so far
has posited a 3D camera, the camera can be virtual and a light
field can be generated from a virtual 3D model.
[0237] A light field has the advantage that it captures both
position and occlusion parallax. It has the disadvantage that it is
data-intensive compared with a traditional 2D image. Conceptually,
compared with a view-dependent 2D image, a discretized
view-independent light field is defined by an array of 2D images,
each image corresponding to a pixel in the view-dependent image.
Although a light field can be used to generate a 2D image for a
novel view, it is expensive to directly display a 2D light field.
Because of this, 3D light field displays such as the lenslet
display described in [35] only support relatively low spatial
resolution. Furthermore, although the light field samples can be
seen as samples of a suitably low-pass filtered set of wavefronts,
the discrete light field display does not reconstruct the
continuous wavefronts which the samples represent, relying instead
on approximate integration by the human visual system.
[0238] Synthetic holographic displays have similar resolution
problems [52].
[0239] FIG. 15 shows a simple wavefront display 322 of a virtual
point source of light 318. In contrast to a discrete light field
display, a wavefront display emits a set of continuous spherical
wavefronts 324. The centre of curvature of each wavefront in the
set to the virtual point source of light 318. If the virtual point
318 was an actual point, it would be emitting spherical wavefronts
320. The wavefronts 324 emitted from the display 322 are equivalent
to the virtual wavefronts 320 had they passed through the display
322.
[0240] The advantage of the wavefront display 322 is that the
description of the input 3D image is much smaller than the
description of the corresponding light field, since it consists of
a 2D image augmented with depth information. The disadvantage of
this representation is that it fails to represent occlusion
parallax. However, in applications where occlusion parallax is not
important, the wavefront display has clear advantages.
[0241] A volumetric display acts as a simple wavefront display
[24], but has the disadvantage that the volume of the display must
encompass the volume of the virtual object being displayed.
[0242] A virtual retinal display [27], as discussed in the next
section, can act as a simple wavefront display when augmented with
a wavefront modulator [43]. Unlike a volumetric display, it can
simulate arbitrary depth. It can be further augmented with a
spatial light modulator [32] to support occlusions.
[0243] Many simpler display technologies have been developed which
provide some of the cues used by the human visual system to
perceive depth. These display technologies are predominantly
stereoscopic, i.e. they present a different view to each eye and
rely on binocular disparity to stimulate depth perception. In a
stereoscopic head-mounted display, left and right views are
presented directly to each eye. Left and right views may also be
spectrally multiplexed on a conventional display and viewed through
glasses with a different filter for each eye, or time-multiplexed
on a conventional display and viewed through glasses which shutter
each eye in alternating fashion. Polarization is also commonly used
for view separation. In an autostereoscopic display, so called
because it allows stereoscopic viewing without encumbering the
viewer with headgear or eyewear, strips of the left and right view
images are typically interleaved and displayed together. When
viewed through a parallax barrier or a lenticular array, the left
eye sees only the strips comprising the left image, and the right
eye sees only the strips comprising the right image. These displays
often only provide horizontal parallax, only support limited
variation in the position and orientation of the viewer, and only
provide two viewing zones, i.e. one for each eye. As discussed
above, arrays of lenslets can be used to directly display light
fields and thus provide omnidirectional parallax [35], dynamic
parallax barrier methods can be used to support wider movement of a
single tracked viewer [50], and multi-projector lenticular displays
can be used to provide a larger number of viewing zones to multiple
simultaneous viewers [40]. In a head-mounted display, motion
parallax results from rendering views according to the tracked
position and orientation of the viewer, whereas in a multiview
autostereoscopic system, motion parallax is intrinsic although
typically of lower quality.
[0244] The Netpage Head-Mounted Display
[0245] The Netpage HMD utilises a virtual retinal display.sup.7
(VRD) for each eye. A VRD projects a beam of light directly onto
the eye, and scans the beam rapidly across the eye in a
two-dimensional raster pattern. It modulates the intensity of the
beam during the scan, based on a source video signal, to produce a
spatially-varying image. The combination of human persistence of
vision and a sufficiently fast and bright scan creates the
perception of an object in the user's field of view. .sup.7Also
referred to as a Retinal Scanning Display (RSD).
[0246] The VRD utilises independent red, green and blue beams to
create a colour display. The tri-stimulus nature of the human
visual system allows a red-green-blue display system to stimulate
the perception of most perceptible colours. Although a colour
display capability is preferred, a monochromatic display capability
also has utility.
[0247] Rendering the image presented to each eye differently
according to eye separation and virtual object depth creates the
perception of depth via stereopsis. Adjusting the projection angle
into each eye to allow correct vergence further enhances depth
perception, as does adjusting the divergence of each beam to allow
correct accommodation. Apart from reinforcing depth perception,
consistent depth cues maximise viewer comfort.
[0248] Key to the operation of the Netpage HMD is the registration
of the image projected by the VRD with the surface of the Netpage
onto which the image is being virtually projected. By operating as
a limited wavefront display, a VRD allows this registration to be
achieved without requiring registration between the eye and the
VRD. In this regard it differs from screen-based HMDs, which
require careful calibration or monitoring of eye position relative
to the HMD to achieve and maintain registration. Thus the
view-independent nature of a wavefront display is exploited to
avoid registration between the eye and the HMD, rather than its
more conventional purpose of avoiding a HMD altogether in the
context of an autostereoscopic display. As an alternative to
exploiting a VRD for this purpose, a view-independent light field
display can also be used, using a much faster laser scan.
[0249] A VRD provides only a limited wavefront display capability
because of practical limits on the size of its exit pupil. Ideally
its exit pupil is large enough to cover the eye's maximum entrance
pupil, at any allowed position relative to the display. The
position of the eye's pupil relative to the display can vary due to
eye movements, variations in the placement of the HMD, and
variations in individual human anatomy. In practice it is
advantageous to track the approximate gaze direction of the eye
relative to the display, so that limited system resources can be
dedicated to generating display output where it will be seen and/or
at an appropriate resolution.
[0250] Tracking the pupil also allows the system to determine an
approximate point of fixation, which it can use to identify a
document of interest. In a Netpage context, projecting virtual
imagery onto the surface region to which the user is directing
foveal attention is most important. It is less critical to project
imagery into the periphery of the user's field of view. Gaze
tracking can also be used to navigate a virtual cursor, or to
indicate an object to be selected or otherwise activated, such as a
hyperlink.
[0251] In a Netpage context, the surface onto which the virtual
imagery is being projected can generally be assumed to be planar,
and for most applications the projected virtual object can
similarly be assumed to be planar. This simplifies the wavefront
display requirements of the Netpage HMD. In particular, the
wavefront curvature is not required to vary abruptly within a
scanline. Alternatively, if the curvature modulation mechanism is
slow, then the wavefront curvature can be fixed for an entire
frame, e.g. based on the average depth of the virtual object. If
the wavefront curvature cannot be varied automatically at all, then
the system may still provide the user with a manual adjustment
mechanism for setting the curvature, e.g. based on the user's
normal viewing distance. Alternatively, the wavefront curvature may
be fixed by the system based on a standard viewing distance, e.g.
50 cm, to maximise viewer comfort.
[0252] FIG. 16 shows a block diagram of a VRD suitable for use in
the Netpage HMD, similar in structure to VRDs described in [27, 28,
37 and 38].
[0253] The VRD as a whole scans a light beam across the eye 326 in
a two-dimensional raster pattern. The eye 326 focuses the beam 390
onto the retina to produce a spot which traces out the raster
pattern over time. At any given time, the intensity of the beam and
hence the spot represents the value of a single colour pixel in a
two-dimensional input image. Human persistence of vision fuses the
moving spot into the perception of a two-dimensional image. The
required pixel rate of the VRD is the product of the image
resolution and the frame rate. The frame rate in turn is at least
as high as the critical fusion frequency, and ideally higher (e.g.
100 Hz or more). By way of example, a frame rate of 100 Hz and a
spatial resolution 2000 pixels by 2000 pixels gives a pixel rate of
400 MHz and a line rate of 200 kHz.
[0254] A video generator 328 accepts a stream of image data 330 and
generates the requisite data and control signals 332 for displaying
the image data 330.
[0255] Light beam generators 334 generate red, green and blue beams
336, 338 and 340 respectively. Each beam generator 334 has a
matching intensity modulator 342, for modulating the intensity of
each beam according to the corresponding component of the pixel
colour 344 supplied by the video generator 328.
[0256] The beam generator 334 may be a gas or solid-state laser, a
light-emitting diode (LED), or a super-luminescent LED. The
intensity modulator 342 may be intrinsic to the beam generator or
may be a separate device. For example, a gas laser may rely on a
downstream acousto-optic modulator (AOM) for intensity modulation,
while a solid-state laser or LED may intrinsically allow intensity
modulation via its drive current.
[0257] Although FIG. 16 shows multiple beam generators 334 and
colour intensity modulators 342, a single monochrome beam generator
may be utilised if color projection is not required.
[0258] Furthermore, multiple beam generators and intensity
modulators may be utilised in parallel to achieve a desired pixel
rate. In general, any component of the VRD whose fundamental
operating rate limits the achievable pixel rate may be replicated,
and the replicated components operated in parallel, to achieve a
desired pixel rate.
[0259] A beam combiner 346 combines the intensity modulated colored
beams 348, 350 and 352 into a single beam 354 multiple colored
beams into a single beam suitable for scanning The beam combiner
may utilise multiple beam splitters.
[0260] A wavefront modulator 356 accepts the collimated input beam
354 and modulates its wavefront to induce a curvature which is the
inverse of the pixel depth signal 358 supplied by the video
generator 328. The pixel depth 358 is clipped at a reasonable
depth, beyond which the wavefront modulator 356 passes a collimated
beam. The wavefront modulator 356 may be a deformable membrane
mirror (DMM) [43, 51], a liquid-crystal phase corrector [47], a
variable focus liquid lens or mirror operating on an electrowetting
principle [16, 25], or any other suitable controllable wavefront
modulator. Depending on the time constant of the modulator 356, it
may be utilised to effect pixel-wise, line-wise or frame-wise
wavefront modulation, corresponding to pixel-wise, line-wise or
frame-wise constant depth. Furthermore, as mentioned earlier,
multiple wavefront modulators may be utilised in parallel to
achieve higher-rate wavefront modulation. If the operation of the
wavefront modulator is wavelength-dependent, then multiple
wavefront modulators may be employed beam-wise before the beams are
combined. Even if the wavefront modulator is incapable of random
pixel-wise modulation, it may still be capable of ramped modulation
corresponding to the linear change of depth within a single
scanline of the projection of a planar object.
[0261] FIG. 17a shows a simplified schematic of a DMM 360 used as a
wavefront modulator (see FIG. 16). When the DMM 360 is flat, i.e.
with no applied voltage (shown on the left), it reflects a
collimated beam 362. This corresponds to infinite pixel depth. FIG.
17b shows the DMM 360 deformed with an applied voltage. The
deformed DMM now reflects a converging beam 364 which becomes a
diverging beam 368 beyond the focal point 366. This corresponds to
a particular finite pixel depth.
[0262] FIG. 18a shows a simplified schematic of a variable focus
liquid lens 370 used as a wavefront modulator (and as part of the
beam expander). The lens is at rest with no applied voltage and
produces a converging beam 364 which is collimated by the second
lens 372. FIG. 18b shows the lens 370 deformed by an applied
voltage so that it produces a more converging beam 364 which is
only partially collimated by the second lens 372 to still produce a
diverging beam 368. A similar configuration can be used with a
variable focus liquid mirror instead of a liquid lens.
[0263] Referring again to FIG. 16, a horizontal scanner 374 scans
the beam in a horizontal direction, while a subsequent vertical
scanner 376 scans the beam in a vertical direction. Together they
steer the beam in a two-dimensional raster pattern. The horizontal
scanner 374 operates at the pixel rate of the VRD, while the
vertical scanner operates at the line rate. To prevent possible
beating between the frame rate and the frequency of microsaccades,
which are of the same order, it is useful for the pixel-rate scan
to occur horizontally with respect to the eye, since many
detail-oriented microsaccades, such as occur during reading, are
horizontal.
[0264] The horizontal scanner may utilise a resonant scanning
mirror, as described in [37]. Alternatively, it may utilise an
acousto-optic deflector, as described in [27,28], or any other
suitable pixel-rate scanner, replicated as necessary to achieve the
desired pixel rate.
[0265] Although FIG. 16 shows distinct horizontal and vertical
scanners, the two scanners may be combined in a single device such
as a biaxial MEMS scanner, as described in [37].
[0266] Similarly, FIG. 16 shows the video generator 328 producing
video timing signals 378 and 380, it may be convenient to derive
video timing from the operation of the horizontal scanner 374 if it
utilises a resonant design, since a resonant scanner's frequency is
determined mechanically. Furthermore, since a resonant scanner
generates a sinusoidal scan velocity, it is crucial to vary pixel
durations accordingly to ensure that their spatial extent is
constant [54].
[0267] An optional eye tracker 382 determines the approximate gaze
direction 384 of the eye 326. It may image the eye to detect the
position of the pupil as well as the position of the corneal
reflection of an infrared lightsource, to determine the approximate
gaze direction. Typical corneal reflection eye tracking systems are
described in [20,34]. Eye tracking in general is discussed in
[23].
[0268] Multiple off-axis light sources may be positioned within the
HMD, as prefigured in [14]. These can be lit in succession, so that
each successive image of the eye contains the reflection of a
single light source. The reflection data resulting from multiple
successive images can then be combined to determine gaze direction
384, either analytically or using least squares adjustment, without
requiring prior calibration of eye position with respect to the
HMD. An image of the infrared corneal reflection of a Netpage coded
surface in the user's field of view may also serve as the basis for
un-calibrated detection of gaze direction.
[0269] If the gaze direction 384 of both eyes is tracked, then the
resultant two fixation points can be averaged to determine the
likely true fixation point.
[0270] The tracked gaze direction 384 may be low-pass filtered to
suppress fine saccades and microsaccades.
[0271] An optional beam offsetter 386 acts on the gaze direction
384 provided by the eye tracker 382 to align the beam with the
pupil of the eye 326. The gaze direction 384 is simultaneously used
by a high-level image generator to generate virtual imagery offset
correspondingly.
[0272] Projection optics 388 finally project the beam 390 onto the
eye 326, magnifying the scan angle to provide the required field of
view angle. The projection optics include a visor-shaped optical
combiner which simultaneously reflects the generated imagery onto
the eye while passing light from the environment. The VRD thereby
acts as a see-through display. The visor is ideally curved, so that
it magnifies the projected imagery to fill the field of view.
[0273] The HMD as a whole, discussed below, ensures that the
projected imagery is registered with a physical Netpage coded
surface in the user's field of view. The optical transmission of
the combiner may be fixed, or it may be variable in response to
active control or ambient light levels. For example, it may
incorporate a liquid-crystal layer switchable between transmissive
and opaque states, either under user or software control.
Alternatively or additionally, it may incorporate a photochromic
material whose opacity is a function of ambient light levels.
[0274] The HMD correctly renders occlusions as part of any
displayed virtual imagery, according to the user's current
viewpoint relative to a tagged surface. It does not, however,
intrinsically support occlusion parallax according to the position
of the user's eye relative to the HMD unless it uses eye tracking
for this purpose. In the absence of eye tracking, the HMD renders
each VRD view according to a nominal eye position. If the actual
eye position deviates from the assumed eye position, then the
wavefront display nature of the VRD prevents misregistration
between the real world and the virtual imagery, but in the presence
of occlusions due to real or virtual objects, it may lead to object
overlap or holes.
[0275] Referring to FIG. 19, the VRD can be further augmented with
a spatial light (amplitude) modulator (SLM) such as a digital
micromirror device (DMD) [32, 48] to support occlusion parallax.
The SLM 392 is introduced immediately after the wavefront modulator
356 and before the raster scanner 374, 376. Alternatively, the SLM
392 is introduced immediately before the wavefront modulator (but
after its beam expander). The video generator 328 provides the SLM
392 with an occlusion map 394 associated with the current pixel.
The SLM passes non-occluded parts of the wavefront but blocks
occluded parts. The amplitude-modulation capability of the SLM may
be multi-level, and each map entry in the occlusion map may be
correspondingly multi-level.
[0276] However, in the limit case the SLM is a binary device, i.e.
either passing light or blocking light, and the occlusion map is
similarly binary.
[0277] To prevent holes appearing when a nominally invisible part
of the virtual scene becomes visible due to eye movement, the HMD
can make multiple passes to display multiple depth planes in the
virtual scene. The HMD can either render and display each depth
plane in its entirety, or can render and display only enough of
each depth plane to support the maximum eye movement possible.
[0278] FIG. 20 shows the wavefront display of FIG. 14 augmented
with support for displaying an occlusion 396.
[0279] FIG. 21 shows the DMM 360 of FIGS. 17a and 17b augmented
with a DMD SLM 392 to produce a VRD with occlusion support. The
"shadow" 398 of the virtual occlusion is a gap formed in the
cross-section of the beam reflected by the DMD 360 by the SLM
392.
[0280] Per-pixel occlusion maps are easily calculated during
rendering of a virtual model. They may also be derived directly
from a depth image. Where the occluding object is an object in the
real world, such as the user's hand (as discussed further below),
it may be represented as an opaque black virtual object during
rendering.
[0281] Table 5 gives examples of the viewing angle associated with
common media at various viewing distances. In the table, specified
values are shown shaded, while derived values are shown un-shaded.
For print media, various common viewing distances are specified and
corresponding viewing angles are derived. Required VRD image sizes
are then derived based representing a maximum feature frequency of
30 cycles per degree. For display media, various common viewing
angles are specified and corresponding viewing angles (and maximum
feature frequencies) are derived. For both media types the
corresponding surface resolution is also shown.
[0282] Based on their native resolution and human visual acuity,
display media such as HDTV video monitors are suited to a viewing
angle of between 30 and 40 degrees. This is consistent with viewing
recommendations for such display media. Based on their native size
and human accommodation limits, print media such as US Letter pages
are also suited to a viewing angle of 30 to 40 degrees.
[0283] A VRD image size of around 2000 pixels by 2000 pixels is
therefore adequate for virtualising these media. Significantly less
is required if knowledge of gaze direction is used to project
non-foveated parts of the image at lower resolution.
TABLE-US-00006 TABLE 5 Viewing parameters for different media
##STR00001## .sup.8In units of screen height .sup.9Per unit of
screen height .sup.10THX recommends 36 degrees in back row of
theatre .sup.11SMPTE EG-18-1994 recommends 30 degrees viewing
angle
[0284] FIG. 22 shows a block diagram of a Netpage HMD 300
incorporating dual VRDs 304 and 306 for binocular stereoscopic
display as shown in FIG. 14. Dual earphones 800 and 802 provide
stereophonic sound. Although dual VRDs are preferred, a single VRD
providing a monoscopic display capability also has utility (see
FIG. 13). Similarly, a single earphone also has utility.
[0285] Although VRDs or similar display devices are preferred for
incorporation in the Netpage HMD because they allow the
incorporation of wavefront curvature modulation, more conventional
display devices such as liquid crystal displays may also be
utilised, but with the added complexity of requiring more careful
head and eye position calibration or tracking. Conventional
LCD-based HMDs are described in detail in [45].
[0286] To maximise the operating range of the VRDs with respect to
eye movement, and to maximise user comfort, the optical axes of the
VRDs can be approximately aligned with the resting positions of the
two eyes by adjusting the lateral separation of the VRDs and
adjusting the tilt of the visor. This can be achieved as part of a
fitting process and/or performed manually by the user at any time.
Note again that the wavefront display capability of the VRDs means
that these adjustments are not required to achieve registration of
virtual imagery with the physical world.
[0287] A Netpage sensor 804 acquires images 806 of a Netpage coded
surface in the user's field of view. It may have a fixed viewing
direction and a relatively narrow field of view (of the order of
the minimum field of view required to acquire and decode a tag); a
variable viewing direction and a relatively narrow field of view;
or a fixed viewing direction and a relatively wide field of view
(of the order of the VRD viewing angle or even greater). In the
first case, the user is constrained to interacting with a Netpage
coded surface in the fixed and narrow field of view of the sensor,
requiring the head to be turned to face the Netpage of interest. In
the second case, the gaze-tracked fixation point can be used to
steer the image sensor's field of view, for example via a tip-tilt
mirror, allowing the user to interact with a Netpage by fixating on
it. In the third case, the gaze-tracked fixation point can be used
to select a sub-region of the sensor's field of view, again
allowing the user to interact with a Netpage by fixating on it. In
the second and third cases, and as described earlier, the user's
effective viewing angle is widened by using the tracked gaze
direction to offset the beam.
[0288] A controlling HMD processor 808 accepts image data 330 from
the Netpage sensor 804. The processor locates and decodes the tags
in the image data to generate a continuous stream of
identification, position and orientation information for the
Netpage being imaged. A suitable Netpage image sensor with an
on-board image processor, and the corresponding image processing
algorithm, tag decoding algorithm and pose (position and
orientation) estimation algorithm, are described in [9,59]. In the
HMD 300, the image sensor resolution is higher than described in
[9] to support a greater range of tag pattern scales. The sensor
utilises a small aperture to ensure good depth of field, and an
objective lens system for focusing, approximately as described in
[4].
[0289] The Netpage sensor 804 incorporates a longpass or bandpass
infrared filter matched to the absorption peak of the infrared ink
used to encode the HMD-oriented Netpage tag pattern. It also
includes a source of infrared illumination matched to the ink.
Alternatively it relies on the infrared component of ambient
illumination to adequately illuminate the tag pattern for imaging
purposes. In addition, large and/or distant SVDs (such as cinema
screens, billboards, and even video monitors) are usefully
self-illuminating, either via front or back illumination, to avoid
reliance on HMD illumination.
[0290] Alternatively or additionally to determining the actual
viewing distance of the tagged surface by analysing the scale and
perspective distortion of the tagged pattern images 806, the
Netpage sensor 804 may include an optical range finder.
Time-of-flight measurement of an encoded optical pulse train is a
well-established technique for optical range finding, and a
suitable system is described in [17].
[0291] The depth determined via the optical range finder can be
used by the HMD to estimate the expected scale of the imaged tag
pattern, thus making tag image processing more efficient, and it
can be used to fix the z depth parameter during pose estimation,
making the pose estimation process more efficient and/or accurate.
It can also be used to adjust the focus of Netpage sensor's optics,
to provide greater effective depth of field, and can be used to
change the zoom of the Netpage sensor's optics, to allow a smaller
image sensor to be utilised across a range of viewing distances,
and to reduce the image processing burden.
[0292] Zoom and/or focus control may be effected by moving a lens
element, as well as by modulating the curvature of a deformable
membrane mirror [43,51], a liquid-crystal phase corrector [47], or
other suitable device. Zoom may also be effected digitally, e.g.
simply to reduce the image processing burden.
[0293] Range-finding, whether based on pose estimation or
time-of-flight measurement, can be performed at multiple locations
on a surface to provide an estimate of surface curvature. The
available range data can be interpolated to provide range data
across the entire surface, and the virtual imagery can be projected
onto the resultant curved surface. The geometry of a tagged curved
surface may also be known a priori, allowing proper projection
without additional range-finding.
[0294] Rather than utilising a two-dimensional image sensor, the
Netpage sensor 804 may instead utilise a scanning laser, as
described in [5]. Since the image produced by the scanning laser is
not distorted by perspective, pose estimation cannot be used to
yield the z depth of the tagged surface. Optical (or other) range
finding is therefore crucial in this case. Pose estimation may
still be performed to determine three-dimensional orientation and
two-dimensional position. The optical range finder may be
integrated with the laser scanner, utilising the same laser source
and photodetector, and operating in multiplexed fashion with
respect to scanning
[0295] The frame rate of the Netpage sensor 804 is matched to the
frame rate of the image generator 328 (e.g. at least 50 Hz, but
ideally 100 Hz or more), so that the displayed image is always
synchronised with the position and orientation of the tagged
surface. Decoding of the page identifier embedded in the surface
coding can occur at a lower rate, since it changes much less often
than position.
[0296] Decoding of the page identifier can be triggered when a tag
pattern is re-acquired, and when the decoded position changes
significantly. Alternatively, if the least significant bits of the
page identifier are encoded in the same codewords which encode
position, then full page identifier decoding can be triggered by a
change in the least significant page identifier bits.
[0297] The imaging axis of the Netpage sensor emerges from the HMD
300 between and slightly above the eyes, and is roughly normal to
the face. Alternatively, the Netpage sensor 804 is arranged to
image the back of the visor, so that its imaging axis roughly
coincides with one eye's resting optical axis.
[0298] Although the HMD 300 incorporates a single Netpage sensor
804, it may alternatively incorporate dual Netpage sensors and be
configured to perform pose estimation across both image sensor's
acquired images. It may also incorporate multiple tag sensors to
allow tag acquisition across a wider field of view.
[0299] Various scenarios for connecting the HMD 300 to a Netpage
server 812 are illustrated in FIG. 23, FIG. 24 and FIG. 25.
[0300] A radio transceiver 810 (see FIG. 22) provides a
communications interface to a server such as a video server or a
Netpage server 812. The architecture of the overall Netpage system
with which the Netpage HMD 300 communicates is described in [1,
3].
[0301] The radio interface 810 may utilise any of a number of
protocols and standards, including personal-area and local-area
standards such as Bluetooth, IEEE 802.11, 802.15, and so on; and
wide-area mobile standards such as GSM, TDMA, CDMA, GPRS, etc. It
may also utilise different standards for outgoing and incoming
communication, for example utilising a broadcast standard for
incoming data, such as a satellite, terrestrial analogue or
terrestrial digital standard.
[0302] The HMD 300 may effect communication with a server 812 in a
multi-hop fashion, for example using a personal-area or local-area
connection to communicate with a relay device 816 which in turn
communicates with a server via communications network 814 for a
longer-range connection. It may also utilise multiple layers of
protocols, for example communicating with the server via TCP/IP
overlaid on a point-to-point Bluetooth connection to a relay as
well as on the broader Internet.
[0303] Alternatively or additionally, the HMD may utilise a wired
connection to a relay or server, utilising one or more of a serial,
parallel, USB, Ethernet, Firewire, analog video, and digital video
standard.
[0304] The relay device 816 may, for example, be a mobile phone,
personal digital assistant or a personal computer. The HMD may
itself act as a relay for other Netpage devices, such as a Netpage
pen [4], or vica versa.
[0305] In the Netpage architecture, the identifier of a Netpage is
used to identify a corresponding server which is able to provide
information about the page and handle interactions with the page.
When the HMD first encounters a new page identifier, it looks up a
corresponding server, for example via the DNS. Having identified a
server, it retrieves static and/or dynamic data associated with the
page from the server. Having retrieved the page data, an image
generator 328 renders the page data stereoscopically for the two
eyes according to the position and orientation of the Netpage with
respect to the HMD, and optionally according to the gaze directions
of the eyes. The generated stereo images include per-pixel depth
information which is used by the VRDs 304 and 306 to modulate
wavefront curvature (see FIG. 22).
[0306] Static page data may include static images, text, line art
and the like. Dynamic page data may include video 822, audio 824,
and the like.
[0307] A sound generator 820 renders the corresponding audio, if
any, optionally spatialised according to the relative positions of
the HMD and the coded surface, and/or the virtual position(s) of
the sound source(s) relative to the coded surface. Suitable audio
spatialisation techniques are described in [41].
[0308] The HMD may download dynamic data such as video and audio
into a local memory or disk device, or it may obtain such data in
streaming fashion from the server, with some degree of local
buffering to decouple the local playback rate from any variations
in streaming rate due to network behaviour.
[0309] Whether the image data is static or dynamic, the image
generator 328 constantly re-renders the page data to take into
account the current position and orientation of the Netpage with
respect to the HMD 300 (and optionally according to gaze
direction).
[0310] The frame rate of the image generator 328 and the VRDs 304,
306 is at least the critical fusion frequency and is ideally
faster. The frame rate of the image generator and the VRDs may be
different from the frame rate of a video stream being displayed by
the HMD 808. Ideally the image generator utilises motion estimation
to generate intermediate frames not explicitly present in the video
stream. Applicable techniques are described in [21, 39]. If the
video stream utilises a motion-based encoding scheme such as an
MPEG variant, then the HMD uses the motion information inherent in
the encoding to generate intermediate frames.
[0311] As an alternative to the image generator in the HMD
performing full page image rendering, the server may perform page
image rendering and transmit a corresponding video sequence to the
HMD. Because of the latency between pose estimation, image
rendering and subsequent display in this scenario, it is
advantageous to still transform the resultant video stream
according to pose in the HMD at the display frame rate.
[0312] More generally, whether image generation occurs on the
server or in the HMD, a dedicated image warper 826 can be utilised
to perspective-project the video stream according to the current
pose, and to generate image data at a rate and at a resolution
appropriate to the display, independent of the rate and resolution
of the image data generated by the image generator 328. This is
illustrated in FIG. 26.
[0313] Multi-pass perspective projection techniques are described
in [58]. Single-pass techniques and systems are described in [31,
2]. General techniques based on three-dimensional texture mapping
are described in [13]. Transforming an input image to produce a
perspective-projected output image involves low-pass filtering and
sampling the input image according to the projection of each output
pixel into the space of the input image, i.e. computing the
weighted sum of input pixels which contribute to each output pixel.
In most hardware implementations, such as described in [22], this
is efficiently achieved by trilinearly interpolating an image
pyramid which represents the input image at multiple resolutions.
The image pyramid is often represented by a mipmap structure [57],
which contains all power-of-two image resolutions. A mipmap only
directly supports isotropic low-pass filtering, which leads to a
compromise between aliasing and blurring in areas where the
projection is anisotropic. However, anisotropic filtering is
commonly implemented using mipmap interpolation by computing the
weighted sum of several mipmap samples.
[0314] In general, image generation for or in the HMD can make
effective use of multi-resolution image formats such as the
wavelet-based JPEG2000 image format, as well as mixed-resolution
formats such as Mixed Raster Content (MRC), which treats line art
and text differently to contone image data, and which is also
incorporated in JPEG2000.
[0315] If there is noticeable latency between initial acquisition
of a surface by the HMD, and subsequent display of virtual imagery
associated with that surface, then the HMD can signal acquisition
of the surface to the user to provide immediate feedback. For
example, the HMD can highlight or outline the surface. This also
serves to distinguish Netpage tagged surfaces from un-tagged
surfaces in the user's field of view. The tags themselves can
contain an indication of the extent of the surface, to allow the
HMD to highlight or outline the surface without interaction with a
server. Alternatively, the HMD can retrieve and display extent
information from the server in parallel with retrieving full
imagery.
[0316] The HMD may be split into a head-mounted unit and a control
unit (not shown) which may, for example, be worn on a belt or other
harness. If the beam generators are compact, then the head-mounted
unit may house the entire VRDs 304 and 306. Alternatively, the
control unit may house the beam generators and modulators, and the
combined beams may be transmitted to the head-mounted unit via
optic fibers.
[0317] As described earlier, the user may utilise gaze to move a
cursor within the field of view and/or to virtually "select" an
object. For example, the object may represent a virtual control
button or a hyperlink. The HMD can incorporate an activation
button, or "clicker" 828, as shown in FIG. 27, to allow the user to
activate the currently selected object. The clicker 828 can consist
of a simple switch, and may be mounted in any of a number of
convenient locations. For example, it may incorporated in a
belt-mounted control unit, or it may be mounted on the index finger
for activation by the thumb. Multiple activation buttons can also
be provided, analogously to the multiple buttons on a computer
mouse.
[0318] Gaze-directed cursor movement can be particularly effective
because the precision of the movement of the cursor relative to a
surface can be increased by simply bringing the surface closer to
the eye.
[0319] In the absence of precise gaze tracking, the user may move
their head to move a cursor and/or select an object, based simply
on the optical axis of the HMD itself.
[0320] The HMD can also provide cursor navigation buttons 830
and/or a joystick 832 to allow the user to move a cursor without
utilising gaze. In this case the cursor is ideally tied to the
currently active tagged surface, so that the cursor appears
attached to the surface when relative movement between the HMD and
the surface occurs. The cursor can be programmed to move at a
surface-dependent rate or a view-dependent rate or a compromise
between the two, to give the user maximum control of the
cursor.
[0321] The HMD can also incorporate a brain-wave monitor 834 to
allow the user to move the cursor, select an object and/or activate
the object by thought alone [60].
[0322] The HMD can provide a number of dedicated control buttons
836, e.g. for changing the cursor mode (e.g. between gaze-directed,
manually controlled, or none), as well as for other control
functions.
[0323] It is sometimes useful to dissociate a SVD from the physical
surface to which it is attached. The HMD can therefore provide a
control button 836 which allows the user to "lift" an SVD from a
surface and place it at a fixed location and in a fixed orientation
relative to the HMD field of view. The user may also be able to
move the lifted SVD, zoom in and zoom out etc., using virtual or
dedicated control buttons. The user may also benefit from zooming
the SVD in situ, i.e. without lifting it, for example to improve
readability without reducing the viewing distance.
[0324] Refiring back to FIG. 22, the HMD can include a microphone
838 for capturing ambient audio or voice input 840 from the user,
and a still or video camera for capturing still or moving images
844 of the user's field of view. All captured audio, image and
video input can be buffered indefinitely by the HMD as well as
streamed to a Netpage or other server 812 (FIGS. 23, 24 and 25) for
permanent storage. Audio and video recording can also operate
continuously with a fixed-size circular buffer, allowing the user
to always replay recent events without having to explicitly record
them.
[0325] The still or video camera 842 can be in line with the HMD's
viewing optics, allowing the user to capture essentially what they
see. The camera can also be stereoscopic. In a simpler
configuration, a single camera is mounted centrally and has an
imaging axis parallel to the viewing axes. In a more sophisticated
configuration, using appropriate beam-steering optics coupled with
the gaze tracking mechanism, the camera can follow the user's gaze.
The camera ideally provides automatic focus, but provides the user
with zoom control. Multiple cameras pointing in different
directions can also be deployed to provide panoramic or rear-facing
capture. Direct imaging of the cornea can also capture a wide-angle
view of the world from the user's point of view [49].
[0326] If the camera is placed in line with the viewing optics,
then the corresponding beam combiner can be an LCD shutter, which
can be closed during exposure to allow the optical path to be
dedicated to the camera during exposure. If the camera is a video
camera, then display and capture can be suitably multiplexed,
although with a concomitant loss of ambient light unless the
exposure time is short.
[0327] If the HMD incorporates a video camera, then the Netpage
sensor can be configured to use it. If the HMD incorporates a
corneal imaging video camera, then it can be utilized by the
gaze-tracking system as well as the Netpage sensor.
[0328] Audio and video control buttons, for settings as well as for
recording and playback, can be provided by the HMD virtually or
physically.
[0329] Binocular disparity between the images captured by a stereo
camera can be used by the HMD to detect foreground objects, such as
the user's hand or coffee cup, occluding the Netpage surface of
interest. It can use this to suppress rendering and/or projection
of the SVD where it is occluded. The HMD can also detect occlusions
by analysing the entire visible tagging of the Netpage surface of
interest.
[0330] An icon representing a captured image or video clip can be
projected by the HMD into the user's field of view, and the user
can select and operate on it via its icon. For example, the user
can "paste" it onto a tagged physical surface, such as a page in a
Netpage notebook. The image or clip then becomes permanently
associated with that location on the surface, as recorded by the
Netpage server, and is always shown at that location when viewed by
an authorized user through the HMD. Arbitrary virtual objects, such
as electronic documents, programs, etc., can be attached to a
Netpage surface in a similar way.
[0331] The source of an image or video clip can also be a separate
camera device associated with the user, rather than a camera
integrated with the HMD.
[0332] The HMD's microphone 838 and earphones 800, 802 allow it to
conveniently support telephony functions, whether over a local
connection such as Bluetooth or IEEE 802.11, or via a longer-range
connection such as GSM or CDMA. Voice may be carried via dedicated
voice channels, and/or over IP (VoIP). Telephony control functions,
such as dialling, answer and hangup, may be provided by the HMD via
virtual or physical buttons, may be provided by a separate physical
device associated with the HMD or more loosely with the user, or
may be provided by a virtual interface tied to a physical surface
[7].
[0333] The HMD's earphones allow it to support music playback, as
described in [8]. Audio can be copied or streamed from a server, or
played back directly from a storage device in the HMD itself.
[0334] The HMD ideally incorporates a unique identifier which is
registered to a specific user. This controls what the wearer of the
HMD is authorized to see.
[0335] The HMD can incorporate a biometric sensor, as shown in FIG.
28, to allow the system to verify the identity of the wearer. For
example, the biometric sensor may be a fingerprint sensor 846
incorporated in a belt-mounted control unit, or it may be a iris
scanner 848 incorporated in either or both the displays 304, 306
(see FIG. 22), possibly integrated with the gaze tracker 382 (see
FIG. 16).
[0336] The HMD can include optics to correct for deficiencies in a
user's vision, such as myopia, hyperopia, astigmatism, and
presbyopia, as well as non-conventional refractive errors such as
aberrations, irregular astigmatism, and ocular layer
irregularities. The HMD can incorporate fixed prescription optics,
e.g. integrated into the beam-combining visor, or adaptive optics
to measure and correct deficiencies on a continuous basis
[18,56].
[0337] The HMD can incorporate an accelerometer so that the
acceleration vector due to gravity can be detected. This can be
used to project a three-dimensional image properly if desired. For
example, during remote conferencing it may be desirable to always
render talking heads the right way up, independently of the
orientation of the surfaces to which they are attached. As a
side-effect, such projections will lean if centripetal acceleration
is detected, such as when turning a corner in a car.
[0338] The HMD incorporates a battery, recharged by removal and
insertion into a battery charger, or by direct connection between
the charger and the HMD. The HMD may also conveniently derive
recharging power on a continuous basis from an item of clothing
which incorporates a flexible solar cell [53]. The item may also be
in the shape of a cap or hat worn on the head, and the HMD may be
integrated with the cap or hat.
[0339] Surface Coding
[0340] The scale of the HMD-oriented Netpage tag pattern disposed
on a particular medium is matched to the minimum viewing distance
expected for that medium. The tag pattern is designed to allow the
Netpage sensor in the HMD to acquire and decode an entire tag at
the minimum supported viewing distance. The pixel resolution of the
Netpage image sensor then determines the maximum supported viewing
distance for that medium. The greater the supported maximum viewing
distance, the smaller the tag pattern projected on the image
sensor, and the greater the image sensor resolution required to
guarantee adequate sampling of the tag pattern. Surface tilt also
increases the feature frequency of the imaged tag pattern, so the
maximum supported surface tilt must also be accommodated in the
selected image sensor resolution.
[0341] The basis for a suitable Netpage tag pattern is described in
[6]. The hexagonal tag pattern described in the reference requires
a sampling field of view with a diameter of 36 features. This
requires an image sensor with a resolution of at least 72.times.72
pixels, assuming minimal two-times sampling. By way of example,
assuming arbitrarily that the Netpage sensor in the HMD has an
angular field of view of 10 degrees, and assuming the minimum
supported viewing distance for a hand-held printed page is 30 cm,
an appropriate HMD-oriented Netpage tag pattern has a scale of
about 1.5 mm per feature (i.e. 30 cm.times.tan(5)/(36/2)). Further
assuming the maximum supported viewing distance is 120 cm (i.e.
4.times.30 cm), the required image sensor resolution is
288.times.288 pixels (i.e. 4.times.72). Greater image sensor
resolution allows for a greater range of viewing distances. By
comparison, assuming the minimum supported viewing distance for a
large-screen "HDTV" Netpage is 2 m, an appropriate HMD-oriented
Netpage tag pattern has a scale of about 1 cm per feature (i.e. 2
m.times.tan(5)/(36/2)), and the same image sensor supports a
maximum viewing distance of 8 m (i.e. 4.times.2 m). By way of
further comparison, assuming the minimum supported viewing distance
for a billboard Netpage mounted on the side of a building is 30 m,
an appropriate HMD-oriented Netpage tag pattern has a scale of
about 15 cm per feature (i.e. 30 m.times.tan(5)/(36/2)), and the
same image sensor supports a maximum viewing distance of 120 m
(i.e. 4.times.30 m).
[0342] Although it is useful for particular media types to utilise
a consistent tag pattern scale, it is also possible for individual
users to select a tag pattern scale suited to their particular
viewing preferences. This is particularly convenient when the
Netpages in question are printed on demand.
[0343] It is useful to encode the scale of a tag pattern in the
data encoded in the pattern, so that a decoding device such as the
Netpage HMD can determine the scale and hence the absolute viewing
distance without reference to associated information. However, if
it is not convenient to encode a scale factor in the tag data, then
the scale factor can be recorded by the corresponding Netpage
server, either per page instance or per page type. The HMD then
obtains the scale factor from the server once it has identified the
page. In general, the server records the scale factor as well as an
affine transform which relates the coordinate system of the tag
pattern to the coordinate system of the physical page.
[0344] As described earlier, if a Netpage surface also supports pen
interaction, then it may be coded with two sets of tags utilising
different infrared inks, one set of tags printed at a pen-oriented
scale, and the other set of tags printed at a HMD-oriented scale,
as discussed above. Alternatively the surface may be coded with
multi-resolution tags which can be imaged and decoded at multiple
scales. In another option, the HMD tag sensor is capable of
acquiring and decoding pen-scale tags, then a single set of tags is
sufficient. A laser scanning Netpage sensor is capable of acquiring
pen-scale tags at normal viewing distances such as 30 cm to 120
cm.
[0345] Since the virtual imagery displayed by the HMD is
effectively added to the user's view of the real world, the
physical Netpage surface region onto which the imagery is virtually
projected is ideally printed black. It is impractical to
selectively change the opacity of the HMD visor, since the beam
associated with a single pixel may cover the entire exit pupil of
the VRD, depending on its depth.
[0346] Tags are ideally disposed on a surface invisibly, e.g. by
being printed using an infrared ink. However, visible tags may be
utilised where invisibility is impractical. Although printing is an
effective mechanism for disposing tags on a surface, tags may also
be manufactured on or into a surface, such as via embossing.
Although inkjet printing is an effective printing mechanism, other
printing mechanisms may also be usefully employed, such as laser
printing, dye sublimation, thermal transfer, lithography, offset,
gravure, etc.
[0347] Neither pen-oriented nor HMD-oriented Netpage tags are
limited in their application to surfaces traditionally associated
with publications, displays and computer interfaces. For example,
tags can also be applied to skin in the form of temporary or
permanent tattoos; they can be printed on or woven into textiles
and fabric; and in general they can be applied to any physical
surface where they have utility. HMD-oriented tags, because of
their intrinsically larger scale, are more easily applied to a wide
range of surfaces than pen-oriented tags.
[0348] Applications
[0349] FIG. 29 shows a mockup of a printed page 850 containing a
typical arrangement of text 858, graphics and images 842. The page
850 also includes two invisible tag patterns 854 and 856. One tag
pattern 854 is scaled for close-range imaging by a Netpage stylus
or pen or other device typically in contact with or in close
proximity to the page 850. The other tag pattern 856 is scaled for
longer-range imaging by a Netpage HMD. Either tag pattern may be
optional on any given page.
[0350] FIG. 30 shows the page 850 of FIG. 29 augmented with a
virtual embedded video clip 860 when viewed through the Netpage
HMD, i.e. the video clip 860 is a dedicated situated virtual
display (SVD) on the page. The video clip appears with playback
controls 862. A playback control buttons can be activated using a
Netpage stylus or pen 8 (see FIG. 31). Alternatively a control
button can be selected and activated via the HMD's clicker as
described earlier. The control buttons 862 can also be printed on
the page 850. Alternatively still, a generic Netpage remote control
may be utilised in conjunction with the Netpage HMD. The remote
control may provide generic media playback control buttons, such as
play, pause, stop, rewind, skip forwards, skip backwards, volume
control, etc. The Netpage system can interpret playback control
commands received from a Netpage remote control associated with a
user as pertaining to the user's currently selected media object
(e.g. video clip 860).
[0351] The video clip 860 is just one example of the use of an SVD
to augment a document. In general, an arbitrary interactive
application with a graphical user interface can make use of an SVD
in the same manner.
[0352] FIG. 31 shows a four-function calculator application 864
embedded in a page 850, with the page augmented with a virtual
display 866 for the calculator. The input buttons 868 for the
calculator are printed on the page, but could also be displayed
virtually.
[0353] FIG. 32 shows a page 850 augmented with a display 870 for
confidential information only intended for the user. As described
earlier, apart from registration of the HMD as belonging to the
user, the HMD may verify user identify via a biometric measurement.
Alternatively, the user may be required to provide a password
before the HMD will display restricted information.
[0354] FIG. 33 shows the page 850 of FIG. 29 augmented with virtual
digital ink 9 drawn using a non-marking Netpage stylus or pen 8.
Virtual digital ink has the advantage that it can be virtually
styled, e.g. with stroke width, colour, texture, opacity,
calligraphic nib orientation, or artistic style such as airbrush,
charcoal, pencil, pen, etc. It also has the advantage that it is
only seen by authorized users via their HMDs (or via Netpage
browsers).
[0355] If all "pen" input is virtual, then multiple physical
instances of the same logical Netpage page instance can be printed
and used as a basis for remote collaboration or conferencing. Any
digital ink 9 drawn virtually by one authorized user
instantaneously appears "on" the other instances of the page 850
when viewed by other authorized users.
[0356] Even on different logical instances of a page a subregion
can be mapped to a shared "whiteboard" for remote collaboration and
conferencing purposes.
[0357] Physical and virtual digital ink can also co-exist on the
same physical page.
[0358] Whether Netpage pen input actually marks the page or is only
displayed virtually, and whether pen input is created relative to
page content printed physically or displayed virtually, the pen
input is captured by the Netpage system as digital ink and is
interpreted in the context of the corresponding page description.
This can include interpreting it as an annotation, as streaming
input to an application, as form input to an application (e.g.
handwriting, a drawing, a signature, or a checkmark), or as control
input to an application (e.g. a form submission, a hyperlink
activation, or a button press) [3].
[0359] FIG. 34 shows another version of the page 850 of FIG. 29,
where even the static page content 858 and 852 is virtual and is
only seen via the Netpage HMD (or the Netpage browser). In this
case the entire page can be thought of as a dedicated SVD for the
static and dynamic content of the page. Only the tag pattern(s)
854, 856 exist on the physical page, and the virtual content is
associated with the page, possibly by "printing" onto the page by
passing it through a virtual "printer" device. The virtual Netpage
printer simply determines the page ID of each page which passes
through it and associates it with the next document page. The
association between page ID and page content is still recorded by
the Netpage server in the usual way.
[0360] Physical pages can be manufactured from durable plastic and
can be tagged during manufacture rather than being tagged on
demand. They can be re-used repeatedly. New content can be
"printed" onto a page by passing it through a virtual Netpage
printer. Content can be wiped from a page by passing it through a
virtual Netpage shredder. Content can also be erased using various
forms of Netpage erasers. For example, a Netpage stylus or pen
operating in one eraser mode may only be capable of erasing digital
ink, while operating in another eraser mode may also be capable of
erasing page content.
[0361] Fully virtualising page content has the added advantage that
pages can be viewed and read in ambient darkness.
[0362] Although not shown in the figures, regions which are
augmented with virtual content (such as video clips and the like)
are ideally printed in black. Since the output of the Netpage HMD
is added to the page, it is ideally added to black to create color
and white. It cannot be used to subtract color from white to create
black. In regions where black is impractical, such as when
annotating physical page content with virtual digital ink, the
brightness of the HMD output is sufficiently high to be clearly
visible even with a white page in the background.
[0363] If plastic blanks are used and all page content is virtual,
then the blanks are also ideally black, and matte to prevent
specular reflection of ambient light.
[0364] FIG. 35 shows a mobile phone device 872 incorporating an
SVD. Like the document page discussed above, the display surface
874 includes a tag pattern scaled for longer-range imaging by a
Netpage HMD 856. It also optionally includes a tag pattern 854
scaled for close-range imaging by a Netpage stylus or pen 8, for
"touch-screen" operation.
[0365] The extent of the SVD 876 need not be constrained by the
physical size of the device to which it is "attached". As shown in
FIG. 36, the display 876 can protrude laterally beyond the bounds
of the device 872.
[0366] The SVD 876 can also be used to virtualise the input
functions on the device 872, such as the keypad in this case, as
shown in FIG. 37.
[0367] Generally also, the SVD 876 can overlay the conventional
display 874 of the device 872, such as an LCD or OLED. The user may
then choose to use the built-in display 874 or the SVD 876
according to circumstance.
[0368] Although the examples show a mobile phone device 872, the
same approach applies to any portable device incorporating a
display and/or a control interface, including a personal digital
assistant (PDA), an music player, A/V remote control, calculator,
still or video camera, and so on.
[0369] Since, as discussed earlier, the physical surface 874 of an
SVD 876 is ideally matte black, it provides an ideal place to
incorporate a solar cell into the device 872 for generating power
from ambient light.
[0370] FIG. 38 shows an SVD 876 used as a cinema screen 878. Note
that the scale of the HMD-oriented tag pattern 856 is much larger
than in the cases described above, because on the much larger
average viewing distance.
[0371] The movie is virtually projected from a video source 880,
either via direct streaming from a video transmitter 882 to the
Netpage HMDs of the members of the audience 884, or via a Netpage
server 812 and an arbitrary communications network 814.
[0372] Individual delivery of content to each audience member
during an otherwise "shared" viewing experience has the advantage
that it can allow individual customisation. For example, specific
edits can be delivered according to age, culture or other
preference; each individual can specify language, subtitle display,
audio settings such as volume, picture settings such as brightness,
contrast, color and format; and each individual may be provided
with personal playback controls such as pause, rewind/replay, skip
etc.
[0373] In a public performance scenario, a Netpage-encoded printed
ticket can act as a token which gives a HMD access to the move. The
ticket can be presented in the field of view of the tag sensor in
the HMD, and the HMD can present the scanned ticket information to
the projection system to gain access.
[0374] FIG. 39 shows an SVD used as a video monitor 886, e.g. to
display pre-recorded or live video from any number of sources
including a television (TV) receiver 888, video cassette recorder
(VCR) 890, digital versatile disc (DVD) player 892, personal video
recorder (PVR) 894, cable video receiver/decoder 896, satellite
video receiver/decoder 898, Internet/Web interface 900, or personal
computer 902. Again note that the scale of the HMD-oriented tag
pattern 856 is larger than in the page and personal device cases
described above, but smaller than in the cinema case.
[0375] The video switch 906 directs the video signal from one of
the video sources (888-902), to the Netpage HMDs 300 of one or more
users. The video is delivered via direct streaming from a video
transmitter 882 or a Netpage server 812 and an arbitrary
communications network 814.
[0376] As in the case of cinema described above, video delivered
via an SVD has the advantage can be individually customised.
[0377] FIG. 40 shows an SVD used as a computer monitor 914. The
monitor surface includes a tag pattern scaled for imaging by a
Netpage HMD 856. It also optionally includes a tag pattern scaled
for close-range imaging 854 by a Netpage stylus or pen 8, for
"touch-screen" operation. Video output from the personal computer
902 or workstation is delivered either via direct streaming from a
video transmitter 882 to the Netpage HMDs 300 of one or more users,
or via a Netpage server 812 and an arbitrary communications network
814.
[0378] Another input device 908 is also optionally provided, tagged
with a stylus-oriented tag pattern 854. The input device can be
used to provide a tablet and/or a virtualised keyboard 910, as well
as other functions. Input from the stylus or pen 8 is transmitted
to a Netpage server 912 in the usual way, for interpretation and
possible forwarding. Although shown separately, the Netpage server
812 may be executing on the personal computer 902.
[0379] Multiple monitors 908 may be used in combination, in various
configurations.
[0380] Advertising in public spaces, if virtually displayed, can be
targeted according to the demographic of each individual viewer.
People may be rewarded for opting in and providing a demographic
profile. Virtually displayed advertising can be more finely
segmented, both time-wise, according to how much an advertiser is
willing to pay, and according to demographic. Targeting can also
occur according to time-of-day, day-of-week, season, weather,
external event etc.
[0381] If the advertising appears in (or is attached to) a movable
object such as a magazine, newspaper, train, bus or taxi poster, or
product packaging, then the advertising content can also be
targeted according the instantaneous location of the viewer, as
indicated by a location device associated with the user, such as a
GPS receiver.
[0382] If the HMD incorporates gaze tracking, then gaze direction
information can be used to provide statistical information to
advertisers on which elements of their advertising is catching the
gaze of viewers, i.e. to support so-called "copy testing". More
directly, gaze direction can be used to animate an advertising
element when the user's gaze strikes it.
[0383] The Netpage HMD can be used to search a physical space, such
as a cluttered desktop, for a particular document. The user first
identifies the desired document to the Netpage system, perhaps by
browsing a virtual filing cabinet containing all of the user's
documents. The HMD is then primed to highlight the document if it
is detected in the user's field of view. The Netpage system informs
the HMD of the relation between the tags of the desired document
and the physical extent of the document, so that the HMD can
highlight the outline of the document when detected.
[0384] The user's virtual filing cabinet can be extended to
contain, either actually or by reference, every document or page
the user has ever seen, as detected by the Netpage HMD. More
specifically, in conjunction with gaze tracking, the system can
mark the regions the user has actually looked at. Furthermore, by
detecting the distinctive saccades associated with reading, the
system can mark, with reasonable certainty, text passages actually
read by the user. This can subsequently be used to narrow the
context of a content search.
[0385] One of the advantages of the Netpage HMD is that it allows
the user to consume and interact with information privately, even
when in a public place. However, because each pixel is projected in
succession, a snooper can build a simple detection device to
collect each pixel in turn from any stray light emitted by the HMD,
and re-synchronise it after the fact to regenerate a sequence of
images. To combat this, the HMD can emit random stray light at the
pixel rate, to swamp any meaningful stray light from the display
itself.
[0386] A non-planar three-dimensional object, if unadorned but
tagged on some or all of its faces, may act as a proxy for a
corresponding adorned object. For example, a prototyping machine
may be used to fabricate a scale model of a concept car. Disposing
tags on the surface of the prototype then allows color, texture and
fine geometric detail to be virtually projected onto the surface of
the car when viewed through a Netpage HMD.
[0387] More simply, a pre-manufactured and pre-tagged shape such as
a sphere, ellipsoid, cube or parallelopiped of a certain size can
be used as a proxy for a more complicated shape. Virtual projection
onto its surface can be used to imbue it with apparent geometry, as
well as with color, texture and fine geometric detail.
REFERENCES
[0388] The following references are incorporated herein by
cross-reference. [0389] Lapstun, P. and K. Silverbrook, "Method and
System for Printing a Document", U.S. Pat. No. 6,728,000, issued 27
Apr. 2004 [0390] [2] Silverbrook, K. and P. Lapstun, "Digital Image
Warping System", U.S. Pat. No. 6,636,216, issued 21 Oct. 2003
[0391] [3] see Appendix A [0392] Silverbrook Research, "Sensing
device for coded data", U.S. patent application Ser. No. 10/815,636
(Docket Number HYJ001), filed 2 Apr. 2004, claiming priority from
[9,11,12] [0393] [5] Silverbrook Research, "Laser scanner device
for printed product identification codes", U.S. patent application
Ser. No. 10/815,609 (Docket Number HYT001), filed 2 Apr. 2004,
claiming priority from [11,12] [0394] [6] Silverbrook Research,
"Rotationally symmetric tags", U.S. patent application Ser. No.
10/309,358, filed 4 Dec. 2002 [0395] Silverbrook Research, "Method
and system for telephone control", U.S. patent application Ser. No.
09/721,895, filed 25 Nov. 2000 [0396] [8] Silverbrook Research,
"Viewer with code sensor", U.S. patent application Ser. No.
09/722,175, filed 25 Nov. 2000 [0397] [9] Silverbrook Research,
"Image sensor with digital framestore", U.S. patent application
Ser. No. 10/778,056 (Docket Number NPS047), filed 17 Feb. 2004,
claiming priority from [10] [0398] [10] Silverbrook Research,
"Methods, systems and apparatus", Australian Provisional Patent
Application 2003900746 (Docket Number NPS041), filed 17 Feb. 2003
[0399] [11] Silverbrook Research, "Methods and systems for object
identification and interaction", Australian Provisional Patent
Application 2003901617 (Docket Number NIR002), filed 7 Apr. 2003
[0400] [12] Silverbrook Research, "Methods and systems for object
identification and interaction", Australian Provisional Patent
Application 2003901795 (Docket Number NIR005), filed 15 Apr. 2003
[0401] [13] Akenine-Msller, T, and E. Haines, Real-Time Rendering,
Second Edition, A K Peters 2002 [0402] [14] Amir, A., M. D.
Flickner, D. B. Koons and C. H. Morimoto, "System and Method for
Eye Gaze Tracking Using Corneal Image Mapping", U.S. Pat. No.
6,659,611, issued 9 Dec. 2003 [0403] [15] Behringer, R., G.
Klinker, and D. W. Mizell, eds., Augmented Reality: Placing
Artificial Objects in Real Scenes: Proceedings of IWAR '98, A K
Peters 1999 [0404] [16] Berge, B., and J. Peseux, "Lens with
variable focus", U.S. Pat. No. 6,369,954, issued 9 Apr. 2002 [0405]
[17] Bloebaum, F., "Method and Apparatus for Determining the Light
Transit Time Over a Measurement Path Arranged Between a Measuring
Apparatus and a Reflecting Object", U.S. Pat. No. 5,805,468, issued
9 Sep. 1998 [0406] [18] Blum, R. D., D. P. Dustin, and D. Katzman,
"Method for refracting and dispensing electro-active spectacles",
U.S. Pat. No. 6,733,130, issued 11 May 2004 [0407] [19] Cameron, C.
D., D. A. Pain, M. Stanley, and C. W. Slinger, "Computational
challenges of emerging novel true 3D holographic displays",
Critical Technologies for the Future of Computing, Proceedings of
SPIE Vol. 4109, 2000, pp. 129-140 [0408] [20] Cleveland, D., J. H.
Cleveland and P. L. Norloff, "Eye Tracking Method and Apparatus",
U.S. Pat. No. 5,231,674, issued 27 Jul. 1993 [0409] [21] Demos, G.
E., "System and Method for Motion Compensation and Frame Rate
Conversion", U.S. Pat. No. 6,442,203, issued 27 Aug. 2002 [0410]
[22] Dignam, D. L., "Circuit and method for trilinear filtering
using texels from only one level of detail", U.S. Pat. No.
6,452,603, issued 17 Sep. 2002 [0411] [23] Duchowski, A. T., Eye
Tracking Methodology, Theory and Practice, Springer-Verlag 2003
[0412] [24] Favalora, G. E., J. Napoli, D. M. Hall, R. K. Dorval,
M. G. Giovinco, M. J. Richmond, and W. S. Chun, "100 Million-voxel
volumetric display", Cockpit Displays IX: Displays for Defense
Applications, Proceedings of SPIE Vol. 4712, 2002, pp. 300-312
[0413] [25] Feenstra, B. J., S. Kuiper, S. Stallinga, B. H. W.
Hendriks, and R. M. Snoeren, "Variable focus lens", PCT Patent
Application WO 03/069380, filed 24 Jan. 2003 [0414] [26] Fulton, J.
T., Processes in Biological Vision, http://www.4colorvision.com
[0415] [27] Furness III, T. A., and J. S. Kollin, "Retinal Display
Scanning of Image with Plurality of Image Sectors", U.S. Pat. No.
6,639,570, issued 28 Oct. 2003 [0416] [28] Furness III, T. A., and
J. S. Kollin, "Virtual Retinal Display", U.S. Pat. No. 5,467,104,
issued 14 Nov. 1995 [0417] [29] Gerhard, G. J., C. T. Tegreene, and
B. Z. Eslam, "Scanned Display with Pinch, Timing, and Distortion
Correction", 5 Aug. 1998 [0418] [30] Gortler, S. J., R. Grzeszczuk,
R. Szeliski, and M. F. Cohen, "The Lumigraph", ACM Computer
Graphics Proceedings, Annual Conference Series, 1996, pp. 43-54
[0419] [31] Heckbert, P. S., "Survey of Texture Mapping", IEEE
Computer Graphics & Applications 6(11), pp. 56-67, November
1986 [0420] [32] Hornbeck, L. J., "Active yoke hidden hinge digital
micromirror device", U.S. Pat. No. 5,535,047, issued 9 Jul. 1996
[0421] [33] Humphreys, G. W., and V. Bruce, Visual Cognition,
Lawrence Erlbaum Associates, 1989, p. 15 [0422] [34] Hutchinson, T.
E., C. Lankford and P. Shannon, "Eye Gaze Direction Tracker", U.S.
Pat. No. 6,152,563, issued 28 Nov. 2000 [0423] [35] Isaksen, A., L.
McMillan, and S. J. Gortler, "Dynamically Reparameterized Light
Fields", ACM Computer Graphics Proceedings, Annual Conference
Series, 2000, pp. 297-306 [0424] [36] Levoy, M. and P. Hanrahan,
"Light Field Rendering", ACM Computer Graphics Proceedings, Annual
Conference Series, 1996, pp. 31-42 [0425] [37] Lewis, J. R., H.
Urey and B. G. Murray, "Scanned Imaging Apparatus with Switched
Feeds", U.S. Pat. No. 6,714,331, issued 30 Mar. 2004 [0426] [38]
Lewis, J. R., and N. Nestorovic, "Personal Display with Vision
Tracking", U.S. Pat. No. 6,396,461, issued 28 May 2002 [0427] [39]
Maturi, G. V., V. Bhargava, S. L. Chen, and R.-Y. Wang, "Hybrid
Hierarchial/Full-search MPEG Encoder Motion Estimation", U.S. Pat.
No. 5,731,850, issued 24 Mar. 1998 [0428] [40] Matusik, W., and H.
Pfister, "3D TV: A Scalable System for Real-Time Acquisition,
Transmission, and Autostereoscopic Display of Dynamic Scenes", ACM
Computer Graphics Proceedings, Annual Conference Series, 2004
[0429] [41] McGrath, D. S., "Methods and Apparatus for Processing
Spatialised Audio", U.S. Pat. No. 6,021,206, issued 1 Feb. 2000
[0430] [42] McMillan, L. and G. Bishop, "Plenoptic Modeling: An
Image-Based Rendering System", ACM SIGGRAPH 95, pp. 39-46 [0431]
[43] McQuaide, S. C., E. J. Seibel, R. Burstein and T. A. Furness
III, "50.4: Three-dimensional virtual retinal display system using
a deformable membrane mirror", SID 02 DIGEST [0432] [44] Meisner,
J., W. P. Donnelly, and R. Roosen, "Augmented Reality Technology",
U.S. Pat. No. 6,625,299, issued 23 Sep. 2003 [0433] [45] Melzer, J.
E., and K. Moffitt, Head Mounted Displays: Designing for the User,
McGraw-Hill 1997 [0434] [46] Miller, G., "Volumetric Hyper-Reality,
A Computer Graphics Holy Grail for the 21st Century?", Graphics
Interface '95, pp. 56-64 [0435] [47] Naumov, A. F., and M. Yu.
Loktev, "Liquid-crystal adaptive lenses with modal control", OPTICS
LETTERS, Vol. 23, No. 13, Jul. 1, 1998, pp. 992-994 [0436] [48]
Nayar, S. K., V. Branzoi, and T. E. Boult, "Programmable Imaging
using a Digital Micromirror Array", Proceedings of the IEEE
Computer Society Conference on Computer Vision and Pattern
Recognition, July 2004, pp. 436-443 [0437] [49] Nishino, K., and S.
K. Nayar, "The World in an Eye", Proceedings of IEEE Conference on
Computer Vision and Pattern Recognition, Washington DC, June 2004
[0438] [50] Perlin, K., S. Paxia, and J. S. Kollin, "An
Autostereoscopic Display", ACM Computer Graphics Proceedings,
Annual Conference Series, 2000, pp.319-326 [0439] [51] Silverman,
N. L., B. T. Schowengerdt, J. P. Kelly, and E. J. Seibel, "58.5L:
Late-News Paper:
[0440] Engineering a Retinal Scanning Laser Display with Integrated
Accommodative Depth Cues", SID 03 DIGEST, pp.1538-1541 [0441] [52]
St.-Hilaire, P., M. Lucente, J. D. Sutter, R. Pappu, C. D.
Sparrell, and S. A. Benton, "Scaling up the MIT holographic video
system", Fifth International Symposium on Display Holography,
Proceedings of SPIE Vol. 2333, 1992, pp. 374-380 [0442] [53]
Sverdrup, L. H. Jr., N. F. Dessel, and A. Pelkus, "Thin film
flexible solar cell", U.S. Pat. No. 6,548,751, issued 15 Apr. 2003
[0443] [54] Urey, H., D. W. Wine, and T. D. Osborn, "Optical
performance requirements for MEMS-scanner based microdisplays",
Conference on MOEMS and Miniaturized Systems, SPIE Vol. 4178, pp.
176-185, Santa Clara, Calif. (2000) [0444] [55] Urey, H.,
"Apparatus and Methods for Generating Multiple Exit-Pupil Images in
an Expanded Exit Pupil", US Patent Application 2003/0086173,
published 8 May 2003 [0445] [56] Williams, D. R., and J. Liang,
"Method and apparatus for improving vision and the resolution of
retinal images", U.S. Pat. No. 5,949,521, issued 7 Sep. 1999 [0446]
[57] Williams, L., "Pyramidal Parametrics", Computer Graphics
(Proc. SIGGRAPH 1983) 17(3), July 1983, pp. 1-11 [0447] [58]
Wolberg, G., Digital Image Warping, IEEE Computer Society Press,
1988 [0448] [59] Wolf, P. R., and B. A. Dewitt, Elements of
Photogrammetry, 3rd Edition, McGraw-Hill 2000 [0449] [60] Wolpaw,
J. R., and D. J. McFarland, "Communication method and system using
brain waves for multidimensional control", U.S. Pat. No. 5,638,826,
issued 17 Jun. 1997
* * * * *
References