Schubin’s Greatest Hits

Where Did We Come From?

Originally published in Videography February 2002

There are 36 video formats in the digital-television standard.  It may be comforting to know they were based on happenstance, spirituality, accident, luck, and error.

Some people draw guidance from the apparent positions of the stars and planets when they were born. Others study patterns in tea leaves or on palms. Fortunately, the foundation of the technology of videography comes from rigorous scientific research — or does it?

One of the hottest new electronic-imaging technologies is called 24p. The p stands for progressive scanning, in which each scanning line follows the previous one in numerical order, as opposed to ordinary television’s interlace (i), in which all of the odd-numbered scanning lines are captured first, followed by all the even-numbered.

The 24 stands for 24 frames per second, the rate at which almost all movies have been projected since the advent of the sound track. Before that, movies were shot at 16-18 frames per second (fps).

When we watch old silent movies today, we notice that everyone seems to move too fast. That’s because the movies were shot at a frame rate slower than the one at which we watch them.

Everyone knows that in the old days movies were projected at the slower frame rate, and the motion looked normal. Everyone is wrong.

In theory, the 16-18 fps rate of silent movies was too low for a high-fidelity sound track, so the rate was increased to 24 fps. In fact, movies were projected faster than 24 fps even in the silent era. But they were shot at 16-20 fps. Moviegoers in the old days probably saw about the same faster-than-normal motion as we see when we watch the same movies today.

What was of critical importance for the sound track was not that the film move at 90 feet per minute (24 fps in 35-mm film). That’s 18 inches per second — faster than most professional audio tape recorders. What was critical was that the speed remain constant. That demanded a motor (instead of the hand cranks that had been used previously) and a standard speed.

No critical analysis of audio frequency response or wow and flutter was performed. No careful psychophysical test data were analyzed. Stanley Watkins, of the Western Electric company, just surveyed some theaters. The larger ones were projecting films at between 20 and 24 fps; the smaller ones used even higher speeds (faster projection meant more shows per day, which meant more tickets sold). Watkins picked 24 — 90 feet of 35-mm film per minute — as a convenient figure; that’s all.

As sound was being introduced to movies, television was being developed. The first motor to move film at 90 feet per minute was introduced in 1925, the same year that John Logie Baird shot the first recognizable human face on video.

So, it was logical for engineers developing television to try to match its frame rate to the new standard for moving images. After all, even then it was recognized that movies would eventually be shown on television. By 1929, a group at the Radio Corporation of America (RCA) had, in fact, selected 24 fps as the frame rate for U.S. television. Unfortunately, there was a problem. Television sets had to be plugged in.

The power supply of the television set would generate an electromagnetic field alternating with the power-line frequency. If the television scanning rate didn’t also match the power-line frequency, there could be significant picture interference unless there was shielding (which would make TV sets more expensive).
When alternating-current (AC) electrical power was first being developed, frequencies ranging from 8-1/3 to 133-1/3 cycles per second (Hz) were proposed. There were pros and cons to each. Westinghouse, which had bought the patent rights to Nikola Tesla’s induction motor, had been experimenting with the highest frequency of that range, but Tesla was adamant about it being 60 Hz. Westinghouse gave in.

Tesla was a genius, but he was also strange. He had come to the conclusion than 60 Hz was the correct frequency of the spiritual Om of the universe. So, in order to keep TV sets inexpensive, RCA picked 60 images per second and came up with a means of converting 24-fps film to 60-image television (what we now call 3-2 pulldown).

When the first National Television System Committee (NTSC) met in 1940 to come up with the U.S. television standard, they did a great deal of research into such issues as the rate at which individual frames are perceived to be moving images (the fusion frequency) and the rate at which the flicker of changing images disappears (the flicker frequency). When a subcommittee voted on image rate, it was 15 to one in favor of 60 (with five absent).

Sixty frames per second would have cut down tremendously on the fineness of the detail that could be transmitted (resolution). So the committee agreed on the interlaced pictures described earlier. There would be 60 fields transmitted per second, each field consisting of either all of the odd-numbered scanning lines or all of the even, but not both. Two fields would comprise a frame, and 30 fps would be transmitted, not 60. Multi-blade shutters in film projectors typically allow each frame to appear on screen twice, similarly raising film’s image rate from 24 to 48 to reduce flicker visibility.

RCA is where 3-2 pulldown came from, and RCA is possibly where 60-image-per-second video came from. So it wouldn’t be surprising if RCA were also where interlaced television pictures came from. But it’s not.
Samuel Lavington Hart received British patent 15,720 on June 25, 1915. He’d applied for it exactly one year earlier. It’s called “Transmitting pictures of moving objects and the like to a distance electronically,” and it contains a description of interlaced video. RCA didn’t exist until 1919.

That takes care of 24-, 30-, and 60-fps. The famous (or infamous) Table 3 of Annex A of A/53, the digital television standard of the Advanced Television Systems Committee (ATSC), also lists three other frame rates, 23.976, 29.97, and 59.94. These so-called non-integer frame rates are based on multiplying the others by a factor of 1000/1001.

The non-integer frame rates were born on June 17, 1953. That’s when a single piece of paper, page 10 of NTSC-P12-241, written by I. C. Abrahams of General Electric, was inserted into the second NTSC’s color standard. It called for a frame rate of 29.97 instead of 30. Why?

According to another Abrahams NTSC document, “In some monochrome receivers now [1952] in use, there may be insufficient attenuation of the sound carrier to prevent an objectionable 0.9 megacycle signal resulting from the beat between the sound carrier and the chrominance carrier” (emphasis added). In other words, the addition of color had the potential to make a few existing TV sets look bad. Introducing the 1000/1001 factor reduced that problem.

On the other hand, the change of frame rate had the potential to make many more TVs look bad because of interference between the 60 Hz power line and the 59.94 image frequency (the familiar rolling hum bars — horizontal bands of varying brightness — seen when there’s a grounding problem). TV sets suddenly required the shielding that the 30-fps rate was supposed to eliminate in the first place. And videography became saddled with drop-frame time code, awkward audio conversion problems, and other delights of non-integer frame rates.

Table 3 isn’t restricted to frame rates. It also lists image aspect ratios (width-to-height) and resolutions. There are six frame rates (not even counting interlace and progressive differences) but only two aspect ratios. One is 4:3 (1.33…:1).

As legend would have it, George Eastman (who created Kodak) and Thomas Edison were having a talk one day about Edison’s proposed motion-picture system. Eastman asked Edison how wide the film should be, and Edison held up his thumb and forefinger and said, “About this wide.” Eastman whipped out a ruler and measured the gap, and 35-mm film was born.

It’s a lovely story. Too bad it’s not true.

Eastman was already manufacturing roll film 2.75 inches wide. Edison’s assistant, William Dickson, asked for it to be slit in two lengthwise. That came to a 1.375-inch film width — just about 35 mm.

The film was perforated on both sides, leaving about an inch in between. If the film were to be run horizontally through the camera (as some films are today), the height of each image could be one inch; if vertically, the width would be one inch. Edison and Dickson eventually chose vertical film motion.
What about the height of the image? In Edison’s 1891 patent application, he mentioned “the taking of pictures one inch in diameter….” In a 1933 reminiscence, however, Dickson wrote that they had already settled on images one inch wide by three-quarters of an inch high by 1889.

Why a 4:3 aspect ratio? It may have been purely accidental. Some Edison Kinetoscope motion pictures were actually vertically oriented, with an aspect ratio less than 1:1.

Dickson’s reminiscence is that he worked with image sizes in quarter-inch increments. But Edison’s early filings with the Patent Office specified images of 1/32 of an inch and 1/8 of an inch, the pitch between sprocket holes was (and is) 3/16 of an inch, and the film width was 1-3/8 inches.

One theory is that Dickson was trying to get as close as possible to the “Golden Rectangle” of about 1.618:1, a shape supposedly used in art throughout history (unless one really performs measurements). But he could have come closer (and saved film) by having three perforations (on each side) per frame instead of four. Another theory is that he’d copied earlier motion-picture devices (the 1868 patent for the animated flip-book shows 4:3 drawings).

The records of the first NTSC note that the 4:3 aspect ratio was known to the ancient Greeks, which is probable. The right angles of the Egyptian pyramids were kept in shape with two sides and the diagonal of a 4:3 rectangle, something simple to create with a rope divided into 12 equal segments (forming the common 3-4-5 right triangle). Maybe Dickson did something similar.

Only the horizontal nature of the 4:3 aspect ratio is specifically related to moving images. Due to gravity, people tend to move from side to side instead of up and down. To show that movement, motion picture aspect ratios favor the horizontal.

The origin of 4:3 is probably less significant than its success as a global shape. For that, the Lumiere brothers in France may have been more significant than Dickson and Edison. The Lumieres changed from an initial 5:4 aspect ratio to 4:3 to ensure worldwide compatibility.

The first NTSC adopted film’s 4:3 aspect ratio. Unfortunately, by that point, film had switched to 11:8 (1.375:1), a fact known to (but ignored by) the NTSC. Dickson’s 1.00 x 0.75 inch frame became 0.905 x 0.6795 in the first standards of the Society of Motion Picture Engineers (SMPE) in 1917. When sound tracks started impinging on the frame, however, the aspect ratio shrank to 6:5 or even less. To preserve a wider aspect ratio, the frame shrank even more. The 1932 Academy camera aperture was just 0.868 x 0.631. And the difference between 11:8 and 4:3?

It was subsumed within the slop allowed for the way different TV sets are set up. SMPE added television to become SMPTE, and the SMPTE safe-action area (an image area likely to appear on most TV sets) was just 0.713 x 0.535. The aspect-ratio difference is insignificant in comparison.

By 1913, however, movie theater exhibitors were already being urged to mask off the tops and bottoms of frames to create “a better shaped picture — more artistic,” (according to The Kinematograph and Lantern Weekly). Throughout film history, there were attempts to create wider aspect ratios.

When television appeared to be poised to hurt movie box-office revenues in the early 1950s, the film industry tried all sorts of things to lure viewers back into theaters for a better experience than they could get at home. Stereoscopic 3-D, seats wired for “shocks,” and wafted aromas were all introduced. But the only enhancement that took off was widescreen.

Unfortunately, a significant portion of film revenues came from television. RCA had come up with a way of showing 24-fps film on 30-fps video. But, when RCA’s NBC television network broadcast the 1953 CinemaScope movie How to Marry a Millionaire in 1961, it was apparent that no one knew a good way to squeeze a 2.55:1 movie aspect ratio into a 4:3 television screen.

CinemaScope eventually shrank to 2.35:1 to accommodate stereo sound tracks. If you wanted to design an electronic-cinematography frame shape that would have the minimum wasted area between 2.35:1 and 4:3, the result would be approximately 16:9 (1.78:1), but only for those particular values. It’s a mathematically derived aspect ratio based on happenstance.

Of course, there is a perfectly good way to make even a 2.55:1 aspect ratio fit in 4:3. It just needs to be shrunk so that the sides of the wider film frame touch the sides of the 4:3 video frame.

For the case of a 2.55:1 picture shrunk to fit a TV set, the image would be only about half as high as the TV screen. That would be fine, unfortunately, only if the screen were big enough and detailed enough to show what viewers wanted to see.

Resolution is the third category of Table-3 figures. In the vertical direction, there are 480, 720, and 1080; in the horizontal, it’s 640, 704, 1280, and 1920.

Unlike frame rate and aspect ratio, resolution does not have origins in film. Film has no fixed resolution. For any given emulsion, a bigger film frame offers more detail than a smaller one. But a 35-mm frame in 2002 offers a lot more detail than did a 35-mm frame in 1902.

The first scanning systems introduced the concept of image definition, and they were introduced for fax transmission, not television. John Logie Baird’s first working video system in 1925 had only eight scanning lines. He went from that to 30, 60, 120, and 240. In 1935, a British parliamentary committee declared 240 scanning lines to be the minimum acceptable for “high definition television” (HDTV).

Designers of television systems juggled many characteristics to try to come up with optimum resolution, which they perceived to be roughly equivalent detail in both the vertical and horizontal directions. Consider our NTSC system, with 525 total scanning lines and 30 frames per second.

There are 15,750 scanning lines per second, which means each line lasts about 63.5 millionths of a second (63.5 us). After the electron beam traces a scanning line from left to right, it needs time (the horizontal blanking interval) to get back to the left. That’s nominally 11 us, so the active (image-carrying) line time is 52.5 us.

A single cycle of a sine wave has a peak and a trough. If the amplitude is adjusted properly, in video the peak can be white and the trough black.

It takes one second for a one-cycle-per-second (1 Hz) signal to complete one cycle. It takes one millionth of a second for a one-million-cycle-per-second (1 MHz) signal to complete one cycle. So, in 52.5 us, there can be 52.5 cycles of a 1 MHz signal.

In U.S. broadcast television, the video signal may range up to 4.2 MHz. That’s 220.5 cycles. But each cycle has a white part and a black part. If the waves line up on succeeding scanning lines, then alternating vertical stripes called TV lines of resolution are formed.

By convention, a white-black pair is called two TV lines. So we have 441 TV lines across the screen. To match vertical and horizontal resolution, however, the figure must be divided by the aspect ratio, yielding about 330 TV lines per picture height. And what’s happening vertically?

There are 525 total lines, but, just as the electron beam needs time to return from right to left, it also needs time (41.5 lines worth) to get from the bottom to the top. That leaves 483.5 active scanning lines.

Ray Kell, a researcher at RCA, found that there was not a direct correlation between the number of scanning lines and perceived resolution. It had to be reduced by some factor. If that factor happened to be about 0.68, then vertical resolution would perfectly match horizontal in NTSC video.

What is the value of the factor? No one knows for sure, and it probably changes a lot. Kell, himself, got confused as to whether it had been 0.64 or 0.8 in one of his experiments.

Only one of the Table 3 figures comes close to any of the NTSC resolutions; 480 is a rounded-off version of 483.5. And 640 is 480 times the 4:3 aspect ratio.

The others go back to Tesla’s 60 Hz Om. Westinghouse humored him for U.S. power systems but wasn’t committed to the figure. So, they were willing to license others to use a different rate. In 1891, the same year that Westinghouse installed its first 60 Hz system, the German company AEG installed a 50 Hz system in Berlin. Global incompatibility was born.

In the course of trying to come up with a global standard for digital television (Recommendation 601), engineers tried to find a common factor between the second NTSC’s 59.94 image rate and Europe’s 50. The result was a common digital sampling rate of 13.5 MHz and a common agreed-upon number of active samples per scanning line of 720.

HDTV was supposed to have twice as much detail as ordinary TV. Twice 720 is 1440, but the 16:9 aspect ratio is 4/3 wider than 4:3, so, to maintain equivalent horizontal resolution, 1920 was chosen. And 1920 divided by 16:9 is 1080.

The 1280 x 720 format is based on the theory that a Kell-like factor of about 0.67 applies to interlace. Get rid of it, and you can supposedly achieve vertical resolution equivalent to 1080 lines with just 720.
The strangest figure in Table 3 is 704. Rec. 601 calls for 720 active samples per line. Digital television uses MPEG (Moving Picture Experts Group) data compression, which normally requires perfect divisibility by 16; 720 is divisible by 16.

It was hoped, however, that the U.S. digital-television standard would be universally used — not just for broadcast but also for cable TV. Some cable operators might want to halve horizontal resolution to squeeze more channels in.

Half of 720 is 360, not all that far from broadcast television’s limit of 440. But 360 is not evenly divisible by 16. Either 352 or 368 are. In an effort to simplify things for cable, 704 (twice 352) was selected for standard-definition television.

Unfortunately, the cable-television industry decided to go its own way. There is no 352 in ATSC A/53, but 704 remains.

So 704 could be said to be in the standard by mistake; 480 is there by rounding; 24 is there by luck; 4:3 is there by error (because film had changed to 11:8); and 16:9 compounds that error. All of the other figures are based, at least in part, on Tesla’s 60 Hz power.

Tesla, genius though he was, was also strange. He couldn’t stand to touch hair. He insisted on 18 napkins at every meal. He had a closer relationship with pigeons than with people. But he did give us 60 Hz power.
Perhaps his life may be summed up as Om and deranged.