Слайд 2Lecture #5. Data Formats
3-
Data Formats
Computers
Process and store all forms
of data in binary format
Human communication
Includes language, images and
sounds
Data formats:
Specifications for converting data into computer-usable form
Define the different ways human data may be represented, stored and processed by a computer
Слайд 3Lecture #5. Data Formats
3-
Sources of Data
Binary input
Begins as discrete input
Example: keyboard input such as A 1+2=3 math
Keyboard generates a
binary number code for each key
Analog
Continuous data such as sound or images
Requires hardware to convert data into binary numbers
Figure 3.1 with this color scheme
Слайд 4Lecture #5. Data Formats
3-
Common Data Representations
Слайд 5Lecture #5. Data Formats
3-
Internal Data Representation
Reflects the
Complexity of input
source
Type of processing required
Trade-offs
Accuracy and resolution
Simple photo vs. painting
in an art book
Compactness (storage and transmission)
More data required for improved accuracy and resolution
Compression represents data in a more compact form
Metadata: data that describes or interprets the meaning of data
Ease of manipulation:
Processing simple audio vs. high-fidelity sound
Standardization
Proprietary formats for storing and processing data (WordPerfect vs. Word)
De facto standards: proprietary standards based on general user acceptance (PostScript)
Слайд 6Lecture #5. Data Formats
3-
Data Types: Numeric
Used for mathematical manipulation
Add, subtract,
multiply, divide
Types
Integer (whole number)
Real (contains a decimal point)
Covered in Chapters
4 and 5
Слайд 7Lecture #5. Data Formats
3-
Data Types: Alphanumeric
Alphanumeric:
Characters: b T
Number digits:
7 9
Punctuation marks: ! ;
Special-purpose characters: $ &
Numeric characters
vs. numbers
Both entered as ordinary characters
Computer converts into numbers for calculation
Examples: Variables declared as numbers by the programmer (Salary$ in BASIC)
Treated as characters if processed as text
Examples: Phone numbers, ZIP codes
Слайд 8Lecture #5. Data Formats
3-
Alphanumeric Codes
Arbitrary choice of bits to represent
characters
Consistency: input and output device must recognize same code
Value of
binary number representing character corresponds to placement in the alphabet
Facilitates sorting and searching
Слайд 9Lecture #5. Data Formats
3-
Representing Characters
ASCII - most widely used coding
scheme
EBCDIC: IBM mainframe (legacy)
Unicode: developed for worldwide use
Слайд 10Lecture #5. Data Formats
3-
ASCII
Developed by ANSI (American National Standards Institute)
Represents
Latin alphabet, Arabic numerals, standard punctuation characters
Plus small set
of accents and other European special characters
ASCII
7-bit code: 128 characters
Слайд 11Lecture #5. Data Formats
3-
ASCII Reference Table
7416
111 0100
Слайд 12Lecture #5. Data Formats
3-
EBCDIC
Extended Binary Coded Decimal Interchange Code developed
by IBM
Restricted mainly to IBM or IBM compatible mainframes
Conversion
software to/from ASCII available
Common in archival data
Character codes differ from ASCII
Слайд 13Lecture #5. Data Formats
3-
Unicode
Most common 16-bit form represents 65,536 characters
ASCII
Latin-I subset of Unicode
Values 0 to 255 in Unicode table
Multilingual:
defines codes for
Nearly every character-based alphabet
Large set of ideographs for Chinese, Japanese and Korean
Composite characters for vowels and syllabic clusters required by some languages
Allows software modifications for local-languages
Слайд 14Lecture #5. Data Formats
3-
Collating Sequence
Alphabetic sorting if software handles mixed
upper- and lowercase codes
In ASCII, numbers collate first; in EBCDIC,
last
ASCII collating sequence for string of characters
Слайд 15Lecture #5. Data Formats
3-
2 Classes of Codes
Printing characters
Produced on the
screen or printer
Control characters
Control position of output on screen or
printer
Cause action to occur
Communicate status between computer and I/O device
Слайд 16Lecture #5. Data Formats
3-
Control Code Definitions
Слайд 17Lecture #5. Data Formats
3-
Keyboard Input
Scan code
Two different scan codes on
keyboard
One generated when key is struck and another when key
is released
Converted to Unicode, ASCII or EBCDIC by software in terminal or PC
Advantage
Easily adapted to different languages or keyboard layout
Separate scan codes for key press/release for multiple key combinations
Examples: shift and control keys
Слайд 18Lecture #5. Data Formats
3-
Other Alphanumeric Input
OCR (optical character reader)
Scans text
and inputs it as character data
Used to read specially encoded
characters
Example: magnetically printed check numbers
General use limited by high error rate
Bar Code Readers
Used in applications that require fast, accurate and repetitive input with minimal employee training
Examples: supermarket checkout counters and inventory control
Alphanumeric data in bar code read optically using wand
Magnetic stripe reader: alphanumeric data from credit cards
Voice
Digitized audio recording common but conversion to alphanumeric data difficult
Requires knowledge of sound patterns in a language (phonemes) plus rules for pronunciation, grammar, and syntax
Слайд 19Lecture #5. Data Formats
3-
Image Data
Photographs, figures, icons, drawings, charts and
graphs
Two approaches:
Bitmap or raster images of photos and paintings
with continuous variation
Object or vector images composed of graphical objects like lines and curves defined geometrically
Differences include:
Quality of the image
Storage space required
Time to transmit
Ease of modification
Слайд 20Lecture #5. Data Formats
3-
Bitmap Images
Used for realistic images with continuous
variations in shading, color, shape and texture
Examples:
Scanned photos
Clip
art generated by a paint program
Preferred when image contains large amount of detail and processing requirements are fairly simple
Input devices:
Scanners
Digital cameras and video capture devices
Graphical input devices like mice and pens
Managed by photo editing software or paint software
Editing tools to make tedious bit by bit process easier
Слайд 21Lecture #5. Data Formats
3-
Bitmap Images
Each individual pixel (pi(x)cture element) in
a graphic stored as a binary number
Pixel: A small area
with associated coordinate location
Example: each point below represented by a 4-bit code corresponding to 1 of 16 shades of gray
Слайд 22Lecture #5. Data Formats
3-
Bitmap Display
Monochrome: black or white
1 bit
per pixel
Gray scale: black, white or 254 shades of gray
1
byte per pixel
Color graphics: 16 colors, 256 colors, or 24-bit true color (16.7 million colors)
4, 8, and 24 bits respectively
Слайд 23Lecture #5. Data Formats
3-
Storing Bitmap Images
Frequently large files
Example: 600 rows
of 800 pixels with 1 byte for each of 3
colors ~1.5MB file
File size affected by
Resolution (the number of pixels per inch)
Amount of detail affecting clarity and sharpness of an image
Levels: number of bits for displaying shades of gray or multiple colors
Palette: color translation table that uses a code for each pixel rather than actual color value
Data compression
Слайд 24Lecture #5. Data Formats
3-
GIF (Graphics Interchange Format)
First developed by CompuServe
in 1987
GIF89a enabled animated images
allows images to be displayed sequentially
at fixed time sequences
Color limitation: 256
Image compressed by LZW (Lempel-Zif-Welch) algorithm
Preferred for line drawings, clip art and pictures with large blocks of solid color
Lossless compression
Слайд 25Lecture #5. Data Formats
3-
GIF (Graphics Interchange Format)
Слайд 26Lecture #5. Data Formats
3-
JPEG
(Joint Photographers Expert Group)
Allows more than
16 million colors
Suitable for highly detailed photographs and paintings
Employs lossy
compression algorithm that
Discards data to decreases file size and transmission speed
May reduce image resolution, tends to distort sharp lines
Слайд 27Lecture #5. Data Formats
3-
Other Bitmap Formats
TIFF (Tagged Image File Format):
.tif (pronounced tif)
Used in high-quality image processing, particularly in publishing
BMP
(BitMaPped): .bmp (pronounced dot bmp)
Device-independent format for Microsoft Windows environment: pixel colors stored independent of output device
PCX: .pcx (pronounced dot p c x)
Windows Paintbrush software
PNG: (Portable Network Graphics): .png (pronounced ping)
Designed to replace GIF and JPEG for Internet applications
Patent-free
Improved lossless compression
No animation support
Слайд 28Lecture #5. Data Formats
3-
Object Images
Created by drawing packages or output
from spreadsheet data graphs
Composed of lines and shapes in various
colors
Computer translates geometric formulas to create the graphic
Storage space depends on image complexity
number of instructions to create lines, shapes, fill patterns
Movies Shrek and Toy Story use object images
Слайд 29Lecture #5. Data Formats
3-
Object Images
Based on mathematical formulas
Easy to move,
scale and rotate without losing shape and identity as bitmap
images may
Require less storage space than bitmap images
Cannot represent photos or paintings
Cannot be displayed or printed directly
Must be converted to bitmap since output devices except plotters are bitmap
Слайд 30Lecture #5. Data Formats
3-
Popular Object Graphics Software
Most object image formats
are proprietary
Files extensions include .wmf, .dxf, .mgx, and .cgm
Macromedia Flash:
low-bandwidth animation
Micrographx Designer: technical drawings to illustrate products
CorelDraw: vector illustration, layout, bitmap creation, image-editing, painting and animation software
Autodesk AutoCAD: for architects, engineers, drafters, and design-related professionals
W3C SVG (Scalable Vector Graphics) based on XML Web description language
Not proprietary
Слайд 31Lecture #5. Data Formats
3-
PostScript
Page description language: list of procedures and
statements that describe each of the objects to be printed
on a page
Stored in ASCII or Unicode text file
Interpreter program in computer or output device reads PostScript to generate image
Scalable font support
Font outline objects specified like other objects
Слайд 32Lecture #5. Data Formats
3-
PostScript Program
Слайд 33Lecture #5. Data Formats
3-
Representing Characters
Characters stored in format like Unicode
or ASCII
Text processed and stored primarily for content
Presentation requirements
like font stored with the character
Text appearance is primary factor
Example: screen fonts in Windows
Glyphs: Macintosh coding scheme that includes both identification and presentation requirement for characters
Слайд 34Lecture #5. Data Formats
3-
Bitmap vs. Object Images
Слайд 35Lecture #5. Data Formats
3-
Video Images
Require massive amount of data
Video camera
producing full screen 640 x 480 pixel true color image
at 30 frames/sec 27.65 MB of data/sec
1-minute film clip 1.6 GB storage
Options for reducing file size: decrease size of image, limit number of colors, reduce frame rate
Method depends on how video delivered to users
Streaming video: video displayed as it is downloaded from the Web server
Example: video conferencing
Local data (file on DVD or downloaded onto system) for higher quality
MPEG-2: movie quality images with high compression require substantial processing capability
Слайд 36Lecture #5. Data Formats
3-
Audio Data
Transmission and processing requirements less demanding
than those for video
Waveform audio: digital representation of sound
MIDI (Musical
Instrument Digital Interface): instructions to recreate or synthesize sounds
Analog sound converted to digital values by A-to-D converter
Слайд 37Lecture #5. Data Formats
3-
Waveform Audio
Sampling rate normally 50KHz
Слайд 38Lecture #5. Data Formats
3-
Sampling Rate
Number of times per second that
sound is measured during the recording process.
1000 samples per second
= 1 KHz (kilohertz)
Example: Audio CD sampling rate = 44.1KHz
Height of each sample saved as:
8-bit number for radio-quality recordings
16-bit number for high-fidelity recordings
2 x 16-bits for stereo
Слайд 39Lecture #5. Data Formats
3-
MIDI
Music notation system that allows computers to
communicate with music synthesizers
Instructions that MIDI instruments and MIDI
sound cards use to recreate or synthesize sounds.
Do not store or recreate speaking or singing voices
More compact than waveform
3 minutes = 10 KB
Слайд 40Lecture #5. Data Formats
3-
Audio Formats
MP3
Derivative of MPEG-2 (ISO Moving
Picture Experts Group)
Uses psychoacoustic compression techniques to reduce storage requirements
Discards
sounds outside human hearing range: lossy compression
WAV
Developed by Microsoft as part of its multimedia specification
General-purpose format for storing and reproducing small snippets of sound
Слайд 41Lecture #5. Data Formats
3-
.WAV Sound Format
Слайд 42Lecture #5. Data Formats
3-
Data Compression
Compression: recoding data so that it
requires fewer bytes of storage space.
Compression ratio: the amount file
is shrunk
Lossless: inverse algorithm restores data to exact original form
Examples: GIF, PCX, TIFF
Lossy: trades off data degradation for file size and download speed
Much higher compression ratios, often 10 to 1
Example: JPEG
Common in multimedia
MPEG-2: uses both forms for ratios of 100:1
Слайд 43Lecture #5. Data Formats
3-
Compression Algorithms
Repetition
0 5 8 7 0 0
0 0 3 4 0 0 0
0 1 5 8 7 0 4 3 4 0 3
Example: large blocks of the same color
Pattern Substitution
Scans data for patterns
Substitutes new pattern,
makes dictionary entry
Example: 45 to 30 bytes
plus dictionary
Peter Piper picked a peck of pickled peppers.
t p a of l pp s.
Слайд 44Lecture #5. Data Formats
3-
Internal Computer Data Format
All data stored as
binary numbers
Interpreted based on
Operations computer can perform
Data types supported by
programming language used to create application
Слайд 45Lecture #5. Data Formats
3-
5 Simple Data Types
Boolean: 2-valued variables or
constants with values of true or false
Char: Variable or constant
that holds alphanumeric character
Enumerated
User-defined data types with possible values listed in definition
Type DayOfWeek = Mon, Tues, Wed, Thurs, Fri, Sat, Sun
Integer: positive or negative whole numbers
Real
Numbers with a decimal point
Numbers whose magnitude, large or small, exceeds computer’s capability to store as an integer