Unit 2.2 Data Compression, Images
Lab will perform alterations on images, manipulate RGB values, and reduce the number of pixels. College Board requires you to learn about Lossy and Lossless compression.
- Enumerate "Data" Big Idea from College Board
- Image Files and Size X
- Python Libraries and Concepts used for Jupyter and Files/Directories
- Reading and Encoding Images (2 implementations follow)
- Data Structures, Imperative Programming Style, and working with Images
- Data Structures and OOP
- Additionally, review all the imports in these three demos. Create a definition of their purpose, specifically these ...
- Hacks
- College Board Hacks
- Programming Hacks
Enumerate "Data" Big Idea from College Board
Some of the big ideas and vocab that you observe, talk about it with a partner ...
- "Data compression is the reduction of the number of bits needed to represent data"
- "Data compression is used to save transmission time and storage space."
- "lossy data can reduce data but the original data is not recovered"
- "lossless data lets you restore and recover"
The Image Lab Project contains a plethora of College Board Unit 2 data concepts. Working with Images provides many opportunities for compression and analyzing size.
Image Files and Size X
Here are some Images Files. Download these files, load them into
images
directory under _notebooks in your Blog. - Clouds Impression
Describe some of the meta data and considerations when managing Image files. Describe how these relate to Data Compression ...
- File Type, PNG and JPG are two types used in this lab
PNG is more often used for a lossless compression, as it doesn't remove additional pixels or imformation from the image. While this makes the quality of the image better, it doesn't reduce as much space.
JPG is lossy, meaning it can reduce the quality of the image, as it removed extra information.
- Size, height and width, number of pixels
All of these are factors which may determine how much space the image takes up, and how much it must be compressed to complete a certain function. These may be factored in when determining which compression method is more appropriate for the situation.
- Visual perception, lossy compression
Lossy and lead to a change in visual perception, sometimes even to the point of distortion.
Python Libraries and Concepts used for Jupyter and Files/Directories
Introduction to displaying images in Jupyter notebook
IPython
Support visualization of data in Jupyter notebooks. Visualization is specific to View, for the web visualization needs to be converted to HTML.
pathlib
File paths are different on Windows versus Mac and Linux. This can cause problems in a project as you work and deploy on different Operating Systems (OS's), pathlib is a solution to this problem.
- What are commands you use in terminal to access files?
cd:change directoryYou can also use ls to list the files existing in a certain directory.
- What are the command you use in Windows terminal to access files?
While I don't have a Windows machine and do not use WSL, I learned from my Scrum mate that WSL also uses cd.
- What are some of the major differences?
https://www.geeksforgeeks.org/linux-vs-windows-commands/ One major difference is that in linux a directory listing is ls -l, while in windows it is dir. Additionally, linuz using rm to delete a file while WSL uses del. This is an important detail as deleteing a file or moving a file could be important to the functioning of a program.
-
Provide what you observed, struggled with, or leaned while playing with this code.
-
Why is path a big deal when working with images?
Without specifying the path, the program will say there is not such file/image. I ran into this error when modifying images below.
- How does the meta data source and label relate to Unit 5 topics?
Meta data might make data sets more accesible to those who do not have background knowledge in reading code or even reading statistics.
- Look up IPython, describe why this is interesting in Jupyter Notebooks for both Pandas and Images?
IPython is a programming shell that allows for the use of multiple different programming languages. This might be useful in Pandas as we use python to interpret a data set in csv or json, and display it in a readable format. For images, we change the image into an array, manipulate it, and then reformat it into an image. The transition might be made possible due to IPython.
from IPython.display import Image, display
from pathlib import Path # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
# prepares a series of images
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
if images is None: # default image
images = [
{'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
{'source': "Peter Carolin",
'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"},
]
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
def image_display(images):
for image in images:
display(Image(filename=image['filename']))
# Run this as standalone tester to see sample data printed in Jupyter terminal
if __name__ == "__main__":
# print parameter supplied image
green_square = image_data(images=[{'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"}])
image_display(green_square)
# display default images from image_data()
default_images = image_data()
image_display(default_images)
Reading and Encoding Images (2 implementations follow)
PIL (Python Image Library)
Pillow or PIL provides the ability to work with images in Python. Geeks for Geeks shows some ideas on working with images.
base64
Image formats (JPG, PNG) are often called *Binary File formats, it is difficult to pass these over HTTP. Thus, base64 converts binary encoded data (8-bit, ASCII/Unicode) into a text encoded scheme (24 bits, 6-bit Base64 digits). Thus base64 is used to transport and embed binary images into textual assets such as HTML and CSS.- How is Base64 similar or different to Binary and Hexadecimal?
Similar: All are used to represent either numbers or letters, and include a series of numbers in specific orders to do so.
Different: Binary using base 2, Hexidecimal used base 16, while base64 uses 6 characters grouped into 24-bit sequences. Base64 is used to convert binary to readable text.
- Translate first 3 letters of your name to Base64.
S- 010010 A- 000000 N- 001101
numpy
Numpy is described as "The fundamental package for scientific computing with Python". In the Image Lab, a Numpy array is created from the image data in order to simplify access and change to the RGB values of the pixels, converting pixels to grey scale.
io, BytesIO
Input and Output (I/O) is a fundamental of all Computer Programming. Input/output (I/O) buffering is a technique used to optimize I/O operations. In large quantities of data, how many frames of input the server currently has queued is the buffer. In this example, there is a very large picture that lags.
- Where have you been a consumer of buffering?
On online shopping websites, and sometimes canvas quizzes.
- From your consumer experience, what effects have you experienced from buffering?
I have experiences frustration, and sometimes have had to reload the page or reset the wifi.
- How do these effects apply to images?
Sometimes if an image doesn't load, I cannot use a function neccesary for the website. This is why data compression is important (so that user can access what they need to use the website).
Data Structures, Imperative Programming Style, and working with Images
Introduction to creating meta data and manipulating images. Look at each procedure and explain the the purpose and results of this program. Add any insights or challenges as you explored this program.
- Does this code seem like a series of steps are being performed?
Yes, I can pick our several steps and what their function/contribution to the program is.
- Describe Grey Scale algorithm in English or Pseudo code?
The image data is first collected and set to a path so that they may be accessed. The images are then set to their proper size and converted to base64. From this I think the program loops through the pixels in the image, pickes every three pixels, averages their RGB values (which creates a grey color), then coverts the pixels into that color.
- Describe scale image? What is before and after on pixels in three images?
I think the scale image downsizes the amount of pixels in the image, as the after is much less pixels than the original image not in gray scale. However, this only happened for the first and last image. Image 1:Original size: (16, 16)Scaled size: (320, 320) Image 2: Original size: (320, 234) Scaled size: (320, 234) Image 3: Original size: (2792, 2094) Scaled size: (320, 240)
- Is scale image a type of compression? If so, line it up with College Board terms described?
Yes, I believe it can be used as a form of compression. I think it does line up with what College Board described, because it either reduces the amount of detail/information and becomes lossy compression, or increasing the amount of pixels without additional information (not compression but might still distort like lossy).
from IPython.display import HTML, display
from pathlib import Path # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
from PIL import Image as pilImage # as pilImage is used to avoid conflicts
from io import BytesIO
import base64
import numpy as np
# prepares a series of images
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
if images is None: # default image
images = [
{'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"},
{'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
{'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"}
]
for image in images:
# File to open
image['filename'] = path / image['file'] # file with path
return images
# Large image scaled to baseWidth of 320
def scale_image(img):
baseWidth = 320
scalePercent = (baseWidth/float(img.size[0]))
scaleHeight = int((float(img.size[1])*float(scalePercent)))
scale = (baseWidth, scaleHeight)
return img.resize(scale)
# PIL image converted to base64
def image_to_base64(img, format):
with BytesIO() as buffer:
img.save(buffer, format)
return base64.b64encode(buffer.getvalue()).decode()
# Set Properties of Image, Scale, and convert to Base64
def image_management(image): # path of static images is defaulted
# Image open return PIL image object
img = pilImage.open(image['filename'])
# Python Image Library operations
image['format'] = img.format
image['mode'] = img.mode
image['size'] = img.size
# Scale the Image
img = scale_image(img)
image['pil'] = img
image['scaled_size'] = img.size
# Scaled HTML
image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])
# Create Grey Scale Base64 representation of Image
def image_management_add_html_grey(image):
# Image open return PIL image object
img = image['pil']
format = image['format']
img_data = img.getdata() # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
image['data'] = np.array(img_data) # PIL image to numpy array
image['gray_data'] = [] # key/value for data converted to gray scale
# 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
for pixel in image['data']:
# create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
average = (pixel[0] + pixel[1] + pixel[2]) // 3 # average pixel values and use // for integer division
if len(pixel) > 3:
image['gray_data'].append((average, average, average, pixel[3])) # PNG format
else:
image['gray_data'].append((average, average, average))
# end for loop for pixels
img.putdata(image['gray_data'])
image['html_grey'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)
# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
# Use numpy to concatenate two arrays
images = image_data()
# Display meta data, scaled view, and grey scale for each image
for image in images:
image_management(image)
print("---- meta data -----")
print(image['label'])
print(image['source'])
print(image['format'])
print(image['mode'])
print("Original size: ", image['size'])
print("Scaled size: ", image['scaled_size'])
print("-- original image --")
display(HTML(image['html']))
print("--- grey image ----")
image_management_add_html_grey(image)
display(HTML(image['html_grey']))
print()
Data Structures and OOP
Most data structures classes require Object Oriented Programming (OOP). Since this class is lined up with a College Course, OOP will be talked about often. Functionality in remainder of this Blog is the same as the prior implementation. Highlight some of the key difference you see between imperative and oop styles.
- Read imperative and object-oriented programming on Wikipedia
In imperative, functions are coded to solve each step/aspect of the problem rather than in OOP which focuses on the object (not functions/logic) to solve the problem.
- Consider how data is organized in two examples, in relations to procedures
While the imperative example defines functions and has more clear iteration, more like a math equation. In contrast, OOP uses the characteristics of the object and changes them accordingly. While I don't fully understand the concept yet, I am more easily able to recognise an OOP as opposed to an imperative program.
- Look at Parameters in Imperative and Self in OOP
Both define a variable or object, that will later be manipulated.
Additionally, review all the imports in these three demos. Create a definition of their purpose, specifically these ...
- PIL
Python Imaging Library, this is an image processing package that allows for the modification of images. From adding text to changing the colors, it gives you control over the visuals involved in a program.
- numpy
A python library used when working with arrays. In context of images, an image can be changes to an array, modified, and then changed back.
- base64
Binary to text encoding which uses 6 text bits and 24 sequences of bits.
from IPython.display import HTML, display
from pathlib import Path # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
from PIL import Image as pilImage # as pilImage is used to avoid conflicts
from io import BytesIO
import base64
import numpy as np
class Image_Data:
def __init__(self, source, label, file, path, baseWidth=320):
self._source = source # variables with self prefix become part of the object,
self._label = label
self._file = file
self._filename = path / file # file with path
self._baseWidth = baseWidth
# Open image and scale to needs
self._img = pilImage.open(self._filename)
self._format = self._img.format
self._mode = self._img.mode
self._originalSize = self.img.size
self.scale_image()
self._html = self.image_to_html(self._img)
self._html_grey = self.image_to_html_grey()
@property
def source(self):
return self._source
@property
def label(self):
return self._label
@property
def file(self):
return self._file
@property
def filename(self):
return self._filename
@property
def img(self):
return self._img
@property
def format(self):
return self._format
@property
def mode(self):
return self._mode
@property
def originalSize(self):
return self._originalSize
@property
def size(self):
return self._img.size
@property
def html(self):
return self._html
@property
def html_grey(self):
return self._html_grey
# Large image scaled to baseWidth of 320
def scale_image(self):
scalePercent = (self._baseWidth/float(self._img.size[0]))
scaleHeight = int((float(self._img.size[1])*float(scalePercent)))
scale = (self._baseWidth, scaleHeight)
self._img = self._img.resize(scale)
# PIL image converted to base64
def image_to_html(self, img):
with BytesIO() as buffer:
img.save(buffer, self._format)
return '<img src="data:image/png;base64,%s">' % base64.b64encode(buffer.getvalue()).decode()
# Create Grey Scale Base64 representation of Image
def image_to_html_grey(self):
img_grey = self._img
numpy = np.array(self._img.getdata()) # PIL image to numpy array
grey_data = [] # key/value for data converted to gray scale
# 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
for pixel in numpy:
# create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
average = (pixel[0] + pixel[1] + pixel[2]) // 3 # average pixel values and use // for integer division
if len(pixel) > 3:
grey_data.append((average, average, average, pixel[3])) # PNG format
else:
grey_data.append((average, average, average))
# end for loop for pixels
img_grey.putdata(grey_data)
return self.image_to_html(img_grey)
# prepares a series of images, provides expectation for required contents
def image_data(path=Path("images/"), images=None): # path of static images is defaulted
if images is None: # default image
images = [
{'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"},
{'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
{'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"},
]
return path, images
# turns data into objects
def image_objects():
id_Objects = []
path, images = image_data()
for image in images:
id_Objects.append(Image_Data(source=image['source'],
label=image['label'],
file=image['file'],
path=path,
))
return id_Objects
# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
for ido in image_objects(): # ido is an Imaged Data Object
print("---- meta data -----")
print(ido.label)
print(ido.source)
print(ido.file)
print(ido.format)
print(ido.mode)
print("Original size: ", ido.originalSize)
print("Scaled size: ", ido.size)
print("-- scaled image --")
display(HTML(ido.html))
print("--- grey image ---")
display(HTML(ido.html_grey))
print()
Hacks
Early Seed award
- Add this Blog to you own Blogging site.
- In the Blog add a Happy Face image.
- Have Happy Face Image open when Tech Talk starts, running on localhost. Don't tell anyone. Show to Teacher.
AP Prep
- In the Blog add notes and observations on each code cell that request an answer.
- In blog add College Board practice problems for 2.3
- Choose 2 images, one that will more likely result in lossy data compression and one that is more likely to result in lossless data compression. Explain.
Project Addition
- If your project has images in it, try to implement an image change that has a purpose. (Ex. An item that has been sold out could become gray scale)
Pick a programming paradigm and solve some of the following ...
- Numpy, manipulating pixels. As opposed to Grey Scale treatment, pick a couple of other types like red scale, green scale, or blue scale. We want you to be manipulating pixels in the image.
- Binary and Hexadecimal reports. Convert and produce pixels in binary and Hexadecimal and display.
- Compression and Sizing of images. Look for insights into compression Lossy and Lossless. Look at PIL library and see if there are other things that can be done.
- There are many effects you can do as well with PIL. Blur the image or write Meta Data on screen, aka Title, Author and Image size.
College Board Hacks
Data Compression
- Which of the following is an advantage of a lossless compression algorithm over a lossy compression algorithm?
A lossless compression algorithm can guarantee reconstruction of original data, while a lossy compression algorithm cannot.
- A user wants to save a data file on an online storage site. The user wants to reduce the size of the file, if possible, and wants to be able to completely restore the file to its original version. Which of the following actions best supports the user’s needs?
Compressing the file using a lossless compression algorithm before uploading it.
- A programmer is developing software for a social media platform. The programmer is planning to use compression when users send attachments to other users. Which of the following is a true statement about the use of compression?
Lossy compression of an image file generally provides a greater reduction in transmission time than lossless compression does.
Using Programs with Data Quiz
- Which of the following expressions will evaluate to true if the book should be counted and evaluates to false otherwise?
(genre = "mystery") AND ((1 ≤ num) AND (cost < 10.00))
- Using only the data collected during the 7-day period, which of the following statements is true?
The total number of items purchased on a given date can be determined by searching the data for all transactions that occurred on the given date and then adding the number of items purchased for each matching transaction.
- Which of the following best explains how the data files in the table can be used to send a targeted e-mail to only those customers who have purchased products that use AA batteries to let them know about the new accessory?
Use the products file to generate a list of product IDs that use AA batteries, then use the list of product IDs to search the purchases file to generate a list of customer IDs, then use the list of customer IDs to search the customers file to generate a list of e-mail addresses
- Assume that applying either of the filters will not change the relative order of the rows remaining in the spreadsheet. Which of the following sequences of steps can be used to identify the desired entry?
Filter by photographer, then filter by year, then sort by year
Sort by subject, then sort by year, then filter by photographer
CORRECTION:
Filter by photographer, then filter by year, then sort by year
Sort by year, then filter by year, then filter by photographer
- Which of the following expressions will evaluate to true if the show should be counted and evaluates to false otherwise?
(genre = "talk") AND ((day = "Saturday") OR (day = "Sunday"))
- Which of the following explains how the two databases can be used to develop the interactive exhibit?
Both databases are needed. Each database can be searched by animal name to find all information to be displayed.
Lossy
JPG like negatives.jpg I use below.
Lossless
PNG like the skin1.png I use below.
from PIL import Image, ImageFont, ImageDraw
# Allows image to be displayed in Jupyter notebook, only run in local
from IPython import display
# Open an Image from images directory
img = Image.open('images/skin1.png')
# Call draw Method to add 2D graphics in an image
draw = ImageDraw.Draw(img)
# Downloaded font, opened zip, dragged to fonts directory
myfont = ImageFont.truetype('fonts/Aloevera.ttf', 68)
# Add Text to an image, adjust coordinates/text/font/color
draw.text((120, 150), "We recommend...", font=myfont, fill=(0, 0, 0))
# Display edited image
img.show()
# Save image in images directory
img.save('images/skin2.png')
# Displays image in notebook
display.Image('images/skin2.png')
from PIL import Image
# numpy for performing batch processing and elementwise
# matrix operations efficiently
import numpy as np
# Allows image to be displayed in Jupyter notebook, only run in local
from IPython import display
display.Image("images/negatives.jpg")
from PIL import Image
# numpy for performing batch processing and elementwise
# matrix operations efficiently
import numpy as np
# Allows image to be displayed in Jupyter notebook, only run in local
from IPython import display
# Opening an image, and saving open image object
img = Image.open(r"images/negatives.jpg")
# Creating an numpy array out of the image object
img_arry = np.array(img)
# Maximum intensity value of the color mode
I_max = 255
# Subtracting 255 (max value possible in a given image
# channel) from each pixel values and storing the result
img_arry = I_max - img_arry
# Creating an image object from the resultant numpy array
inverted_img = Image.fromarray(img_arry)
# Saving the image under the name Image_negative.jpg
inverted_img.save(r"images/negatives1.jpg")
# Displays image in notebook
print("after image:")
display.Image("images/negatives1.jpg")