2.2 Data Compression Notes

Image Files and Size

  • We shrink files in order to save speed and storage.
    • Reduction of bits saves storage and saves transmision time
  • Lossy data reduces data but the original is not recovered
  • Lossless data lets you restore and recover
  • Often when we work with files we use c: and cd to take us towards a certain directory

Questions

  • Describe some of the meta data and considerations when managing Image files. Describe how these relate to Data Compression …
    • File Type, PNG and JPG are two types used in this lab
      • It is important to know the file type of the image file, such as PNG or JPG, as different file types have different compression techniques and levels of compression
    • Size, height and width, number of pixels
      • Size, height and width, number of pixels: The size of an image file is measured in terms of the number of pixels it contains, which is determined by its height and width.
      • The more pixels an image has, the larger its file size will be. It is important to consider the size of the image file as it can affect the loading time of the image.
    • Visual perception, lossy compression
      • When compressing image files, it is important to consider how the compression will affect the visual quality of the image. Lossy compression techniques, which reduce file size by discarding some of the image data, can result in a loss of visual quality that may be noticeable to the human eye.
      • Data compression is the process of reducing the size of an image file by removing or compressing redundant information.
        • This can be achieved through lossy or lossless compression techniques. Lossy compression techniques discard some of the image data, resulting in a smaller file size but reduced image quality, while lossless compression techniques maintain image quality but may not achieve as much compression.

Displaying images in Python Jupyter notebook

Python Libraries and Concepts used for Jupyter and Files/Directories

IPython

Support visualization of data in Jupyter notebooks. Visualization is specific to View, for the web visualization needs to be converted to HTML.

pathlib

File paths are different on Windows versus Mac and Linux. This can cause problems in a project as you work and deploy on different Operating Systems (OS’s), pathlib is a solution to this problem.

Questions

  • What are commands you use in terminal to access files?
    • ls - list the contents of the current directory
    • cd - change directory to a specified location
    • pwd - print the current working directory
    • mkdir - create a new directory
    • rm - remove a file or directory
    • cp - copy a file or directory
    • mv - move a file or directory
  • What are the command you use in Windows terminal to access files?
    • dir - list the contents of the current directory
    • cd - change directory to a specified location
    • cd.. - move up one directory level
    • md - create a new directory
    • rd - remove a directory
    • copy - copy a file or directory
    • move - move a file or directory
  • What are some of the major differences?
    • One of the major differences between the two sets of commands is the syntax used. In Linux/Unix, commands and file paths are typically separated by forward slashes, while in Windows, they are separated by backslashes.
    • Additionally, Linux/Unix commands are often shorter and more concise, using single-letter flags to modify their behavior, while Windows commands are often longer and more descriptive, using full words and phrases to specify options and parameters.
    • Another difference is that Windows commands are typically case-insensitive, while Linux/Unix commands are case-sensitive.
  • Why is path a big deal when working with images?
    • Path is a big deal when working with images because it allows the software or application to locate the image file on the computer or network. Without the correct path, the software or application will not be able to find and display the image. This is particularly important when working with images in programming, where the path is specified in code and needs to be accurate.
  • How does the meta data source and label relate to Unit 5 topics?
    • The metadata source and label are important in Unit 5 topics, particularly in the context of supervised learning. Metadata such as image dimensions, file type, and compression type can provide useful information about the image and help in preprocessing and feature extraction.
    • Labels can also provide important information about the image, such as the object or category it belongs to, which can be used to train and test machine learning models.
  • Look up IPython, describe why this is interesting in Jupyter Notebooks for both Pandas and Images?
    • IPython is an interactive command-line shell for Python that provides enhanced functionality, such as tab completion, object introspection, and inline plotting. It is particularly useful in Jupyter Notebooks, where it allows for interactive exploration and analysis of data.
      • In Jupyter Notebooks, IPython can be used with both Pandas and Images.
        • With Pandas, IPython allows for interactive data exploration and manipulation, making it easier to analyze and clean data.
        • With Images, IPython can be used to display and manipulate images inline, allowing for real-time visualization and preprocessing of images. This can be especially useful in computer vision tasks where image preprocessing and analysis is an important part of the workflow.

Reading and Encoding Images

  • When we are now working on this data and images unit

Base 64

  • converts binary encoded data into an encoded scheme of 24 bits and 6 base64 bits.
    • It is used to transport and embed binary images into textual assets such as HTML and CSS
    • First 3 letters of my name in Base64: TGlh

      Numpy

  • an artificial intelligence data science tool
    • Mort uses them to manipulate and grab images

      io, BytesIO

  • Input and Output (I/O) is a fundamental of all Computer Programming. Input/output (I/O) buffering is a technique used to optimize I/O operations.
    • In large quantities of data, how many frames of input the server currently has queued is the buffer.

      Questions

  • Where have you been a consumer of buffering?
    • As a consumer, buffering is a common issue that I often encounter. Buffering occurs when the playback of media content is interrupted due to slow internet connection, network congestion, or limited system resources. When buffering occurs, the media content will stop playing and a loading icon or message may be displayed while the content is being buffered. This is really annoying and a big part of my digital expirience.
  • From your consumer experience, what effects have you experienced from buffering?
    • Interruptions in media playback: Buffering can cause interruptions in media playback, which can be frustrating for the consumer, especially if the content is being streamed in real-time or contains important information.
    • Delayed or slow playback: Buffering can cause delayed or slow playback, which can negatively affect the user’s experience and perception of the quality of the media content.
    • Reduced image quality: Buffering can cause the image quality of media content to be reduced due to compression and other techniques used to speed up buffering.
  • How do these effects apply to images?
    • In the context of images, buffering may occur when displaying large images or when loading images from a slow network or storage device.
    • This can result in delayed or slow image loading, which can affect the user’s experience and perception of the image quality.
    • In addition, images that are compressed or resized for faster loading may lose some of their quality and detail, which can negatively impact the user’s viewing experience.

Data Structures, Imperative Programming Style, and Working with Images

Most data structures classes require Object Oriented Programming (OOP). Since this class is lined up with a College Course, OOP will be talked about often. Functionality in remainder of this Blog is the same as the prior implementation

Imperative Programing

  • a programming paradigm that emphasizes giving the computer a set of instructions to follow, which are executed sequentially.
    • It is often compared to a list of steps that a person might take to complete a task.
  • Imperative programming languages focus on the “how” of a task rather than the “what” or “why”. Examples of imperative programming languages include C, Fortran, and Assembly

OOP

  • a programming paradigm that focuses on objects as the fundamental unit of programming. OOP is based on the idea that objects are made up of data and behavior, and that objects interact with each other to perform tasks.
    • The key features of OOP include encapsulation, inheritance, and polymorphism.
    • Examples of OOP languages include Java, Python, and Ruby.

Differences

  • Data and behavior: In imperative programming, data and behavior are often separate, and functions or procedures manipulate data. In OOP, data and behavior are encapsulated within objects.
  • Procedural vs. Object-Oriented: Imperative programming is often procedural, meaning that it relies on a series of procedures or functions to accomplish a task. OOP is based on objects that interact with each other to accomplish a task.
  • Abstraction: OOP provides abstraction, meaning that objects can hide their internal workings and expose only a public interface. Imperative programming is less abstract, and typically exposes more of the internal workings of a program.
  • Flexibility: OOP is generally more flexible and adaptable than imperative programming, due to its ability to encapsulate data and behavior within objects. Imperative programming can be more rigid, as it is often based on a series of procedures or functions that are executed sequentially.

Questions

  • Does this code seem like a series of steps are being performed?
  • Describe Grey Scale algorithm in English or Pseudo code?
    • A grayscale algorithm converts a color image into a black and white image by removing the color information and retaining only the luminance or brightness values of each pixel. The algorithm works as follows:
      1. Load the color image into memory.
      2. Iterate through each pixel of the image.
      3. For each pixel, calculate the grayscale value using the formula:
      4. grayscale = (0.299 * red) + (0.587 * green) + (0.114 * blue) where red, green, and blue are the color values of the pixel.
      5. Assign the calculated grayscale value to the pixel.
      6. Save the grayscale image to disk.
  • Describe scale image? What is before and after on pixels in three images?
    • A scaled image is an image that has been resized to either be smaller or larger than its original size. Scaling an image involves changing the number of pixels that make up the image, which can affect its overall quality and clarity.
    • In order to scale an image, a mathematical formula is used to determine how the pixels should be adjusted.
    • When scaling an image to be larger, new pixels are added to the image to increase its size. Conversely, when scaling an image to be smaller, some of the original pixels are removed or combined to decrease its size.
    • Examples:
      • Image 1: Original size: 1000 x 1000 pixels Scaled size: 500 x 500 pixels (50% of original size)

      • Image 2: Original size: 1000 x 1000 pixels Scaled size: 2000 x 2000 pixels (200% of original size)

      • Image 3: Original size: 1000 x 1000 pixels Scaled size: 100 x 100 pixels (10% of original size)

    • Is scale image a type of compression? If so, line it up with College Board terms described?
      • Scaling an image is not considered a type of compression because it does not involve any reduction in the file size of the image. When an image is compressed, the file size is reduced by either removing some of the data or by encoding the data in a more efficient way.
        • Compression is usually used to reduce the file size of an image to make it easier to store, transmit or load.
      • On the other hand, scaling an image is simply a process of resizing the image to make it larger or smaller without reducing its file size or changing its format. Scaling an image can, however, impact the visual quality of the image.
        • If the image is scaled down, the reduction in the number of pixels may lead to loss of details or blurriness. If the image is scaled up, it may result in pixelation or distortion.

2.2 Data Compression Hacks