Notes

2.2

  • Files are shrunk in order to reduce the storage they take up and therefore make work faster
  • Lossy data shrinks data but is unable to recover the original data
  • Lossless data shrinks the data and is able to recover the original
  • It is important to be able to navigate through directories
  • We use various methods to go through directories on Windows and Mac

2.3

  • Pandas and dataframes are things which people can use to create their careers
  • Pandas is python tool which lets us work with dataframes
  • Headers are usually metadata and the inner area is data(more essential)
  • It is important to clean data in order to analyze and use it
  • De Morgan’s Law: the complement of the product of all the terms is equal to the sum of the complement of each term
  • When using dataframes which are long, head refers to the front of the list and tail is the end of the list

Collegeboard Questions

I completed all the collegeboard assignment and here is my reflection on the questions. I tried thinking of different applications of the knowledge.

Data Compression

Q1

  • Q: Which of the following is an advantage of a lossless compression algorithm over a lossy compression algorithm?
  • A: A lossless compression algorithm can guarantee reconstruction of original data, while a lossy compression algorithm cannot.
  • This means that the lossless compression algorithm will be able to save all the data of a file, while a lossy compression algorithm will maintain mostly the essential information.
  • This can be applied to how a lossy compression algorithm can be used to reduce files to the smallest size possible and lossless compression algorithms can be used to maintain the same data of a file even after compression.

Q2

  • Q: A user wants to save a data file on an online storage site. The user wants to reduce the size of the file, if possible, and wants to be able to completely restore the file to its original version. Which of the following actions best supports the user’s needs?
  • A: Compressing the file using a lossless compression algorithm before uploading it
  • A lossless algorithm allows the original file to be restored, which may be the reason of the name of lossless
  • A lossless algorithm can help maintain information of a file but help cut down on file size. This could be used for important things like word processing or database records.

Q3

  • Q: A programmer is developing software for a social media platform. The programmer is planning to use compression when users send attachments to other users. Which of the following is a true statement about the use of compression?
  • A: Lossy compression of an image file generally provides a greater reduction in transmission time than lossless compression does.
  • Lossy compression will reduce the image file to have only the minimum data necessary, but a lossless compression will be able to restore all data. So, it makes sense for lossy images to be able to transmit faster.
  • A lossy compressed image will be used when a lot of files need to be stored and something needs to be reduced to the minimum.

Reflection

Lossy and lossless compression are very important to understand. They both have different implications for a file. They also have different pros and cons for files. Being able to understand when to use the different types of compressions will be important as a developer in order to create quality algorithms which fit my overall purpose well.

Extracting Information from Data

Q1

  • Q: Upon compiling the data, the researcher identifies a problem due to the fact that neither data source uses a unique ID number for each student. Which of the following best describes the problem caused by the lack of unique ID numbers?
  • A: Students who have the same name may be confused with each other.
  • Since both datasets had first name and last name, but no unique identification, it would be possible to confuse two students for the same person.
  • This shows how important it is to have distinct characteristics for each piece of data. Using ID numbers can help solve this problem.

Q2

  • Q: A team of researchers wants to create a program to analyze the amount of pollution reported in roughly 3,000 counties across the United States. The program is intended to combine county data sets and then process the data. Which of the following is most likely to be a challenge in creating the program?
  • A: Different counties may organize data in different ways.
  • Not all areas would have the same way of ordering and organizing data, so there could be different forms of data sets.
  • This is why data cleaning is important for data collectors, in order to make all the data in the same style so that programs will be able to easily run on them.

Q3

  • Q: A student is creating a Web site that is intended to display information about a city based on a city name that a user enters in a text field. Which of the following are likely to be challenges associated with processing city names that users might provide as input?
  • A: Users might enter abbreviations for the names of cities.; Users might misspell the name of the city.
  • There are many types of errors which can arise when taking in user input. So, it is important to have checks in order to make sure that there is no garbage which will be able to get into the data set.
  • Implementing checks will be able to ensure that the user gets the best experience.

Q4

  • Q: Which of the following additional pieces of information would be most useful in determining the artist with the greatest attendance during a particular month?
  • A: Average ticket price
  • Given that the total dollar amount of all tickets sold is given, dividing that number by average ticket price would give a good estimate of the amount of tickets which are sold.

Q5

  • Q: A camera mounted on the dashboard of a car captures an image of the view from the driver’s seat every second. Each image is stored as data. Along with each image, the camera also captures and stores the car’s speed, the date and time, and the car’s GPS location as metadata. Which of the following can best be determined using only the data and none of the metadata?
  • A: The number of bicycles the car passed on a particular day
  • The dashboard of the car will be able to tell how much distance is traveled, the time, or the speed. These can all be estimated, but will never be accurate. However, the number of bicycles passed will be able to be compiled from the photos, thus making this the correct answer.
  • This shows how some important things like metadata may not be able to be accessed from the direct photo, but from metadata, while other things can only be found in the photo. Thus, it is important to be able to identify where to get the data from.

Q6

  • Q: Which of the following questions about the students who responded to the survey can the teacher answer by analyzing the survey results? Do students who enjoy the subject material tend to spend more time on homework each night than the other students do? Do students who spend more time on homework each night tend to spend less time studying for tests than the other students do? Do students who spend more time studying for tests tend to earn higher grades in the class than the other students do?
  • A: I and II
  • The last question can’t be answered because the data set does not include a way to reference the answers to a student, since it is anonymous. So, the third question can’t be answered.
  • It is important to be able to tell what type of data is stored in a data set to ensure there are efficient programs being used.

Reflection

These questions showed me the multiple ways that data can be received. They also show the importance of being able to go through databases and get the data which you want. This is essential to creating the algorithms which you want, because you want to be able to understand what the data you have can and can’t do.

Using Programs

Q1

  • Q: For a given row in the spreadsheet, suppose genre contains the genre as a string, num contains the number of copies in stock as a number, and cost contains the cost as a number. Which of the following expressions will evaluate to true if the book should be counted and evaluates to false otherwise?
  • A: (genre = “mystery”) AND ((1 ≤ num) AND (cost < 10.00))
  • This makes sure that all the necessary requirements are met through AND conditionals/

Q2

  • Q: Using only the data collected during the 7-day period, which of the following statements is true?
  • A: The total number of items purchased on a given date can be determined by searching the data for all transactions that occurred on the given date and then adding the number of items purchased for each matching transaction.
  • This is the only answer choice which uses data from the given dataset, so it is the only correct answer.

Q3

  • Q: A new rechargeable battery pack is available for products that use AA batteries. Which of the following best explains how the data files in the table can be used to send a targeted e-mail to only those customers who have purchased products that use AA batteries to let them know about the new accessory?
  • A: Use the products file to generate a list of product IDs that use AA batteries, then use the list of product IDs to search the purchases file to generate a list of customer IDs, then use the list of customer IDs to search the customers file to generate a list of e-mail addresses
  • This answer specifically finds customers who have purchased products using AA batteries and emails them. This is important to a efficient algorithm which helps both the business and the consumers.

Q4

  • Q: Which of the following sequences of steps can be used to identify the desired entry?
  • A: Filter by photographer, then filter by year, then sort by year; Sort by year, then filter by year, then filter by photographer
  • These both make sure that the picture selected is of a confirmed photographer and year. This ensures all requirements are met.

Q5

  • Q: For a given row in the spreadsheet, suppose genre contains the genre as a string and day contains the day as a string. Which of the following expressions will evaluate to true if the show should be counted and evaluates to false otherwise?
  • A: (genre = “talk”) AND ((day = “Saturday”) OR (day = “Sunday”))
  • This answer uses both AND and OR to meet the required conditions. This is important because one has to happen and one is either or.

Q6

  • Q: Which of the following explains how the two databases can be used to develop the interactive exhibit?
  • A: Both databases are needed. Each database can be searched by animal name to find all information to be displayed.
  • Each database has part of the information needed, but not all of it, so they need to be combined in order to help the interactive exhibit work.

Reflection

The usage of programs is very useful with databases. They help to be able to efficiently go through and get the data required. However, it is also essential to write good code which meets the requirements. It is also necessary to have a good understanding of the information the database can give you, in order to make sure you can get the most out of your code with good accuracy.