Lecture 10

Trust Models

  • A trust model is a computational term for how much we trust something we’re receiving over the internet. For example, do we trust that a certain software is what it says it is, or do we trust that a provider is providing a service in a way they describe without doing other things behind the scenes?

Reflections on Trusting Trust

  • This paper was written in 1984 by Ken Thompson, one of the inventors of the Unix operating system.

  • Thompson begins this paper discussing a computer program that can reproduce itself, typically called a quine in computer science. It is quite simple to write programs to do this.

  • Thompson then discusses how one might teach a computer to teach itself something. He suggests that we can teach the compiler to compile itself. Remember that compilers turn source code, what we write, to machine code, since computers can only understand these zeroes and ones.

    • To teach a compiler to compile itself, he introduces a new character for the compiler to understand. In this case, he uses the vertical tab character, which allows one to jump down several lines without resetting back to the beginning of the line as newline would.

    • In the paper, he goes through the process of how one can teach the compiler what this new character (vertical tab) means.

    • He shows that we can write code in C, have the compiler compile that code into zeroes and ones that create a binary, or a program that a computer can execute and understand. With that newly created compiler, we can compile other C programs.

    • Overall, once we teach the computer how to understand what the vertical tab character is, it can propagate into any other C program.

  • Thompson then discusses the possibility of a computer or compiler doing more than, for example, just adding a vertical tab character.

    • For example, say we’re teaching the computer to understand the vertical tab. While doing so, we secretly add a bug to the source code. Whenever we compile the code and encounter the vertical tab character, then, we’re not only putting that vertical tab into the code but also a bug. Thompson then discusses what steps can be taken to make it seem like the bug was never there even though the bug is now propagating into all of the source code we compile going forward.
  • Ultimately, the question is: is it possible to ever trust software written by anyone else? Let’s look at some examples of software that in using, we trust that their code functions as they claim and that they handle our information properly.

Samsung Smart TV Privacy Policy Supplement

  • A few years ago, when people discovered just how much information they were sharing with Samsung via their Smart TVs, it became a mildly scandalous news story.

  • The Samsung Smart TV can listen to voice commands to execute actions like turning up the volume or changing the channel. To do so, it records and captures these voice commands and then transmits these to a third-party language processor which would input these into their database to improve the quality of understanding what these commands were.

  • In Samsung’s policy, it states that the device will collect IP addresses, cookies, the hardware and software configuration, and browser information.

  • Is this information different from the information we share with our browsers, though? When we use our browsers and send a request, we also provide our IP address, our OS information, geographic location, and browser information via the HTTP headers.

  • Samsung also allows for gesture controls. This helps people who are visually impaired or people who may be unable to use a remote control device who can simply wave or make certain gestures to operate the TV. In using gesture controls, Samsung might capture faces, movements, and perhaps even the aspects of the room. This feature leads to the questions: do we trust Samsung? Is there a way to ensure that Samsung is properly interacting with our data? Should there be a way for us to verify this, or is that proprietary to Samsung?

Intel Management Engine (IME)

  • The Intel Management Engine is a program that helps network administrators in the event that something goes wrong with a computer—they will be able to access the computer remotely by issuing commands. The computer listens to these commands on a specific port.

  • If the computer is listening on a specific port, how can we be sure that the request the computer has received on the port is accurate? Without Intel’s code, we cannot truly understand how the software runs on our machines. Then, should Intel be required to reveal that code?

    • Those who argue the affirmative might say that we have a right to know what programs are running on our computers, but those who disagree might argue that this is Intel’s intellectual property.
  • Intel also provides a software that tells us whether or not our IME chip is activated in a way such that we are subject to potential remote access or not. Should we trust the result of the software that Intel has provided us?

Open-source Software and Licensing

  • Open source software is a type of computer software in which source code is released and can be used, changed, and distributed under a license.

GPLv3

  • GPLv3, or General Public License version 3, is often criticized for being a copyleft license. With a copyleft license, if someone has used code licensed under GPL in their own code, they are not allowed to impose restrictions on the use or distribution of their own code either. This is to improve the community, as others will also be able to benefit from using and modifying that source code.

  • This can introduce dangers as well. For example, suppose there is a company that has just come up with an amazing idea that will transform the market. The last snippet of code they need to implement this idea has been found online! But it’s GPL licensed. Once they include this snippet of code into their own source code, their whole project becomes GPL licensed. Of course, they can still sell their product, but their profitability will decrease because now the source code is available freely for anyone to access.

  • Sometimes this is referred to as the General Public License virus because it propagates so extensively! As soon as one touches code or uses code that is GPL licensed, suddenly all of the person’s source code is GPL licensed.

LGPLv3

  • LGPL is the lesser General Public License. If code is LGPL licensed, then any modifications made to that specific code will also have to be LGPL licensed, but other parts of the source code do not have to be LGPL licensed. Other parts can be licensed under other terms, including terms that are not open source at all.

  • This still benefits the community since changes made to the original LGPL code are open sourced.

MIT License

  • The MIT license is one of the most permissive licenses available.

  • Code under the MIT license can be taken and changed however one likes, and these changes do not have to be re-licensed.

  • Most code on Github is MIT licensed.

Other Licenses

  • In CS50, the material produced is licensed under a Creative Commons license, which is similar to the GPL license. Oftentimes, it will require people to re-license the changes that were made to the material, and people are not allowed to profit from the changes made to the material.

Dealing with Emergent (Disruptive) Technologies

  • In this section, we’ll discuss and ask questions about how the law might keep up with emergent technologies. Sometimes these technologies are referred to as disruptive, since they materially affect the way that we interact with technology.

3D Printing

  • In 2D printing, a write head moves left to right across a piece of paper, providing x-axis movements, and spits out ink. The paper is fed through a feeder and provides the y-axis movements.

  • In 3D printing, instead of using ink in our write head, we use a plastic based filament that is heated to just above its melting point, so it can harden quickly. Then, the plastic is deposited onto a surface. The write head can move left and right, similar to a 2D printer, and additionally, the write arm can also move up and down, providing z-axis movements.

  • 3D printing is considered a disruptive technology because it allows people to create items that they may not otherwise have access to.

    • For example, it is possible to 3D print a plastic gun that can fire plastic or metal bullets, and this plastic gun is able to evade metal detection.
  • How might the law contend with this new technology?

    • We might just allow people to do whatever they want to do with the technology and decide after the matter whether or not that particular usage was okay.

    • We might want to be pro-active in trying to prevent the production of certain things that we consider are unethical to produce.

    • For example, current 3D printing technologies allow us to print with metal. We can also print with human cells to create organs. Do we want people to create these things or should we regulate this production beforehand?

  • The article also discusses the concept of immunizing intermediaries. Should manufacturers of 3D printers and the designers of the CAD files, computer-aided design files that generally go into 3D printing, be held accountable for the misuse of 3D printing?

  • Finally, the article discusses the possibility of allowing the 3D printing industry to self-regulate.

    • Attorneys self-regulate, and the system seems to work well. However, social media companies similarly self-regulate; they have had limited success. Would self-regulation be successful in the 3D printing industry?
  • 3D printing lends itself to violating copyrights, patents, and trademarks. Some companies that have dealt with copyright issues include Napster, a digital music file sharing site, which was shut down as a result of violating copyright law. Sony was part of a lawsuit concerning VCRs and tape delaying copyrighted material.

  • If interested in the implications of 3D printing, check out these articles: “Guns, Limbs, and Toys: What Future for 3D Printing?” and “The Law and 3D Printing.”

Virtual and Augmented Reality (VR/AR)

  • Augmented reality involves superimposing graphical images onto the real world. A popular example of Augmented Reality is Pokemon Go; the mobile game allows you to view and catch Pokemon in your living room and outdoors.

  • Virtual reality is an immersive alternate reality experience. The real world around you disappears as you put on a headset. Headphones and audio allow for an even more immersive experience.

  • VR and AR allow for interaction in these digital worlds. Studies have shown that people have true, realistic feelings in these alternate realities. What would happen if someone were to commit a crime in one of these worlds?

    • If someone were to pull a gun on another person in one of these alternate worlds, would that qualify as assault? There is a perception of potential harm, but does the potential for bodily harm actually exist?

    • If malicious code altered the augmented reality we view, how would that be considered under the law? AR GPS technologies could lend themselves to this form of hacking and lead people astray into potentially dangerous circumstances.

    • Interactions can often occur across different nations. Which nation would have jurisdiction? If the crime occurs virtually, do courts have jurisdiction over the human player or solely the in-game avatar?

  • Crimes can also be technologically driven. Doxxing involves revealing the personal information of someone over the internet in an attempt to harm that individual. Swatting involves reporting a false crime to the police who would then send a SWAT team (hence the name) to an innocent person’s home.

  • There are some benefits for the law in virtual reality. Virtual crimes allow for actions to be more closely tracked. IP addresses can allow for investigators to more easily track down the perpetrators. People can be muted in virtual reality, thus avoiding certain possible instances of harassment.

Digital Privacy: Tracking

  • It is difficult for consumers to understand exactly what digital data is being gathered about them by corporations, particularly in the United States. Most consumers simply accept that they are being tracked when they are using the Internet. Should that be an essential part of using the Internet?

  • Cookies allow a user to bypass login credentials and verify their identity. This prevents a recurrent user from having to repeatedly log onto a login-protected site. Cookies offer not only the opportunity to identify a user, but they can be used to track a user’s activity on the Internet.

    • If this cookie is linked to a user’s IP address, the user’s activity could even be linked to a specific geographical area. Something as simple as activity on a webpage could lead to targeted “snail mail” advertisements at their home. How do we feel about that sort of invasion of privacy? We could change the way IP addresses function, but is the potential for physical advertising enough to warrant such a change?

EFF: The Problem with Mobile Phones

Mobile phone tracking is often considered more invasive and dangerous than other forms of digital tracking. We carry our mobile devices with us wherever we go; this allows us to be easily tracked and pinpointed at specific locations. Mobile phones often quickly become obsolete. Manufacturers may stop providing firmware patches, thus making this data vulnerable to digital theft.

  • Mobile phone tracking does not operate through GPS but rather through cell towers. While GPS systems can triangulate a user’s location, they do not contain information about the device which requests the information. Cell phone towers, however, can allow for location tracking by analyzing the strength of the signals received by the towers at different locations.

  • Are the corporations that produce our mobile devices at the command of the federal government? Do backdoors exist in our operating systems and firmware that would allow unconsented access into our digital activity? What about access to outside activity recorded through device microphones or cameras?

  • While it is relatively easy to mask one’s identity on a computer through a VPN, it is very difficult to do so on a mobile phone. Burner phones are meant for a limited amount of uses before being disposed of; however, a repeated pattern in phone calls can still be used to identify users who may be utilizing a variety of different mobile devices.

AI, Machine Learning

  • Artificial intelligence and machine learning can prove useful to lawyers to process large amounts of data and review documents. These sorts of processing tasks are typically outsourced to contract attorneys or first-year associates.

  • Artificial intelligence is often associated with pattern recognition and the potential to make decisions based on those patterns. However, the essential feature of artificial intelligence lies in its ability to mimic the behaviors and operations of the human mind.

  • There are two major ways for artificial intelligences to learn. One is to provide large amounts of data and the rules that map this data to a certain outcome. The other is termed neuroevolution which involves giving the computer a target and allowing it to generate data until it reaches that objective.

  • Could a computer write Shakespeare? Let’s attempt to program a computer to arrive at the phrase: “a rose by any other name.”

    • We will do this by using the genetic algorithm; the genetic algorithm assumes that good traits will propagate into future generations while bad traits will be weeded out in each generation, leaving only the good traits in the end. Random variation or “mutations” are included so the computer is not indefinitely stuck with only bad traits.

    • The computer will not be provided with any original data set. Instead, it will generate its own data set. To do this, we will create DNA objects; in this case, our DNA objects will be random 24-character strings (the length of our target string).

    • The computer will start with 1000 of these random strings and then determine how fit each string is, where fitness is a measure of how favorable this particular string’s characteristics are, where favorable characteristics are those that we would like to propagate down the line.

      • In this case, fitness is calculated by the number of matching characters between the original string and our generated string.
    • As is in genetics, we have to be able to generate new strings. After determining the most fit strings, we’ll have crossover, where two strings are combined in some way.

      • In this case, crossover occurs by combining the first half of one string and the second half of another string.
    • When producing the next generation, we include random mutations. In this case, for some small proportion of the time, we’ll randomly change randomly selected characters.

    • After accounting for these functions, we might have a file titled dna.py that looks like this:

      import random
    
      chars = range(32,128)
      target = "a rose by any other name"
      
      mutationRate = 0.01
      
      class DNA:
    
          def __init__(self):
              self.genes = []
              for i in range(len(target)):
                  gene = chr(random.choice(chars))
                  self.genes.append(gene)
      
          def update_fitness(self):
              score = 0
              for i in range(len(self.genes)):
                  if self.genes[i] == target[i]:
                      score += 1
              self.fitness = float(score)/len(target)
      
          def crossover(self, partner):
              child = DNA()
      
              midpoint = random.choice(range(len(self.genes)))
      
              for i in range(len(self.genes)):
                  if i > midpoint:
                      child.genes[i] = self.genes[i]
                  else:
                      child.genes[i] = partner.genes[i]
      
              return child
      
          def mutate(self):
              for i in range(len(self.genes)):
                  if random.random() < mutationRate:
                      self.genes[i] = chr(random.choice(chars))
      
          def getPhrase(self):
              return ''.join(self.genes)
    
    • In a separate file called script.py, we’ll implement the functions we just wrote. The code is shown here:
      import random
      from dna import DNA
      
      population_size = 1000
      bestScore = 0
      
      population = []
      
      for i in range(int(population_size)):
          population.append(DNA())
      
      while bestScore < 1.0:
      
          for i in range(len(population)):
              population[i].update_fitness()
      
              if population[i].fitness > bestScore:
                  bestScore = population[i].fitness
                  print(f"{population[i].getPhrase()}  score:  {round(bestScore, 3)}".replace(chr(127), " "))
      
          matingPool = []
      
          previous_population = population[:]
          population = []
      
          for i in range(len(previous_population)):
              n = int(previous_population[i].fitness * 100)
              for j in range(n):
                  matingPool.append(previous_population[i])
      
          for i in range(len(previous_population)):
              a = random.choice(range(len(matingPool)))
              b = random.choice(range(len(matingPool)))
      
              parentA = matingPool[a]
              parentB = matingPool[b]
              child = parentA.crossover(parentB)
              child.mutate()
      
              population.append(child)
    
    • Note that the bestScore value starts from 0. As we get closer and closer to the desired string, the score will increase.

    • The population array allows us to store the list of strings that we are considering, i.e. determining their fitness, crossing over, and mutating.

    • As long as we have not found the perfect string, where the bestScore value is 1, we loop through the process over and over again.

      • Within the loop, there is a matingpool array, where we store the fittest strings so crossing over may occur. The most fit strings will participate in crossing over more than the lesser fit strings.
    • In script.py, note that we print out the current string and its fitness—this way we’ll be able to see the various strings the computer iterates through. Over time, the computer learns, and the strings become closer and closer to the target string, as we can see below.

      $ python script.py
      /k(%4  08W*1UC0rFYyXUo@'  score:  0.042
      B_?k/ee^76=xb/X;omR `kW&  score:  0.083
      ?@6K<$~,w]*r2Eft`]C=KaTe  score:  0.125
      Q4+6se1u^F2kz:_5>+y=KaTe  score:  0.167
      ...
      a  Vse by ]ny othe` nam$  score:  0.792
      a rPse!by Gny oth"r name  score:  0.833
      a rose by 9ny other Aa)e  score:  0.875
      a rPse by /ny other name  score:  0.917
      a rose by gny other name  score:  0.958
      a rose by any other name  score:  1.0
    
  • A famous use of machine learning occurred a few years ago with a parking ticket clearing service called “Do Not Pay.” Someone taught a computer how to argue parking tickets on someone’s behalf so they would not have to hire someone to do that same work. The data that the computer learned from were successful and unsuccessful challenges to parking tickets. This service ended up saving hundreds of thousands of dollars in legal fees to challenge parking tickets! So, is it okay for computers to be making these decisions?

Machine Bias

  • There’s a program that is used by judges and prosecutors when releasing a person on bail or setting conditions for parole. This program tells the user the likelihood that this person will commit future crimes.

  • The data fed into these programs are provided by users, and consequently, the programs have developed a racial bias. For example, the program will ask for the person’s socioeconomic status, languages spoken, whether or not their parents have been in prison, and so on. This stereotypes people in a way that we might not deem acceptable.

  • This program has been found to be only 20% accurate in predicting future violent crimes, and this program has been only 60% accurate in predicting future crimes, which is only a little better than a 50/50 guess.

  • Proponents of this program say that this program provides useful data, and opponents say that the data is being misused, for example, for setting sentence lengths, instead.

  • How much do we want technology to be involved in these legal decisions? Judges now are influenced by this data, should the judges reach their decision without the use of this program?

GDPR and the “Right to be Forgotten”

  • The General Data Protection Regulation was passed by the European Union and took effect in May 2018. This allowed people the right to know what sort of data was being collected about them. This right does not currently exist in the United States, but it may very well come into effect in the near future. American businesses that have European users or customers may be subject to the GDPR.

  • The GDPR allows users to request their personal data. This may include data that specifically identifies an individual such as the cookies and digital tracking previously examined. Additionally, there might even be data that has the potential to identify an individual, but does not yet do so.

  • The GDPR imposes requirements on the controller of this data. It defines the controller’s responsibilities for processing this data answering user requests concerning the data. The controller must identify themselves and make themselves accessible for contact. They must also reveal what data they have about the user, how the data is being processed, the purpose of processing this data, and whether this data is being shared with a third party.

    • These requirements are placed not only on explicit data from the user but also on implicit data that may be gathered through web patterns or digital activity.
  • The GDPR also offers users the ability to compel controllers to correct the collected data that is inaccurate. The GDPR does however prevent the deletion of data that serves the public interest. What if a user simply wishes to correct data that they do not like, despite its veracity? Can the user still challenge this data?

    • The GDPR might allow for individuals to delete data concerning minor, non-violent crimes that happened in the past; these records of crime often have significant negative impacts on job prospects and many other areas of life. Deleting these records would allow for individuals to gain a clean slate. Does this deletion of history pose dangerous consequences?

    • If data characterizes an aspect of a user’s personality that they do not personally agree with, can they challenge this data? For example, if a person is categorized as a compulsive spender based on their purchase history, should they be allowed to disagree and alter this data?

Net Neutrality

  • Net neutrality is based on the principle that all web traffic should be treated equally. For example, traffic from Facebook and traffic from a small business should be treated the same regardless of size. This has become a very controversial political issue in recent years.

  • If we envision the Internet as a sort of road that carries information, we can examine the issue of net neutrality by constructing another road alongside the first road. This second road is better maintained and transmits traffic more quickly, but it charges a toll to use it.

    • Proponents of net neutrality argue that offering this second road would prioritize the traffic of the users who could afford this extra spending. If the Internet was designed with the purpose of an equal, free flow of information, this second road would defy this principle.

    • Opponents of net neutrality base their argument on the existence of free markets. If a user wishes to prioritize their traffic, they should be able to do so by paying the necessary toll. Furthermore, free markets operate in most areas of the American economy.

  • The implementation of this second road is quite simple. If a business wishes to pay the toll, their IP address will be associated with this purchase. The Internet service provider (ISP) which owns the infrastructure to transmit this traffic would then prioritize the purchasers over the non-purchasers when transmitting data.

    • For actions such as sending an email or accessing a webpage,Transmission Control Protocol (TCP) allows for redundancies that will resend that data packet if the network is overly congested. These low impact services do not necessitate any sort of prioritization of traffic.

    • Services such as business calls or video streaming often use User Datagram Protocol (UDP) which does not have redundancies to resend the packet in case of congestion. In these instances, prioritizing traffic might be important to ensure that the service runs smoothly without interruptions.

  • In 2015, under Barack Obama, the Democratically-controlled Federal Communications Commission (FCC) voted in favor of net neutrality and reclassified the Internet as a Title II communications service, offering more room for stricter regulation.

  • In 2017, Donald Trump appointed Ajit Pai as the Chairman of the FCC. The FCC repealed net neutrality with the decision taking effect in the summer of 2018.

    • Some states have begun to pass their own laws enforcing net neutrality. These state laws have come into conflict with federal law concerning net neutrality.