1. #1
    Banned GennGreymane's Avatar
    10+ Year Old Account
    Join Date
    Apr 2010
    Location
    Wokeville mah dood
    Posts
    45,475

    Can big databases be kept both anonymous and useful?

    We’ll see you, anon (title altered to subtitle to give context)
    http://www.economist.com/news/scienc...Wellseeyouanon


    FREQUENT visitors to the Hustler Club, a gentlemen’s entertainment venue in New York, could not have known that they would become part of a debate about anonymity in the era of “big data”. But when, for sport, a data scientist called Anthony Tockar mined a database of taxi-ride details to see what fell out of it, it became clear that, even though the data concerned included no direct identification of the customer, there were some intriguingly clustered drop-off points at private addresses for journeys that began at the club. Stir voter-registration records into the mix to identify who lives at those addresses (which Mr Tockar did not do) and you might end up creating some rather unhappy marriages.

    The anonymisation of a data record typically means the removal from it of personally identifiable information. Names, obviously. But also phone numbers, addresses and various intimate details like dates of birth. Such a record is then deemed safe for release to researchers, and even to the public, to make of it what they will. Many people volunteer information, for example to medical trials, on the understanding that this will happen.

    But the ability to compare databases threatens to make a mockery of such protections. Participants in genomics projects, promised anonymity in exchange for their DNA, have been identified by simple comparison with electoral rolls and other publicly available information. The health records of a governor of Massachusetts were plucked from a database, again supposedly anonymous, of state-employee hospital visits using the same trick. Reporters sifting through a public database of web searches were able to correlate them in order to track down one, rather embarrassed, woman who had been idly searching for single men. And so on.

    Each of these headline-generating stories creates a demand for more controls. But that, in turn, deals a blow to the idea of open data—that the electronic “data exhaust” people exhale more or less every time they do anything in the modern world is actually useful stuff which, were it freely available for analysis, might make that world a better place.

    Of cake, and eating it
    Modern cars, for example, record in their computers much about how, when and where the vehicle has been used. Comparing the records of many vehicles, says Viktor Mayer-Schönberger of the Oxford Internet Institute, could provide a solid basis for, say, spotting dangerous stretches of road. Similarly, an opening of health records, particularly in a country like Britain, which has a national health service, and cross-fertilising them with other personal data, might help reveal the multifarious causes of diseases like Alzheimer’s.

    This is a true dilemma. People want both perfect privacy and all the benefits of openness. But they cannot have both. The stripping of a few details as the only means of assuring anonymity, in a world choked with data exhaust, cannot work. Poorly anonymised data are only part of the problem. What may be worse is that there is no standard for anonymisation. Every American state, for example, has its own prescription for what constitutes an adequate standard.

    Worse still, devising a comprehensive standard may be impossible. Paul Ohm of Georgetown University, in Washington, DC, thinks that this is partly because the availability of new data constantly shifts the goalposts. “If we could pick an industry standard today, it would be obsolete in short order,” he says. Some data, such as those about medical conditions, are more sensitive than others. Some data sets provide great precision in time or place, others merely a year or a postcode. Each set presents its own dangers and requirements.

    Fortunately, there are a few easy fixes. Thanks in part to the headlines, many now agree that public release of anonymised data is a bad move. Data could instead be released piecemeal, or kept in-house and accessible by researchers through a question-and-answer mechanism. Or some users could be granted access to raw data, but only in strictly controlled conditions.

    All these approaches, though, are anathema to the open-data movement, because they limit the scope of studies. “If we’re making it so hard to share that only a few have access,” says Tim Althoff, a data scientist at Stanford University, “that has profound implications for science, for people being able to replicate and advance your work.”

    Purely legal approaches might mitigate that. Data might come with what have been called “downstream contractual obligations”, outlining what can be done with a given data set and holding any onward recipients to the same standards. One perhaps draconian idea, suggested by Daniel Barth-Jones, an epidemiologist at Columbia University, in New York, is to make it illegal even to attempt re-identification.

    While some level of anonymisation will remain part of any resolution of the dilemma, mathematics may change the overall equation. One approach that would shift the balance to the good is homomorphic encryption, whereby queries on an encrypted data set are themselves encrypted. The result of any inquiry is the same as the one that would have been obtained using a standard query on the unencrypted database, but the questioner never sets eyes on the data. Or there is secure multiparty computation, in which a database is divided among several repositories. Queries are thus divvied up so that no one need have access to the whole database.

    These approaches are, on paper, absolute in their protections. But putting them to work on messy, real-world data is proving tricky. Another set of techniques called differential privacy seems further ahead. The idea behind it is to ensure results derived from a database would look the same whether a given individual’s data were in it or not. It works by adding a bit of noise to the data in a way that does not similarly fuzz out the statistical results.

    Hot fuzz
    America’s Census Bureau has used differential privacy in the past for gathering commuters’ data. Google is employing it at the moment as part of a project in which a browser plug-in gathers lots of data about a user’s software, all the while guaranteeing anonymity. Cynthia Dwork, a differential-privacy pioneer at Microsoft Research, suggests a more high-profile proving ground would be data sets—such as some of those involving automobile data or genomes—that have remained locked up because of privacy concerns.

    For now, differential privacy’s difficult mathematical underpinnings make it tricky to implement more broadly. That needs to change, according to Salil Vadhan, of the Centre for Research on Computation and Society at Harvard. “The ball is in our court to not just to write papers, but to produce general-purpose tools,” he says.

    Public education is also needed. Data science could well lead to safer roads and long-sought cures, but people have to understand the trade-offs. In July researchers at Britain’s Office for National Statistics (ONS), whose releases of data underpin billions of pounds of public spending, began to consult members of the public about their comfort with different types of data disclosure. There is always some risk to anonymity, says Jane Naylor of the ONS. But “there’s also a risk of not making the best use of data.”

  2. #2
    If you lived in a small town say, 3000 people and you had a small town newspaper, and this newspaper made its living by advertising, is that a comparable model to what is going on in the Internet now? How would businesses and the newspaper take their knowledge of the community to off more effective advertising?
    .

    "This will be a fight against overwhelming odds from which survival cannot be expected. We will do what damage we can."

    -- Capt. Copeland

  3. #3
    Quote Originally Posted by Hubcap View Post
    If you lived in a small town say, 3000 people and you had a small town newspaper, and this newspaper made its living by advertising, is that a comparable model to what is going on in the Internet now? How would businesses and the newspaper take their knowledge of the community to off more effective advertising?
    advertising can only be effective if it doesnt turn off your target market

    inserts in the newspaper is a great way to advertise - non-intrusive and handy

    personally i think the internet needs this as well, like a .ad domain, a place for ppl to peruse ads at their leisure, just like the inserts in newspapers

  4. #4
    The Insane Kujako's Avatar
    10+ Year Old Account
    Join Date
    Oct 2009
    Location
    In the woods, doing what bears do.
    Posts
    17,987
    Short answer "no". Long answer "yes", but there's not as much money in doing it. So it's not so much that it can't be secure, as it's not worth the expense. For things like Facebook etc, there's no point since the users are the product. The whole point is to share/sell user information.

    On the users side, best bet is to use a firewall level content filter. But that's fairly technical and the few attempts at a consumer friendly system have met with a lot of push back (wonder why).
    It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shakes, the shakes become a warning.

    -Kujako-

  5. #5
    The Insane Acidbaron's Avatar
    10+ Year Old Account
    Join Date
    Oct 2010
    Location
    Belgium, Flanders
    Posts
    18,230
    Yes, you can if you are willing to invest You could just put all the registration parts from register to database off an internet connected network and not linked to a wireless network either. There's no reason for such a database to be online, secondly if you want to make a website and transfer data over you can make it so you setup another server but limit the amount of private information on there, so for example, login given by said bar, password( since people are dumbasses and still pick the easiest to guess ones) also and let them only use email addresses that you know of hosts that are decent. Use a decent confirmed website software (joomla not wordpress) and protect the data again by a firewall and also you put your webserver in a DMZ, not part of your internal network.

    Last but not least you get someone who actually knows what they are doing to set it all up. Every monkey can install server software, configuration know how is where the actual knowledge is needed.

    Problems is cloud is cheaper, companies therefor go with cloud options actual server harddisks are expensive, NAS ones less so but overal still more expensive.
    People don't really go for the more expensive solution since they will for the large part always underestimate how easy they are to be compromised.

  6. #6
    Void Lord Doctor Amadeus's Avatar
    10+ Year Old Account
    Join Date
    May 2011
    Location
    In Security Watching...
    Posts
    43,753
    Privacy is overrated. Just because someone has the tools to do something wrong with it, doesn't mean that they are, people with Facebook and Twitter accounts, who feel differently, better start to realize that battle has been fought, and lose, and people surrendered a long time ago.


    It makes no difference, what is really more important than identity, and whatever database being kept, is that those who own those identities claim it, and own it. Be aware of your identity, monitor your credit, be careful what associations you have, and who on your behave acts in your name.
    Milli Vanilli, Bigger than Elvis

  7. #7
    The Insane Acidbaron's Avatar
    10+ Year Old Account
    Join Date
    Oct 2010
    Location
    Belgium, Flanders
    Posts
    18,230
    Quote Originally Posted by Kujako View Post
    Short answer "no". Long answer "yes", but there's not as much money in doing it. So it's not so much that it can't be secure, as it's not worth the expense. For things like Facebook etc, there's no point since the users are the product. The whole point is to share/sell user information.

    On the users side, best bet is to use a firewall level content filter. But that's fairly technical and the few attempts at a consumer friendly system have met with a lot of push back (wonder why).
    We are just coming to the point that people accept a firewall and anti-virus is recommended for day to day PC use and can only be avoided if you really visit things that are secure and know what you are doing (what most believe that they are, aren't)

    A decent firewall is one that is high on customization end, You can't have everything pre-setup since if you do the ports that are used will be know for example. Also you can put a firewall as well locked down as you like if at the end of the day (do note that you can still use and do everything) a person will still click 'yes' on something bad and that protection they had in place has just become bypassed.

  8. #8
    Banned GennGreymane's Avatar
    10+ Year Old Account
    Join Date
    Apr 2010
    Location
    Wokeville mah dood
    Posts
    45,475
    Quote Originally Posted by HomeHoney View Post
    advertising can only be effective if it doesnt turn off your target market

    inserts in the newspaper is a great way to advertise - non-intrusive and handy

    personally i think the internet needs this as well, like a .ad domain, a place for ppl to peruse ads at their leisure, just like the inserts in newspapers
    I agree with this statement since I never minded advertising in the past until I became a much more heavy internet user. Ads were meh during T.V shows, but they were always that way and u knew when they would show and generally for how long. News papers were helpful ads because if you wanted something you would check. Even going to a direct website is helpful if you need to make a purchase, however in general internet ads are horrible, intrusive, loud, not needed, and generally make me avoid those products. Fuck they even slow down the website.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •