Workshop Breakouts: Track 3: Automated Indicator Sharing—Controls
This is an audio file.
ANTONIO SCURLOCK: Alrighty. We’d like to welcome you to one of the final panels of the day, and I appreciate everybody coming forward and bringing their special friends with them. That’s why there’s an empty seat between each one of you, right? I brought mine too and he’s sitting right here.
That being said, this last panel is going to talk about automated indicator sharing, and we’ve had a couple of discussions earlier today, one to speak about requirements, and for those who can’t see in this recording I actually have that in parens, because we don’t really mean something that we’re going to leverage against folks. What we’re really talking about is if we want to do this information sharing piece in an automated fashion, and what might be the future of the information sharing analysis organizations, what would be the kind of things we would have to have an understanding of—roles, responsibilities, and so on and so forth. So we had a conversation about that.
So a couple of individuals have sat in those meetings and I’ll do a brief description of them, and I’ll start off with the most important person—me. My name is Antonio Scurlock. I work for Department of Homeland Security in the National Protection and Programs Directorate, in the Strategy, Policy, and Plans Office, and I also—I’m a co-lead for the Enhanced Situational Awareness Initiative, formerly known as the Comprehensive National Cybersecurity Initiative #5. Exhale. Roger.
That being said, I’d also like to introduce you to Ed White, on your far left over there, VP of Public Sector and GHE. As such, he is responsible for developing—Centripetal? Got it—Centripetal Networks’ strategy for supporting the needs, policies affecting the Federal Government, critical infrastructure, health care communities, and he’s a 25-year veteran of the short-haired—no, I’m just kidding—25-year veteran of the federal IT industry, and he started his career in public service—you know, you’ve got to appreciate that, right? Always giving—in intelligence community. Always giving. And he has held more leadership positions than I want to read off here, and actually some pretty cool stuff. You led somewhere at Microsoft? That’s hot. That’s my place. I’m a Microsoft fanboy. Every time I see that work I get riled up. That’s pretty hot. Nothing against Apple but I’m a Microsoft fanboy.
Ms. Allison Bender, DHS OGC. Enough said. No, I’m just kidding. No, she really is OGC, and she’s been doing quite a bit for us in dealing with the automated indicator sharing, Executive Orders 13636, 13691, and also PPD 21 work, right? Okay. Roger that. And she’s been advising the NCCIC on a number of operational issues, including serving the primary attorney for the government’s response to Heartbleed and the USIS background security clearance breach.
Do we want to talk to anybody else besides Allison? I’m just saying. Okay.
And last, but most assuredly not least, is Mark Davidson, and Mark has been pivotal in speaking to us in a couple of the other sessions. So he is the author and architect of TAXII—don’t hold it against him. He’s been admitting to it all day. That takes a lot of bravery—and an integral member of MITRE’s STIX and TAXII team, lead cyber security engineer at MITRE, and he has a cutting edge R&D team focused on the nation’s hardest cyber security challenges. His areas of expertise include threat sharing, cyber security, and software development.
So hopefully you guys have lots of hard-hitting, difficult-to-answer questions that we will take up at the next panel, and we’re going to focus on just the ones I want to ask today. [Whistles.] Nobody caught that? That was hardcore.
So let me start off with saying a couple of things. In this morning’s breakout session we talked about, like I said, participants. How do you play if you’re an ISAO? Various roles. So that be a straight consumer of information from a particular sector, and then passed on that information to someone else. Or are you going to consume information and rinse that information and then produce unique pieces of information? Would you play in both roles, where you’re always consuming and producing information that’s enriched from your trusted community?
Are you going to actually offer up infrastructure to be shared with your community, and therefore provide some sort of shared capability for these kind of things? And then, would you be playing the role of a broker, you know, and for definitions we have a literally got all of them worked out. But in this particular case the ISAO would be working as a community entity would take into account both the government and the non-government entity and their sharing, looking at the best interests for both sides of the house, if you will.
Good discussion about that. Some of the equities we talked about was weighing, if you will, anonymity versus having confidence in the content and context of the information being shared. Machine speed in the context of the trust relationships between the machines themselves versus the inter-organizational flow of information, as well as the intra-organizational flow of information. One of the other pieces that I kind of want to have a good piece of discussion here is, while we are moving towards this machine speed piece, we want to do that, but, you know, the question I have initially for the panel is, do we still maintain manual inputs? Do we still take non-machine-speed inputs? Or, do we see that as—the relevance of that not being quite the same as that constant flow of those trusted machines, pumping, consuming, and producing information?
So I’ll start off with that.
ED WHITE: I’ll take it. To my opinion on that would be probably have to do both, because you have a different, a different scale of, of individual capability that’s going to come across as the ISAO organizations grow. So initially, you might actually have to take the manual input and then transfer that manual input into something that’s going to be more, obviously, efficient and effective, and your ultimate goal would be to get everybody the machine rates. Obviously you can have STIX some TAXII standard or something to that effect. You want to be able to, to, to collect that information dynamically, update that information when it changes, and be able to consume that information at line speed or you’re going to find yourself not being able to keep up with the adversaries as they move forward.
ALLISON BENDER: I think Ed is exactly right that we do need to maintain the capability to ingest and disseminate indicators in a manual fashion, but at the same time as we’re focusing on providing timely, relevant, and actionable information, we need to decrease the amount of human review that is required, increase the amount of automation that is available, both for intelligence analysis but in a way that also maintains privacy, civil liberty, and other compliance controls, really focusing on how we can do that moving forward and bring down the time from months to milliseconds, and how we’ll be able to share information so that organizations can best protect their infrastructure.
MARK DAVIDSON: So apparently there’s a solution or an answer to this question because I agree with the other two panelists. I think that both manual and automated means are absolutely necessary. I think in terms of accepting intelligence from a, from a broad diversity of potential inputters and producers—I don’t know. I kind of tend to look at things within the technology realm and one of the axioms of implementing protocols is be generous in what you accept and be strict in what you send. And I think in following that it would, you know, allowing, really, a great variety of inputs into any indicator or threat intelligence sharing system or community that we might be developing, I think it lets anybody take the knowledge they have and put it into the system.
Now, I think that the automated mechanisms will ultimately be the more efficient, higher value way of doing it, but somewhere along the line there will be somebody who sees something and their best way of inputting it will be to just maybe go to a, a website and say, “Hey, I saw this; can you do something with it?” And then I think from there, you know, the entire subsystem can be automated, but it’s going to be interfacing with humans, so the system needs to have interfaces that are comfortable for humans to use.
ANTONIO SCURLOCK: Roger that. I appreciate your answers on that. As a matter of fact, I’d be more than happy to take an answer from the audience if they have any feedback they’d like to offer on that particular piece.
[No audible response.]
ANTONIO SCURLOCK: Okay. Roger. We’re going to move on to the next question, which may be harder, actually. So what you heard here, and this is what I gleaned from the answers, were—and what we actually talked about earlier was an economics of scale, that some organizations are going to be able to do considerably more than others, and even as you go into the ISAO, the ISEC model that’s currently existing, whether you’re sharing directly with the federal cyber sharing, like the NCCIC, one of the things that was shown—and I want to kind of have a discussion about the communities of trust piece—when General Touhill was giving his brief, there was a slide, and it showed the NCCIC, if you will, at the center of a hub-and-spoke type of information sharing, and I think what a lot of folks in the room that resonated as to that would be the only way, and the like.
And I’m not saying it is or it is not, but I’d love to hear the panel’s opinions on the idea that, do you see that as the only way to go about doing the automated indicator sharing, or could one envision multiple hub-and-spokes that are interconnected, much like we do our networking today?
MARK DAVIDSON: So I think in terms of the multiple hub-and-spoke, I think that might be kind of part of what some people’s vision is for the ISAOs. So it was mentioned earlier that there might be, over the next 3 years, maybe something like 200 different ISAOs, and each of those ISAOs is going to have their members. So you already, there, have a bunch of disparate hub-and-spoke kind of drawings, right, if you just think of the ISAO here and each of the individual organizations here. But then you’re going to kind of have like a hub-and-spoke of hub-and-spokes, where each of the ISAOs maybe all connect up to DHS or some sort of, you know, getting indicators from the government or something like that.
So I think in terms of the—there’s really maybe two different perspectives of it. There is the organizational architecture, which is each of the ISAOs and how they relate to the government and their constituents, and then in terms of automated indicator sharing, one of the ways that I look at it is the technical architecture behind all of it, and I think in terms of the technical architecture, hub-and-spoke is probably one of the ones that needs to be supported, but it should also be flexible enough to support the natural communication pathways that organizations develop.
ALLISON BENDER: So, you know, is there a way for entities to share information outside of the NCCIC essentially being the heliocentric point of all information sharing? I think that there is. I think that that is what happens now. Can the government help the private sector improve the way that they share information, whether that’s through support for infrastructure or by providing, you know, policy overlays that support greater protection of privacy and civil liberties, other compliance controls? You know, I think that we can do that. In a lot of ways the ISAO models are a way of providing a lot of flexibility for how, how the private sector chooses to organize, and I will use my swimming pool analogy that I gave earlier. Especially because it’s summer, I’ve been thinking about swimming pools a lot.
Information sharing can be like a swimming pool. Different groups have different requirements. In a sense, the NCCIC, being, you know, your local community swimming pool. It’s government-supported, you sign your name on the way in, it’s very light, easy to use, provides value, you enjoy your, you know, hot dog and cheap lounge chair, and maybe a cannonball in the deep end, right? It’s great. Everybody’s there and it’s pretty easy to do.
There’s all more sophisticated ways for mature users of being able to share information. In some ways this is more like the DHS CISCP program, the Cyber Information Sharing and Collaboration Program. There’s an agreement to get in; it’s the CRADA. It’s kind of a heavy lift but it helps protect those trust community rules. There’s expectations about membership. You know, there’s not just the information sharing but there are these other activities. Maybe there’s golf. Maybe there’s tennis. But, you know, you also have to wear your coat and tie when you go to dinner at the club, right? You can go to the ATTEs. You are eligible to pursue, you know, conversations about potentially having security clearance and sitting on the NCCIC floor, being part of those more sophisticated, higher level collaborations. But it’s an investment.
ISAOs, in some ways, can be like, you know, your neighbor’s pool, your good buddy’s pool. You can self-organize the invitation. It’s still swimming. It’s still information sharing, but there’s a wide, there’s a wide range of, you know, what that could look like. Is it, you know, a really luxurious pool with a hot tub, or is it, you know, a kiddie pool in the back yard? Do you have a lot of robust information sharing architecture that’s, you know, supported and has all of the bells and whistles, or is it more of a light touch, manual exchange?
I think all of those things are possible. For the ISAO model, you know, it’s up to you. Do you want to invite the government to your pool party? We’d love to be there. We think we can help, but it’s up to you.
ED WHITE: I would add, basically, the, the one point that needs to be addressed, and I think Mark touched on it, is the ad hoc aspect of being able to share information bidirectionally, right? The importance of being able to do that is, is, is not lost on industry or anybody who’s even sitting in this room, trying to consider being in an ISAO or being a ISAO, right? The sense of the, of, of being is what allows you to feel trust, and then that trust allows you to be able to share information more efficiently and effectively.
And if you want to get to an architecture discussion, which always ends up, since we’re all in IT, I would say a federated approach on top of kind of what Mark already kind of described with respects to the multiple hub-and-spoke arrangement would be beneficial in the sense that you would be able to have a little bit more control of the DHS side of the house and more control at the aggregate ISAO. Rather than having some rogue guys running around doing whatever they want, maybe there’s a way for us to be able to make this more controlled and effective based upon an architecture that allows you to be able to not only report and share but to be inclusive without being restrictive.
ANTONIO SCURLOCK: I appreciate your feedback on that. So, you know, it’s interesting when we talk about the hub-and-spoke, federated processes, and at some point in time you start to look at the, the sheer scope, size, variety of the information and the entities that are sharing. And I’ll get into some pieces, and tell me where you feel comfortable, talking about some of the architectures and infrastructure aspects of it. And from your view I’d like to have an understanding of, let’s talk about the de-duping process, if you will, the ability to kind of score, if you need to, information that flows in, whether the infrastructure can handle all of that. Right now we have what’s basically a, a push situation, or a subscribed situation. If we truly get into this machine, the machine, we might get into multiple avenues of query.
I know that was basically three-fold, but, you know, tackle whatever you feel like. But I think that as we go to this trusted, interconnected model, there’s going to be some other avenues of thought that we need to take it to a place that, from a human standpoint, are easy. You know, the human sees the same piece of paper, and they go, “Oh, this is version 2. Version 1 is no longer relevant, because I can see that.” There’s a machine piece of that that needs to also plausibly happen.
MARK DAVIDSON: Sure. So I spend a lot of my day working on TAXII and other things like that, but I can speak a little bit to the process that the MITRE network defense team goes through. So basically what we like to do is we like to get indicators that are high quality, and what we do is we take the, the burden on ourselves of de-duplicating those indicators, and really any threat information that comes in. We have a, a threat intelligence analysis platform that we built ourselves and we open sourced, and I know there’s a number of organizations out there that are leveraging ours. I know there’s other organizations out there that have built similar platforms for themselves. And, really, the technology that we built, it has the ability to take in multiple things that are the same, and really all it does is it collapses them down into one entity and say I got it from these three different sources, or these four different sources. So you can track where you got it from, and partly in our information sharing efforts, that matters because where you got it from influences where you can send it to.
And right now a lot of our information sharing within MITRE is more of the ad hoc personal network-related, less automated variety. You know, as a member of the STIX and TAXII team I’m always looking for ways to get them to use STIX and TAXII and automate what they’re doing. And then I guess I’ll—so I’ll tie that up and then quickly touch on another aspect. So at least for us, as an organization, we’re completely comfortable de-duplicating indicators as they come in, at least at the volume we currently have. I can’t speak to what would happen in a future increased volume state.
The other thing, though, is in terms of the indicator sharing ecosystem as a whole, what happens if it basically becomes a big echo chamber and you have maybe three indicators getting transmitted thousands of times across a variety of ISAOs and organizations. You know, maybe if there’s a lot of organizations, I might be a member—you know, MITRE might be a member of three different ISAOs, and if you get some exponentially increasing number of the same indicator, I think that is just going to have poor implications for network health, and hopefully, in terms of the system that we all designed, that’s something that can be designed out.
ALLISON BENDER: Sure. I mean, certainly de-duplicating data is an incredibly important part of data quality, and when you’re thinking about broadly further sharing this type of information you don’t want to perpetuate information that is inaccurate, particularly if it contains information that shouldn’t be there in the first place, like a certain PII, perhaps your proprietary information, other things like that. So, you know, the data quality process I think has to include, you know, some look at de-duping to prevent that echo chamber effect, but also applying technical mitigations, really a sanitization process, to understand the content, not just as you see it again, but is this content still right?
And then, lastly, you know, kind of thinking about how do we fix the echo chamber thing, one thing that DHS has been looking at very closely is using reputation scores to be able to identify like how good do we think this piece of information is. There’s always going to be a balance between anonymizing your source and protecting trust community rules, and then for the entity that’s receiving that indicator, do I trust this? I don’t know who sent it, how did they do this analysis, and, you know, reputation scores are one way that we’ve identified that might potentially help alleviate that concern.
ED WHITE: You guys answered that pretty, pretty well. To add to that, I will, I will have to say, I don’t necessarily think it’s technical problem that we have with respects to de-duplication or architecture of whatever platform it might be. There are tools out there, like ours, for example, that can take 5 million indicators on every packet, right, and, and use those to, to make determinations on them. You have to be able to consume the information you need to consume, and you have to update that information that you need to update, dynamically, without losing the ability in which to make an indication, or an action on an indicator that you might be receiving.
So I think, technologically, we’re probably ahead of the functional game here, and I think that’s maybe what we all struggle with. We always want to throw an answer out there that says, “I have the silver bullet that’s going to solve this problem,” when really the problem is how do we put the process together, and the flow together, that allows us to put the technology in place to be efficient and effective to actually make these things work.
ANTONIO SCURLOCK: So, as I complained, I’ll just quickly or briefly, when Allison got ahead of me on the PII, PCII discussion, actually, my next question was going to be your viewpoints on, well, okay, handling that type of information. I mean, do you let it come in? You ingest it. When you see it you chop it off, it falls to the cutting room floor and it slowly fizzles into the glass because it’s acidic and you don’t want to touch it? Is it something that you, you know, kind of induce maybe a compliance engine of some sort, that sees it and says, uh, maybe it’s necessary to take care of the situation, or maybe it’s not necessary? Your thoughts on that. We’ll start with Ed.
ED WHITE: So we had a little bit of a heated debate in our last panel around—
ANTONIO SCURLOCK: It was heated?
ED WHITE: —what an indicator is. Is it an actual artifact or is it an IOC? And if it’s going to be an artifact, which might include a PII-type of situation, there’s a lot of things, as you just mentioned, that are going to have to be considered—as a consumer of that, as a user of that, as a provider of that, whatever stage you are in that ISAO type of organization. And I think that depending upon where you sit in that organizational structure, you’re going to make those determinations based upon the risks that you’re most comfortable with.
So we’ll use the Joe’s Pizza Shop analogy that we’ve been using for the last hour or two. If they’re the ISAO for the community of interest of pizza stores, in, you know, Cambridge, Massachusetts, then the odds are he’s not going to want to take in PII, right? He’s going to say, “I’m going to cut that out, leave it on the floor, because that’s too much of a risk.” But if you were a larger organization, maybe the financial—like the financial ISAC, they might say, “I’m going to make an ISAO and I’m going to accept that information. Why? Because I need that information to actually create that transaction between those two parties that I have to help.”
So it’s really going to be dependent upon where you sit in that broader ISAO community, and in that chain as it, as actions have to happen, moving forward.
ALLISON BENDER: No, I think all of that is, is very much right. It depends on what you’re getting, right? If you’re getting unstructured data, you don’t know what’s in it, you know, unless you’ve done a manual review, and as we think about how to move from very manual, human resource-intensive processes to automated ones, you have to know what you’re getting and what you expect to get. And I think with unstructured data, you, you typically get kind of two extreme reactions. “That’s scary. Don’t share it” is one reaction, and whether that’s because there’s personally identifiable information or potentially proprietary information, other types of sensitive data. I don’t know what it is, that’s scary, don’t share it at all. And on the other hand, typically, from the cyber threat analysts and often from our law enforcement partners, they’re like, you know, “I need it. Give me everything. I don’t care what it is. I need all of it.”
I think that there’s a middle ground in between the two, which is to take a look at this unstructured data and either try and structure it or do some sort of analysis about what we’re likely to get, what you should expect in, you know, whether it’s an indicator, an artifact, and do the analysis at a more holistic level. At DHS, we do take personally identifiable information very seriously—civil rights, liberties, other compliance concerns—and protecting data is very important to us. That said, you know, even if it’s PII, or potentially PII, we don’t want to spend hours and hours redacting the bad guys’ e-mail from a spear-phishing indicator, right? So for information that is necessary to understand the cyber threat, as a matter of policy we do not apply protections to that information that would otherwise be considered PII, right? We’re not going to protect the bad guy. So what level of analysis is required to get from that unstructured data to a high level of confidence about, this is what we expect to see, this is what we expect to be able to share. So I think that we’re kind of in the process of moving towards that, but certainly I think it would be useful for other organizations, as they think about their information sharing relationships, you know, how is my data structured? Is it structured at all? How could I apply structures that would give me higher confidence in what I’m sharing, and then what sorts of technical mitigations or policy mitigations could you put in place to reduce risks in sharing information that has a compliance concern?
MARK DAVIDSON: So kind of I want to respond to two different things and then I’ll say my own thing. So first, building on what Ed was asking, is it’s basically what’s the vision for the, for the ISAO framework that’s going to be built? I think that’s really the question that we all need to answer, and, you know, I personally don’t have a great answer for that. But I think once you can define that vision and what it is, then I think it becomes easier to build out all of the things below it and all of the various factors and dimensions that we’ve all been discussing so far today.
And then, you know, you gave me a great platform to champion STIX out here. So STIX, the Structured Threat Information Expression. I know that some organizations out there—so once you have your, let’s say it’s an e-mail or something, and you decide that the, the recipient’s e-mail address is something private, because it’s, you know, a phishing e-mail that got sent to your company and you don’t want your head of finance’s information being out in the information that you share. You know, there’s—once you have your information structured in STIX, because there’s fields and it’s structured, you can apply processing and say things. Like we always want to redact this field, we sometimes want to redact that field, in certain circumstances. So STIX enables that kind of processing to happen.
There’s always edge cases where maybe somebody sticks an SSN in the body of an e-mail and that’s more of a text search, but that’s, that’s kind of a different thing.
And then, I guess I folded in the point that I was going to make, into my response, but it’s basically that in terms of the information that MITRE shares out, generally speaking we will chop off all of the information that’s MITRE-specific. We try to share enough information that the people that we share with can build detection mechanisms on top of what we share them, so we can, you know, maybe share specific technical details about the mail clients that connected to our mail servers, who was sending them, what IP addresses they were coming from. But we typically won’t share like recipient e-mail addresses, how we think they—well, we won’t necessarily do the specific reconnaissance that they got against us but maybe we’ll share some of their methods. So MITRE has a lot of—it’s like my e-mail address is out there a lot on the STIX and TAXII discussion lists, and those are good harvesting grounds for threat actors.
ANTONIO SCURLOCK: Okay. Roger that. So, let’s, let’s say we have a best case scenario. The infrastructure architecture is in place, we’re doing our sharing, and I’ll propose two pieces of this. One, in this utopian society, everyone is tagging their data appropriately, and, I mean, for everything—providence, utility, dissemination, and so on and so forth. And on the consumer side, they all have the capability to read, parse, execute, implement those particular tags.
I’d like you to take on two separate roles in minds for the answer. The first role is the producer. In this environment where you’re tagging appropriately, and you understand that your, your consumer group can execute those tags, what, what type of feedback , as a producer, are you looking for from that consumer? What do you want to know that can help you plausibly get better indicators, better information, help you to enrich whatever you’re shooting back the next time? What kind of feedback would you think you’d want?
MARK DAVIDSON: So it sounds to me like the two dimensions you mentioned are somewhat orthogonal. So there’s one where it’s—I guess basically as a basic sanity check, let me know if I sent you something that according to my rules I shouldn’t have. That might be a first order of business. But I think that the general marking and resharing and controlling of indicator movement across organizations is kind of a separate process than getting feedback on iterating over and managing the quality of an indicator. Actually, now that I’ve said they’re orthogonal I’ve talked myself into believing that they’re together. I would be very interested in knowing where those controls prevented them from giving me quality feedback about an indicator. That might be a critical processing piece to know about.
ALLISON BENDER: Right. One thing I think to think about is, in terms of feedback, is, you know, how close are the relationships in the first place. If part of the trust community rules that you’ve establish are the sources going to be anonymized, you know, how do you track the value of an indicator? Because there was a unique ID? Is that sufficient for the group to have confidence about how their data is being shared? Are you asking for, you know, repeat citings, or changing in TTPs that reference back to an original indicator? I think the trust community really has to think about, you know, how much they’re willing to share and how much feedback they want, because feedback is going to go to, most likely, being able to identify your source.
ED WHITE: And you just took my answer so I can’t actually go down that path. I would—I agree on both counts. You’ve got a situation where the only value that this piece of information has is going to—it’s only going to be beneficial to the guys as it enriches moving forward. So it may, it may come in looking at, looking like one thing, and I might give it to Allison and she might go, “Oh, my God, you know, what? This is a piece, an artifact that goes along with it.” And then she gives it to Mark, and, you know, the next thing he knows he goes, “I have the attribution source, right?” And now we have a complete picture of the attacker or the attack, and, in turn, can hopefully pass that along to the community as a whole so they can be better protected.
So, you know, looking at it from a, a provider perspective, as the ISAO owner, getting that piece of information, you’re going to want to see how that, that information is abridged over time, as you mentioned, but more importantly, who’s touched it, right, and have that be given to you and dynamically updated so everybody else can use it, and making sure that closed loop is tracked 100 percent, right? In the end, a provider is only as good as the data that he’s got, right, and the ISAO is only as good as the community is acting as one. So the closer they are, and the more information they share, and the more willingness they have in which to be able to collectively, hopefully rise to the tide, so all boats go up, is, is going to only be beneficial in the end.
ANTONIO SCURLOCK: Outstanding. That’s good, and, you know, you, you’re giving me a segue into a piece of the discussion that we were having earlier, that we kind of summarized with, and I kind of started off with. One of those was with the anonymization versus the plausibility of assigning a confidence score in your data and information sharing. And I’m going to—maybe this begs the question. I don’t know. But Allison, I’m going to reach to you kind of answer first. If that’s the case, if that’s the balance, then as a, as a consumer of information like that—and put your consumer hat on—auto-enrichment of data. It comes in, it has a confidence score of X, and you say, “Man, I’ve also received X from here, here, here, and here. Let me enrich that and then autosend that back out to the community with a new confidence score of X.” I’m not saying that works, but I guess, give me your—if you’re the consumer of information, do you auto-enrich? Do you not? Do you see pitfalls for that?
ALLISON BENDER: Sure. I mean, I think as the data consumer, any time you receive an indicator—you know, we talked about this earlier—do you know the source, and do you know what the analysis process is? You know, and at some combination, hopefully of the two, that it’s going to give you a high level of confidence in the information that you’re receiving. A confidence score, a reputation score, could be a part of that, kind of trying to eliminate that kind of garbage-in-garbage-out potential for automated systems, or perhaps less mature analysis from people you don’t know, people you don’t have a trust basis with quite yet.
So from a data consumer perspective, are you going to know the source, are you going to know the analysis, and what trust community rules are going to be established that govern whether you know the source and whether you know the analysis? To the extent that we can use reputation scores perhaps in lieu of source, that’s one way to protect anonymization and the confidentiality of the data producer, but it has its own pitfalls as well, potentially that echo chamber possibility.
So I would say it really comes down to how each ISAO is going to structure its rule set on, you know, trust, consent to share, anonymization of source, and then how much data you’re going to put in the information you share. Will it be a thin like indicator that can be shared quickly and easily, but high volume so that you have a lot more data points, or is what you really want, you know, that big, juicy, steak-sized indicator that has all of the wonderful analysis bits of, you know, how this previous person came to arrive at that conclusion—even if you get it late, right? So fast and light and easy, probably without the source, or, you know, really dense, really helpful, probably slow. So I think that’s a balance to consider as each group thinks about what type of data they want to get and who they want to participate in that trust community with.
ED WHITE: I would agree with you, all except one point. I don’t think that there’s a fast versus a slow hard line. I think the technology is going to—it’s there today. You can do it today. You can get a heavy-weighted indicator, broadly speaking. To many people they’re very fast. I think you’ve got a situation where the, the consumer of the information, automatically enriching that information and passing that information on, they have to be very diligent and make sure—back to your point—that that indicator and that process that they use is agreed upon and, and, and executed appropriately, so that it doesn’t propagate itself out to, you know—the world’s coming to an end and everybody goes and runs screaming down the street, right? We don’t want to worry about that.
What you really are trying to get is a flexible risk score. So you want the score to be able to be managed by the individual receiving that information. So if I do have attribution from multiple sources, that I can see that I have that attribution, and because of that attribution now I have a higher risk on that individual indicator or that individual piece of intelligence, because, why? Multiple people are saying that it’s bad. Then, if you do pass it on to another consumer and that person has more information, they can enrich it and they can tag it so that it can be tracked, and appropriately stated, moving forward as it relates to each individual consumer getting that information.
Now, the de-duplication piece of it, obviously you want to de-duplicate the things that are exactly the same, but the things that are going to enrich that data you obviously want to see the progression.
The other point I’d probably say that the consumer wants to make sure that they’re clear on is the identification of what, what is important to them over what somebody else has deemed important for them, and, and maybe the distinction needs to be made back to the risk scoring, right? If DHS says it’s bad, does that necessarily mean it’s bad to your, your ISAO? Not necessarily. It could be that, just like a, a, a piece of malware on your, on your enterprise, right, that piece of malware may be attacking an OS that you don’t have deployed. Is it a piece of malware? Yes. Is it bad? Yes. But is it a threat to me? No. Right? So I shouldn’t have to worry about it. The same principle holds true here with the information that we’re actually going to use and pass on to other people.
MARK DAVIDSON: So I think in terms of enrichment, from the MITRE info set perspective, we definitely want to enrich everything as much as possible, as soon as it comes in the door, so that within our analysis platform it’s available for correlation and, you know, creation with everything else that’s already in the system. So we do try and do as much of that as possible, so if you, you know, you dump an e-mail with an attachment into the system, it’ll pull out the attachment, look for things in the attachment, try to carve malware out of the, like, PDF, or things like that.
Then I think to Allison’s point, though, there really is, at least for us, some distinction between fast and slow, because we, you know, the, the number of malware instances in our system is closing in on like 10 million or so. So, I mean, I don’t know, I know there’s lots of people, or there are organizations with more than that. But we can’t run deep analysis on everything single piece of malware that we bring in. So there’s like this quick list of things that we can do, and then depending on what an analyst thinks is valuable, they’ll kick off processes for enriching, you know, doing that deeper dive into it. And then at the extreme degree of analysis we’ll have an analyst sit down and try and reverse-engineer the piece of malware.
In terms of automated resharing, I like, I like, Ed, what you said about risk because I think, I think that’s probably a—at least the way you framed it—it seems like a critical component to me, because if it’s just, “Hey, I have this piece of malware,” you know, that might be one thing if an adversary targeted, you know, one or two organizations with that piece of malware they’d now that at least those organizations detected that malware. But then as you add like attribution to that malware, it becomes riskier to share because now you’re really showing how much you know and what your capabilities are.
But for MITRE, at least in terms of the current state of technology right now, the automated resharing is probably not a thing we’d do.
ANTONIO SCURLOCK: Roger that. So I’m going to bring you guys back and—let me pause for a minute. Are there any specific questions you want to take from the audience? Did you have any, based upon what we’ve covered so far?
Yes, sir. Let me get you a microphone right quick.
ATTENDEE: I was just thinking a little ways down the road as the, the information sharing operations expand, obviously there’s an awful lot of defenders, there’s an awful lot of attackers, there’s an awful lot of activity, and it’s probably never going to be any more orderly than, say, a FEMA operation, you know, during a storm. Have—you know, particularly maybe Allison from the OGC perspective, have you thought about such a thing as emergency cyber security responders, because threat information could obviously be tagged with lots of metadata so that, as it’s shared in a federated network of organizations, you could try to decide what the access rights are to it. But those are also going to be contextual as well as identity-based in nature.
ALLISON BENDER: Sure. So DHS, the NCCIC, ICE, Secret Service, as well as our other partners in government—the FBI, DC3, Energy, Treasury—we all are working to try and be able to share information with each other that is important to have situational awareness of kind of the overall cyber ecosystem. But, as you point out, you know, when things tip over from, I kept blocking this over and over again, to, this is now transitioned from just an indicator to an actual incident, US-CERT, ICS-CERT, we provide a lot of different resources to the private sector. Call our 24/7 SOC, let us know what’s going on. Let us know what’s going on, and everything from vulnerability, scanning under our NCAT’s team, to potential advice and guidance, sending in a malware sample, having a flyaway team come and provide, you know, behind the keyword “hands off” or “actual on-network assistance.” All of those sorts of things are, are available, but they’re very much case by case. Call us. Call who you’re used to working with and say you want us to come too. We’d be happy to. I mean, we really are, in some ways, kind of like the FEMA of the Internet, right? We’re, you know, the fire department as opposed to the arson investigators. We really want to help you get back on your feet, get remediated, identify vulnerabilities in your system, and figure out ways to improve your overall security posture.
ATTENDEE: It just may be that the next level of our maturity is kind of going beyond case by case and beyond figuring out organizational processes for who shares what with whom, to who shares what based on what context, and things like that, and automating the, the different levels of response, based on the level of incident.
ALLISON BENDER: Sure. There’s no requirement, there’s no requirement that you call 911, and all of our services are also voluntary, so it’s kind of up to each individual entity to decide, you know, yes I need some help here. And if that’s the case, we’re happy to.
ANTONIO SCURLOCK: I just want to jump on this from a different hat, actually, not as the moderator but from the standpoint and the viewpoint of the lead for enhanced situational awareness. You know, that context rule, information sharing piece that you talk about, and moving from there, in all honesty I believe that on the government side of the house we’re kind of there, not that the machine speed and the context rule together, but definitely from the contextual to the response. In a lot of organizations, if you’re prior DoD you might call that a critical information requirement. The idea that you have a preconceived concept based upon trend analysis and actually incidents that have occurred over time. You may have some sort of insight operationally into the types of things that you may want to know, without having certain specificity, and a certain threshold for that, and as that threshold is tripped you may have already preapproved or preplanned response actions and activities, whether that be a team on site, maybe some machine speed operational piece, or even what other capabilities and capacities you bring to bear, that may even be non-cyber because of, you know, laws, regulations, and authorities and whether one can actually engage or not.
So I think the contextual information sharing is not far-fetched, in the sense of going from what’s currently, I would say, human and hybrid speed to machine speed. We’re just not quite there yet. But I definitely think it’s not just on the horizon. I think we’re at a tipping point where we almost have to get there, because of what I think Ed and Mark both mentioned is the vast pieces of information that are out there. You’re not going to be able to go looking for desperate data elements, but you’re going to have to start looking at incidents, and actually maybe look at something a little bit more than that, not quite full-on knowledge of activity but something in there that’s a little bit nebulous because you can’t necessarily identify it but it’s, it’s strongly an indicator. And I’m not sure where that ground is.
So I think we can get there.
MARK DAVIDSON: So one area, one growing area of work that MITRE is involved in is a thing called cyber exercises, and what we try and do is help organizations basically do drills for incident response, and I’m not sure I’m allowed to say the name but there was one that we participated in probably about a month ago that was a week-long, and MITRE was in basically an observer and facilitator role, and we helped the, the people participating in the exercise—I want to say it was a bunch of local government and others—and we helped them apply, at a high-level threat intelligence, to the processes they were doing.
So I think in terms of, you know, being able to respond to a cyber incident, I think just like any other kind of incident response, planning and exercises and drilling and, you know, basically looking at what happened after the fact and what went poorly and what went well, I think are all important pieces.
ED WHITE: I’ll add one piece, from a technical point of view. We have the capability to put in place active controls that allow you to be able to, based upon, we’ll call it a DEFCON 1 through 5 model, all right. As it progresses you can shut yourself completely off the Internet, right, and say, you know what? DHS has told me we’re at this level, so I’m not going to let my enterprise talk to anybody until that comes back down to whatever level is acceptable to me. And you can do that today. Technology allows you to do it today. Technology allows you to do it at line speed, so that you’re not propagating problems across your enterprise. And if you wanted to do that in that federated manner, in an ISAO, you could do that in the same manner, right, especially if you guys are all communicating in real time, machine to machine, right? So it comes down to both policy as well as technology, but, you know, there is the capability being able to provide you that protection, even if you want it from a, you know, a technical point of view.
ANTONIO SCURLOCK: Roger that. I’ll take an opportunity to take another question or two. Let me get you a mic.
ATTENDEE: I have a question probably targeted at Mark. So automated threat indicator sharing just screams a great attack factor to me. Poisoning that information that gets shared across companies, that people rely on could have pretty horrible effects. What kind of safeguards are in place in TAXII and STIX to prevent something like that from happening?
ANTONIO SCURLOCK: Oh, is that a plant, because I have a question.
ATTENDEE: I can wait.
ANTONIO SCURLOCK: No. Go ahead.
MARK DAVIDSON: So, you know, at least in my eyes, you’re completely right, you know. A big, big repository of threat indicators would be a really great thing for an adversary to get themselves on, or to, you know, hack into. In terms of STIX and TAXII, there’s, there’s not—so, so they’re both standards. So there isn’t as much necessarily built into them from a security standpoint. It’s more about the application building on top of them, and implementing those standards from a security standpoint. And also in terms of we have some reference implementations for STIX and TAXII and things like that, and, you know, we’ve had vulnerability reports for them. We’ve disclosed them, we’ve fixed them.
And then, you know, at least to, to the extent of STIX and TAXII themselves, there haven’t yet been any identified protocol or representation weaknesses that we’ve found, or that have been reported to us. So if you’re a, if you’re a vulnerability researcher and you find one, please let us know, because we, we want to fix it and we want to make it better. So maybe I can sidestep it by saying I don’t know of any weaknesses in STIX and TAXII.
Was that a suitable answer?
ANTONIO SCURLOCK: That’s hardcore.
ATTENDEE: [Speaking off mic.]
MARK DAVIDSON: That’s going to be more, I think—it sounds to me like it’s more on the back-end research part, right? We can’t accept 100 percent that if I hand an IOC into the community at large that that IOC hasn’t gone through some sort of QA process, for a lack of a better way to, you know, explain the situation. I would hope that there would be that process in line before we start to put it in, you know, in the machines that are blocking or in the machines that are saying I’m going to propagate it to all of the communities of interest for protection purposes, and then, to your point, there’s a gigantic gaping hole across the nation’s infrastructure, based upon one, one person’s threat.
So I don’t have the answer on how that can be done, but I, I would have to say that that has to be part of, of anything that we put forward as it relates to STIX and TAXII, IOC development, or anything.
ALLISON BENDER: A couple of things that, you know, information sharing analysis organizations might want to think about. One, very strong authentication requirements, if you’re doing it in an automated fashion, right? Know who’s connecting to your system if you’re going to do it at machine speed and scale. Two, you know, if it’s not people who are directly pumping noise into the system, are you using a time-to-live field, so that these things eventually go away, or are you testing them in some sort of controlled environment to see like, oh, wow, we’re just echoing this noise but it’s not real. I think it would be much harder, you know, to control adversary behavior—you know, if they decide to make a lot of noise in a lot of different places, I mean, the best, best chance we have for that is actually doing very broad, very fast indicator sharing, so that people can take appropriate action to tamp down at least that particular type of activity. But certainly if there are other ideas that you will have, that type of information would be really helpful to us.
ANTONIO SCURLOCK: I’m going to get to the last question from the audience and then we have a final topic to wrap it up for the panel—unless they told me be on unlimited time. Roger.
ATTENDEE: So it strikes me that a lot of the discussions that we’re having about information sharing are things that the government in particular has solved in other contexts. If you strip away cyber there is information sharing in the intelligence community, and then law enforcement, and there’s automated or what you say computer-assisted information sharing of that kind of information. One example is the declassification problem. It’s something I hear discussed among cyber threat analysts is how they wrestle through the, the risk calculation of when it’s more beneficial to share among an open group, where the information might leak out and benefit the adversary, as opposed to holding in a much more close-knit group, where the adversary doesn’t learn about the information. This strikes me as a problem that’s been discussed for, you know, generations in other contexts.
So I’m wondering if the government has access to lessons learned from information sharing in other contexts, and best practice that could be shared in the cyber—to those of us working in the cyber context.
ANTONIO SCURLOCK: So just to be fair, I’ll take a small piece of this. In a nutshell, I do believe that we do, meaning “we” the government. The key element is, is that when you talk to best practices, no matter how much you give, implementation is key, right? And we have no insight into the implemented best practices until we find out that somebody didn’t implement the best practices. But I think having access to them isn’t really a problem, that I’m aware of.
ATTENDEE: [Speaking off mic.]
ANTONIO SCURLOCK: So from an intelligence community or law enforcement, I can’t speak to that, but I can definitely say that I know that CERT, both ICS and US, have that type of information available for the asking, and ICS-CERT even has an onsite program they would be more than happy to stop by at your request and engage with you to help you secure it.
ALLISON BENDER: So looking at the, the declassification issue, I mean, part of it is that the government has, in some ways, already structured its data, right—top secret, grave, national security risk. You go down the tier and there’s these seven reasons why you can classify things, if the data is in some ways already structured. And then how do you declassify it? Typically there’s an associated classification guide—this plus that is this level. You know, we don’t have that level of structure that has been applied to cyber security information, because it also contains things like, you know, personally identifiable information. Is it you, the first party? Is it third-party PII, where you’re getting ready to share your customer data? You know, is it proprietary information? Is it information you’ve received from another source?
And so, you know, certainly as we’ve looked at the, the declassification issues and how we can push more timely, relevant, and actionable information, and particularly that has been derived and brought down classified sources, that’s been helpful, but we’ve had the structure to do that. We don’t really have those structures available in cyber, and it takes you knowing you data and your risk levels to be able to get there.
MARK DAVIDSON: So I agree that those would be good things to learn from. I think also worthwhile to keep in mind is seemingly a key differentiator between what, you know, the law enforcement and intelligence communities might have done versus what we’re doing, is, you know, for hours today we were using Steve’s Pizza Shop, or Bob’s Pizza Shop, whatever it was. Joe’s. And at that point the mindset becomes, how do we have the technology to prevent end users from doing something that they don’t want to do? And I guess really it’s a, it’s a scope. There’s a big scope difference, where, in terms of the intelligence community you have, you know, you have a boundary that doesn’t include pizza shops somewhere.
So I think, in terms of the lessons learned, the scope of what’s being tried, attempted to accomplish here, is—I’ll go ahead and use the word “unprecedented,” but hopefully we can find those, those lessons learned and apply them here.
ANTONIO SCURLOCK: Okay. Well, so I appreciate everybody’s participation. If you don’t mind, just a quick hand for the panelists that were up here, providing important feedback.
ANTONIO SCURLOCK: That being said, I will at least tell you the last question I was going to ask and we’ll carry it over to the next engagement, and that is, as we look at decision trees for machine-speed information, what does it mean to have the human in the loop, whether that human is doing analysis, a deeper dive on analysis, XII review, meaning any kind of II—PCII, you name it, health care data. What does that look like and what does that mean going forward in the machine-speed environment?
So I’ll make sure that we catalog that question and bring it up at the next engagement for the ISAOs. And thank you of your time, and enjoy your break. Appreciate it, guys.