Compare commits

...

2 Commits

Author SHA1 Message Date
Romain Quinet
fdf987c342 updated examples 2023-10-07 00:36:03 +02:00
Romain Quinet
4e587aed9e document metadata 2023-10-07 00:31:01 +02:00
4 changed files with 42 additions and 28 deletions

2
.gitignore vendored
View File

@ -1,3 +1,3 @@
/data /transcripts
/index /index
/.idea /.idea

View File

@ -36,22 +36,28 @@ You can then ask it any questions about Darknet Diaries! Currently, it does hall
[link to transcript](https://darknetdiaries.com/transcript/137/) [link to transcript](https://darknetdiaries.com/transcript/137/)
> What is the spyware predator ? > What is the spyware predator ?
>> The Predator is a phone-hacking software developed by a company called Cytrox. It was discovered by Citizen Lab during their investigations. Once installed on a phone, the Predator spyware has the capability to exfiltrate files from the phone, take passwords, turn on the microphone and listen in to what's going on. It is typically installed on a victim's phone through a phishing text message that prompts the user to click a link. Once the link is clicked, it triggers a series of exploits on the phone, eventually installing the spyware. The malware then erases the tracks of the infection process. >> The Predator is a type of spyware that is capable of infecting a phone and erasing its tracks of the infection process. Once installed, it can watch what's going on with the camera, listen in on the microphone, scrape passwords, read texts, and report the person's location. It is typically delivered through a phishing text message that encourages the recipient to click on a link, triggering a series of exploits on the phone. The Predator spyware is supposedly only sold to intelligence agencies.
> >
> What happened to the egyptian journalists? > What happened with predator and the egyptian journalists?
>> I'm sorry, but I don't have specific information about what happened to the Egyptian journalists. Could you please provide more details or context? >> Predator spyware was found on the phones of an Egyptian politician living in exile and an Egyptian journalist. This led to the suspicion that the Egyptian government might be using this technology to spy on their civil society. The use of the spyware was not to stop terrorists or catch criminals, but seemingly to monitor the stories the journalist was working on, which was considered a misuse of the technology.
> >
> Related to predator > How and when was Predator first discovered?
>> The Egyptian journalist, along with an Egyptian politician living in exile, were found to have Predator spyware on their phones. This led to suspicions that the Egyptian government might be using this technology to spy on their civil society. >> The Predator spyware was found on the phones of three people in Greece's civil society, including a journalist and an opposition leader. However, the specific timeline or details about the initial discovery are not provided.
> >
> How is Bill Marczak related to this story >Which episode should I listen to in order to learn more about this topic?
>> Bill Marczak is a senior researcher at the Citizen Lab at the University of Toronto. He was involved in the Predator spyware story as he and his team discovered and analyzed the Predator spyware. They found it while examining phones for another spyware called Pegasus. They were able to analyze the final payload of the spyware, understand what it was doing, and attribute it back to Predator. They also investigated the companies behind Predator, like Cytrox and Intellexa, and tried to understand the geopolitical implications of such an exploit being sold on the mercenary marketplace. >> To learn more about the Predator spyware, you should listen to episode 137 titled "Predator" from the Darknet Diaries.
### Episode 130: Jason's Pen Test
[link to transcript](https://darknetdiaries.com/transcript/130/)
> Who is jason haddix
>> Jason Haddix is a well-known figure in the cybersecurity industry. He has held various roles in the field, including Director of Technical Operations at Bugcrowd and Head of Trust and Security at HackerOne. Haddix is particularly known for his expertise in bug bounty programs and penetration testing. He has also contributed to the cybersecurity community by developing methodologies and tools for security testing.
> >
> When did they first discovered samples of predator ? > in which episode is he featured
>> I'm sorry, but I don't have specific information about when the Predator spyware was first discovered. Could you please provide more details or context? >> Jason Haddix is featured in Episode 130, titled "Jason's Pen Test".
> >
> something about Payload 2... > Can you summarize that episode
>> Payload 2 is a suspicious process that was discovered on a phone during an investigation by Citizen Lab. It didn't match any previously-known malware that they had been tracking on phones. Upon further analysis, it was found that the arguments passed into this process when it was started up included a URL, which led to a binary file for an iPhone. This binary file was an application that was identified as spyware with capabilities to exfiltrate files from the phone, take passwords, and turn on the microphone. Through further analysis, it was attributed back to Predator, a spyware made by Cytrox. >> Episode 130, titled "Jason's Pen Test," features a professional penetration tester named Jason who worked for HP. Jason shares his experiences conducting hundreds of penetration tests for Fortune 500 companies, primarily focusing on network hacking. He explains the process of testing a company's security measures, both physically and digitally.
> >>
> In which episode is this topic discussed ? >>One of his memorable stories involves a physical penetration test where he had to break into a former employer's building. He used his knowledge of the building's layout and security mechanisms to gain access, even falling through a ceiling into a server room at one point.
>> The Predator spyware is discussed in the Darknet Diaries episode number 137. >>
>>The episode also covers a project where Jason was tasked with hacking into a large, worldwide bank. His job was to examine the bank's mobile app for any potential security vulnerabilities that could expose customer or sensitive information. The episode provides a detailed look into the world of penetration testing, highlighting the importance of robust security measures in both physical and digital spaces.

View File

@ -1,6 +1,12 @@
import requests import requests
import os
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
folder_path = "transcripts"
if not os.path.exists(folder_path):
os.makedirs(folder_path)
for i in range(1, 139): for i in range(1, 139):
url = f"https://darknetdiaries.com/transcript/{i}" url = f"https://darknetdiaries.com/transcript/{i}"
r = requests.get(url) r = requests.get(url)
@ -9,11 +15,10 @@ for i in range(1, 139):
title_section = soup.find('h1') title_section = soup.find('h1')
if pre_section: if pre_section:
text = pre_section.get_text() transcript = pre_section.get_text()
title = title_section.get_text() ep, title = title_section.get_text().split(":", 1)
with open(f"data/episode_{i}.txt", "w") as f: ep = ep.strip()
f.write( title = title.strip()
f"Darknet Diaries - {title}\n" + with open(f"{folder_path}/episode_{i}.txt", "w") as f:
text f.write(f"{title}\n{transcript}")
) print(f"{ep} {title}")
print(title)

13
main.py
View File

@ -12,20 +12,22 @@ service_context = ServiceContext.from_defaults(llm=llm)
set_global_service_context(service_context) set_global_service_context(service_context)
if not os.path.exists("./index/lock"): if not os.path.exists("./index/lock"):
print("Generating index...")
documents = [] documents = []
for filename in os.listdir("./data"): for filename in os.listdir("./transcripts"):
episode_number = re.search(r'\d+', filename).group() episode_number = re.search(r'\d+', filename).group()
with open("./data/" + filename, 'r') as f: with open("./transcripts/" + filename, 'r') as f:
title = f.readline().strip()
content = f.read() content = f.read()
document = Document( document = Document(
text=content, text=content,
doc_id=filename,
metadata={ metadata={
"episode_number": episode_number "episode_number": episode_number,
"episode_title": title
} }
) )
documents.append(document)
documents = SimpleDirectoryReader('./data').load_data()
parser = SimpleNodeParser.from_defaults() parser = SimpleNodeParser.from_defaults()
nodes = parser.get_nodes_from_documents(documents) nodes = parser.get_nodes_from_documents(documents)
@ -44,6 +46,7 @@ chat_text_qa_msgs = [
"You have been trained on the Darknet Diaries podcast transcripts with data from october 6 2023." "You have been trained on the Darknet Diaries podcast transcripts with data from october 6 2023."
"You are an expert about it and will answer as such. You know about every episode up to number 138." "You are an expert about it and will answer as such. You know about every episode up to number 138."
"Always answer the question, even if the context isn't helpful." "Always answer the question, even if the context isn't helpful."
"Mention the number and title of the episodes you are referring to."
) )
), ),
ChatMessage( ChatMessage(