Not that much is missing from Epstein DataSet 9 already collected.

ermstein@lemmy.world · edit-2 11 days ago

Not that much is missing from Epstein DataSet 9 already collected.

TropicalDingdong@lemmy.world · 6 days ago

I was following your thread on Dataset 9 on github. I’m still having issues with various versions and torrents I’ve tried to use to get these data.

I was wondering if you have a viable high quality link or torrent where the most-complete Dataset 9 might be available?

acelee1012@lemmy.world · 6 days ago

https://github.com/yung-megafone/Epstein-Files?tab=readme-ov-file#data-set-9-incomplete

its in there. look at the section for data set 9 and source a is currently most complete as far as we know

TropicalDingdong@lemmy.world · 6 days ago

So I pulled that and unzipped, but its fucked. Something is definitely not right with it. It only unzips to 70k documents, but is 140gb, but somehow ~12 on disk?

acelee1012@lemmy.world · 4 days ago

I am not sure what would be causing that. I am not seeing any issues on the repository saying anything similar. I recommend opening an issue and seeing if they can help you resolve it. I am not trying to go through them myself, just hosting and seeding

TropicalDingdong@lemmy.world · 4 days ago

Yeah I don’t interact with it support Microsoft so I don’t use GitHub outside in of what I must…

The issue was in some of the earlier links. I was able to get one of the more recent ones. I posted about it here to one of the contributers.

gravitas_deficiency@sh.itjust.works · 12 days ago

I fucking love this community. This is good and necessary work.

Coelacanth@feddit.nu · 12 days ago

That’s good news. With the amount of people interested in these files and in data preservation it’s bound to be only a matter of time until the whole dataset 9 is restored. Someone out there’s gotta have the rest of the files.

iByteABit@lemmy.ml · 12 days ago

I know it would be ridiculously ironic, but if any CSAM is in there could the authorities get you in trouble over its possession or even distribution?

Coelacanth@feddit.nu · 12 days ago

Probably. CSAM is CSAM, I’m not sure the law would differentiate. Probably one of the reasons Dataset 9 has taken time to get restored, as I believe it was said it had some accidentally unredacted/uncensored CSAM in it?

Dhoard@lemmy.world · 11 days ago

Don’t believe them. I have proof that in one of the documents, the “allen oren tal” brothers were redacted after they re-released dataset9.

CapableStaircase@lemmy.zip · 12 days ago

You rock. I didn’t realize NATIVEs had a placeholder PDF. I’ll try and scrape the media files tonight to add to the existing dataset 9 more complete archive.

CapableStaircase@lemmy.zip · 11 days ago

I could only grab ~44 of the NATIVEs you’ve listed and they total up to a tiny portion of the expected 80GB remaining. The hard part is guessing what file extension these files will have without getting rate limited by DOJ. I was hoping to get a copy of the zip file’s EOCD but it’s still down.

If anyone ever sees that zip come back please try and download the last 150-200MB. That’s where the zip archive’s table of contents is gonna live.

ermstein@lemmy.world · 11 days ago

One thing you could try is looking at the file extensions from DataSet 10’s Natives so you have fewer to guess from.

The rest of the natives still could be that large but I’ll double check if there are other placeholders.

CapableStaircase@lemmy.zip · 11 days ago

Can you also check and see if dataset 8/10/11 have all the native files they should based on the presence of these placeholders?

CapableStaircase@lemmy.zip · 11 days ago

I found this in a random doc today. I’ll add it to your list and give it a shot tonight. It’ll be slow going so I don’t get rate limited again. I think if you hit too many 404’s in a row the CDN locks you out for a bit.

ermstein@lemmy.world · 11 days ago

I updated the post with the URLs that I have found, and what extensions I have tried. Also you can track updates at https://github.com/yung-megafone/Epstein-Files/issues/4

CapableStaircase@lemmy.zip · 9 days ago

For anyone watching this post, I just dropped an update on that issue. Will be posting a new magnet link for the 84GB I was able to download soon.

ermstein@lemmy.world · 11 days ago

I have updated the post with a list of 2542 NATIVEs instead of 135 after finding a second placeholder size of 2433 bytes.

CapableStaircase@lemmy.zip · 8 days ago

I took the same list provided by this post and added a few more extensions to the search. In doing so I was able to successfully download 2327/2542 NATIVE files. I performed this search by making HEAD requests for each URL before trying to download them with a GET request. This search method resulted in me finding an additional 3 files that gave Content-Type and Content-Length in the HEAD response but ultimately “disappeared” and gave a 404 when performing a GET response.

NOTE:

All MS office files (.doc(x), .xls(x), .ppt(x)) are exactly ZERO bytes long.
There are two sqlite .db files which are password protected and I have not yet tried to crack.
Lots of jail footage
I think very small .avi videos which many sequential Bates numbers are actually single frames that need to be recombined into the original video. I have not done so.

Extensions I tried:

dataset10:

avi, mp4, mov, mp3, wav, m4a, m4v, wmv, ts, vob, 3gp, amr, opus, csv, xlsx, xls, docx, doc, pluginpayloadattachment

common-audio:

m4a, mp3, wav, aac, flac, ogg, wma, aiff, opus, m4b

common-video:

mp4, mov, avi, wmv, mkv, webm, m4v, mpg, mpeg, 3gp

uncommon-audio:

ac3, amr, mka, au, ra, mid, aif, dts, caf, gsm, ape, wv, spx, mpc, snd, voc, tta, tak, dsf, dff

uncommon-video:

flv, vob, ts, ogv, m2ts, mts, asf, 3g2, f4v, divx, rm, rmvb, m2v, dv, xvid, swf, m4s, hevc, h264, h265

rare-audio:

8svx, amb, au, avr, cda, cvs, cvsd, cvu, dss, dvms, fap, fssd, gsrt, hcom, htk, ima, ircam, maud, nist, paf, prc, pvf, sd2, sds, sf, smp, sou, txw, vms, w64, wve, xa, aifc, al, ul, la, sb, sw, ub, uw

rare-video:

264, 265, 302, 3p2, 787, 890, aec, aep, aepx, ajp, ale, am, amc, amv, arcut, arf, avb, avc, avd, avp, avs, awlive, axm, bdm, bdmv, bik, bix, bmk, bnp, box, bs4, bsf, bu, camproj, camrec, ced, cine, cip, clpi, cmmp, cmmtpl, cmproj, cmrec, cpi, cst, cx3, d2v, d3v, dash, dat, dce, dck, dcr, dcr, ddat, dif, dir, dlx, dmb, dmsd, dmsd3d, dmsm, dmsm3d, dmss, dnc, dpa, dpg, dream, dsy, dv4, dvdmedia, dvr, dvr-ms, dvx, dxr, dzm, dzp, dzt, edl, evo, eye, f4p, fbr, fbz, fcp

documents:

pdf, doc, docx, txt, rtf, odt, xls, xlsx, csv, ppt, pptx, odp, html, htm, xml, json, md, tex, epub, mobi

images:

jpg, jpeg, png, gif, bmp, tiff, tif, webp, svg, ico, raw, cr2, nef, orf, sr2, psd, ai, eps, heic, heif

archives:

zip, rar, 7z, tar, gz, bz2, xz, iso, dmg, cab, lz, lzma, zst, lz4, sz, z, tgz, tbz2, txz, tlz, tar.gz, tar.bz2, tar.xz, tar.zst, tar.lz, tar.lzma, tar.lz4, tar.z, [tar.sz](http://tar.sz/)

epstein:

apmaster, apversion, attr, bmp, bup, dat, data, db, db-journal, doc, ds\_store, f catalog, f\_catalog, ifo, images #1, images #2, iphoto, ivc, mpg, NULL, pdf, pps, ps, psb, psd, raf, tif, tiff, tropez, txt, xml

Torrent file: https://archive.org/details/data-set-9-native.tar.xz

NOTE: See INFO folder for more information.

ZaInT@lemmy.world · 10 days ago

Damn, the I’ve pulled ~770 of the 1983-list now. Hopefully it keeps up.

Not that much is missing from Epstein DataSet 9 already collected.

Not that much is missing from Epstein DataSet 9 already collected.

Update 1 (February 6):

Update 2 (February 6):