Giant Server File Folders - How Do They Access/Update It?

contract

We're all gunna mine it brah.
Joined
Jun 2, 2015
Messages
402
Likes
448
Degree
2
Imagine you have giant site with 1 Million files in a single FTP folder.

And you need to remove 1 or upload 1 more new file.

When you try to FTP into the folder, it freezes trying to load those 1 million files...

How do the big sites get around that?

I've tried this on a desktop PC running a server CPU with ram maxed out and the FTP client just never finishes loading. There has to be a better way? Does the internet speed matter when using an FTP? Would Fiberoptic solve that, or is it the FTP or even server itself shitting the bed and getting overloaded?

I've looked into backup and replace options, but on a very large site, but that's been a huge hassle with the size of files.

There has to be a simple way... If you can use an FTP with a folder that has 10,000 files, what is physically stopping it from letting you view a 1 million files?
 
What kind of files are we talk about here? Images and the like stored on S3 or something with a CNAME and you never worry about them choking down.
 
What kind of files are we talk about here? Images and the like stored on S3 or something with a CNAME and you never worry about them choking down.

Image files.
Under main .com domain
Stored on ~$1K/m standard hosting provider web server.
Hosting any files elsewhere is not an option in this case.
 
First - don't have folder with 1 mil image files. :-)

- Can you turn off the directory file listing with the FTP client? This might stop it from choking.

- Or you could remove the image via SSH. Adding could be done with rsync over ssh.

- Or have a PHP file to upload the images and move the images to removed-folder.

You could probably get this done with psftp or pscp
http://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html

http://the.earth.li/~sgtatham/putty/0.60/htmldoc/Chapter5.html
http://the.earth.li/~sgtatham/putty/0.60/htmldoc/Chapter6.html
648 x 363
 
Use S3 (assuming your images comply with their T&C)
Use Putty/scp
 
Last edited:
In my experience, CPanel file manager is much faster than ftp clients like FileZilla for such tasks.

Cpanel can't handle it either unfortantly... Even with a PC desktop on steriods, the broswer freezes and seriously lags.

First - don't have folder with 1 mil image files. :-)

- Can you turn off the directory file listing with the FTP client? This might stop it from choking.

- Or you could remove the image via SSH. Adding could be done with rsync over ssh.

- Or have a PHP file to upload the images and move the images to removed-folder.

You could probably get this done with psftp or pscp
http://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html

http://the.earth.li/~sgtatham/putty/0.60/htmldoc/Chapter5.html
http://the.earth.li/~sgtatham/putty/0.60/htmldoc/Chapter6.html
648 x 363

- That would be an excellent idea for uploading new files. I wonder if I can replace files without having them all listed...

- Can SSH handle an import of 1M files at once? Say if I want to replace them all.

- Wordpress' PHP media library works pretty decent. I believe I'm able to load ~2,000 files per page. It never crashes on me. Only limitation is there's no option to replace files when new ones are uploaded. Perhaps another script would work better.

- Will check those links out, thank you.

- Still trying to understand why FTP clients can't handle it... Both my server/desktop are on steriods when it comes to performance.
 
I'd recommend to just SSH into the box. If you only need a subset of the directory listing, you can just grep for it. For example, let's say you were looking for files that contained the word "apple" in that huge directory. You could do something like:
Code:
 ls -l | grep -i apple

If you wanted to copy the whole thing, rsync can do what you want. If you have SSH already set up and ready to go, you could pull everything down from the server by using something like this:

Code:
rsync -azP remote_user@remote_host:/path/to/remote/dir /path/on/local/machine

For single files, scp should do the trick. An example to pull down a file from remote:

Code:
scp remote_user@remotehost:/path/to/some/file.txt /local/dir/to/copy/file/to

And from local to remote:
Code:
scp  /local/path/to/file.txt remote_user@remotehost:/path/to/copy/to

There are several reasons that FTP clients might barf on a listing. One of the less obvious causes on a Windows box is Firewall software and even the routers themselves depending on firmware. Other causes could include performance issues related the the particular type of filesystem, network bottlenecks, etc. There's almost always a log file or 2 on the server that can help pinpoint the prob.

As a general rule, if you have to have a lot of files in a single dir, it's best to break the dir into subdirectories (and/or use multiple dirs/mount points) using some kind of sharding scheme. It could be as easy as year/month/day to start with. At the same time, I'd be seriously looking at S3 for that many files. Hope this helps :smile:
 
Last edited:
I think you have a limit problem on your server side (put in place for your safety). The Filezilla client does not have any limits placed on how many items it can display of a directory (other FTP clients might). However your server's FTP service most likely has a limit in place and it's probably enabled. For example pureFTP defaults to limit the files displayed for a directory to 2,000 files. You have to figure out which FTP service is running on your box and then figure out if it's got a default limit to the amount of files it can display, then adjust it accordingly.

With all that said @SmokeTree's methods is 100% the best method of solving your problem. SSH into the box and grep the listing and you are good to go.
 
Back