Asked  8 Months ago    Answers:  5   Viewed   36 times

I have a form that allows the user to either upload a text file or copy/paste the contents of the file into a textarea. I can easily differentiate between the two and put whichever one they entered into a string variable, but where do I go from there?

I need to iterate over each line of the string (preferably not worrying about newlines on different machines), make sure that it has exactly one token (no spaces, tabs, commas, etc.), sanitize the data, then generate an SQL query based off of all of the lines.

I'm a fairly good programmer, so I know the general idea about how to do it, but it's been so long since I worked with PHP that I feel I am searching for the wrong things and thus coming up with useless information. The key problem I'm having is that I want to read the contents of the string line-by-line. If it were a file, it would be easy.

I'm mostly looking for useful PHP functions, not an algorithm for how to do it. Any suggestions?

 Answers

80

preg_split the variable containing the text, and iterate over the returned array:

foreach(preg_split("/((r?n)|(rn?))/", $subject) as $line){
    // do stuff with $line
} 
Wednesday, March 31, 2021
 
EastSw
answered 8 Months ago
46

A regex would be simplest:

$input = 'foo_left.jpg';
if(!preg_match('/_(left|right|center)/', $input, $matches)) {
    // no match
}

$pos = $matches[0]; // "_left", "_right" or "_center"

See it in action.

Update:

For a more defensive-minded approach (if there might be multiple instances of "_left" and friends in the filename), you can consider adding to the regex.

This will match only if the l/r/c is followed by a dot:

preg_match('/(_(left|right|center))./', $input, $matches);

This will match only if the l/r/c is followed by the last dot in the filename (which practically means that the base name ends with the l/r/c specification):

preg_match('/(_(left|right|center))\.[^\.]*$/', $input, $matches);

And so on.

If using these regexes, you will find the result in $matches[1] instead of $matches[0].

Wednesday, March 31, 2021
 
braindamage
answered 8 Months ago
61

This can't work properly. Stored with Unicode there are many more Characters than with ANSI. So if you "convert" to ANSI, you will loose lots of charackters.

http://php.net/manual/en/function.htmlentities.php

You can use Unicode (UTF-8) charset with htmlentities:

string htmlentities ( string $string [, int $flags = ENT_COMPAT [, string $charset [, bool $double_encode = true ]]] )

htmlentities($myString, ENT_COMPAT, "UTF-8"); should work.

Thursday, August 5, 2021
 
CoderGuy123
answered 3 Months ago
22

This works, thanks to the nice people on /r/rust:

use std::error::Error;
use std::fs::File;
use std::io::BufReader;
use std::io::prelude::*;
use std::path::Path;

fn is_vowel(x: &char) -> bool {
    "aAeEiIoOuU".chars().any(|y| y == *x)
}

fn is_umlaut(x: &char) -> bool {
    "äÄüÜöÖ".chars().any(|y| y == *x)
}

fn valid(line: &str) -> bool {
    line.chars().all(|c| !is_vowel(&c)) && line.chars().filter(is_umlaut).fuse().nth(1).is_some()
}

fn main() {
    // Create a path to the desired file
    let path = Path::new("c.txt");
    let display = path.display();
    // Open the path in read-only mode, returns `io::Result<File>`
    let file = match File::open(&path) {
        Err(why) => panic!("couldn't open {}: {}", display, Error::description(&why)),
        Ok(file) => file,
    };
    let reader = BufReader::new(file);
    for line in reader.lines() {
        match line {
            Ok(line) => {
                if valid(&line) {
                    println!("{}", line)
                }
            }
            Err(e) => println!("ERROR: {}", e),
        }
    }
}
Wednesday, August 25, 2021
 
HexaGridBrain
answered 2 Months ago
66

Since you want to remove any horizontal whitespace from a Unicode string you need to use

  • h regex escape ("any horizontal whitespace character (since PHP 5.2.4)")
  • u modifier (see Pattern Modifiers)

Use

$txt = preg_replace("/^h+/mu", '', $txt);

Details

  • ^ - start of a line (m modifier makes ^ match all line start positions, not just string start position)
  • h+ - one or more horizontal whitespaces
  • u modifier will make sure the Unicode text is treated as a sequence of Unicode code points, not just code units, and will make all regex escapes in the pattern Unicode aware.
Tuesday, August 31, 2021
 
Bruno
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share